精华内容
下载资源
问答
  • 带有Resnet后端的CNN LSTM用于视频分类的实现 入门 先决条件 PyTorch(需要0.4以上版本) FFmpeg,FFprobe 的Python 3 尝试自己的数据集 mkdir data mkdir data/video_data 将您的视频数据集放入data / video_...
  • 条件随机场(CRF)的PyTorch实现 CRF损失的矢量化计算 矢量化维特比解码 用法 培训数据的格式应如下: token/tag token/tag token/tag ... token/tag token/tag token/tag ... ... 有关更多详细信息,请参见每个...
  • 今天小编就为大家分享一篇pytorch+lstm实现的pos示例,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • 这是Tree-LSTM实现,如Kai Sheng Tai,Richard Socher和Christopher Manning在论文《中所述。 在使用SICK数据集的语义相似性任务上,此实现达到: 皮尔逊系数: 0.8492和MSE: 0.2842使用超参数--lr 0.010 --wd 0...
  • 最近写算法的时候发现网上关于BiLSTM加Attention的实现方式五花八门,其中很多是错的,自己基于PyTorch框架实现了一版,主要用到了LSTM处理变长序列和masked softmax两个技巧。代码如下: 1、attention_utils.py ...

    最近写算法的时候发现网上关于BiLSTM加Attention的实现方式五花八门,其中很多是错的,自己基于PyTorch框架实现了一版,主要用到了LSTM处理变长序列和masked softmax两个技巧。代码如下:

    1、attention_utils.py

    from typing import Dict, Optional
    
    import numpy as np
    import torch
    import torch.nn.functional as F
    from torch import Tensor
    
    
    def create_src_lengths_mask(
        batch_size: int, src_lengths: Tensor, max_src_len: Optional[int] = None
    ):
        """
        Generate boolean mask to prevent attention beyond the end of source
        Inputs:
          batch_size : int
          src_lengths : [batch_size] of sentence lengths
          max_src_len: Optionally override max_src_len for the mask
        Outputs:
          [batch_size, max_src_len]
        """
        if max_src_len is None:
            max_src_len = int(src_lengths.max())
        src_indices = torch.arange(0, max_src_len).unsqueeze(0).type_as(src_lengths)
        src_indices = src_indices.expand(batch_size, max_src_len)
        src_lengths = src_lengths.unsqueeze(dim=1).expand(batch_size, max_src_len)
        # returns [batch_size, max_seq_len]
        return (src_indices < src_lengths).int().detach()
    
    
    def masked_softmax(scores, src_lengths, src_length_masking=True):
        """Apply source length masking then softmax.
        Input and output have shape bsz x src_len"""
        if src_length_masking:
            bsz, max_src_len = scores.size()
            # compute masks
            src_mask = create_src_lengths_mask(bsz, src_lengths)
            # Fill pad positions with -inf
            scores = scores.masked_fill(src_mask == 0, -np.inf)
    
        # Cast to float and then back again to prevent loss explosion under fp16.
        return F.softmax(scores.float(), dim=-1).type_as(scores)

    2、layers.py

    import torch
    from torch import nn
    import torch.nn.functional as F
    
    from utils.attention_utils import masked_softmax
    
    
    # s(x, q) = v.T * tanh (W * x + b)
    class MLPAttentionNetwork(nn.Module):
    
        def __init__(self, hidden_dim, attention_dim, src_length_masking=True):
            super(MLPAttentionNetwork, self).__init__()
    
            self.hidden_dim = hidden_dim
            self.attention_dim = attention_dim
            self.src_length_masking = src_length_masking
    
            # W * x + b
            self.proj_w = nn.Linear(self.hidden_dim, self.attention_dim, bias=True)
            # v.T
            self.proj_v = nn.Linear(self.attention_dim, 1, bias=False)
    
        def forward(self, x, x_lengths):
            """
            :param x: seq_len * batch_size * hidden_dim
            :param x_lengths: batch_size
            :return: batch_size * seq_len, batch_size * hidden_dim
            """
            seq_len, batch_size, _ = x.size()
            # (seq_len * batch_size, hidden_dim)
            # flat_inputs = x.view(-1, self.hidden_dim)
            flat_inputs = x.reshape(-1, self.hidden_dim)
            # (seq_len * batch_size, attention_dim)
            mlp_x = self.proj_w(flat_inputs)
            # (batch_size, seq_len)
            att_scores = self.proj_v(mlp_x).view(seq_len, batch_size).t()
            # (seq_len, batch_size)
            normalized_masked_att_scores = masked_softmax(
                att_scores, x_lengths, self.src_length_masking
            ).t()
            # (batch_size, hidden_dim)
            attn_x = (x * normalized_masked_att_scores.unsqueeze(2)).sum(0)
    
            return normalized_masked_att_scores.t(), attn_x

    3、model.py

    import torch
    import torch.nn as nn
    from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
    
    from layers import MLPAttentionNetwork
    
    
    class BiLSTMAttentionNetwork(nn.Module):
    
        def __init__(self, num_vocab, embedding_dim, max_len, hidden_dim, num_layers, bidirectional, attention_dim, num_classes):
    
            super(BiLSTMAttentionNetwork, self).__init__()
    
            # 词表长度,实际单词数量+填充占位符(1)
            self.num_vocab = num_vocab
            # 最大序列长度
            self.max_len = max_len
            # 单词隐向量维度
            self.embedding_dim = embedding_dim
            # LSTM中隐层的维度
            self.hidden_dim = hidden_dim
            # 循环神经网络的层数
            self.num_layers = num_layers
            # 是否使用双向RNN,布尔值
            self.bidirectional = bidirectional
            # 注意力层参数维度
            self.attention_dim = attention_dim
            # 标签数量
            self.num_classes = num_classes
            # Embedding层
            self.embedding_layer = nn.Embedding(self.num_vocab, self.embedding_dim, padding_idx=0)
            # RNN层
            self.bilstm_layer = nn.LSTM(self.embedding_dim, self.hidden_dim, self.num_layers, bidirectional=self.bidirectional,
                                  batch_first=True)
            # MLP注意力层
            self.mlp_attention_layer = MLPAttentionNetwork(2 * self.hidden_dim, self.attention_dim)
            # 全连接层
            self.fc_layer = nn.Linear(2 * self.hidden_dim, self.num_classes)
            # 单层softmax分类器
            self.softmax_layer = nn.Softmax(dim=1)
    
        def forward(self, x, lengths):
            """
            :param x: 填充好的序列
            :param lengths:
            :return:
            """
    
            # x: t.tensor([[1,2,3],[6,0,0],[4,5,0], [3, 7, 1]])
            # lengths: t.tensor([3, 1, 2, 3])、序列的实际长度
    
            x_input = self.embedding_layer(x)
            # print(x_input)
            x_packed_input = pack_padded_sequence(input=x_input, lengths=lengths, batch_first=True, enforce_sorted=False)
            # print(x_packed_input)
            packed_out, _ = self.bilstm_layer(x_packed_input)
            # print(packed_out)
            outputs, _ = pad_packed_sequence(packed_out, batch_first=True, total_length=self.max_len, padding_value=0.0)
            # print(out)
            atten_scores, atten_out = self.mlp_attention_layer(outputs.permute(1, 0, 2), lengths)
            # print(atten_out)
            # (batch_size, num_classes)
            logits = self.softmax_layer(self.fc_layer(atten_out))
            return atten_scores, logits

    测试代码如下:

    if __name__ == '__main__':
        # num_vocab, embedding_dim, hidden_dim, num_layers, bidirectional, attention_dim, num_classes
        b_a = BiLSTMAttentionNetwork(20, 3, 3, 2, 1, bidirectional=True, attention_dim=5, num_classes=10)
        x = torch.tensor([[1, 2, 3], [6, 0, 0], [4, 5, 0], [3, 7, 1]])
        lengths = torch.tensor([3, 1, 2, 3])
        atten_scores, logits = b_a(x, lengths)
        print('--------------------->注意力分布')
        print(atten_scores)
        print('--------------------->预测概率')
        print(logits)

    效果如下:

    --------------------->注意力分布:
    tensor([[0.3346, 0.3320, 0.3335],
            [1.0000, 0.0000, 0.0000],
            [0.4906, 0.5094, 0.0000],
            [0.3380, 0.3289, 0.3330]], grad_fn=<TBackward>)
    --------------------->预测概率:
    tensor([[0.0636, 0.0723, 0.1241, 0.0663, 0.0671, 0.0912, 0.1244, 0.1446, 0.1223,
             0.1240],
            [0.0591, 0.0745, 0.1264, 0.0650, 0.0657, 0.0853, 0.1273, 0.1478, 0.1186,
             0.1303],
            [0.0634, 0.0727, 0.1178, 0.0678, 0.0688, 0.0925, 0.1203, 0.1492, 0.1228,
             0.1249],
            [0.0615, 0.0739, 0.1253, 0.0633, 0.0675, 0.0872, 0.1226, 0.1490, 0.1224,
             0.1274]], grad_fn=<SoftmaxBackward>)

     

    展开全文
  • PyTorch中使用LSTM进行风速预测
  • model.py: #!/usr/bin/python # -*- coding: utf-8 -*- import torch from torch import nn import numpy as np from torch.autograd import Variable import torch.nn.functional as F class TextRNN(nn.Module):...
  • 该存储库实现了用于命名实体识别的LSTM-CRF模型。 该模型与的模型相同,只是我们没有BiLSTM之后的最后一个tanh层。 我们在CoNLL-2003和OntoNotes 5.0英文数据集上均达到了SOTA性能(请通过使用Glove和ELMo来检查我们...
  • LSTMpytorch实现

    万次阅读 多人点赞 2019-05-29 17:35:24
    文章目录LSTM探索矩阵乘法符号@自己实现LSTM版本和官方实现相比较 LSTM 探索矩阵乘法符号@ 不管输入的两个矩阵到底是多少维,实际中都是只对最后两维做矩阵相乘的运算。 import torch x = torch.randn(5,4,3) y ...

    LSTM

    探索矩阵乘法符号@

    不管输入的两个矩阵到底是多少维,实际中都是只对最后两维做矩阵相乘的运算。

    import torch
    x = torch.randn(5,4,3)
    y = torch.randn(5,3,8)
    z = x@y
    
    z.shape
    
    torch.Size([5, 4, 8])
    
    y = torch.randn(3,8)
    z = x@y
    
    z.shape
    
    torch.Size([5, 4, 8])
    
    x = torch.randn(4,3)
    y = torch.randn(3,8)
    z = x@y
    
    z.shape
    
    torch.Size([4, 8])
    
    x = torch.randn(2,5,4,3)
    y = torch.randn(2,5,3,8)
    z = x@y
    
    z.shape
    
    torch.Size([2, 5, 4, 8])
    

    自己实现的LSTM版本

    比较volatile和requires_grad属性:当你确定你甚至不会调用.backward()时,那么volatile属性比require_grad更好用,只要volatile是true,那么require_grad就是false。而且计算时只要有一个volatile为true的输入,它的结果也将如此。相对的,只有当输入的每个变量其require_grad值均为false,计算结果才会是false。

    Parameters 是 Variable 的子类。Paramenters和Modules一起使用的时候会有一些特殊的属性,即:当Paramenters赋值给Module的属性的时候,他会自动的被加到 Module的参数列表中(会出现在 parameters() 迭代器中)。将Varibale赋值给Module属性则不会有这样的影响。 这样做的原因是:我们有时候会需要缓存一些临时的状态(state), 比如:模型中RNN的最后一个隐状态。如果没有Parameter这个类的话,那么这些临时变量也会注册成为模型变量。
    Variable 与 Parameter的另一个不同之处在于,Parameter无法设置volatile=True,而且默认requires_grad=True。Variable默认requires_grad=False。

    以前都是写nn.linear()或者是nn.conv()来对模型参数进行赋值的,而在此过程中并没有用到Parameters,但是如果你去查看nn.linear()或者是nn.conv()的源码(https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py),你会发现内部的weight和bias都是使用了Parameters。

    class NaiveLSTM(nn.Module):
        """Naive LSTM like nn.LSTM"""
        def __init__(self, input_size: int, hidden_size: int):
            super(NaiveLSTM, self).__init__()
            self.input_size = input_size
            self.hidden_size = hidden_size
    
            # 输入门的权重矩阵和bias矩阵
            self.w_ii = Parameter(Tensor(hidden_size, input_size))
            self.w_hi = Parameter(Tensor(hidden_size, hidden_size))
            self.b_ii = Parameter(Tensor(hidden_size, 1))
            self.b_hi = Parameter(Tensor(hidden_size, 1))
    
            # 遗忘门的权重矩阵和bias矩阵
            self.w_if = Parameter(Tensor(hidden_size, input_size))
            self.w_hf = Parameter(Tensor(hidden_size, hidden_size))
            self.b_if = Parameter(Tensor(hidden_size, 1))
            self.b_hf = Parameter(Tensor(hidden_size, 1))
    
            # 输出门的权重矩阵和bias矩阵
            self.w_io = Parameter(Tensor(hidden_size, input_size))
            self.w_ho = Parameter(Tensor(hidden_size, hidden_size))
            self.b_io = Parameter(Tensor(hidden_size, 1))
            self.b_ho = Parameter(Tensor(hidden_size, 1))
            
            # cell的的权重矩阵和bias矩阵
            self.w_ig = Parameter(Tensor(hidden_size, input_size))
            self.w_hg = Parameter(Tensor(hidden_size, hidden_size))
            self.b_ig = Parameter(Tensor(hidden_size, 1))
            self.b_hg = Parameter(Tensor(hidden_size, 1))
    
            self.reset_weigths()
    
        def reset_weigths(self):
            """reset weights
            """
            stdv = 1.0 / math.sqrt(self.hidden_size)
            for weight in self.parameters():
                init.uniform_(weight, -stdv, stdv)
    
        def forward(self, inputs: Tensor, state: Tuple[Tensor]) \
            -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
    #       ->用来提示该函数返回值的数据类型
            """Forward
            Args:
                inputs: [1, 1, input_size]
                state: ([1, 1, hidden_size], [1, 1, hidden_size])
            """
    
    #         batch_size, seq_size , _ = inputs.size()
    
            if state is None:
                h_t = torch.zeros(1, self.hidden_size).t()
                c_t = torch.zeros(1, self.hidden_size).t()
            else:
                (h, c) = state
                h_t = h.squeeze(0).t()
                c_t = c.squeeze(0).t()
    
            hidden_seq = []
    
            seq_size = 1
            for t in range(seq_size):
                x = inputs[:, t, :].t()
                # input gate
                i = torch.sigmoid(self.w_ii @ x + self.b_ii + self.w_hi @ h_t +
                                  self.b_hi)
                # forget gate
                f = torch.sigmoid(self.w_if @ x + self.b_if + self.w_hf @ h_t +
                                  self.b_hf)
                # cell
                g = torch.tanh(self.w_ig @ x + self.b_ig + self.w_hg @ h_t
                               + self.b_hg)
                # output gate
                o = torch.sigmoid(self.w_io @ x + self.b_io + self.w_ho @ h_t +
                                  self.b_ho)
                
                c_next = f * c_t + i * g
                h_next = o * torch.tanh(c_next)
                c_next_t = c_next.t().unsqueeze(0)
                h_next_t = h_next.t().unsqueeze(0)
                hidden_seq.append(h_next_t)
    
            hidden_seq = torch.cat(hidden_seq, dim=0)
            return hidden_seq, (h_next_t, c_next_t)
    
    def reset_weigths(model):
        """reset weights
        """
        for weight in model.parameters():
            init.constant_(weight, 0.5)
    
    inputs = torch.ones(1, 1, 10)
    h0 = torch.ones(1, 1, 20)
    c0 = torch.ones(1, 1, 20)
    print(h0.shape, h0)
    print(c0.shape, c0)
    print(inputs.shape, inputs)
    
    torch.Size([1, 1, 20]) tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
              1., 1., 1.]]])
    torch.Size([1, 1, 20]) tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
              1., 1., 1.]]])
    torch.Size([1, 1, 10]) tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]])
    
    # test naive_lstm with input_size=10, hidden_size=20
    naive_lstm = NaiveLSTM(10, 20)
    reset_weigths(naive_lstm)
    
    output1, (hn1, cn1) = naive_lstm(inputs, (h0, c0))
    
    print(hn1.shape, cn1.shape, output1.shape)
    print(hn1)
    print(cn1)
    print(output1)
    
    torch.Size([1, 1, 20]) torch.Size([1, 1, 20]) torch.Size([1, 1, 20])
    tensor([[[0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640]]], grad_fn=<UnsqueezeBackward0>)
    tensor([[[2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000,
              2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000,
              2.0000, 2.0000, 2.0000, 2.0000]]], grad_fn=<UnsqueezeBackward0>)
    tensor([[[0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640]]], grad_fn=<CatBackward>)
    

    和官方实现相比较

    # Use official lstm with input_size=10, hidden_size=20
    lstm = nn.LSTM(10, 20)
    reset_weigths(lstm)
    
    output2, (hn2, cn2) = lstm(inputs, (h0, c0))
    print(hn2.shape, cn2.shape, output2.shape)
    print(hn2)
    print(cn2)
    print(output2)
    
    torch.Size([1, 1, 20]) torch.Size([1, 1, 20]) torch.Size([1, 1, 20])
    tensor([[[0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640]]], grad_fn=<StackBackward>)
    tensor([[[2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000,
              2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000, 2.0000,
              2.0000, 2.0000, 2.0000, 2.0000]]], grad_fn=<StackBackward>)
    tensor([[[0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640, 0.9640,
              0.9640, 0.9640, 0.9640, 0.9640]]], grad_fn=<StackBackward>)
    

    Grad of LSTM

    def setup_seed(seed):
        """
        确保每次都会生成相同的结果
        """
        torch.manual_seed(seed)
        np.random.seed(seed)
        random.seed(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = True
    
    def show_gates(i_s, o_s, f_s):
        """Show input gate, output gate, forget gate for LSTM
        """
        plt.plot(i_s, "r", label="input gate")
        plt.plot(o_s, "b", label="output gate")
        plt.plot(f_s, "g", label="forget gate")
        plt.title('Input gate, output gate and forget gate of LSTM')
        plt.xlabel('t', color='#1C2833')
        plt.ylabel('Mean Value', color='#1C2833')
        plt.legend(loc='best')
        plt.grid()
        plt.show()
    
    def lstm_step(x, h, c, w_ii, b_ii, w_hi, b_hi,
                      w_if, b_if, w_hf, b_hf,
                      w_ig, b_ig, w_hg, b_hg,
                      w_io, b_io, w_ho, b_ho, use_forget_gate=True):
        """run lstm a step
        """
        x_t = x.t()
        h_t = h.t()
        c_t = c.t()
        i = torch.sigmoid(w_ii @ x_t + b_ii + w_hi @ h_t + b_hi)
        o = torch.sigmoid(w_io @ x_t + b_io + w_ho @ h_t + b_ho)
        g = torch.tanh(w_ig @ x_t + b_ig + w_hg @ h_t + b_hg)
        f = torch.sigmoid(w_if @ x_t + b_if + w_hf @ h_t + b_hf)
        if use_forget_gate:
            c_next = f * c_t + i * g
        else:
            c_next = c_t + i * g
        h_next = o * torch.tanh(c_next)
        c_next_t = c_next.t()
        h_next_t = h_next.t()
        
        i_avg = torch.mean(i).detach()
        o_avg = torch.mean(o).detach()
        f_avg = torch.mean(f).detach()
        
        return h_next_t, c_next_t, f_avg, i_avg, o_avg
    
    hidden_size = 50
    input_size = 100
    sequence_len = 100
    high = 1000000
    test_idx = torch.randint(high=high, size=(1, sequence_len)).to(device)
    setup_seed(45)
    embeddings = nn.Embedding(high, input_size).to(device)
    test_embeddings = embeddings(test_idx).to(device)
    h_0 = torch.zeros(1, hidden_size, requires_grad=True).to(device)
    c_0 = torch.zeros(1, hidden_size, requires_grad=True).to(device)
    h_t = h_0
    c_t = c_0
    print(test_embeddings)
    print(h_0)
    print(c_0)
    
    tensor([[[ 0.5697,  0.7304, -0.4647,  ...,  0.7549,  0.3112, -0.4582],
             [ 1.5171,  0.7328,  0.0803,  ...,  1.2385,  1.2259, -0.5259],
             [-0.2804, -0.4395,  1.5441,  ..., -0.8644,  0.1858, -0.9446],
             ...,
             [ 0.5019, -0.8431, -0.9560,  ...,  0.2607,  1.2035,  0.6892],
             [-0.5062,  0.8530,  0.3743,  ..., -0.4148, -0.3384,  0.9264],
             [-2.1523,  0.6292, -0.9732,  ..., -0.2591, -1.6320, -0.1915]]],
           device='cuda:2', grad_fn=<EmbeddingBackward>)
    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0.]], device='cuda:2', grad_fn=<CopyBackwards>)
    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0.]], device='cuda:2', grad_fn=<CopyBackwards>)
    

    Grad of LSTM (Not Using forget gate)

    lstm = NaiveLSTM(input_size, hidden_size).to(device)
    iters = test_embeddings.size(1)
    lstm_grads = []
    i_s = []
    o_s = []
    f_s = []
    for t in range(iters):
        h_t, c_t, f, i, o = lstm_step(test_embeddings[: , t, :], h_t, c_t, 
                                   lstm.w_ii, lstm.b_ii, lstm.w_hi, lstm.b_hi,
                                   lstm.w_if, lstm.b_if, lstm.w_hf, lstm.b_hf,
                                   lstm.w_ig, lstm.b_ig, lstm.w_hg, lstm.b_hg,
                                   lstm.w_io, lstm.b_io, lstm.w_ho, lstm.b_ho,
                                   use_forget_gate=False)
        loss = h_t.abs().sum()
        h_0.retain_grad()
        loss.backward(retain_graph=True)
        lstm_grads.append(torch.norm(h_0.grad).item())
        i_s.append(i)
        o_s.append(o)
        f_s.append(f)
        h_0.grad.zero_()
        lstm.zero_grad()
    
    plt.plot(lstm_grads)
    

    在这里插入图片描述

    show_gates(i_s, o_s, f_s)
    

    在这里插入图片描述

    Grad of LSTM (Using forget gate)

    setup_seed(45)
    embeddings = nn.Embedding(high, input_size).to(device)
    test_embeddings = embeddings(test_idx).to(device)
    h_0 = torch.zeros(1, hidden_size, requires_grad=True).to(device)
    c_0 = torch.zeros(1, hidden_size, requires_grad=True).to(device)
    h_t = h_0
    c_t = c_0
    print(test_embeddings)
    print(h_0)
    print(c_0)
    
    tensor([[[ 0.5697,  0.7304, -0.4647,  ...,  0.7549,  0.3112, -0.4582],
             [ 1.5171,  0.7328,  0.0803,  ...,  1.2385,  1.2259, -0.5259],
             [-0.2804, -0.4395,  1.5441,  ..., -0.8644,  0.1858, -0.9446],
             ...,
             [ 0.5019, -0.8431, -0.9560,  ...,  0.2607,  1.2035,  0.6892],
             [-0.5062,  0.8530,  0.3743,  ..., -0.4148, -0.3384,  0.9264],
             [-2.1523,  0.6292, -0.9732,  ..., -0.2591, -1.6320, -0.1915]]],
           device='cuda:2', grad_fn=<EmbeddingBackward>)
    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0.]], device='cuda:2', grad_fn=<CopyBackwards>)
    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0.]], device='cuda:2', grad_fn=<CopyBackwards>)
    
    lstm = NaiveLSTM(input_size, hidden_size).to(device)
    ## BIG CHANGE!!
    lstm.b_hf.data = torch.ones_like(lstm.b_hf) * 1/2
    lstm.b_if.data = torch.ones_like(lstm.b_if) * 1/2
    iters = test_embeddings.size(1)
    lstm_grads = []
    i_s = []
    o_s = []
    f_s = []
    for t in range(iters):
        h_t, c_t, f, i, o = lstm_step(test_embeddings[: , t, :], h_t, c_t, 
                                   lstm.w_ii, lstm.b_ii, lstm.w_hi, lstm.b_hi,
                                   lstm.w_if, lstm.b_if, lstm.w_hf, lstm.b_hf,
                                   lstm.w_ig, lstm.b_ig, lstm.w_hg, lstm.b_hg,
                                   lstm.w_io, lstm.b_io, lstm.w_ho, lstm.b_ho,
                                   use_forget_gate=True)
        loss = h_t.abs().sum()
        h_0.retain_grad()
        loss.backward(retain_graph=True)
        lstm_grads.append(torch.norm(h_0.grad).item())
        i_s.append(i)
        o_s.append(o)
        f_s.append(f)
        h_0.grad.zero_()
        lstm.zero_grad()
    
    plt.plot(lstm_grads)
    

    在这里插入图片描述

    show_gates(i_s, o_s, f_s)
    

    在这里插入图片描述

    展开全文
  • 先运行main.py进行文本序列化,再train.py模型训练 dataset.py from torch.utils.data import DataLoader,Dataset import torch import os from utils import tokenlize import config class ImdbDataset(Dataset): ...
  • 此存储库包含BiLSTM-CRF模型的PyTorch实现,用于命名实体识别任务。 代码结构 在项目的根目录,您将看到: ├── pyner | └── callback | | └── lrscheduler.py  | | └── trainingmonitor.py  | | └...
  • 为了解决传统RNN无法长时依赖问题,RNN的两个变体LSTM和GRU被引入。 LSTM Long Short Term Memory,称为长短期记忆网络,意思就是长的短时记忆,其解决的仍然是短时记忆问题,这种短时记忆比较长,能一定程度上解决...
  • pytorch实现LSTM回归代码分享

    千次阅读 2020-10-08 08:37:40
    最近正在学习RNN相关的知识,并尝试使用LSTM网络实现回归分析。刚刚接触RNN相关网络的上手难度比较大,首先从CSDN上寻找相关的代码并没有找到比较满意的。这几天终于把LSTM相关网络调试通过现在把我的代码及数据集...

    最近正在学习RNN相关的知识,并尝试使用LSTM网络实现回归分析。刚刚接触RNN相关网络的上手难度比较大,首先从CSDN上寻找相关的代码并没有找到比较满意的。这几天终于把LSTM相关网络调试通过现在把我的代码及数据集开源,供大家学习参考。

    LSTM回归算法代码分享

    LSTM简介

    参考相关博客链接: LSTM这一篇就够了.,在这里不再介绍相关的理论知识。我对这个网络的理解:如果某个信号的时间相关性强,那么RNN相关网络的训练效果应该会比较好。

    数据集介绍

    数据集来源于相关课题,课题的内容不便透漏,在这里展示数据集的相关图像数据集图像展示
    数据集的横坐标表示时间,纵坐标表示振幅。可以看出数据的波动性很大,并且根据实际的物理背景很容易得知第n+1时刻的输出与第n时刻的输出之间有着密切联系,故选定LSTM这个长短时间记忆网络。

    前期经过DNN网络训练过,DNN的网络结构为一个输入层(1 Net),三个隐藏层(10* 50 *10 Net),一个输出层(1 Net)。网络结构及训练结果如下图所示:
    网络结构与训练结果
    左图展示的为DNN网络结构,右图展示数据拟合效果。蓝色代表实际数据,红色线条代表对数据的拟合情况。可以看出红色线条对蓝色的数据点能够大体拟合但是丢掉了很多震荡数据。这个可能是由于调参不当或者训练次数不够导致。通过调参应该也能达到更好的一个结果。

    使用LSTM网络也进行相关的实验,搭建了一个很简单的单输入单输出网络,训练结果如图所示:
    RNN训练结果
    刚开始的时候会有些许振荡现象,后来数据能够基本符合图片一展示的数据集。数据集图片一与训练出的结果对比图展示如下:

    将第三张图片部分截取进行对比,可以看出,经过训练的LSTM网络能够很好的预测实验数据。保留了数据中的波动情况。

    代码展示

    """
    user:liujie
    time:2020.10.07
    """
    import torch
    import torch.nn as nn
    import numpy as np
    import matplotlib.pyplot as plt
    #引入相关的文件及数据集,数据集的数据来源于csv文件
    import pandas as pd                             #导入pandas包
    #csv文件读取
    data = pd.read_csv("patientdata.csv")
    #读取的文件首先进行列表化并转置。随后转存为float64的格式,默认格式为flaot32
    data = np.transpose(np.array(data)).astype(np.float32)
    x_data = data[0, :3000]                         #数据切片,x_data表示自变量
    y_data = data[1, :3000]                         #数据切片,y_data表示因变量
    # 设置超参数
    input_size = 1                                  #定义超参数输入层,输入数据为1维
    output_size = 1                                 #定义超参数输出层,输出数据为1维
    num_layers = 1                                  #定义超参数rnn的层数,层数为1层
    hidden_size = 32                                #定义超参数rnn的循环神经元个数,个数为32个
    learning_rate = 0.02                            #定义超参数学习率
    train_step = 1000                                #定义训练的批次,3000个数据共训练1000次,
    time_step = 3                                  #定义每次训练的样本个数每次传入3个样本
    h_state = None                                  #初始化隐藏层状态
    use_gpu = torch.cuda.is_available()             #使用GPU加速训练
    class RNN(nn.Module):
        """搭建rnn网络"""
        def __init__(self, input_size, hidden_size, num_layers, output_size):
            super(RNN, self).__init__()
            self.rnn = nn.RNN(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                batch_first=True,)                  #传入四个参数,这四个参数是rnn()函数中必须要有的
            self.output_layer = nn.Linear(in_features=hidden_size, out_features=output_size)
        def forward(self, x, h_state):
            # x (batch, time_step, input_size)
            # h_state (n_layers, batch, hidden_size)
            # rnn_out (batch, time_step, hidden_size)
            rnn_out, h_state = self.rnn(x, h_state)     #h_state是之前的隐层状态
            out = []
            for time in range(rnn_out.size(1)):
                every_time_out = rnn_out[:, time, :]    #相当于获取每个时间点上的输出,然后过输出层
                out.append(self.output_layer(every_time_out))
            return torch.stack(out, dim=1), h_state     #torch.stack扩成[1, output_size, 1]
    # 显示由csv提供的样本数据图
    plt.figure(1)
    plt.plot(x_data, y_data, 'r-', label='target (Ca)')
    plt.legend(loc='best')
    plt.show()
    #对CLASS RNN进行实例化时向其中传入四个参数
    rnn = RNN(input_size, hidden_size, num_layers, output_size)
    # 设置优化器和损失函数
    #使用adam优化器进行优化,输入待优化参数rnn.parameters,优化学习率为learning_rate
    optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
    loss_function = nn.MSELoss()                #损失函数设为常用的MES均方根误差函数
    plt.figure(2)                               #新建一张空白图片2
    plt.ion()
    # 按照以下的过程进行参数的训练
    for step in range(train_step):
        start, end = step*time_step, (step+1)*time_step#
        steps = np.linspace(start, end, (end-start), dtype=np.float32)#该参数仅仅用于画图过程中使用
        x_np = x_data[start:end]        #按照批次大小从样本中切片出若干个数据,用作RNN网络的输入
        y_np = y_data[start:end]        #按照批次大小从样本中切片出若干个数据,用作与神经网络训练的结果对比求取损失
        x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis])
        y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])
        pridect, h_state = rnn.forward(x, h_state)
        h_state = h_state.detach()     # 重要!!! 需要将该时刻隐藏层的状态作为下一时刻rnn的输入
    
        loss = loss_function(pridect, y)#求解损失值,该损失值用于后续参数的优化
        optimizer.zero_grad()           #优化器的梯度清零,这一步必须要做
    
        loss.backward()                 #调用反向传播网络对损失值求反向传播,优化该网络
        optimizer.step()                #调用优化器对rnn中所有有关参数进行优化处理
    
        plt.plot(steps, pridect.detach().numpy().flatten(), 'b-')
        plt.draw()
        plt.pause(0.05)
        plt.ioff()
        plt.show()
    
    
    
    

    这个代码使用的python3.6+pytorch开发,进行了比较详细的注释,希望能够抛砖引玉,有问题还请大家不吝赐教

    展开全文
  • pytorch使用lstm“There is no rule on how to write. Sometimes it comes easily and perfectly: sometimes it’s like drilling rock and then blasting it out with charges” — Ernest Hemingway “没有写法的...

    pytorch使用lstm

    “There is no rule on how to write. Sometimes it comes easily and perfectly: sometimes it’s like drilling rock and then blasting it out with charges” — Ernest Hemingway

    “没有写法的规则。 有时候它变得轻松而完美:有时候就像钻石头,然后用炸药炸开它” –欧内斯特·海明威(Ernest Hemingway)

    The aim of this blog is to explain the building of an end-to-end model for text generation by implementing a powerful architecture based on LSTMs.

    该博客的目的是通过实现基于LSTM的强大架构来解释用于文本生成的端到端模型。

    The blog is divided into the following sections:

    该博客分为以下几部分:

    • Introduction

      介绍

    • Text preprocessing

      文字预处理

    • Sequence generation

      序列产生

    • Model architecture

      模型架构

    • Training phase

      训练阶段

    • Text generation

      文字产生

    You can find the complete code at: https://github.com/FernandoLpz/Text-Generation-BiLSTM-PyTorch

    您可以在以下位置找到完整的代码: https : //github.com/FernandoLpz/Text-Generation-BiLSTM-PyTorch

    介绍 (Introduction)

    Over the years, various proposals have been launched to model natural language, but how is this? what does the idea of “modeling natural language” refer to? We could think that “modeling natural language” refers to the reasoning given to the semantics and syntax that make up the language, in essence, it is, but it goes further.

    多年以来,已经提出了各种建议来模拟自然语言 ,但这是怎么回事? “ 模拟自然语言 ”的想法指的是什么? 我们可以认为“ 对自然语言建模 ”是指对构成语言的语义和语法的推理,从本质上讲,确实如此,但它可以走得更远。

    Nowadays, the field of Natural Language Processing (NLP) deals with different tasks that refer to reasoning, understanding and modeling of language through different methods and techniques. The field of NLP (Natural Language processing) has been growing extremely fast in this past decade. It has been proposed in plenty of models to solve different NLP tasks from different perspectives. Likewise, the common denominator among the most popular proposals is the implementation of Deep Learning based models.

    如今, 自然语言处理 ( NLP )领域通过不同的方法和技术处理涉及语言推理,理解和建模的不同任务。 在过去的十年中,NLP(自然语言处理)领域的发展非常Swift。 已经提出了许多模型来从不同角度解决不同的NLP任务。 同样,最受欢迎的提案中的共同点是基于深度学习的模型的实现。

    As already mentioned, NLP field addresses a huge number of problems, specifically in this blog we will address the problem of text generation by making use of deep learning based models, such as the recurrent neural networks LSTM and Bi-LSTM. Likewise, we will use one of the most sophisticated frameworks today to develop deep learning models, specifically we will use the LSTMCell class from PyTorch to develop the proposed architecture.

    如前所述, NLP领域解决了大量问题,特别是在此博客中,我们将通过使用基于深度学习的模型 (例如递归神经网络 LSTMBi-LSTM)来解决文本生成问题。 同样,我们将用最先进的框架之一,今天深发展的学习模式,特别是我们将使用LSTMCell PyTorch发展所提出的架构。

    If you want to dig into the mechanics of the LSTM, as well as how it is implemented in PyTorch, take a look at this amazing explanation: From a LSTM Cell to a Multilayer LSTM Network with PyTorch

    如果您想了解LSTM的原理以及在PyTorch中的实现方式 ,请看一下以下令人惊奇的解释: 从LSTM单元到带有PyTorch的多层LSTM网络

    问题陈述 (Problem statement)

    Given a text, a neural network will be fed through character sequences in order to learn the semantics and syntactics of the given text. Subsequently, a sequence of characters will be randomly taken and the next character will be predicted.

    给定文本,将通过字符序列提供神经网络,以学习给定文本的语义和句法。 随后,将随机抽取一系列字符,并预测下一个字符。

    So, let’s get started!

    所以,让我们开始吧!

    文字预处理 (Text preprocessing)

    First, we are going to need a text which we are going to work with. There are different resources where you can find different texts in plain text, I recommend you take a look at the Gutenberg Project.

    首先,我们需要一个将要使用的文本。 您可以在不同的资源中找到纯文本的不同文本,我建议您看一下Gutenberg项目

    In this case, I will use the book called Jack Among the Indians by George Bird Grinnell, the one you can find here: link to the book. So, the first lines of chapter 1 look like:

    在这种情况下,我将使用乔治·伯德·格林纳尔 ( George Bird Grinnell)所著的《印第安人杰克之中 》一书,您可以在这里找到该书链接至该书 。 因此,第一章的第一行如下所示:

    The train rushed down the hill, with a long shrieking whistle, and then began to go more and more slowly. Thomas had brushed Jack off and thanked him for the coin that he put in his hand, and with the bag in one hand and the stool in the other now went out onto the platform and down the steps, Jack closely following.

    As you can see, the text contains uppercase, lowercase, line breaks, punctuation marks, etc. What is suggested to do is to try to adapt the text to a form which allows us to handle it in a better way and which mainly reduces the complexity of the model that we are going to develop. So we are going to transform each character to its lowercase form. Also, it is advisable to handle the text as a list of characters, that is, instead of having a “big string of characters”, we will have a list of characters. The purpose of having the text as a sequence of characters is for better handling when generating the sequences which the model will be fed with (we will see this in the next section in detail).

    如您所见,文本包含大写,小写,换行符,标点符号等。建议做的是尝试使文本适应某种形式,以使我们可以更好地处理它,并且主要减少我们将要开发的模型的复杂性。 因此,我们将每个字符转换为小写形式 。 另外,建议将文本作为字符列表来处理,也就是说,我们将拥有一个字符列表,而不是使用“ 大字符字符串 ”。 将文本作为字符序列的目的是为了更好地处理生成模型将要使用的序列(我们将在下一节中详细介绍)。

    So let’s do it!

    让我们开始吧!

    Code snippet 1. Preprocessing
    代码段1.预处理

    As we can see, in line 2 we are defining the characters to be used, all other symbols will be discarded, we only keep the “white space” symbol. In lines 6 and 10 we are reading the raw file and transforming it into its lowercase form. In the loops of lines 14 and 19 we are creating and string which represents the entire book and generating a list of characters. In line 23 we are filtering the text list by only keeping the letters defined in line 2.

    如我们所见,在第2行中,我们定义了要使用的字符,所有其他符号都将被丢弃,我们仅保留“ 空白 ”符号。 在第6和10行中,我们正在读取原始文件并将其转换为小写形式。 在第14和19行的循环中,我们创建并表示整个书籍的字符串,并生成一个字符列表。 在第23行中,我们仅保留第2行中定义的字母来过滤文本列表

    So, once the text is loaded and preprocessed, we will go from having a text like this:

    因此,一旦加载并预处理了文本,我们将不再需要这样的文本:

    text = "The train rushed down the hill."

    to have a list of characters like this:

    具有这样的字符列表:

    text = ['t','h','e',' ','t','r','a','i','n',' ','r','u','s','h','e','d',' ','d','o','w','n',
    ' ','t','h','e',' ','h','i','l','l']

    Well, we already have the full text as a list of characters. As it’s well known, we cannot introduce raw characters directly to a neural network, we require a numerical representation, therefore, we need to transform each character to a numerical representation. For this, we are going to create a dictionary which will help us to save the equivalence “character-index” and “index-character”.

    好吧,我们已经有了全文作为字符列表。 众所周知,我们无法将原始字符直接引入神经网络,我们需要一个数字表示形式 ,因此,我们需要将每个字符转换为一个数字表示形式。 为此,我们将创建一个字典,该字典将帮助我们保存等价的“ character-index ”和“ i ndex-character ”。

    So, let’s do it!

    所以,让我们开始吧!

    Code snippet 2. Dictionary creation
    代码段2.字典创建

    As we can notice, in lines 11 and 12 the “char-index” and “index-char” dictionaries are created.

    我们可以注意到,在第11和12行中,创建了“ char-index ”和“ index-char ”字典。

    So far we have already shown how to load the text and save it in the form of a list of characters, we have also created a couple of dictionaries that will help us to encode-decode each character. Now, it is time to see how we will generate the sequences that will be introduced to the model. So, let’s go to the next section!

    到目前为止,我们已经展示了如何加载文本并以字符列表的形式保存文本,我们还创建了两个字典,可以帮助我们对每个字符进行编码/解码。 现在,该看一下我们如何生成将引入模型的序列了。 因此,让我们进入下一部分!

    序列产生 (Sequence generation)

    The way in which the sequences are generated depends entirely on the type of model that we are going to implement. As already mentioned, we will use recurrent neural networks of the LSTM type, which receive data sequentially (time steps).

    生成序列的方式完全取决于我们将要实现的模型的类型。 如前所述,我们将使用LSTM类型的递归神经网络,该网络顺序地(时间步长)接收数据。

    For our model, we need to form sequences of a given length which we will call “window”, where the character to predict (the target) will be the character next to the window. Each sequence will be made up of the characters included in the window. To form a sequence, the window is sliced one character to the right at a time. The character to predict will always be the character following the window. We can clearly see this process in Figure 1.

    对于我们的模型,我们需要形成给定长度的序列,我们将其称为“ 窗口 ”,其中要预测的字符( 目标 )将是窗口旁边的字符。 每个序列将由窗口中包含的字符组成。 为了形成一个序列,将窗口一次向右切一个字符。 要预测的字符始终是跟随窗口的字符。 我们可以在图1中清楚地看到此过程。

    Image for post
    Figure 1. Sequences generation. In this example the window has a size of 4, meaning that it will contain 4 characters. The target is the first character next to the window | Image by the author
    图1.序列生成。 在此示例中,窗口的大小为4,这意味着它将包含4个字符。 目标是窗口旁边的第一个字符。 图片由作者提供

    Well, so far we have seen how to generate the character sequences in a simple way. Now we need to transform each character to its respective numerical format, for this we will use the dictionary generated in the preprocessing phase. This process can be visualized in Figure 2.

    好了,到目前为止,我们已经看到了如何以简单的方式生成字符序列。 现在我们需要将每个字符转换为其各自的数字格式,为此,我们将使用在预处理阶段生成的字典。 此过程可以在图2中看到。

    Image for post
    Figure 2. Transforming from chars to numerical format | Image by the author
    图2.从字符转换为数字格式 图片由作者提供

    Great, now we know how to generate the character sequences using a window that slides one character at a time and how we transform the characters into a numeric format, the following code snippet shows the process described.

    太好了,现在我们知道如何使用可一次滑动一个字符的窗口生成字符序列,以及如何将字符转换为数字格式,以下代码片段显示了所描述的过程。

    Code snippet 3. Sequences generation
    代码段3.序列生成

    Fantastic, now we know how to preprocess raw text, how to transform it into a list of characters and how to generate sequences in a numeric format. Now we go to the most interesting part, the model architecture.

    太神奇了,现在我们知道如何预处理原始文本,如何将其转换为字符列表以及如何以数字格式生成序列。 现在我们来看最有趣的部分,即模型架构。

    模型架构 (Model architecture)

    As you already read in the title of this blog, we are going to make use of Bi-LSTM recurrent neural networks and standard LSTMs. Essentially, we make use of this type of neural network due to its great potential when working with sequential data, such as the case of text-type data. Likewise, there are a large number of articles that refer to the use of architectures based on recurrent neural networks (e.g. RNN, LSTM, GRU, Bi-LSTM, etc.) for text modeling, specifically for text generation [1, 2].

    正如您已经在该博客的标题中阅读的那样,我们将使用Bi-LSTM递归神经网络和标准LSTM 。 本质上,由于这种类型的神经网络在处理顺序数据(例如文本类型数据)时具有巨大的潜力,因此我们会使用这种类型的神经网络。 同样,有很多文章引用了基于递归神经网络(例如RNN, LSTMGRUBi-LSTM等)的架构进行文本建模,尤其是用于文本生成[1,2]。

    The architecture of the proposed neural network consists of an embedding layer followed by a Bi-LSTM as well as a LSTM layer. Right after, the latter LSTM is connected to a linear layer.

    所提出的神经网络的体系结构由嵌入层, Bi-LSTM以及LSTM层组成。 之后,将后者的LSTM连接到线性层

    方法 (Methodology)

    The methodology consists of passing each sequence of characters to the embedding layer, this to generate a representation in the form of a vector for each element that makes up the sequence, therefore we would be forming a sequence of embedded characters. Subsequently, each element of the sequence of embedded characters will be passed to the Bi-LSTM layer. Subsequently, a concatenation of each output of the LSTMs that make up the Bi-LSTM (the forward LSTM and the backward LSTM) will be generated. Right after, each forward + backward concatenated vector will be passed to the LSTM layer from which the last hidden state will be taken to feed the linear layer. This last linear layer will have as activation function a Softmax function in order to represent the probability of each character. Figure 3 show the described methodology.

    该方法包括将每个字符序列传递给嵌入层,从而为构成该序列的每个元素生成矢量形式的表示形式,因此我们将形成一个嵌入字符序列 。 随后, 嵌入字符序列中的每个元素都将传递到Bi-LSTM 。 随后,将生成组成Bi-LSTM的LSTM的每个输出( 正向LSTM反向LSTM )的串联。 之后,每个正向和反向连接的向量将传递到LSTM 层,从该将获取最后的隐藏状态以馈送线性层 。 最后的线性层将具有Softmax函数作为激活函数,以便表示每个字符的概率。 图3显示了所描述的方法。

    Image for post
    Figure 3 . BiLSTM-LSTM model. In this image the word “bear” is passed through the BiLSTM-LSTM model for text generation | Image by the author
    图3。 BiLSTM-LSTM模型。 在此图像中,单词“ bear”通过BiLSTM-LSTM模型传递以生成文本| 图片由作者提供

    Fantastic, so far we have already explained the architecture of the model for text generation as well as the implemented methodology. Now we need to know how to do all this with the PyTorch framework, but first, I would like to briefly explain how the Bi-LSTM and the LSTM work together to later see how we would do it in code, so let’s see how a Bi-LSTM network works.

    太棒了,到目前为止,我们已经解释了文本生成模型的体系结构以及实现的方法。 现在我们需要知道如何使用PyTorch框架来完成所有这些工作,但是首先,我想简要地解释一下Bi-LSTMLSTM如何一起工作,以便以后在代码中看到我们将如何做,所以让我们看看Bi-LSTM 网络有效。

    Bi-LSTM和LSTM (Bi-LSTM & LSTM)

    The key difference between a standard LSTM and a Bi-LSTM is that the Bi-LSTM is made up of 2 LSTMs, better known as “forward LSTM” and “backward LSTM”. Basically, the forward LSTM receives the sequence in the original order, while the backward LSTM receives the sequence in reverse. Subsequently and depending on what is intended to be done, each hidden state for each time step of both LSTMs can be joined or only the last states of both LSTMs will be operated. In the proposed model, we suggest joining both hidden states for each time step.

    标准LSTMBi-LSTM之间的主要区别在于Bi-LSTM 2 个LSTM 组成 ,通常称为“ 正向 LSTM ”和“ 反向 LSTM ”。 基本上, 前向 LSTM按原始顺序接收序列,而后 LSTM 接收相反的顺序。 随后,根据要执行的操作,两个LSTM每个时间步的每个隐藏状态都可以合并,或者仅两个LSTM最后一个状态都将被操作。 在提出的模型中,我们建议为每个时间步加入两个隐藏状态

    Perfect, now we understand the key difference between a Bi-LSTM and an LSTM. Going back to the example we are developing, Figure 4 represents the evolution of each sequence of characters when they are passed through the model.

    完美,现在我们了解了Bi-LSTMLSTM之间的关键区别。 回到我们正在开发的示例,图4表示每个字符序列在通过模型时的演变。

    Image for post
    Figure 4. BiLSTM-LSTM model. A simple example showing the evolution of each character when passed through the model | Image by the author
    图4. BiLSTM-LSTM模型。 一个简单的示例,显示通过模型时每个角色的演变| 图片由作者提供

    Great, once everything about the interaction between Bi-LSTM and LSTM is clear, let’s see how we do this in code using only LSTMCells from the great PyTorch framework.

    太好了,一旦有关Bi-LSTMLSTM之间的交互的所有事情都清楚了,让我们看看我们如何仅使用来自PyTorch框架的LSTMCells在代码中执行此操作。

    So, first let’s understand how we make the constructor of the TextGenerator class, let’s take a look at the following code snippet:

    因此,首先让我们了解如何制作TextGenerator类的构造函数,让我们看一下以下代码片段:

    Code snippet 4. Constructor of text generator class
    代码段4.文本生成器类的构造函数

    As we can see, from lines 6 to 10 we define the parameters that we will use to initialize each layer of the neural network. It is important to mention that input_size is equal to the size of the vocabulary (that is, the number of elements that our dictionary generated in the preprocessing contains). Likewise, the number of classes to be predicted is also the same size as the vocabulary and sequence_length refers to the size of the window.

    如我们所见,从第6行到第10行,我们定义了用于初始化神经网络每一层的参数。 重要的是要提到input_size等于词汇表大小 (即预处理中生成的字典中包含的元素数)。 同样,要预测的类数量也与词汇表相同,并且sequence_length指的是窗口的大小。

    On the other hand, in lines 20 and 21 we are defining the two LSTMCells that make up the Bi-LSTM (forward and backward). In line 24 we define the LSTMCell that will be fed with the output of the Bi-LSTM. It is important to mention that the hidden state size is double compared to the Bi-LSTM, this is because the output of the Bi-LSTM is concatenated. Later on line 27 we define the linear layer, which will be filtered later by the softmax function.

    另一方面,在第20行和第21行中,我们定义了两个构成 Bi-LSTM的 LSTMCell ( 正向反向 )。 在第24行中,我们定义了将与Bi-LSTM的输出一起馈入的LSTMCell 。 值得一提的是, 隐藏状态的大小是Bi-LSTM的两倍,这是因为Bi-LSTM的输出是串联的。 在第27行的后面,我们定义了线性层 ,稍后将通过softmax函数对其进行过滤。

    Once the constructor is defined, we need to create the tensors that will contain the cell state (cs) and hidden state (hs) for each LSTM. So, we proceed to do it as follows:

    定义构造函数后,我们需要创建张量,其中将包含每个LSTM单元状态 ( cs )和隐藏状态 ( hs )。 因此,我们继续执行以下操作:

    Code snippet 5. Weights initialization
    代码段5.权重初始化

    Fantastic, once the tensors that will contain the hidden state and cell state have been defined, it is time to show how the assembly of the entire architecture is done, let’s go for it!

    太棒了,一旦定义了包含隐藏状态单元状态的张量,就该展示整个架构的组装方式了,那就开始吧!

    First, let’s take a look at the following code snippet:

    首先,让我们看一下以下代码片段:

    Code snippet 6. BiLSTM + LSTM + Linear layer
    代码段6. BiLSTM + LSTM +线性层

    For a better understanding, we are going to explain the assembly with some defined values, in such a way that we can understand how each tensor is passed from one layer to another. So say we have:

    为了更好地理解,我们将以一些定义的值来解释装配,这样我们可以理解每个张量如何从一层传递到另一层。 所以说我们有:

    batch_size = 64
    hidden_size = 128
    sequence_len = 100
    num_classes = 27

    so the x input tensor will have a shape:

    因此x输入张量将具有以下形状:

    # torch.Size([batch_size, sequence_len])
    x : torch.Size([64, 100])

    then, in line 2 is passed the x tensor through the embedding layer, so the output would have a size:

    然后,在第2行中将x张量通过嵌入层,因此输出将具有以下大小:

    # torch.Size([batch_size, sequence_len, hidden_size])
    x_embedded : torch.Size([64, 100, 128])

    It is important to notice that in line 5 we are reshaping the x_embedded tensor. This is because we need to have the sequence length as the first dimension, essentially because in the Bi-LSTM we will iterate over each sequence, so the reshaped tensor will have a shape:

    请注意,在第5行中,我们正在重塑 x_embedded张量。 这是因为我们需要将序列长度作为第一维,主要是因为在Bi-LSTM中,我们将遍历每个序列,因此重塑后的张量将具有以下形状:

    # torch.Size([sequence_len, batch_size, hidden_size])
    x_embedded_reshaped : torch.Size([100, 64, 128])

    Right after, in lines 7 and 8 the forward and backward lists are defined. There we will store the hidden states of the Bi-LSTM.

    之后,在第7和8行中,定义了向前向后的列表。 在那里,我们将存储Bi-LSTM隐藏状态

    So it’s time to feed the Bi-LSTM. First, in line 12 we are iterating over forward LSTM, we are also saving the hidden states of each time step (hs_forward). In line 19 we are iterating over the backward LSTM, at the same time we are saving the hidden states of each time step (hs_backward). You can notice that the loop is done in the same sequence, the difference is that it’s read in reversed form. Each hidden state will have the following shape:

    因此,现在该喂Bi-LSTM了 。 首先,在第12行中,我们遍历正向LSTM ,还保存了每个时间步隐藏状态 ( hs_forward )。 在第19行中,我们遍历向后 LSTM ,同时,我们保存每个时间步隐藏状态 ( hs_backward )。 您可能会注意到,循环以相同的顺序完成,不同之处在于它是以相反的形式读取的。 每个隐藏状态将具有以下形状:

    # hs_forward : torch.Size([batch_size, hidden_size])
    hs_forward : torch.Size([64, 128])# hs_backward : torch.Size([batch_size, hidden_size])
    hs_backward: torch.Size([64, 128])

    Great, now let’s see how to feed the latest LSTM layer. For this, we make use of the forward and backward lists. In line 26 we are iterating through each hidden state corresponding to forward and backward which are concatenated in line 27. It is important to note that by concatenating both hidden states, the dimension of the tensor will increase 2X, that is, the tensor will have the following shape:

    太好了,现在让我们看看如何添加最新的LSTM层 。 为此,我们使用前向后向列表。 在第26行中,我们遍历与在第27行中串联的 前向后向相对应的每个隐藏状态 。重要的是要注意,通过将这两个隐藏状态 串联在一起 ,张量的尺寸将增加2倍,也就是说,张量将具有以下形状:

    # input_tesor : torch.Size([bathc_size, hidden_size * 2])
    input_tensor : torch.Size([64, 256])

    Finally, the LSTM will return a hidden state of size:

    最后,LSTM将返回size的隐藏状态

    # last_hidden_state: torch.Size([batch_size, num_classes])
    last_hidden_state: torch.Size([64, 27])

    At the very end, the last hidden state of the LSTM will be passed through a linear layer, as shown on line 31. So, the complete forward function is shown in the following code snippet:

    最后LSTM最后一个隐藏状态将通过inear层传递,如第31行所示。因此,下面的代码片段显示了完整的forward函数:

    Code snippet 7. Forward function
    代码段7.转发功能

    Congratulations! Up to this point we already know how to assemble the neural networks using LSTMCell in PyTorch. Now it’s time to see how we do the training phase, so let’s move on to the next section.

    恭喜你! 至此,我们已经知道如何在PyTorch中使用LSTMCell组装神经网络。 现在是时候看看我们如何进行培训了,接下来让我们继续下一节。

    训练阶段 (Training phase)

    Great, we’ve come to training. To perform the training we need to initialize the model and the optimizer, later we need to iterate for each epoch and for each mini-batch, so let’s do it!

    太好了,我们来训练了 。 为了执行训练,我们需要初始化模型优化器 ,稍后我们需要针对每个时期和每个迷你批处理进行迭代,让我们开始吧!

    Code snippet 8. Training phase
    代码段8.培训阶段

    Once the model is trained, we will need to save the weights of the neural network to later use them to generate text. For this we have two options, the first is to define a fixed number of epochs and then save the weights, the second is to determine a stop function to obtain the best version of the model. In this particular case, we are going to opt for the first option. After training the model under a certain number of epochs, we save the weights as follows:

    训练完模型后,我们将需要保存神经网络的权重 ,以便以后使用它们生成文本 。 为此,我们有两个选择,第一个是定义固定数量的纪元 ,然后保存权重,第二个是确定停止函数以获得模型的最佳版本。 在这种情况下,我们将选择第一个选项。 在一定时期内训练模型后,我们将权重保存如下:

    Code snippet 9. Save weights
    代码段9.节省重量

    Perfect, up to this point we have already seen how to train the text generator and how to save the weights, now we are going to the top part of this blog, the text generation! So let’s go to the next section.

    完美,到目前为止,我们已经了解了如何训练文本生成器以及如何节省权重 ,现在我们将转到本博客的顶部,即文本生成! 因此,让我们进入下一部分。

    文字产生 (Text generation)

    Fantastic, we have reached the final part of the blog, the text generation. For this, we need to do two things: the first is to load the trained weights and the second is to take a random sample from the set of sequences as the pattern to start generating the next character. So let’s take a look at the following code snippet:

    太棒了,我们已经到达了博客的最后一部分, 即文本生成 。 为此,我们需要做两件事:第一是加载训练后的权重 ,第二是从序列集中获取随机样本作为模式,以开始生成下一个字符。 因此,让我们看一下以下代码片段:

    Code snippet 10. Text generator
    代码段10。文本生成器

    So, by training the model under the following characteristics:

    因此,通过根据以下特征训练模型:

    window : 100
    epochs : 50
    hidden_dim : 128
    batch_size : 128
    learning_rate : 0.001

    we can generate the following:

    我们可以生成以下内容:

    Seed:one of the prairie swellswhich gave a little wider view than most of them jack saw quite close to thePrediction:one of the prairie swellswhich gave a little wider view than most of them jack saw quite close to the wnd banngessejang boffff we outheaedd we band r hes tller a reacarof t t alethe ngothered uhe th wengaco ack fof ace ca  e s alee bin  cacotee tharss th band fofoutod we we ins sange trre anca y w farer we sewigalfetwher d e  we n s shed pack wngaingh tthe we the we javes t supun f the har man bllle s ng ou   y anghe ond we nd ba a  she t t anthendwe wn me anom ly tceaig t i isesw arawns t d ks wao thalac tharr jad  d anongive where the awe w we he is ma mie cack seat sesant sns t imes hethof riges we he d ooushe he hang out f t thu inong bll llveco we see s the he haa is s igg merin ishe d t san wack owhe o or th we sbe se we we inange t ts wan br seyomanthe harntho thengn  th me ny we ke in acor offff  of wan  s arghe we t angorro the wand be thing a sth t tha alelllll willllsse of s wed w brstougof bage orore he anthesww were ofawe ce qur the he sbaing tthe bytondece nd t llllifsffo acke o t in ir me hedlff scewant pi t bri pi owasem the awh thorathas th we hed ofainginictoplid we me

    As we can see, the generated text may not make any sense, however there are some words and phrases that seem to form an idea, for example:

    我们可以看到,生成的文本可能没有任何意义,但是有些单词和短语似乎构成了一个想法,例如:

    we, band, pack, the, man, where, he, hang, out, be, thing, me, were

    Congratulations, we have reached the end of the blog!

    恭喜,我们已经结束了博客!

    结论 (Conclusion)

    Throughout this blog we have shown how to make an end-to-end model for text generation using PyTorch’s LSTMCell and implementing an architecture based on recurring neural networks LSTM and Bi-LSTM.

    在整个博客中,我们展示了如何使用PyTorch的LSTMCell创建端到端的文本生成模型,以及如何实现基于递归神经网络LSTMBi-LSTM的体系结构

    It is important to comment that the suggested model for text generation can be improved in different ways. Some suggested ideas would be to increase the size of the text corpus to be trained, increase the number of epochs as well as the memory size for each LSTM. On the other hand, we could think of an interesting architecture based on Convolutional-LSTM (maybe a topic for another blog).

    重要的是要评论可以以不同方式改进建议的文本生成模型。 一些建议的想法将是增加要训练的文本语料库的大小增加每个LSTM 的时期数以及内存大小 另一方面,我们可以想到一个基于卷积LSTM (也许是另一个博客的主题)的有趣架构。

    翻译自: https://towardsdatascience.com/text-generation-with-bi-lstm-in-pytorch-5fda6e7cc22c

    pytorch使用lstm

    展开全文
  • lstm-cnn- pytorch版学习
  • 本项目为中文分词任务baseline的代码实现,模型包括 BiLSTM-CRF 基于BERT的+ X(softmax / CRF / BiLSTM + CRF) 罗伯塔+ X(softmax / CRF / BiLSTM + CRF) 本项目是的项目。 数据集 数据集第二届中文分词任务中...
  • Pytorch LSTM 代码解读及自定义双向 LSTM 算子 1. 理论 关于 LSTM 的理论部分可以参考 Paper Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition ...
  • 文章目录RNN训练难题梯度爆炸梯度弥散LSTM遗忘门 RNN训练难题 RNN的梯度推导公式: 累乘会导致的梯度爆炸或梯度弥散。 梯度爆炸 现象:比如loss从0.25、0.24突然变的很大,比如1.7、2.3。 解决方案:对梯度做...
  • pytorch实现LSTM(附code)

    千次阅读 热门讨论 2021-01-23 13:32:59
    注:LSTM的原理就不多讲了,网上一大堆,不懂的自己去百度,本文主要侧重代码实现。 一、数据集介绍 本数据集是NASA PCoE研究中心公布的IGBT加速老化数据集。数据集含有四种实验条件下的IGBT加速老化数据,以下是...
  • LSTM实现股票预测--pytorch版本【120+行代码

    万次阅读 多人点赞 2018-12-20 20:45:42
    网上看到有人用Tensorflow写了的但是没看到有用pytorch写的。 所以我就写了一份。写的过程中没有参照任何TensorFlow版本的(因为我对TensorFlow目前理解有限),所以写得比较简单,看来来似乎也比较容易实现(欢迎...
  • BI-LSTM-CRF模型的PyTorch实现。 特征: 与相比,执行了以下改进: 全面支持小批量计算 完全矢量化的实现。 特别是,删除了“得分句”算法中的所有循环,从而极大地提高了训练效果 支持CUDA 用于非常简单的API ...
  • Pytorch实现LSTM案例学习(1)

    万次阅读 多人点赞 2020-06-25 21:26:11
    通过inverse_transform实现 actual_predictions = scaler.inverse_transform(np.array(test_inputs[train_window:]).reshape(-1, 1)) print(actual_predictions) """ 根据实际值,绘制预测值 """ x = np.arange(132,...
  • Pytorch 给出的命名体识别(NER)的小例子,简洁清晰,深入原理及实现细节,比较适合想深入学习又没有好的入门途径的同学。不过他过于简洁,一些理论背景没有介绍,对于咱们这些,在门口晃悠的渣渣掌握起来还是有...
  • BiLSTMPyTorch应用

    千次阅读 2020-07-02 09:11:58
    本文介绍一下如何使用BiLSTM(基于PyTorch)解决一个实际问题,实现给定一个长句子预测下一个单词 如果不了解LSTM的同学请先看我的这两篇文章LSTMPyTorch中的LSTM。下面直接开始代码讲解 导库 ''' code by Tae ...
  • [深度学习] PyTorch 实现双向LSTM 情感分析

    万次阅读 多人点赞 2019-07-05 19:10:37
    在这里我们使用word2vec方式来实现,而且特别神奇的是,我们只需要加入嵌入层即可,网络会自主学习嵌入矩阵 参考下图 通过embedding 层, 新的单词表示传入 LSTM cells。这将是一个递归链接网络,所以单词的序列信息...
  • Pytorch实现LSTM模型结构

    千次阅读 多人点赞 2021-03-27 19:25:40
    LSTM模型结构1、LSTM模型结构2、LSTM网络3、LSTM的输入结构4、Pytorch中的LSTM4.1、pytorch中定义的LSTM模型4.2、喂给LSTM的数据格式4.3、LSTM的output格式5、LSTM和其他网络组合 1、LSTM模型结构 BP网络和CNN网络...
  • PyTorch 1.0 Requirements.txt中指定的其他python库 如何使用 步骤1.设置python虚拟环境 $ virtualenv .env $ source .env/bin/activate (.env) $ pip install --upgrade pip (.env) $ pip install -r ...
  • convolutional LSTM(convLSTM)的pytorch版本代码实现

    万次阅读 多人点赞 2019-07-30 10:44:43
    convolutional LSTM(convLSTM)是《Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting》一文提出的,用于降水预测。这一网络结构,既考虑了输入之间的空间关联,也考虑了时序...
  • Pytorch LSTM实现

    2019-12-26 19:48:45
    pytorch训练LSTM模型的代码疑问

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 6,031
精华内容 2,412
关键字:

lstm代码实现pytorch