精华内容
下载资源
问答
  • 循环递归神经网络

    2018-02-28 18:58:19
    1、在自然语言处理过程中,神经网络中输入的语言中的每个单词都是以向量的形式送入的,那个该怎样将语言转化为向量形式呢? 一般采用1-of-N编码方式处理,处理过程如下: 具体原理参考笔记: ...

    1、在自然语言处理过程中,神经网络中输入的语言中的每个单词都是以向量的形式送入的,那个该怎样将语言转化为向量形式呢?

    一般采用1-of-N编码方式处理,处理过程如下:

    具体原理参考笔记:

    http://blog.csdn.net/chloezhao/article/details/53484471

    2、Long Short-term Memory(LSTM)结构框架如下图所示:

    由图可知:LSTM共有三个门和一个内存单元,顾名思义,是门就有开和关两种状态,所以三个门都有各自的信号控制部分分别控制三个门的状态,像一般的神经网络节点只有一个输入和输出,而对于LSTM来说,该网络有四个输入(一个网络输入和三个门控信号)和一个输出。

    其运行过程如下图所示:

    这里的激活函数一般是sigmoid函数,sigmoid函数的输出为0~1之间的值。注意这里遗忘门的输入是相乘,而输出是相加。

    LSTM实际运行例子如下:

    输入向量为x,输出为y

    其中x1为网络需要处理的数据,定义x2、x3的功能分别如下:

    注意这里遗忘门和输入门都是由x2控制,故x2可以有三-1、0、1种状态,x3是输出门控制信号,实际结果处理如下:

    第一排数据是内存中的数据改变值过程。

    具体处理过程的方式如下,只例举了其中部分

    由图中注意到,每一个输入的节点都有相同的输入,而每个节点不同的效果则由各个输入的权重值决定,实际的输入除了x1、x2、x3之外,还有一个1,这个输入1的作用后面再解释,下面就从权重这个角度解释各个门的作用效果,LSTM单元输入模块中,只有x1的权重不为0,而其他的权重均为0,所以所以给模块输入只有x1有作用。同理对于输入门和输出门而言,只有x2、1有作用,输出门只有x3、1有作用。

    由以上分析可知,1只有在输入、输出和遗忘门中才有作用,而在遗忘门中和另外两个门的作用效果又不相同,1的实际作用需要从激活函数说起,一般来说LSTM的门节点的激活函数均为sigmoid函数,输出范围为0~1之间,而实际输出则为0和1两个状态,所以在sigmoid输出后会进行归一化处理,以0.5作为阈值,大于0.5则输出1,否则为0。因为输入门和输出门1的权重相同,所以作用效果也一样,只要分析其中一个即可,对于输入门而言x2其作用,而x2有三种不同的取值,分别为-1、0、1;当取值为1时,激活函数输入值为100-10=90,sigmoid函数输出值大于0.5,标准化后输出为1,即输入门开,当为-1时,sigmoid输入值为-110,此时sigmoid输出小于0.5,标准化后为0,输入门关闭。当为0时,1的作用体现出来了,如果没有1,则sigmoid函数输入值为0输出值为0.5,此时就无法将其标准化为0或者1了,如果有一个1,则sigmoid函数输入值为-10,输出值小于0.5,可将其标准化为0。输出门的作用也是同样的道理。对于遗忘门而言,x2为-1才有作用,复位内存,即sigmoid函数输出为0,当x2为-1时,sigmoid输入值为-90,输出小于0.5标准化后为0,复位内存,同理当x2为1是,sigmoid输出标准化后输出为1,当x2为0是,如果没有1,则会出现输入门一样的情况,无法标准化为0或者1,有1后就可以标准化为1.

    数据流向过程如下图所示:

    刚开始内存中的值为0。

    送入第一列值后个节点输出结果如下:

    送入第二列值后节点输出结果如下:注意内存中的值为和上一次相加。

    第三列

    第四列,输出结果为7

    第五列

    至此,整个例子运行结束。

    以上就是LSTM网络结构原理图解

    3、LSTM的特点

    一般的神经网络如下:

    用LSTM代替后参数数量变为原来的四倍

    4、LSTM变体

    上图中向量c表示内存中原有的值,输入向量xt经过不同的权值处理后分别产生了四个z分别对应LSTM的四个输入。

    上图中,每个LSTM输入中都包含有上一个LSTM的内存值和输出值,再分别作用到四个输入端口,也可以等价的理解,将上一次LSTM的内存值和输出值分别作用到下一个LSTM的四个输入端口,此连接成为peephole连接。

    多层连接

    当前形式为标准形式

    LSTM主要学习的什么呢?

    学习的主要是内存单元值回送至下一次内存中的权重值,如图中w值。

    学习算法为:Back Propagation Through Time(BPTT)反向传播算法(算法原理稍后再讲)

    然而基于以上讨论的RNN在实际训练过程中并不总是可以得到理想的结果,代价函数有时会出现巨大的波动,如下图所示:

    主要是代价函数曲线误差面不是梯度消失就是梯度爆炸,如下图所示:

    主要原因就是学习参数w导致,原因如下:

    如果直接取w为1,即相当于将原始内存值直接加到新的内存中可以解决梯度消失问题,如下图所示:

    因为w始终为恒定值1,所以基于上述图解

    w=1时,输出就是输入值,而不会受到w的影响而改变,只要输入不为0,就不存在输出值y为0的情况,所以也就不会有梯度消失的情形。

    w=1时的LSTM就是GRU单元。

    展开全文
  • 卷积神经网络 递归神经网络 CNTK-递归神经网络 (CNTK - Recurrent Neural Network) Advertisements 广告 Previous Page 上一页 Next Page 下一页 Now, let us understand how to construct a ...
    循环神经网络 递归神经网络

    循环神经网络 递归神经网络

    CNTK-递归神经网络 (CNTK - Recurrent Neural Network)

    Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.

    现在,让我们了解如何在CNTK中构建递归神经网络(RNN)。

    介绍 (Introduction)

    We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.

    我们学习了如何使用神经网络对图像进行分类,这是深度学习中的标志性工作之一。 但是,神经网络擅长和研究大量的另一个领域是递归神经网络(RNN)。 在这里,我们将了解什么是RNN,以及在需要处理时间序列数据的场景中如何使用RNN。

    什么是递归神经网络? (What is Recurrent Neural Network?)

    Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −

    递归神经网络(RNN)可以定义为能够随时间进行推理的特殊类型的NN。 RNN主要用于需要处理随时间变化的值(即时间序列数据)的场景。 为了更好地理解它,让我们对常规神经网络和递归神经网络进行一下比较-

    • As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks.

      众所周知,在常规神经网络中,我们只能提供一个输入。 这将其限制为仅导致一个预测。 举个例子,我们可以使用常规的神经网络来翻译文本。

    • On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks.

      另一方面,在递归神经网络中,我们可以提供导致单个预测的一系列样本。 换句话说,使用RNN,我们可以基于输入序列来预测输出序列。 例如,在翻译任务中已经有许多成功的RNN实验。

    递归神经网络的用途 (Uses of Recurrent Neural Network)

    RNNs can be used in several ways. Some of them are as follows −

    RNN可以以多种方式使用。 其中一些如下-

    预测单个输出 (Predicting a single output)

    Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−

    在深入研究步骤之前,RNN如何基于序列预测单个输出,让我们看一下基本RNN的样子-

    Single Output

    As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.

    如上图所示,RNN包含到输入的回送连接,并且每当我们输入一个值序列时,它将作为时间步长处理序列中的每个元素。

    Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.

    此外,由于具有环回连接,RNN可以将生成的输出与序列中下一个元素的输入进行组合。 这样,RNN将在整个序列上建立一个可用于进行预测的内存。

    In order to make prediction with RNN, we can perform the following steps−

    为了使用RNN进行预测,我们可以执行以下步骤-

    • First, to create an initial hidden state, we need to feed the first element of the input sequence.

      首先,要创建初始隐藏状态,我们需要输入输入序列的第一个元素。

    • After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence.

      之后,要生成更新的隐藏状态,我们需要采用初始隐藏状态并将其与输入序列中的第二个元素组合。

    • At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence.

      最后,要生成最终的隐藏状态并预测RNN的输出,我们需要在输入序列中使用final元素。

    In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.

    这样,借助此环回连接,我们可以教导RNN识别随时间发生的模式。

    预测序列 (Predicting a sequence)

    The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −

    上面讨论的RNN的基本模型也可以扩展到其他用例。 例如,我们可以使用它来基于单个输入来预测值序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-

    • First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network.

      首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要将输入样本馈入神经网络。

    • After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample.

      之后,要生成更新的隐藏状态和输出序列中的第二个元素,我们需要将初始隐藏状态与相同的样本进行组合。

    • At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time.

      最后,要再更新一次隐藏状态并预测输出序列中的最后一个元素,我们需要再一次提供样本。

    预测序列 (Predicting sequences)

    As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −

    如我们所见,如何基于序列预测单个值以及如何基于单个值预测序列。 现在让我们看看如何预测序列的序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-

    • First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence.

      首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要获取输入序列中的第一个元素。

    • After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state.

      之后,要更新隐藏状态并预测输出序列中的第二个元素,我们需要采用初始隐藏状态。

    • At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.

      最后,要预测输出序列中的最后一个元素,我们需要获取更新的隐藏状态和输入序列中的最后一个元素。

    RNN的工作 (Working of RNN)

    To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.

    为了了解递归神经网络(RNN)的工作,我们需要首先了解网络中递归层的工作方式。 因此,首先让我们讨论e如何通过标准循环层来预测输出。

    使用标准RNN层预测输出 (Predicting output with standard RNN layer)

    As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −

    如前所述,RNN中的基本层与神经网络中的常规层完全不同。 在上一节中,我们还在图中演示了RNN的基本体系结构。 为了更新首次进入序列的隐藏状态,我们可以使用以下公式-

    Rnn Layer

    In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.

    在上式中,我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。

    Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −

    现在,对于下一步,将当前时间步的隐藏状态用作序列中下一时间步的初始隐藏状态。 这就是为什么要更新第二步的隐藏状态,我们可以重复在第一步中执行的计算,如下所示:

    First Step

    Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −

    接下来,我们可以按照以下顺序重复更新第三步和最后一步的隐藏状态的过程:

    Last Step

    And when we have processed all the above steps in the sequence, we can calculate the output as follows −

    当我们按顺序处理了所有上述步骤后,我们可以计算出如下输出:

    Calculate Output

    For the above formula, we have used a third set of weights and the hidden state from the final time step.

    对于上面的公式,我们使用了第三组权重和最后时间步骤中的隐藏状态。

    高级循环单元 (Advanced Recurrent Units)

    The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −

    基本循环层的主要问题是消失的梯度问题,因此,它不是很擅长学习长期相关性。 用简单的话来说,基本的循环层不能很好地处理长序列。 这就是为什么其他一些更适合于较长序列的循环图层类型的原因如下-

    长期记忆(LSTM) (Long-Short Term Memory (LSTM))

    Long-Short Term Memory (LSTM)

    Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −

    Hochreiter&Schmidhuber引入了长期短期记忆(LSTM)网络。 它解决了使基本的循环层能够长时间记住事物的问题。 LSTM的体系结构如上图所示。 如我们所见,它具有输入神经元,记忆细胞和输出神经元。 为了解决梯度消失的问题,长期短期存储网络使用显式存储单元(存储先前的值)和随后的门-

    • Forget gate− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.

      忘记门 -顾名思义,它告诉存储单元忘记先前的值。 存储单元存储这些值,直到门(即“忘记门”)告诉它忘记它们为止。

    • Input gate− As name implies, it adds new stuff to the cell.

      输入门 -顾名思义,它为单元添加了新内容。

    • Output gate− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.

      输出门 -顾名思义,输出门决定何时将矢量从单元传递到下一个隐藏状态。

    门控循环单元(GRU) (Gated Recurrent Units (GRUs))

    Gated Recurrent Units (GRUs)

    Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −

    梯度递归单位 (GRU)是LSTM网络的细微变化。 它的门少了一个,并且接线方式与LSTM略有不同。 上图显示了它的体系结构。 它具有输入神经元,门控存储单元和输出神经元。 门控循环单元网络具有以下两个门-

    • Update gate− It determines the following two things−

      更新门 -它确定以下两件事-

      • What amount of the information should be kept from the last state?

        上次状态应保留多少信息?

      • What amount of the information should be let in from the previous layer?

        上一层应提供多少信息?

    • Reset gate− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently.

      重置门 - 重置门的功能与LSTMs网络的忘记门非常相似。 唯一的区别是它的位置略有不同。

    In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.

    与长期短期存储网络相比,门控循环单元网络稍快且易于运行。

    创建RNN结构 (Creating RNN structure)

    Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−

    在开始对任何数据源的输出进行预测之前,我们需要首先构建RNN,并且构建RNN与在上一节中构建常规神经网络的过程完全相同。 以下是构建一个的代码-

    
    from cntk.losses import squared_error
    from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
    from cntk.learners import adam
    from cntk.logging import ProgressPrinter
    from cntk.train import TestConfig
    BATCH_SIZE = 14 * 10
    EPOCH_SIZE = 12434
    EPOCHS = 10
    
    

    多层放样 (Staking multiple layers)

    We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−

    我们还可以在CNTK中堆叠多个循环层。 例如,我们可以使用以下图层组合:

    
    from cntk import sequence, default_options, input_variable
    from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
    features = sequence.input_variable(1)
    with default_options(initial_state = 0.1):
       model = Sequential([
          Fold(LSTM(15)),
          Dense(1)
       ])(features)
    target = input_variable(1, dynamic_axes=model.dynamic_axes)
    
    

    As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −

    从上面的代码中可以看到,我们可以通过以下两种方式在CNTK中对RNN进行建模-

    • First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep.

      首先,如果只需要循环层的最终输出,则可以将折叠层与循环层结合使用,例如GRU,LSTM甚至RNNStep。

    • Second, as an alternative way, we can also use the Recurrence block.

      其次,作为一种替代方法,我们也可以使用Recurrence块。

    用时间序列数据训练RNN (Training RNN with time series data)

    Once we build the model, let’s see how we can train RNN in CNTK −

    构建模型后,让我们看看如何在CNTK中训练RNN-

    
    from cntk import Function
    @Function
    def criterion_factory(z, t):
       loss = squared_error(z, t)
       metric = squared_error(z, t)
       return loss, metric
    loss = criterion_factory(model, target)
    learner = adam(model.parameters, lr=0.005, momentum=0.9)
    
    

    Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.

    现在,要将数据加载到训练过程中,我们必须从一组CTF文件中反序列化序列。 以下代码具有create_datasource函数,该函数是用于创建训练和测试数据源的有用实用程序函数。

    
    target_stream = StreamDef(field='target', shape=1, is_sparse=False)
    features_stream = StreamDef(field='features', shape=1, is_sparse=False)
    deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
       datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
    return datasource
    train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
    test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.
    
    

    Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.

    现在,由于我们已经设置了数据源,模型和损失函数,因此可以开始训练过程。 就像我们在上一节中使用基本神经网络所做的那样,它非常相似。

    
    progress_writer = ProgressPrinter(0)
    test_config = TestConfig(test_datasource)
    input_map = {
       features: train_datasource.streams.features,
       target: train_datasource.streams.target
    }
    history = loss.train(
       train_datasource,
       epoch_size=EPOCH_SIZE,
       parameter_learners=[learner],
       model_inputs_to_streams=input_map,
       callbacks=[progress_writer, test_config],
       minibatch_size=BATCH_SIZE,
       max_epochs=EPOCHS
    )
    
    

    We will get the output similar as follows −

    我们将获得类似以下的输出-

    输出- (Output−)

    
    average  since  average  since  examples
    loss      last  metric  last
    ------------------------------------------------------
    Learning rate per minibatch: 0.005
    0.4      0.4    0.4      0.4      19
    0.4      0.4    0.4      0.4      59
    0.452    0.495  0.452    0.495   129
    […]
    
    

    验证模型 (Validating the model)

    Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.

    实际上,使用RNN进行预测与使用任何其他CNK模型进行预测非常相似。 唯一的区别是,我们需要提供序列而不是单个样本。

    Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −

    现在,由于我们的RNN终于经过训练完成,我们可以通过使用一些样本序列测试模型来验证模型,如下所示-

    
    import pickle
    with open('test_samples.pkl', 'rb') as test_file:
    test_samples = pickle.load(test_file)
    model(test_samples) * NORMALIZE
    
    

    输出- (Output−)

    
    array([[ 8081.7905],
    [16597.693 ],
    [13335.17 ],
    ...,
    [11275.804 ],
    [15621.697 ],
    [16875.555 ]], dtype=float32)
    
    

    翻译自: https://www.tutorialspoint.com/microsoft_cognitive_toolkit/microsoft_cognitive_toolkit_recurrent_neural_network.htm

    循环神经网络 递归神经网络

    展开全文
  • 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...

    循环神经网络 递归神经网络

    有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

    These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. Try it yourself! If you spot mistakes, please let us know!

    这些是FAU YouTube讲座“ 深度学习 ”的 讲义 这是演讲视频和匹配幻灯片的完整记录。 我们希望您喜欢这些视频。 当然,此成绩单是使用深度学习技术自动创建的,并且仅进行了较小的手动修改。 自己尝试! 如果发现错误,请告诉我们!

    导航 (Navigation)

    Previous Lecture / Watch this Video / Top Level / Next Lecture

    上一个讲座 / 观看此视频 / 顶级 / 下一个讲座

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Welcome back to the final part of our video series on recurrent neural networks! Today, we want to talk a bit about the sampling of recurrent neural networks. When I mean sampling, I mean that we want to use recurrent neural networks to actually generate sequences of symbols. So, how can we actually do that?

    欢迎回到我们关于递归神经网络的视频系列的最后一部分! 今天,我们想谈谈递归神经网络的采样。 当我指采样时,我的意思是我们想使用递归神经网络实际生成符号序列。 那么,我们实际上该如何做呢?

    Well, if you train your neural networks in the right way. You can actually create them in a way that they predict the probability distribution of the next element. So, if I train them to predict the next symbol in the sequence, you can also use them actually for generating sequences. The idea here is that you start with the empty symbol and then you use the RNN to generate some output. Then, you take this output and put it into the next state’s input. If you go ahead and do so, then you can see that you can actually generate whole sequences from your trained recurrent neural network.

    好吧,如果您以正确的方式训练您的神经网络。 实际上,您可以按照预测下一个元素的概率分布的方式来创建它们。 因此,如果我训练它们来预测序列中的下一个符号,您也可以实际使用它们来生成序列。 这里的想法是从空符号开始,然后使用RNN生成一些输出。 然后,获取此输出并将其放入下一个状态的输入。 如果继续这样做,那么您会发现实际上可以从训练有素的递归神经网络生成整个序列。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So, the simple strategy is to perform a greedy search. So here we start with the empty symbol. Then, we just pick the most likely element as the input to the RNN in the next state and generate the next one and the next one and the next one and this generates exactly one sample sequence per experiment. So, this would be a greedy search and you can see that we exactly get one sentence that is constructed here. The sentence that we are constructing here is “let’s go through time”. Well, the drawback is, of course, there is no look-ahead possible. So, let’s say the most likely word after “let’s go” is “let’s”. So you could be generating loops like “let’s go let’s go” and so on. So, you’re not able to detect that “let’s go through time” has a higher total probability. So, it tends to repeat sequences of frequent words “and”, “the”, “some” and so on in speech.

    因此,简单的策略是执行贪婪搜索。 因此,这里我们从空符号开始。 然后,我们只选择最可能的元素作为下一状态下RNN的输入,并生成下一个,下一个和下一个,并且每个实验恰好生成一个样本序列。 因此,这将是一个贪婪的搜索,您可以看到我们恰好得到一个在此处构造的句子。 我们在这里构建的句子是“让我们慢慢来”。 好吧,缺点是,当然,不可能超前。 因此,假设“放手”之后最可能的单词是“放手”。 因此,您可能会生成诸如“让我们开始吧”之类的循环。 因此,您无法检测到“让我们经历时间”的总概率更高。 因此,它倾向于在语音中重复频繁出现的单词“和”,“该”,“一些”等的序列。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Now, we are interested in alleviating this problem. This can be done with a beam search. Now, the beam search concept is to select the k most likely elements. k is essentially the beam width or size. So, here you then roll out k possible sequences. You have the one with these k elements as prefix and take the k most probable ones. So, in the example that we show here on the right-hand side, we start with the empty word. Then, we take the two most likely ones which would be “let’s” and “through”. Next, we generate “let’s” as output if we take “through”. If we take “let’s”, we generate “go” and we can continue this process and with our beam of the size of two. We can keep the two most likely sequences in the beam search. So now, we generate two sequences at a time. One is “let’s go through time” and the other one is “through let’s go time”. So, you see that we can use this beam idea to generate multiple sequences. In the end, we can determine which one we like best or which one generated the most total probability. So, we can generate multiple sequences in one go which typically then also contains better sequences than in the greedy search. I would say this is one of the most common techniques actually to sample from an RNN.

    现在,我们有兴趣减轻这个问题。 这可以通过光束搜索来完成。 现在,波束搜索的概念是选择k个最可能的元素。 k本质上是光束的宽度或大小。 因此,在这里您可以推出k个可能的序列。 您拥有一个以这k个元素为前缀的元素,并取k个最可能的元素。 因此,在右侧显示的示例中,我们从空字开始。 然后,我们采用两个最可能的方式:“让我们”和“通过”。 接下来,如果我们采用“通过”,我们将生成“让我们”作为输出。 如果我们选择“让我们”,我们将产生“开始”,并且我们可以继续进行此过程,并使用两个大小的光束。 我们可以在波束搜索中保留两个最可能的序列。 所以现在,我们一次生成两个序列。 一个是“让我们度过时光”,另一个是“让我们度过时光”。 因此,您看到我们可以使用该波束概念来生成多个序列。 最后,我们可以确定我们最喜欢哪一个,或者哪一个产生了最大的总概率。 因此,我们可以一口气生成多个序列,然后通常包含比贪婪搜索更好的序列。 我要说的是,这实际上是从RNN采样的最常用技术之一。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Of course, there are also other things like random sampling. Here, the idea is that you select the next one according to the output probability distribution. You remember, we encoded our word as one-hot-encoded vectors. Then, we can essentially interpret the output of the RNN as a probability distribution and sample from it. This then allows us to generate many different sequences. So let’s say if “let’s” has an output probability of 0.8, it is sampled 8 out of 10 times as the next word. This creates very diverse results and it may look too random. So, you see here we get quite diverse results and the sequences that we are generating here. There’s quite some randomness that you can also observe in the generated sequences. To reduce the randomness, you can increase the probability or decrease the probability of probable or less probable words. This can be done for example by temperature sampling. Here you see that we introduced this temperature 𝜏 that we then use in order to steer the probability sampling. This is a common technique that you have already seen in various instances in this class.

    当然,还有其他一些事情,例如随机抽样。 这里的想法是根据输出概率分布选择下一个。 您还记得,我们将单词编码为单热编码向量。 然后,我们可以从本质上将RNN的输出解释为概率分布并从中进行采样。 然后,这允许我们生成许多不同的序列。 因此,假设“让我们”的输出概率为0.8,则将其作为下一个单词从10次中采样8次。 这会产生非常多样的结果,并且看起来可能太随机了。 因此,您在这里看到我们得到了相当多样化的结果以及我们在这里生成的序列。 您还可以在生成的序列中观察到相当多的随机性。 为了减少随机性,您可以增加或减少概率词或概率词的概率。 这可以例如通过温度采样来完成。 在这里,您看到我们引入了此温度then,然后将其用于引导概率采样。 这是您在此类中的各种实例中已经看到的常见技术。

    Image for post
    Character-level RNNs. Image under 字符级RNNCC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So let’s look into some examples and one thing that I found very interesting is character-based language modeling with RNNs. There’s a great blog post by Andrew Kaparthy which we have here. I also put it as a link to the description below. There he essentially trained an RNN for text generation based on Shakespeare. It’s trained on the character level. So, you only have one character as input and then you generate the sequence. It generates very interesting sequences. So here, you can see typical examples that have been generated. Let me read this to you:

    因此,让我们看一些示例,我发现非常有趣的一件事是使用RNN进行基于字符的语言建模。 Andrew Kaparthy撰写了一篇很棒的博客文章 ,我们在这里。 我也把它作为下面描述的链接。 在那里,他基本上训练了一个基于莎士比亚的RNN来生成文本。 它是在角色级别上训练的。 因此,只有一个字符作为输入,然后生成序列。 它生成非常有趣的序列。 因此,在这里,您可以看到已生成的典型示例。 让我给你读:

    “Pandarus Alas I think he shall be come approached and the dayWhen little srain would be attain’d into being never fed, And who is but a chain and subjects of his death, I should not sleep.”

    “ Pandarus Alas,我认为他会来的,那一天永远不会被困住,几乎没有什么痛苦,而那只是他的链条和他的死亡对象,那一天,我不应该睡觉。”

    Except from Karparthy’s blog

    除了 Karparthy的博客

    and so on. So, you can see that this is very interesting that the type of language that is generated this very close to Shakespeare but if you read through these examples, you can see that they’re essentially complete nonsense. Still, it’s interesting that the tone of the language that is generated is still present and is very typical for Shakespeare. So, that’s really interesting.

    等等。 因此,您会发现生成的语言非常接近莎士比亚,这是非常有趣的,但是如果您仔细阅读这些示例,就会发现它们本质上是完全废话。 仍然有趣的是,所生成语言的语气仍然存在,对于莎士比亚来说是非常典型的。 所以,这真的很有趣。

    Image for post
    Composing folk music. Image under 创作民间音乐CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Of course, you can generate many, many other things. One of a very nice example that I want to show to you today is composing folk music. So, music composition is typically tackled with RNNS and you can find different examples in literature, also by Jürgen Schmidhuber. The idea here is to use bigger deeper networks to generate folk music. So, what they employ is a character level RNN using ABC format including generating the title. So one example that I have here is this small piece of music. Yeah, as you can hear, it is really folk music. So, this is completely automatically generated. Interesting isn’t it? If you listen very closely, then you can also hear that folk music may be particularly suited for this because you could argue it’s kind a bit of repetitive. Still, it’s pretty awesome that the entire song is completely automatically generated. There are actually people meeting playing computer-generated songs like these folks on real instruments. Very interesting observation. So, I also put the link here for your reference if you’re interested in this. You can listen to many more examples on this website.

    当然,您可以生成许多其他东西。 今天我想向大家展示的一个很好的例子就是创作民间音乐。 因此,音乐创作通常是通过RNNS解决的,您也可以在JürgenSchmidhuber的文学作品中找到不同的例子。 这里的想法是使用更大更深的网络来产生民间音乐。 因此,他们采用的是使用ABC格式的字符级RNN,包括生成标题。 所以我在这里举的一个例子就是这一小段音乐。 是的,正如您所听到的,这确实是民间音乐。 因此,这是完全自动生成的。 有趣吗? 如果您听得很仔细,那么您还会听到民间音乐可能特别适合此操作,因为您可能会认为它有点重复。 尽管如此,整个歌曲完全自动生成还是非常棒的。 实际上,有人会在真实乐器上演奏这些人那样的计算机生成的歌曲。 非常有趣的观察。 因此,如果您对此感兴趣,我也将链接放在此处供您参考。 您可以在此网站上收听更多示例。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So there are also RNNs for non-sequential tasks. RNNs can also be used for stationary inputs like image generation. Then, the idea is to model the process from rough sketch to final image. You can see one example here where we start essentially by drawing numbers from blurry to sharp. In this example, they use an additional attention mechanism telling the network where to look. This then generates something similar to brushstrokes. It actually uses a variational autoencoder which we will talk about when we talk on the topic of unsupervised deep learning.

    因此,也有用于非顺序任务的RNN。 RNN也可用于固定输入,例如图像生成。 然后,该想法是对从粗略草图到最终图像的过程进行建模。 您可以在此处看到一个示例,我们从本质上讲是从模糊到清晰绘制数字。 在此示例中,他们使用了一种附加的注意机制来告诉网络在哪里看。 然后生成类似于笔触的内容。 它实际上使用了一种变体自动编码器,当我们讨论无监督深度学习的主题时,我们将进行讨论。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So let’s summarize this a little bit. You’ve seen recurrent neural networks are able to directly model sequential algorithms. You train via truncated backpropagation through time. The simple units suffer extremely from the exploding and vanishing gradients. We have seen that the LSTMs and GRUs are improved RNNs that explicitly model this forgetting and remembering operation. What we haven’t talked about is that there are many, many more developments that we can’t cover in this short lecture. So, it would be interesting also to talk about memory networks, neural Turing machines, and what we only touched at the moment is attention and recurrent neural networks. We’ll talk a bit more about attention in one of the next videos as well.

    因此,让我们总结一下。 您已经看到递归神经网络能够直接对顺序算法建模。 您可以通过时间的截断反向传播进行训练。 简单单元遭受爆炸和消失梯度的极大折磨。 我们已经看到,LSTM和GRU是改进的RNN,可以显式地对此遗忘和记忆操作进行建模。 我们没有谈论的是,在这个简短的讲座中,我们还无法涵盖很多更多的发展。 因此,谈论存储网络,神经图灵机也很有趣,而我们目前仅涉及到注意力和循环神经网络。 在接下来的视频中 ,我们还将讨论更多有关注意力的内容。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So, next time in deep learning, we want to talk about visualization. In particular, we want to talk about visualizing architectures the training process, and of course also the inner workings of the network. We want to figure out what is actually happening inside the network and there are quite a few techniques — and to be honest — we’ve already seen some of them earlier in this class. In this lecture, we will really want to look into those methods and understand how they actually work in order to figure out what’s happening inside of deep neural networks. One interesting observation is that this is also related to neural network art. Another thing that deserves some little more thought is attention mechanisms and this will also be covered in one of the videos very soon to follow.

    因此,下次在深度学习中,我们想谈谈可视化。 特别是,我们想谈谈可视化体系结构的培训过程,当然还有网络的内部运作。 我们想弄清楚网络内部实际发生了什么,并且有很多技术-坦白地说-我们在本课程的前面已经看到了其中一些技术。 在本讲座中,我们将非常想研究这些方法并了解它们的实际工作方式,以便弄清楚深度神经网络内部正在发生的事情。 一个有趣的发现是,这也与神经网络技术有关。 注意机制是值得关注的另一件事,很快就会在其中一个视频中介绍。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So, I have some comprehensive questions: “What’s the strength of RNNs compared to feed-forward networks?” Then, of course: “How do you train an RNN?”, “What are the challenges?”, “What’s the main idea behind LSTMs?” So you should be able to describe the unrolling of RNNs during the training. You should be able to describe the Elman cell, the LSTM, and the GRU. So, these are really crucial things that you should know if you have to take some tests in the very close future. So, better be prepared for questions like this one. Ok, we have some further reading below. There’s this very nice blog post by Andrew Kaparthy. There is a very cool blog post about CNN’s for a machine translation that I really recommend reading and a cool blog post for music generation which you can also find below. Of course, we also have plenty of scientific references. So, I hope you enjoyed this video and see you in the next one. Bye-bye!

    因此,我有一些综合性的问题:“与前馈网络相比,RNN的优势是什么?” 然后,当然是:“您如何训练RNN?”,“挑战是什么?”,“ LSTM背后的主要思想是什么?” 因此,您应该能够描述训练期间RNN的展开。 您应该能够描述Elman单元,LSTM和GRU。 因此,如果您在不久的将来必须进行一些测试,那么这些都是至关重要的事情。 因此,最好为此类问题做好准备。 好的,我们在下面有进一步的阅读。 这是Andrew Kaparthy撰写的非常不错的博客文章 。 我真的建议阅读有关CNN的机器翻译的非常酷的博客文章,以及关于音乐生成的非常酷的博客文章 ,您也可以在下面找到。 当然,我们也有大量的科学参考资料。 因此,希望您喜欢这个视频,并在下一个视频中见到您。 再见!

    If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced. If you are interested in generating transcripts from video lectures try AutoBlog.

    如果你喜欢这篇文章,你可以找到这里更多的文章 ,更多的教育材料,机器学习在这里 ,或看看我们的深入 学习 讲座 。 如果您希望将来了解更多文章,视频和研究信息,也欢迎关注YouTubeTwitterFacebookLinkedIn 。 本文是根据知识共享4.0署名许可发布的 ,如果引用,可以重新打印和修改。 如果您对从视频讲座中生成成绩单感兴趣,请尝试使用AutoBlog

    RNN民间音乐 (RNN Folk Music)

    FolkRNN.orgMachineFolkSession.comThe Glass Herry Comment 14128

    FolkRNN.org MachineFolkSession.com 玻璃哈里评论14128

    链接 (Links)

    Character RNNsCNNs for Machine TranslationComposing Music with RNNs

    字符RNN CNN用于机器翻译 和RNN组合音乐

    翻译自: https://towardsdatascience.com/recurrent-neural-networks-part-5-885fc3357792

    循环神经网络 递归神经网络

    展开全文
  • 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...

    循环神经网络 递归神经网络

    有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

    These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!

    这些是FAU YouTube讲座“ 深度学习 ”的 讲义 这是演讲视频和匹配幻灯片的完整记录。 我们希望您喜欢这些视频。 当然,此成绩单是使用深度学习技术自动创建的,并且仅进行了较小的手动修改。 如果发现错误,请告诉我们!

    导航 (Navigation)

    Previous Lecture / Watch this Video / Top Level / Next Lecture

    上一个讲座 / 观看此视频 / 顶级 / 下一个讲座

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Welcome back to deep learning! Today we want to talk a little bit more about recurrent neural networks and in particular look into the training procedure. So how does our RNN training work? Let’s look at a simple example and we start with a character level language model. So, we want to learn a character probability distribution from an input text and our vocabulary is going to be very easy. It’s gonna be the letters h, e, l, and o. We’ll encode them as one-hot vectors which then gives us for example for h the vector (1 0 0 0)ᵀ. Now, we can go ahead and train our RNN on the sequence “hello” and we should learn that given “h” as the first input, the network should generate the sequence “hello”. Now, the network needs to know previous inputs when presented with an l because it needs to know whether it needs to generate an l or an o. It’s the same input but two different outputs. So, you have to know the context.

    欢迎回到深度学习! 今天,我们想谈论更多有关递归神经网络的知识,尤其是研究训练过程。 那么我们的RNN培训如何工作? 让我们看一个简单的例子,我们从字符级语言模型开始。 因此,我们想从输入文本中学习字符概率分布,并且我们的词汇将非常简单。 它是字母h,e,l和o。 我们将它们编码为一热向量,然后为我们提供例如向量(1 0 0 0)ᵀ。 现在,我们可以继续在序列“ hello”上训练我们的RNN,我们应该学习到给定“ h”作为第一个输入,网络应该生成序列“ hello”。 现在,网络需要在显示l时知道以前的输入,因为它需要知道是否需要生成l或o。 它是相同的输入,但有两个不同的输出。 因此,您必须了解上下文。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Let’s look at this example and here you can already see how the decoding takes place. So, we put in essentially on the input layer again as one-hot encoded vectors the inputs. Then, we produce the hidden state h subscript t with the matrices that we’ve seen previously and produce outputs and you can see. Now, we feed in the different letters and this then produces some outputs that can then be mapped via one-hot encoding back to letters. So, this gives us essentially the possibility to run over the entire sequence and produce the desired outputs. Now, for the training, the problem is how can we determine all of these weights? Of course, we want to maximize these weights with respect to predicting the correct component.

    让我们看一下这个例子,在这里您已经可以看到解码是如何进行的。 因此,我们实际上再次将输入作为一热编码矢量输入到输入层。 然后,我们用前面已经看到的矩阵生成隐藏状态h下标t并生成输出,您可以看到。 现在,我们输入不同的字母,然后产生一些输出,然后可以通过一键编码将其映射回字母。 因此,这实际上使我们有可能在整个序列上运行并产生所需的输出。 现在,对于培训而言,问题在于我们如何确定所有这些权重? 当然,我们要在预测正确分量方面最大化这些权重。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    This all can be achieved with the backpropagation through time algorithm. The idea is we train on the unfolded network. So here’s a short sketch on how to do this. The idea is that we unfold the network. So, we compute the forward path for the full sequence and then we can apply the loss. So, we essentially then backpropagate over the entire sequence such that even things that happen in the very last state can have an influence on the very beginning. So, we compute the backward pass through the full sequence to get the gradients and the weight update. So, for one update with backpropagation through time, I have to unroll this complete network that then is generated by the input sequence. Then, I can compare the output that was created with the desired output and compute the update.

    所有这些都可以通过时间反向传播算法来实现。 我们的想法是在不断扩展的网络上进行训练。 因此,这是一个简短的草图。 这个想法是我们展开网络。 因此,我们计算整个序列的前向路径,然后可以应用损耗。 因此,我们实质上是在整个序列上反向传播,这样,即使是在最后状态下发生的事情也可能对一开始产生影响。 因此,我们计算整个序列的反向传递,以获得梯度和权重更新。 因此,对于一次使用反向传播的更新,我必须展开由输入序列生成的完整网络。 然后,我可以将创建的输出与所需的输出进行比较,并计算更新。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So, let’s look at this in a little bit more detail. The forward pass is, of course, just the computation of the hidden states and the output. So, we know that we have some input sequence that is x subscript 1 to x subscript T, where T is the sequence length. Now, I just repeat update our u subscript t which is the linear part before the respective activation function. Then, we compute the activation function to get our new hidden state then we compute the o subscript t which is essentially the linear part before the sigmoid function. Then, we apply the sigmoid to produce the y hat that is essentially the output of our network.

    因此,让我们更详细地看一下。 当然,前向通过只是隐藏状态的计算和输出。 因此,我们知道我们有一些输入序列,即x下标1到x下标T,其中T是序列长度。 现在,我只是重复更新我们的u下标t,它是相应激活函数之前的线性部分。 然后,我们计算激活函数以获得新的隐藏状态,然后计算o下标t,它实际上是S型函数之前的线性部分。 然后,我们应用S形来产生y hat,这实际上是网络的输出。

    Image for post
    imgflip.imgflip

    If we do so, then we can unroll the entire network and produce all of the respective information that we need to then actually compute the update for the weights.

    如果这样做,那么我们可以展开整个网络并生成我们需要的所有相应信息,然后实际计算权重的更新。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Now the backpropagation through time then essentially produces a loss function. Now, the loss function is summing up essentially the losses that we already know from our previous lectures, but we sum it up over the actual observations at every time t. So, we can, for example, take cross-entropy, then we compare the predicted output with the ground truth and compute the gradient of the loss function in a similar way as we already know it. We want to get the parameter update for our parameter vector θ that is composed of those three matrices, the two bias vectors, and the vector h. So, the update of the parameters can then also be done using a learning rate in a very similar way as we have been doing this throughout the entire class. Now, the question is, of course, how do we get those derivatives and the idea is now to go back in time through the entire network.

    现在,随时间的反向传播基本上会产生损失函数。 现在,损失函数实质上是对我们先前的讲课中已经知道的损失进行总结,但是我们将其在每个时间t的实际观察值上进行总结。 因此,例如,我们可以采用交叉熵,然后将预测的输出与基本事实进行比较,并以类似于我们已经知道的方式计算损失函数的梯度。 我们想要获取由这三个矩阵,两个偏置向量和向量h组成的参数向量θ的参数更新。 因此,还可以使用学习率来完成参数的更新,这与我们在整个课堂上所做的非常相似。 现在,问题是,当然,我们如何获得这些衍生产品,现在的想法是追溯整个网络。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So what do we do? Well, we start at time t equals T and then iteratively compute the gradients for T up to 1. So just keep in mind that our y hat was produced by the sigma of o subscript t which is composed of those two matrices. So, if we want to compute the partial derivative with respect to o subscript t, then we need the derivative of the sigmoid functions of o subscript t times the partial derivative of the loss function with respect to y hat subscript t. Now, you can see that the gradient with respect to W subscript hy is going to be given as the gradient of o subscript t times h subscript t transpose. The gradient with respect to the bias is going to be given simply as the gradient of o subscript t. So, the gradient hsubscript t now depends on two elements: the hidden state that is influenced by o subscript t and the next hidden state hsubscript t+1. So, we can get the gradient of h subscript t as the partial derivative of h subscript t+1 with respect to hsubscript t transpose times the gradient of h subscript t+1. Then, we still have to add the partial derivative of o subscript t with respect to h subscript t transposed times the gradient of o subscript t. This can then be expressed as the weight matrix W subscript hh transpose times the tangens hyperbolicus derivative of W subscript hh times h subscript t plus Wsubscript xh times x subscript t+1 plus the bias h multiplied with the gradient of h subscript t+1 plus W subscript hy transposed times the gradient of o subscript t. So, you can see that we can also implement this gradient with respect to matrices. Now, you already have all the updates for the hidden state.

    那么我们该怎么办? 好吧,我们从时间t等于T开始,然后迭代地计算T的梯度直到1。因此,请记住,我们的y是由o下标t的总和构成的,而o下标t由这两个矩阵组成。 因此,如果要计算关于o下标t的偏导数,则需要o下标t的S形函数的导数乘以关于y hat下标t的损失函数的偏导数。 现在,您将看到相对于W下标hy的梯度将作为o下标t乘以h下标t转置的梯度给出。 相对于偏差的梯度将简单地以o下标t的梯度给出。 因此,梯度h下标t现在取决于两个元素:受o下标t影响的隐藏状态和下一个隐藏状态h下标t + 1。 因此,我们可以得到h下标t的梯度作为h下标t + 1的偏导数相对于h下标t转置乘以h下标t + 1的梯度的偏导数。 然后,我们仍然必须将o下标t的偏导数相对于h下标t的转置乘以o下标t的梯度。 这可以被表示为权重矩阵W标HH转置倍tangens hyperbolicus衍生物W¯¯标HH的倍ħ下标t加W¯¯标XH乘以x下标t + 1加上偏压ħ乘以h的下标t + 1中的梯度加W下标hy转置乘以o下标t的梯度。 因此,您可以看到我们也可以针对矩阵实现此梯度。 现在,您已经具有隐藏状态的所有更新。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Now, we also want to compute the updates for the other weight matrices. So, let’s see how this is possible. We now have established essentially the way of computing the derivative with respect to our h subscript t. So, now we can already propagate through time. So for each t, we essentially get one element in the sum and because we can compute the gradient h subscript t, we can now get the remaining gradients. In order to compute h subscript t, you see that we need the tanh of u subscript t which then contains the remaining weight matrices. So we essentially get the derivative respect to the two missing matrices and the bias. By using the gradient h subscript t times the tangens hyperbolicus derivative of u subscript t. Then, depending on which matrix you want to update, it’s gonna be h subscript t-1 transpose, or x subscript t transpose. For the bias, you don’t need to multiply with anything extra. So, these are essentially the ingredients that you need in order to compute the remaining updates. What we see now is that we can compute the gradients, but they are dependent on t. Now, the question is how do we get the gradient for the sequence. What we see is that the network that emerges in the unrolled state is essentially a network of shared weights. This means that we can update simply by the sum over all time steps. So this then allows us to compute essentially all the updates for the weights and every time t. Then, the final gradient update is gonna be the sum of all those gradient steps. Ok, so we’ve seen how to compute all these steps and yes: It’s maybe five lines of pseudocode, right?

    现在,我们还想计算其他权重矩阵的更新。 那么,让我们看看这是怎么可能的。 现在,我们基本上建立了关于我们的h下标t的导数计算方法。 因此,现在我们已经可以随着时间传播。 因此,对于每个t,我们基本上得到一个和,并且因为我们可以计算梯度h下标t,所以现在可以得到其余的梯度。 为了计算h下标t,您看到我们需要u下标t的tanh,然后包含剩余的权重矩阵。 因此,我们从本质上得到了两个缺失矩阵和偏差的导数方面。 通过使用梯度h下标t乘以u下标t的tangens双曲导数。 然后,根据要更新的矩阵,将是h下标t-1转置或x下标t转置。 对于偏见,您不需要乘以其他任何东西。 因此,这些实质上是您计算剩余更新所需的要素。 现在看到的是,我们可以计算梯度,但是它们取决于t。 现在,问题是如何获得序列的梯度。 我们看到的是,处于展开状态的网络实质上是共享权重的网络。 这意味着我们可以简单地按所有时间步长的总和进行更新。 因此,这使我们能够计算出权重和每次t的所有更新。 然后,最终的梯度更新将是所有这些梯度步长的总和。 好的,我们已经了解了如何计算所有这些步骤,是的:可能是五行伪代码,对吗?

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Well, there are some problems with normal backpropagation through time. You need to unroll the entire sequence and for long sequences and complex networks, this can mean a lot of memory consumption. A single parameter update is very expensive. So, you could do a splitting approach like the naive approach that we’re suggesting here, but if you would just split the sequence into batches and then start again initializing the hidden state, then you can probably train but you lose dependencies over long periods of time. In this example, the first input can never be connected to the last output here. So, we need a better idea of how to proceed and save memory and, of course, there’s an approach to do so. This is called the truncated backpropagation through time algorithm.

    好吧,正常的反向传播会存在一些问题。 您需要展开整个序列,对于较长的序列和复杂的网络,这可能意味着大量的内存消耗。 单个参数更新非常昂贵。 因此,您可以像我们在此建议的天真的方法那样进行拆分,但是如果您只是将序列拆分为批次,然后再次开始初始化隐藏状态,那么您可能可以进行训练,但是长期失去依赖时间。 在此示例中,第一个输入永远无法连接到此处的最后一个输出。 因此,我们需要更好地了解如何继续进行并节省内存,当然,有一种方法可以做到这一点。 这称为通过时间的截短反向传播算法。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    Now, the truncated backpropagation through time algorithm keeps the processing of the sequence as a whole, but it adapts the frequency and depth of the updates. So every k₁ time steps, you run a backpropagation through time for k₂ time steps and the parameter update is gonna be cheap if k₂ is small. The hidden states are still exposed to many time steps as you will see in the following. So, the idea is for time t from 1 to T to run our RNN for one step computing h subscript t and ysubscript t and then if we are at the k₁ step, then we run backpropagation through time from T down to t minus k₂.

    现在,通过时间截断的反向传播算法可以保持序列的整体处理,但可以适应更新的频率和深度。 因此,每k₁个时间步长,您将对k2个时间步长进行时间反向传播,如果k2很小,参数更新将很便宜。 您将在下面看到,隐藏状态仍然面临许多时间步长。 因此,想法是让时间t从1到T运行我们的RNN,一步计算h下标t和y下标t,然后如果我们处于k₁步骤,那么我们将进行从T到t减去k 2的时间反向传播。 。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    This then emerges in the following setup: What you can see here is that we essentially step over 4 time steps. If we are in the fourth time step, then we can backpropagate through time until the beginning of the sequence. Once we did that, we process ahead and we always keep the hidden state. We don’t discard it. So, we can model this interaction. So, does this solve all of our problems? Well, no because if we have a very long temporal context, it will not be able to update. So let’s say, the first element is responsible for changing something in the last element of your sequence, then you see they will never be connected. So, we are not able to learn this long temporal context anymore. This is a huge problem with long term dependency and basic RNNs.

    然后在以下设置中出现:您可以在这里看到的是,我们基本上分了4个时间步长。 如果我们处于第四时间步长,那么我们可以反向传播时间直到序列开始。 一旦做到这一点,我们将继续前进,并始终保持隐藏状态。 我们不会丢弃它。 因此,我们可以对此交互进行建模。 那么,这是否解决了我们所有的问题? 好吧,不是,因为如果我们的时间上下文很长,它将无法更新。 这么说吧,第一个元素负责更改序列的最后一个元素,然后您将看到它们永远不会被连接。 因此,我们不再能够学习这种长时态上下文。 对于长期依赖和基本RNN,这是一个巨大的问题。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    So, let’s say you have this long term dependency. You want to predict the next word in “the clouds are in the sky”. You can see that the clouds are probably a relevant context for this. Here, the context information is rather nearby. So, we can encode it in the hidden state rather easily. Now, if we have very long sequences, then it will be much harder because we have to backpropagate over so many steps. You have seen also that we had these problems in deep networks where we had the vanishing gradient problem. We were not able to find updates that connect parts of networks that are very far apart from each other.

    因此,假设您具有这种长期依赖性。 您想预测“云在天空”中的下一个单词。 您可以看到云可能与此相关。 在这里,上下文信息就在附近。 因此,我们可以很容易地将其编码为隐藏状态。 现在,如果我们有很长的序列,那么将变得更加困难,因为我们必须在许多步骤中反向传播。 您还已经看到,我们在深度网络中遇到了这些问题,在这些网络中,梯度问题逐渐消失了。 我们找不到连接彼此相距很远的网络部分的更新。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    You can see here that if we have this example: a sentence like “I grew up in Germany” and then say something else and “I speak fluent”, it’s probably German. I have to be able to remember that “I grew up in Germany”. So, the contextual information is far away and this makes a difference because we have to propagate through many layers.

    您可以在这里看到,如果有这样的例子:诸如“我在德国长大”然后再说其他话和“我说流利”的句子,可能是德语。 我必须记得“我在德国长大”。 因此,上下文信息距离很远,这有所作为,因为我们必须传播许多层。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    This means that we have to multiply with each other. You can see that those gradients are prone to vanishing and exploding as by the way identified by Hochreiter and Schmidhuber in [12]. Now, you still have this problem that you could have an exploding gradient. Well, you can truncate the gradient but the vanishing gradient is much harder to solve. There’s another problem the memory overwriting because the hidden state is overwritten in each time step. So detecting long-term dependencies will be even more difficult if you don’t have enough space in your hidden state vector. This is also a problem that may occur in your recurrent neural network. So, can we do better than this? The answer is again: yes.

    这意味着我们必须彼此相乘。 您可以看到,按照Hochreiter和Schmidhuber在[12]中指出的方式,这些梯度很容易消失和爆炸。 现在,您仍然有可能会出现爆炸梯度的问题。 好了,您可以截断渐变,但是逐渐消失的渐变很难解决。 内存覆盖还有另一个问题,因为隐藏状态在每个时间步都被覆盖。 因此,如果隐藏状态向量中没有足够的空间,则检测长期依赖项将变得更加困难。 这也是您的循环神经网络中可能发生的问题。 那么,我们能做得更好吗? 答案是:是的。

    Image for post
    CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

    This is something we will discuss in the next video, where we then talk about long short-term memory units and the contributions that were done by Hochreiter and Schmidhuber.

    这是我们将在下一个视频中讨论的内容,然后在其中讨论长短期存储单元以及Hochreiter和Schmidhuber所做的贡献。

    Image for post
    imgflip.imgflip

    So thank you very much for listening to this video and hope to see you in the next one. Thank you and goodbye!

    因此,非常感谢您收听此视频,并希望在下一个视频中见到您。 谢谢,拜拜!

    If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.

    如果你喜欢这篇文章,你可以找到这里更多的文章 ,更多的教育材料,机器学习在这里 ,或看看我们的深入 学习 讲座 。 如果您希望将来了解更多文章,视频和研究信息,也欢迎关注YouTubeTwitterFacebookLinkedIn 。 本文是根据知识共享4.0署名许可发布的 ,如果引用,可以重新打印和修改。

    RNN民间音乐 (RNN Folk Music)

    FolkRNN.orgMachineFolkSession.comThe Glass Herry Comment 14128

    FolkRNN.org MachineFolkSession.com 玻璃哈里评论14128

    链接 (Links)

    Character RNNsCNNs for Machine TranslationComposing Music with RNNs

    字符RNN CNN用于机器翻译 和RNN组合音乐

    翻译自: https://towardsdatascience.com/recurrent-neural-networks-part-2-5f45c1c612c4

    循环神经网络 递归神经网络

    展开全文
  • 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...
  • 循环神经网络 递归神经网络After the citizen science project of Curieuze Neuzen, I wanted to learn more about air pollution to see if I could make a data science project out of it. On the website of the...
  • 循环神经网络 递归神经网络I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal ...
  • RNN:循环神经网络or递归神经网络

    千次阅读 多人点赞 2019-05-09 19:26:11
    前些天,导师看完我论文以后问我:RNN是循环神经网络吗,我看到论文里用递归神经网络的更多啊? 我(内心os):有吗,我感觉我看到的都是循环神经网络啊? 我:这个应该就是翻译的问题吧 回去以后我查了一下,...
  • 循环递归神经网络

    千次阅读 2019-07-12 16:06:42
    1 循环神经网络 循环神经网络(Recurrent Neural Network,RNN)是一类具有短期记忆能力的神经网络。在循环神经网络中,神经元不但可以接受其它神经元的信息,也可以接受自身的信息,形成具有环路的网络结构。和...
  • ) 递归神经网络的图解指南:直观理解 如果你想学习机器学习,理解递归神经网络这一强大技术非常重要。 如果你使用智能手机或经常上网,你很有可能已经使用过应用了RNN的应用程序。 递归神经网络用于语音识别,语言...
  • RNN 循环/递归神经网络 RNN概述 RNN模型 LSTM长短记忆网络 使用LSTM进行情感分析 RNN 循环/递归神经网络 RNN概述 为什么有RNN 传统的神经网络,CNN(卷积神经网络), 他们的输出都是只考虑前一个输入的影响而不...
  • 1.循环神经网络(recurrent neural...2.递归神经网络(recursive neural network)递归神经网络是空间上的展开,处理的是树状结构的信息,是无环图,模型结构如下: recursive: 空间维度的展开,是一个树结构,比如nlp里某. ...
  • 递归神经网络 LSTM、GRU的结构、提出背景、优缺点。 针对梯度消失(LSTM等其他门控RNN)、梯度爆炸(梯度截断)的解决方案。 Memory Network(自选) Text-RNN的原理。 利用Text-RNN模型来进行文本分类。 Recur...
  • RNN 解决的问题:一般的神经网络的输入维度都是确定的,但有时我们要处理变长的输入,解决方法是采用循环或递归的方法输入(recurrent/recursive neural network)...循环神经网络和递归神经网络区别 RNN,一般都...
  • 深度学习小白专场之循环神经网络和递归神经网络 全连接神经网络和卷积神经⽹网络,都只能单独的去处理单个的输入,且前后的输入之间毫无关系。但是在一些任务中,我们需要更好的去处理序列的信息,即前后的输⼊之间...
  • 8.1 循环神经网络与递归神经网络的区别与联系 1.循环神经网络(recurrent neural network)是时间上的展开,处理的是序列结构的信息,是有环图,模型结构如下: recurrent: 时间维度的展开,代表信息在时间维度...
  • 递归神经网络 LSTM、GRU的结构、提出背景、优缺点。 针对梯度消失(LSTM等其他门控RNN)、梯度爆炸(梯度截断)的解决方案。 Memory Network(自选) Text-RNN的原理。 利用Text-RNN模型来进行文本分类。 Rec....
  • 循环递归神经网络 RNN的结构。循环神经网络的提出背景、优缺点。着重学习RNN的反向传播、RNN出现的问题(梯度问题、长期依赖问题)、BPTT算法。 双向RNN 递归神经网络 LSTM、GRU的结构、提出背景、优缺点。 针对...
  • 递归神经网络

    2019-06-18 16:22:00
    (ps:也有很多文献称之为递归神经网络循环神经网络)。  RNN在基础研究领域和工程领域都取得了很多突破性进展。在自然语言处理领域,采用神经网络模型来改进传统的N元统计模型。还应用于机器翻译领域、语音识别...
  • 作业: https://shimo.im/docs/3AB1IwSkwBwZlbMY 参考:https://blog.csdn.net/roger_royer/article/details/90552633#_2 ... RNN的结构 循环神经网路阔以看作是在时间维...
  • [神经网络学习笔记]递归神经网络,即循环神经网络(Recurrent Neural Network,RNN)综述 本人刚入门神经网络,学习了各个大佬们的博客,按照自己的学习思路整理出来的笔记,边学习边完善,供入门的同学学习,其中可能...
  • 循环神经网络(RNN) 卷积网络专门处理网格化的数据,而循环网络专门处理序列化的数据。 一般的神经网络结构为: 一般的神经网络结构的前提假设是:元素之间是相互独立的,输入、输出都是独立的。 现实世界中的输入并...
  • 2020-2-21 深度学习笔记10 - 序列建模:循环递归网络 1(展开计算图,循环神经网络–经典 / 导师驱动 / 唯一单向量输出 / 基于上下文RNN建模) 2020-2-23 深度学习笔记10 - 序列建模:循环递归网络 2(双向RNN,...
  • RNN可充分挖掘序列数据中的时序信息以及语义信息,这种在处理时序数据时比全连接神经网络和CNN更具有深度表达能力,RNN已广泛应用于语音识别、语言模型、机器翻译、时序分析等各个领域。 RNN的训练方法——BPTT算法...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 437
精华内容 174
关键字:

循环递归神经网络