• 递归神经网络

2020-01-07 22:01:16
递归神经网络 Recursive Neural Network, RNN 循环神经网络 时间维度展开 处理序列信息 递归神经网络 空间维度展开 处理空间信息 训练 误差项的传递 权重梯度的计算 权重更新 ...
递归神经网络 Recursive Neural Network, RNN
循环神经网络

时间维度展开
处理序列信息

递归神经网络

空间维度展开
处理空间信息

训练
误差项的传递
权重梯度的计算
权重更新


展开全文
• 卷积神经网络和递归神经网络（构建神经网络，进行数据处理，包括卷积神经网络和递归神经网络
• 卷积神经网络 递归神经网络 CNTK-递归神经网络 (CNTK - Recurrent Neural Network) Advertisements 广告 Previous Page 上一页 Next Page 下一页 Now, let us understand how to construct a ...
循环神经网络 递归神经网络 CNTK-递归神经网络 (CNTK - Recurrent Neural Network)
Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.
现在，让我们了解如何在CNTK中构建递归神经网络(RNN)。
介绍 (Introduction)
We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.
我们学习了如何使用神经网络对图像进行分类，这是深度学习中的标志性工作之一。 但是，神经网络擅长和研究大量的另一个领域是递归神经网络(RNN)。 在这里，我们将了解什么是RNN，以及在需要处理时间序列数据的场景中如何使用RNN。
什么是递归神经网络？ (What is Recurrent Neural Network?)
Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −
递归神经网络(RNN)可以定义为能够随时间进行推理的特殊类型的NN。 RNN主要用于需要处理随时间变化的值(即时间序列数据)的场景。 为了更好地理解它，让我们对常规神经网络和递归神经网络进行一下比较-
As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks. 众所周知，在常规神经网络中，我们只能提供一个输入。 这将其限制为仅导致一个预测。 举个例子，我们可以使用常规的神经网络来翻译文本。  On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks. 另一方面，在递归神经网络中，我们可以提供导致单个预测的一系列样本。 换句话说，使用RNN，我们可以基于输入序列来预测输出序列。 例如，在翻译任务中已经有许多成功的RNN实验。  递归神经网络的用途 (Uses of Recurrent Neural Network)
RNNs can be used in several ways. Some of them are as follows −
RNN可以以多种方式使用。 其中一些如下-
预测单个输出 (Predicting a single output)
Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−
在深入研究步骤之前，RNN如何基于序列预测单个输出，让我们看一下基本RNN的样子-

As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.
如上图所示，RNN包含到输入的回送连接，并且每当我们输入一个值序列时，它将作为时间步长处理序列中的每个元素。
Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.
此外，由于具有环回连接，RNN可以将生成的输出与序列中下一个元素的输入进行组合。 这样，RNN将在整个序列上建立一个可用于进行预测的内存。
In order to make prediction with RNN, we can perform the following steps−
为了使用RNN进行预测，我们可以执行以下步骤-
First, to create an initial hidden state, we need to feed the first element of the input sequence. 首先，要创建初始隐藏状态，我们需要输入输入序列的第一个元素。  After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence. 之后，要生成更新的隐藏状态，我们需要采用初始隐藏状态并将其与输入序列中的第二个元素组合。  At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence. 最后，要生成最终的隐藏状态并预测RNN的输出，我们需要在输入序列中使用final元素。 In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.
这样，借助此环回连接，我们可以教导RNN识别随时间发生的模式。
预测序列 (Predicting a sequence)
The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −
上面讨论的RNN的基本模型也可以扩展到其他用例。 例如，我们可以使用它来基于单个输入来预测值序列。 在这种情况下，为了使用RNN进行预测，我们可以执行以下步骤-
First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network. 首先，要创建初始隐藏状态并预测输出序列中的第一个元素，我们需要将输入样本馈入神经网络。 After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample. 之后，要生成更新的隐藏状态和输出序列中的第二个元素，我们需要将初始隐藏状态与相同的样本进行组合。 At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time. 最后，要再更新一次隐藏状态并预测输出序列中的最后一个元素，我们需要再一次提供样本。  预测序列 (Predicting sequences)
As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −
如我们所见，如何基于序列预测单个值以及如何基于单个值预测序列。 现在让我们看看如何预测序列的序列。 在这种情况下，为了使用RNN进行预测，我们可以执行以下步骤-
First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence. 首先，要创建初始隐藏状态并预测输出序列中的第一个元素，我们需要获取输入序列中的第一个元素。  After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state. 之后，要更新隐藏状态并预测输出序列中的第二个元素，我们需要采用初始隐藏状态。  At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.
最后，要预测输出序列中的最后一个元素，我们需要获取更新的隐藏状态和输入序列中的最后一个元素。  RNN的工作 (Working of RNN)
To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.
为了了解递归神经网络(RNN)的工作，我们需要首先了解网络中递归层的工作方式。 因此，首先让我们讨论e如何通过标准循环层来预测输出。
使用标准RNN层预测输出 (Predicting output with standard RNN layer)
As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −
如前所述，RNN中的基本层与神经网络中的常规层完全不同。 在上一节中，我们还在图中演示了RNN的基本体系结构。 为了更新首次进入序列的隐藏状态，我们可以使用以下公式-

In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.
在上式中，我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。
Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −
现在，对于下一步，将当前时间步的隐藏状态用作序列中下一时间步的初始隐藏状态。 这就是为什么要更新第二步的隐藏状态，我们可以重复在第一步中执行的计算，如下所示：

Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −
接下来，我们可以按照以下顺序重复更新第三步和最后一步的隐藏状态的过程：

And when we have processed all the above steps in the sequence, we can calculate the output as follows −
当我们按顺序处理了所有上述步骤后，我们可以计算出如下输出：

For the above formula, we have used a third set of weights and the hidden state from the final time step.
对于上面的公式，我们使用了第三组权重和最后时间步骤中的隐藏状态。
The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −
基本循环层的主要问题是消失的梯度问题，因此，它不是很擅长学习长期相关性。 用简单的话来说，基本的循环层不能很好地处理长序列。 这就是为什么其他一些更适合于较长序列的循环图层类型的原因如下-
长期记忆(LSTM) (Long-Short Term Memory (LSTM))

Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −
Hochreiter＆Schmidhuber引入了长期短期记忆(LSTM)网络。 它解决了使基本的循环层能够长时间记住事物的问题。 LSTM的体系结构如上图所示。 如我们所见，它具有输入神经元，记忆细胞和输出神经元。 为了解决梯度消失的问题，长期短期存储网络使用显式存储单元(存储先前的值)和随后的门-
Forget gate− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them. 忘记门 -顾名思义，它告诉存储单元忘记先前的值。 存储单元存储这些值，直到门(即“忘记门”)告诉它忘记它们为止。  Input gate− As name implies, it adds new stuff to the cell. 输入门 -顾名思义，它为单元添加了新内容。  Output gate− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state. 输出门 -顾名思义，输出门决定何时将矢量从单元传递到下一个隐藏状态。  门控循环单元(GRU) (Gated Recurrent Units (GRUs))

Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −
梯度递归单位 (GRU)是LSTM网络的细微变化。 它的门少了一个，并且接线方式与LSTM略有不同。 上图显示了它的体系结构。 它具有输入神经元，门控存储单元和输出神经元。 门控循环单元网络具有以下两个门-
Update gate− It determines the following two things− 更新门 -它确定以下两件事-  What amount of the information should be kept from the last state? 上次状态应保留多少信息？  What amount of the information should be let in from the previous layer? 上一层应提供多少信息？  Reset gate− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently. 重置门 - 重置门的功能与LSTMs网络的忘记门非常相似。 唯一的区别是它的位置略有不同。 In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.
与长期短期存储网络相比，门控循环单元网络稍快且易于运行。
创建RNN结构 (Creating RNN structure)
Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−
在开始对任何数据源的输出进行预测之前，我们需要首先构建RNN，并且构建RNN与在上一节中构建常规神经网络的过程完全相同。 以下是构建一个的代码-

from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10

多层放样 (Staking multiple layers)
We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−
我们还可以在CNTK中堆叠多个循环层。 例如，我们可以使用以下图层组合：

from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)
with default_options(initial_state = 0.1):
model = Sequential([
Fold(LSTM(15)),
Dense(1)
])(features)
target = input_variable(1, dynamic_axes=model.dynamic_axes)

As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −
从上面的代码中可以看到，我们可以通过以下两种方式在CNTK中对RNN进行建模-
First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep. 首先，如果只需要循环层的最终输出，则可以将折叠层与循环层结合使用，例如GRU，LSTM甚至RNNStep。 Second, as an alternative way, we can also use the Recurrence block. 其次，作为一种替代方法，我们也可以使用Recurrence块。  用时间序列数据训练RNN (Training RNN with time series data)
Once we build the model, let’s see how we can train RNN in CNTK −
构建模型后，让我们看看如何在CNTK中训练RNN-

from cntk import Function
@Function
def criterion_factory(z, t):
loss = squared_error(z, t)
metric = squared_error(z, t)
return loss, metric
loss = criterion_factory(model, target)

Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.
现在，要将数据加载到训练过程中，我们必须从一组CTF文件中反序列化序列。 以下代码具有create_datasource函数，该函数是用于创建训练和测试数据源的有用实用程序函数。

target_stream = StreamDef(field='target', shape=1, is_sparse=False)
features_stream = StreamDef(field='features', shape=1, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
return datasource
train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.

Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.
现在，由于我们已经设置了数据源，模型和损失函数，因此可以开始训练过程。 就像我们在上一节中使用基本神经网络所做的那样，它非常相似。

progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
target: train_datasource.streams.target
}
history = loss.train(
train_datasource,
epoch_size=EPOCH_SIZE,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config],
minibatch_size=BATCH_SIZE,
max_epochs=EPOCHS
)

We will get the output similar as follows −
我们将获得类似以下的输出-
输出- (Output−)

average  since  average  since  examples
loss      last  metric  last
------------------------------------------------------
Learning rate per minibatch: 0.005
0.4      0.4    0.4      0.4      19
0.4      0.4    0.4      0.4      59
0.452    0.495  0.452    0.495   129
[…]

验证模型 (Validating the model)
Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.
实际上，使用RNN进行预测与使用任何其他CNK模型进行预测非常相似。 唯一的区别是，我们需要提供序列而不是单个样本。
Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −
现在，由于我们的RNN终于经过训练完成，我们可以通过使用一些样本序列测试模型来验证模型，如下所示-

import pickle
with open('test_samples.pkl', 'rb') as test_file:
model(test_samples) * NORMALIZE

输出- (Output−)

array([[ 8081.7905],
[16597.693 ],
[13335.17 ],
...,
[11275.804 ],
[15621.697 ],
[16875.555 ]], dtype=float32)

翻译自: https://www.tutorialspoint.com/microsoft_cognitive_toolkit/microsoft_cognitive_toolkit_recurrent_neural_network.htm循环神经网络 递归神经网络
展开全文
• 卷积神经网络 递归神经网络Artificial intelligence (AI) is bridging the gap between technology and humans by allowing machines to automatically learn things from data and become more ‘human-like’;...
卷积神经网络 递归神经网络Artificial intelligence (AI) is bridging the gap between technology and humans by allowing machines to automatically learn things from data and become more ‘human-like’; thus, becoming more ‘intelligent’. In this case, intelligence can be considered to be the ability to process information which can be used to inform future decisions. This is ideal because humans can spontaneously put information together by recognizing old patterns, developing new connections, and perceiving something that they have learnt in a new light to develop new and effective processes. When combined with a machine’s computational power, tremendous results can be achieved.人工智能(AI)通过允许机器自动从数据中学习事物并变得更像“人类”来弥合技术与人类之间的鸿沟。 因此，变得更加“智能”。 在这种情况下，可以将智能视为处理信息的能力，该信息可用于为将来的决策提供依据。 之所以理想，是因为人们可以通过识别旧模式，建立新的联系并感知他们以新的视角学到的东西来自动开发信息，从而开发出新的有效流程。 与机器的计算能力相结合，可以获得巨大的结果。
The combination of automatic learning and computational efficiency can best be described by deep learning. This is a subset of AI and machine learning (ML) where algorithms are made to determine a pattern in data and develop a target function which best maps an input variable, x, to a target variable, y. The goal here is to automatically extract the most useful pieces of information needed to inform future decisions. Deep learning models are very powerful and they can be used to tackle a wide variety of problems; from predicting the likelihood that a student will pass a course, to recognizing an individual’s face to unlock their iPhones using Face ID. 深度学习可以最好地描述自动学习和计算效率的结合。 这是AI和机器学习(ML)的子集，在其中进行算法以确定数据中的模式并开发目标函数，该函数最好将输入变量x映射到目标变量y 。 这里的目标是自动提取最有用的信息，以为将来的决策提供依据。 深度学习模型非常强大，可用于解决各种问题。 从预测学生通过课程的可能性，到识别人脸以使用Face ID解锁iPhone。
Image by Author 图片作者Deep learning models are built on the idea of ‘neural networks’, and this is what allows the models to learn from raw data. Simply put, the deep neural network is created by stacking perceptrons, which is a single neuron. Information is propagated forward through this system by having a set of inputs, x, and each input has a corresponding weight, w. The input should also include a ‘bias term’ which is independent of x. The bias term is used to shift the function being used accordingly, given a problem at hand. Each corresponding input and weight are then multiplied, and the sum of products is calculated. The sum then passes through a non-linear activation function, and an output, y, is generated.深度学习模型是基于“神经网络”的思想构建的，这使模型可以从原始数据中学习。 简而言之，深层神经网络是通过堆叠感知器(单个神经元)创建的。 通过具有一组输入x ，信息通过该系统向前传播，并且每个输入具有对应的权重w 。 输入还应包括一个独立于x的“偏差项”。 给定当前的问题，可以使用偏置项来相应地改变所使用的功能。 然后将每个相应的输入和权重相乘，然后计算出乘积之和。 然后，该和通过非线性激活函数，并生成输出y 。
However, this ‘feed-forward’ type of model is not always applicable, and their fundamental architecture makes it difficult to apply them to certain scenarios. For example, consider a model that is designed to predict where a flying object will go to next, given a snapshot of that flying object. This is a sequential problem because the object will be covering some distance over time, and the current position of the object will depend on where the object was previously. If no information about the object’s previous position is given, then predicting where the object will go next is no better than a random guess. 但是，这种“前馈”类型的模型并不总是适用，并且它们的基本体系结构使其很难将它们应用于某些情况。 例如，考虑一个模型，该模型被设计为在给定飞行物体快照的情况下预测该飞行物体将要到达的位置。 这是一个顺序问题，因为随着时间的推移，对象将覆盖一定距离，并且对象的当前位置将取决于对象先前所在的位置。 如果没有给出有关对象的先前位置的信息，那么预测对象下一步将去的地方并不比随机猜测更好。
Image by Author 图片作者Let us consider another simple, yet important problem: predicting the next word. Models which do this are common now as they are used in applications such as autofill and autocorrect, and they are often taken for granted. This is a sequential task since the most appropriate ‘next word’ depends on the words which came before it. A feed-forward network would not be appropriate for this task because it would require a sentence with a particular length as an input to then predict the next word. However, this is an issue because we cannot guarantee an input of the same length each time, and the model’s performance would then be negatively affected.让我们考虑另一个简单但重要的问题：预测下一个单词。 执行此操作的模型现在很普遍，因为它们被用于诸如自动填充和自动更正之类的应用程序中，并且通常被认为是理所当然的。 这是一个顺序的任务，因为最合适的“下一个单词”取决于之前的单词。 前馈网络不适用于该任务，因为它将需要一个具有特定长度的句子作为输入，然后才能预测下一个单词。 但是，这是一个问题，因为我们不能保证每次输入的长度都是相同的，这样会对模型的性能造成负面影响。
A potential way to combat this issue is to only look at a subsection of this input sentence, such as the last two words maybe. This combats the issue of variable-length inputs because, despite the total input length, the model will only use the last two words of the sentence to predict the next word. But this is still not ideal because the model now cannot account for long-term dependencies. That is, consider the sentence “I grew up in Berlin and only moved to New York a year ago. I can speak fluent …”. By only considering the last two words, every language would be equally likely. But when the entire sentence is considered, German would be most likely. 解决此问题的一种潜在方法是仅查看此输入句子的一个小节，例如最后两个单词。 这解决了可变长度输入的问题，因为尽管输入总长度很大，但是该模型将仅使用句子的最后两个单词来预测下一个单词。 但这仍然不是理想的，因为该模型现在无法解决长期依赖性。 也就是说，请考虑以下句子：“我在柏林长大，一年前才搬到纽约。 我会说流利的……”。 仅考虑最后两个词，每种语言的可能性均等。 但是当考虑整个句子时，德语很有可能会出现。
Image by Author 图片作者The best way to overcome these issues is to have an entirely new network structure; one that can update information over time. This is a Recurrent Neural Network (RNN). This is similar to a perceptron in that over time, information is being forward through the system by a set of inputs, x, and each input has a weight, w. Each corresponding input and weight are then multiplied, and the sum of products is calculated. The sum then passes through a non-linear activation function, and an output, y, is generated.解决这些问题的最佳方法是拥有一个全新的网络结构。 一种可以随着时间更新信息的工具。 这是递归神经网络(RNN)。 这类似于感知器，随着时间的流逝，信息由一组输入x通过系统转发，并且每个输入的权重w 。 然后将每个相应的输入和权重相乘，然后计算出乘积之和。 然后，该和通过非线性激活函数，并生成输出y 。
The difference is that, in addition to the output, the network is also generating an internal state update, u. This update is then used when analyzing the next set of input information and provides a different output that is also dependent on the previous information. This is ideal because information persists throughout the network over time. As the name suggests, this update function is essentially a recurrence relation that happens at every step of the sequential process, where u is a function of the previous u and the current input, x. 区别在于，除了输出之外，网络还生成内部状态更新u 。 然后，在分析下一组输入信息时使用此更新，并提供一个也取决于先前信息的不同输出。 这是理想的，因为随着时间的推移，信息会在整个网络中持续存在。 顾名思义，此更新函数本质上是在顺序过程的每个步骤中发生的递归关系，其中u是上一个u和当前输入x的函数。
Image by Author 图片作者The concept of looping through the RNN’s system over time might be a bit abstract and difficult to grasp. Another way to think of an RNN is to actually unfold this system over time. That is, think of the RNN as a set of singular feed-forward models, where each model is linked together by the internal state update. Viewing the RNN like this can truly provide some insight as to why this structure is suitable for sequential tasks. At each step of the sequence, there is an input, some process being performed on that input, and a related output. For the next step of the sequence, the step before must have some influence does not affect the input but affects the related output.随着时间的流逝，循环遍历RNN系统的概念可能有点抽象并且难以掌握。 想到RNN的另一种方法是随着时间的推移实际展开该系统。 也就是说，将RNN视为一组奇异的前馈模型，其中每个模型都通过内部状态更新链接在一起。 像这样查看RNN可以真正提供一些有关为什么此结构适合于顺序任务的见解。 在序列的每个步骤中，都有一个输入，对该输入执行一些处理，以及相关的输出。 对于序列的下一个步骤，之前的步骤必须具有一定的影响力，但不会影响输入，但会影响相关的输出。
If we go back to either the flying object scenario or the word prediction scenario, and we consider them using the unfolded RNN, we would be able to understand the solutions more. At each previous position of the flying object, we can predict a possible path. The predicted path updates as the model receives more information about where the object was previously, and this information updates itself to then feed into the future sequences of the model. Similarly, as each new word from the sentence scenario is fed into the model, a new combination of likely words is generated. 如果我们回到飞行物体场景或单词预测场景，并使用展开的RNN考虑它们，我们将能够更多地了解解决方案。 在飞行物体的每个先前位置，我们可以预测一条可能的路径。 当模型接收到有关对象先前所在位置的更多信息时，预测的路径会更新，并且此信息会更新自身，然后输入到模型的未来序列中。 类似地，当将来自句子场景的每个新单词输入到模型中时，将生成可能单词的新组合。
Image by Author 图片作者Neural networks are an essential part of AI and ML as they allow models to automatically learn from data, and they combine a version of human learning with great computational ability. However, applying a non-sequential structure to a sequential task will result in poor model performance, and the true power of neural networks would not be harnessed. RNNs are artificial learning systems which internally update themselves based on previous information, in order to predict the most accurate results over time.神经网络是AI和ML的重要组成部分，因为它们允许模型自动从数据中学习，并且将人类学习的一种版本与强大的计算能力结合在一起。 但是，将非顺序结构应用于顺序任务将导致较差的模型性能，并且无法利用神经网络的真正功能。 RNN是人工学习系统，可以根据以前的信息在内部进行自我更新，以便预测一段时间内最准确的结果。
dspace.mit.edu/bitstream/handle/1721.1/113146/1018306404-MIT.pdf?sequence=1 dspace.mit.edu/bitstream/handle/1721.1/113146/1018306404-MIT.pdf?sequence=1
stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
karpathy.github.io/2015/05/21/rnn-effectiveness/ karpathy.github.io/2015/05/21/rnn-efficiency/
Other Useful Material: 其他有用的材料：
deeplearning.mit.edu/ deeplearning.mit.edu/
neuralnetworksanddeeplearning.com/ neuronetworksanddeeplearning.com/
towardsdatascience.com/the-mathematics-behind-deep-learning-f6c35a0fe077 向datascience.com/the-mathematics-behind-deep-learning-f6c35a0fe077
翻译自: https://towardsdatascience.com/introducing-recurrent-neural-networks-f359653d7020卷积神经网络 递归神经网络
展开全文
• 递归神经网络/ 深度学习 ， 自然语言处理 (Deep Learning, Natural Language Processing) You asked Siri about the weather today, and it brilliantly resolved your queries. 您向Siri查询了今天的天气，它很好地...
递归神经网络/ 深度学习 ， 自然语言处理 (Deep Learning, Natural Language Processing)
But, how did it happen? How it converted your speech to the text and fed it to the search engine? 但是，它是怎么发生的呢？ 它如何将您的语音转换为文本并将其反馈给搜索引擎？
Morning Brew on Morning Brew在UnsplashUnsplash拍摄 This is the magic of Recurrent Neural Networks. 这是递归神经网络的魔力。
Recurrent Neural Networks(RNN) lies under the umbrella of Deep Learning. They are utilized in operations involving Natural Language Processing. Nowadays since the range of AI is expanding enormously, we can easily locate Recurrent operations going around us. These play an important role ranging from Speech Translation, Music Composition to predicting the next word in your mobile’s keyboard. 递归神经网络(RNN)位于   深度学习 。 它们用于涉及自然语言处理的操作。 如今，由于AI的范围正在极大地扩展，我们可以轻松地找到我们周围的经常性运营。 这些功能起着重要作用，从语音翻译，音乐创作到预测手机键盘中的下一个单词。
The types of problems that RNN caters to are: RNN迎合的问题类型为：
Outputs are dependent on previous inputs. (Sequential Data) 输出取决于先前的输入。 (顺序数据) The length of the input isn’t fixed. 输入的长度不固定。  顺序数据 (Sequential Data)
Photo by Erik Mclean on Unsplash Erik Mclean在Unsplash上拍摄的照片 To understand Sequential data, let us suppose you have a dog standing still. 为了了解顺序数据，让我们假设您有一条狗停滞不前。
Now, you’re supposed to predict in which direction will he move? So with only this limited information imparted to you, how would you do this? Well, you can irrefutably take a guess, but in my opinion, what you’d come up would be a random guess. Without knowledge of where the dog has been, you wouldn’t have enough data to predict where he’ll be going. 现在，您应该预测他将朝哪个方向移动？ 因此，仅将有限的信息提供给您，您将如何做？ 好吧，您可以毫无疑问地进行猜测，但是在我看来，您要提出的只是一个随机的猜测。 如果不知道那只狗去了哪里，您将没有足够的数据来预测他要去哪里。
Photo by Marcus Benedix on Unsplash Marcus Benedix在Unsplash上的照片 But, now if the dog starts running in a particular direction and if you try to record the movements of dogs, you’ll be pretty sure the directions he’ll be choosing. Because at this instant you’ve enough information to make a better prediction. 但是，现在，如果狗开始沿特定方向运行，并且如果您尝试记录狗的运动，则可以肯定他会选择的方向。 因为这时您有足够的信息来做出更好的预测。
So a sequence is a particular order in which one thing follows another. With this information, you can now see that the dog is moving towards you. 因此，序列是一个事物跟随另一事物的特定顺序。 有了这些信息，您现在可以看到狗正在向您移动。
Text, Audio are also illustrations of sequence data. 文本，音频也是序列数据的图示。
When you’re talking to someone, there is a sequence of the words you utter. Similarly, when you e-mail someone, based on your texts, there is some certainty about what your next words would be. 当您与某人交谈时，您会说出一系列单词。 同样，当您根据文本向某人发送电子邮件时，可以确定下一个单词的含义。
顺序记忆 (Sequential Memory)
As mentioned earlier, RNNs cater to the problems that involve inter-dependency between outputs and previous inputs. That indirectly means, there is some memory affiliated to these kinds of Neural Networks. 如前所述，RNN解决了涉及产出与先前投入之间相互依存的问题。 这间接意味着，有些记忆与这些神经网络有关。
Sequential memory is something that helps RNN achieve its goal. 顺序存储可以帮助RNN实现其目标。
Photo by Jessicah Hast on Unsplash 杰西卡·哈斯特 ( Jessicah Hast) 摄于Unsplash That was an easy task, if you were taught this specific sequence, it should come quickly to you. 这是一项容易的任务，如果您被教导了这个特定的顺序，那么它应该很快就会出现。
Now, if I ask you to recall alphabets in a reverse manner. 现在，如果我要您以相反的方式回忆字母。
I bet this task is much solid. And in my opinion, it will give you a hard time. 我敢打赌，这项任务非常可靠。 我认为，这会给您带来困难。
So, the reason the former task proved to be resilient because you’ve learned the alphabets as a sequence. Sequential memory makes it easier for your brain to recognize patterns. 因此，前一个任务被证明具有弹性的原因是因为您已经按顺序学习了字母。 顺序记忆使您的大脑更容易识别模式。
递归神经网络与神经网络有何不同？ (How Recurrent Neural Networks differ from Neural Networks?)
As discussed earlier, Recurrent Neural Network comes under Deep Learning but so does Neural Networks. But due to the absence of an internal state, Artificial Neural Networks are not something that we use to process our sequential data. 如前所述，递归神经网络属于深度学习，但神经网络也属于深度学习。 但是由于缺少内部状态，因此人工神经网络不是我们用来处理序列数据的东西。
Feedforward Network 前馈网络 To develop a Neural Network that is robust for Sequential data, we add an internal state to our feedforward neural network that provides us with internal memory. Or in nutshell, Recurrent Neural Network is a generalization of a feedforward neural network that has internal memory. RNN implements the abstract concept of sequential memory, that helps them by providing the previous experience and thus allowing it to predict better on sequential data. 为了开发对顺序数据具有鲁棒性的神经网络，我们将内部状态添加到前馈神经网络中，从而为我们提供内部记忆。 简而言之，递归神经网络是具有内部记忆的前馈神经网络的概括。 RNN实现了顺序存储器的抽象概念 ，它通过提供以前的经验，从而使他们能够更好地预测顺序数据，可以帮助他们。
Recurrent Neural Network 递归神经网络 RNN proves it recurrent nature by performing the same function for every input, while the output of current input depends upon the past input. Comparing it to Feedforward Neural Network, in RNN, all the inputs are inter-dependent on each other unlike that in vanilla form. RNN通过对每个输入执行相同的功能来证明其递归性质，而当前输入的输出取决于过去的输入。 与RNN中的前馈神经网络相比，所有输入都是相互依赖的，这与原始形式不同。
RNN的工作 (Working of RNN)
Okay, but how does RNN replicate those internal memories and actually work? 好的，但是RNN如何复制这些内部记忆并真正起作用？
Since RNN solely depends upon sequential memory, we expect our model to break up the sentence into individual words. 由于RNN仅取决于顺序记忆，因此我们希望我们的模型将句子分解为单个单词。
At first, “What” is fed into RNN. Our model then encodes it and presents us with an output. 首先，“ What”被输入到RNN中。 然后，我们的模型对其进行编码，并为我们提供输出。
For the next part, we feed the word “is” and the former output that we got from the word “What”. RNN has now access to the information imparted by both words: “What” and “is”. 在下一部分中，我们将输入“ is”和从“ What”中获得的前一个输出。 RNN现在可以访问由“什么”和“是”这两个词提供的信息。
The same process will be iterated until we reach the end of our sequence. And In the end, we can expect RNN had encoded information from all the words present in our sequence. 重复相同的过程，直到我们到达序列的末尾。 最后，我们可以期望RNN从序列中存在的所有单词中编码信息。
What is your name? 你叫什么名字？ Since the last output is developed by combining the former outputs and the last input, we can pass the final output to the feedforward layer to achieve our goal. 由于最后一个输出是通过合并前一个输出和最后一个输入来开发的，因此我们可以将最终输出传递到前馈层以实现我们的目标。
To create the context, let us resemble input by x; output by y; and state vector by a. 为了创建上下文，让我们类似于x的输入 ； 由y输出 ; 和状态向量。
When we pass our first input i.e. x0 (“What”), we are provided with the output y1 and a state vector a1, that is passed to next function s1 to accommodate the past output of x0. 当我们传递第一个输入x0(“ What”)时，我们将得到输出y1和状态向量a1，状态向量a1传递给下一个函数s1以容纳x0的过去输出。
The process iterates until we reach at the end of our sequence. At the end we are left with state vector a5 that assures us that all inputs <x1, x2, x3, x4, x5> have been fed to our model and an output is generated that is contributed by all outputs. 该过程将反复进行，直到到达序列末尾。 最后，我们留给状态向量a5，以确保我们所有输入<x1，x2，x3，x4，x5>都已馈送到我们的模型中，并生成由所有输出贡献的输出。
State Vector 状态向量 Single RNN cell 单个RNN单元  RNN的伪代码 (Pseudocode for RNN)
RNN架构的类型 (Types of RNN architectures)
One to One 一对一 One to Many — These kinds of RNN architectures are usually used for Image captioning/story captioning. 一对多 -这些RNN体系结构通常用于图像字幕/故事字幕。 Many to One — These kinds of RNN architectures are used for Sentiment Analysis. 多对一 -这些RNN架构用于情感分析。 Many to Many — These types of RNN architectures are utilized in Part of Speech i.e. where we are expected to find property for each word. 多对多 -语音部分使用了这些类型的RNN体系结构，即我们希望在其中找到每个单词的属性。 Encoder-Decoder — These types of RNN are the most complex ones and are used for Language Translation. 编码器-解码器 -这些类型的RNN是最复杂的类型，用于语言翻译。 Source资源  RNN的缺点 (Drawbacks of RNN)
短期记忆 (Short-term Memory)
I hope you’ve pondered upon the odd color distribution in our final RNN cell. 希望您在我们的最终RNN单元中考虑了奇怪的颜色分布。
Final output produced by RNN RNN产生的最终输出 This is an interpretation of Short-term memory. In RNN, at each new timestamp(new input) old information gets morphed by the current input. One could imagine, that after “t” timestamps, the information stored at the time step (t-k) gets completely morphed. 这是短期记忆的一种解释。 在RNN中，在每个新时间戳记(新输入)处，旧信息都会被当前输入变形。 可以想象，在“ t ”个时间戳之后，在时间步长(tk)中存储的信息会完全变形。
And thus, RNNs can’t be used for very long sequences. 因此，RNN不能用于很长的序列。
This is the reason for Short-term memory. Vanishing Gradient is present in every type of Neural Network due to the nature of Backpropagation. 这就是短期记忆的原因。 由于反向传播的性质，每种神经网络都存在消失梯度。
When we train a Neural Network there are three major steps associated with our training. First, a forward pass is done to make a prediction. Later, it compares the prediction to theoretical value producing a loss function. Lastly, we aim to make our prediction better, therefore, we implement Backpropagation that revises values for each node. 当我们训练神经网络时，与训练相关的三个主要步骤。 首先，进行前向通过以进行预测。 随后，它将预测结果与产生损失函数的理论值进行比较。 最后，我们旨在改善我们的预测，因此，我们实施了反向传播，以修改每个节点的值。
“After calculation of loss function, we’re pretty sure that our model is doing something wrong and we need to inspect that, but, it is practically impossible to check for each neuron. But, also the only way possible for us to salvage our model is to retrograde. “计算完损失函数后，我们很确定我们的模型做错了，我们需要检查它，但是，实际上不可能检查每个神经元。 但是，挽救我们模型的唯一可能方法就是逆行。
Steps for Backpropagation 反向传播的步骤
We compute certain losses at the output and we will try to figure out which node was responsible for that inefficiency. 我们在输出端计算一定的损耗，然后尝试找出哪个节点造成了这种低效率。 To do so, we will backtrack the whole network. 为此，我们将回溯整个网络。 Suppose, we found that the second layer(w3h2+b2) is responsible for our loss, and we will try to change it. But if we ponder upon our network, w3 and b2 are independent entities but h2 depends upon w2, b1 & h1 and h1 further depends upon our input i.e. x1, x2, x3…., xn. But since we don’t have control over inputs we will try to amend w1 & b1. 假设我们发现第二层(w3h2 + b2)是造成我们损失的原因，我们将尝试对其进行更改。 但是如果我们考虑网络，w3和b2是独立实体，但是h2取决于w2，b1＆h1和h1进一步取决于我们的输入，即x1，x2，x3…。，xn。 但由于我们无法控制输入，因此我们将尝试修改w1和b1。 To compute our changes we will use the chain rule.” 为了计算更改，我们将使用链式规则。”
Source源 When we perform backpropagation, we calculate weights and biases for each node. But, if the improvements in the former layers are meager then the adjustment to the current layer will be much smaller. This causes gradients to dramatically diminish and thus leading to almost NULL changes in our model and due to that our model is no longer learning and no longer improving. 当我们执行反向传播时，我们为每个节点计算权重和偏差。 但是，如果前几层的改进很少，那么对当前层的调整将小得多。 这将导致梯度急剧减小，从而导致模型中的变化几乎为NULL，并且由于我们的模型不再学习并且不再改进。
LSTM和GRU (LSTMs and GRUs)
To combat the drawbacks of RNNs, we have LSTM(Long Short Term Memory) and GRU(Gated Recurrent Unit). LSTMs and GRUs are basically advanced versions of RNNs with little tweaks to overcome the problem of Vanishing Gradients and learning long-term dependencies using components known as “Gates”. Gates are a tensor operation that can learn the flow of information and thus short-term memory isn’t an issue for them. 为了克服RNN的缺点，我们提供了LSTM ( 长期短期记忆 )和GRU ( 门控循环单元 )。 LSTM和GRU基本上是RNN的高级版本，几乎不需要进行任何调整即可克服消失梯度的问题，并使用称为“门”的组件学习长期依赖关系。 Gates是一个张量运算，可以了解信息流，因此短期存储对他们而言不是问题。
During Forward propagation, the gates control flow of information. Thus, preventing any irrelevant information from being written to states. 在正向传播过程中 ，门控制信息流。 因此，防止了任何不相关的信息被写入状态。 During Backpropagation, the gates control the flow of gradient, and these gates are capable of multiplying the gradients to avoid vanishing gradient. 在反向传播期间，门控制梯度的流动，并且这些门能够使梯度相乘以避免梯度消失。 To learn more about LSTM and GRUs, you can check out: 要了解有关LSTM和GRU的更多信息，可以查看以下内容：
LSTM doesn’t solve problem of Exploding Gradients, therefore, we tend to use Gradient Clipping while implementing LSTMs. LSTM不能解决爆炸梯度的问题，因此，在实现LSTM时，我们倾向于使用梯度剪切。  结论 (Conclusion)
Feel free to connect: 随时连接：
Instagram ~ https://www.instagram.com/_daksh_trehan_/ Instagram〜https: //www.instagram.com/_daksh_trehan_/
Github ~ https://github.com/dakshtrehan Github〜https: //github.com/dakshtrehan
Follow for further Machine Learning/ Deep Learning blogs. 请关注进一步的机器学习/深度学习博客。
Detecting COVID-19 Using Deep Learning 使用深度学习检测COVID-19
The Inescapable AI Algorithm: TikTok 不可避免的AI算法：TikTok
An insider’s guide to Cartoonization using Machine Learning 使用机器学习进行卡通化的内部指南
Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots? 您为什么要为乔治·弗洛伊德(George Floyd)的谋杀和德里公社暴动负责？
Convolution Neural Network for Dummies 卷积神经网络
Diving Deep into Deep Learning 深入学习
Why Choose Random Forest and Not Decision Trees 为什么选择随机森林而不是决策树
Clustering: What it is? When to use it? 聚类：是什么？ 什么时候使用？
Start off your ML Journey with k-Nearest Neighbors 通过k最近邻居开始您的ML旅程
Naive Bayes Explained 朴素贝叶斯解释
Activation Functions Explained 激活功能介绍
Parameter Optimization Explained 参数优化说明
Logistic Regression Explained 逻辑回归解释
Linear Regression Explained 线性回归解释
Determining Perfect Fit for your ML Model 确定最适合您的ML模型
Cheers! 干杯! 翻译自: https://medium.com/towards-artificial-intelligence/recurrent-neural-networks-for-dummies-8d2c4c725fbe递归神经网络/
展开全文
• 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...
• 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...
• 循环神经网络 递归神经网络 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture ...
• 循环神经网络 递归神经网络After the citizen science project of Curieuze Neuzen, I wanted to learn more about air pollution to see if I could make a data science project out of it. On the website of the...
• 循环神经网络 递归神经网络I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal ...
• rnn 递归神经网络Recurrent neural network is a type of neural network used to deal specifically with sequential data. Actually what makes RNN so powerful is the fact that it doesn't take into ...
• 具有缺失值的多元时间序列的递归神经网络题目：具有缺失值的多元时间序列的递归神经网络作者：Ben D. Fulcher来源：Machine Learning (cs.LG) Submitted on 6 Jun 2016文档链接：arXiv:1606.01865代码链接：...
• 递归神经网络的诊断预测
• 深度学习实战篇：采用RNN递归神经网络进行手写数据集分类，适合初学者。
• RNN：循环神经网络or递归神经网络？

千次阅读 多人点赞 2019-05-09 19:26:11
前些天，导师看完我论文以后问我：RNN是循环神经网络吗，我看到论文里用递归神经网络的更多啊？ 我（内心os）：有吗，我感觉我看到的都是循环神经网络啊？ 我：这个应该就是翻译的问题吧 回去以后我查了一下，...
• 递归神经网络的语义角色标签

...