有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)
These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. Try it yourself! If you spot mistakes, please let us know!
Welcome back to the final part of our video series on recurrent neural networks! Today, we want to talk a bit about the sampling of recurrent neural networks. When I mean sampling, I mean that we want to use recurrent neural networks to actually generate sequences of symbols. So, how can we actually do that?
欢迎回到我们关于递归神经网络的视频系列的最后一部分！ 今天，我们想谈谈递归神经网络的采样。 当我指采样时，我的意思是我们想使用递归神经网络实际生成符号序列。 那么，我们实际上该如何做呢？
Well, if you train your neural networks in the right way. You can actually create them in a way that they predict the probability distribution of the next element. So, if I train them to predict the next symbol in the sequence, you can also use them actually for generating sequences. The idea here is that you start with the empty symbol and then you use the RNN to generate some output. Then, you take this output and put it into the next state’s input. If you go ahead and do so, then you can see that you can actually generate whole sequences from your trained recurrent neural network.
好吧，如果您以正确的方式训练您的神经网络。 实际上，您可以按照预测下一个元素的概率分布的方式来创建它们。 因此，如果我训练它们来预测序列中的下一个符号，您也可以实际使用它们来生成序列。 这里的想法是从空符号开始，然后使用RNN生成一些输出。 然后，获取此输出并将其放入下一个状态的输入。 如果继续这样做，那么您会发现实际上可以从训练有素的递归神经网络生成整个序列。
So, the simple strategy is to perform a greedy search. So here we start with the empty symbol. Then, we just pick the most likely element as the input to the RNN in the next state and generate the next one and the next one and the next one and this generates exactly one sample sequence per experiment. So, this would be a greedy search and you can see that we exactly get one sentence that is constructed here. The sentence that we are constructing here is “let’s go through time”. Well, the drawback is, of course, there is no look-ahead possible. So, let’s say the most likely word after “let’s go” is “let’s”. So you could be generating loops like “let’s go let’s go” and so on. So, you’re not able to detect that “let’s go through time” has a higher total probability. So, it tends to repeat sequences of frequent words “and”, “the”, “some” and so on in speech.
因此，简单的策略是执行贪婪搜索。 因此，这里我们从空符号开始。 然后，我们只选择最可能的元素作为下一状态下RNN的输入，并生成下一个，下一个和下一个，并且每个实验恰好生成一个样本序列。 因此，这将是一个贪婪的搜索，您可以看到我们恰好得到一个在此处构造的句子。 我们在这里构建的句子是“让我们慢慢来”。 好吧，缺点是，当然，不可能超前。 因此，假设“放手”之后最可能的单词是“放手”。 因此，您可能会生成诸如“让我们开始吧”之类的循环。 因此，您无法检测到“让我们经历时间”的总概率更高。 因此，它倾向于在语音中重复频繁出现的单词“和”，“该”，“一些”等的序列。
Now, we are interested in alleviating this problem. This can be done with a beam search. Now, the beam search concept is to select the k most likely elements. k is essentially the beam width or size. So, here you then roll out k possible sequences. You have the one with these k elements as prefix and take the k most probable ones. So, in the example that we show here on the right-hand side, we start with the empty word. Then, we take the two most likely ones which would be “let’s” and “through”. Next, we generate “let’s” as output if we take “through”. If we take “let’s”, we generate “go” and we can continue this process and with our beam of the size of two. We can keep the two most likely sequences in the beam search. So now, we generate two sequences at a time. One is “let’s go through time” and the other one is “through let’s go time”. So, you see that we can use this beam idea to generate multiple sequences. In the end, we can determine which one we like best or which one generated the most total probability. So, we can generate multiple sequences in one go which typically then also contains better sequences than in the greedy search. I would say this is one of the most common techniques actually to sample from an RNN.
现在，我们有兴趣减轻这个问题。 这可以通过光束搜索来完成。 现在，波束搜索的概念是选择k个最可能的元素。 k本质上是光束的宽度或大小。 因此，在这里您可以推出k个可能的序列。 您拥有一个以这k个元素为前缀的元素，并取k个最可能的元素。 因此，在右侧显示的示例中，我们从空字开始。 然后，我们采用两个最可能的方式：“让我们”和“通过”。 接下来，如果我们采用“通过”，我们将生成“让我们”作为输出。 如果我们选择“让我们”，我们将产生“开始”，并且我们可以继续进行此过程，并使用两个大小的光束。 我们可以在波束搜索中保留两个最可能的序列。 所以现在，我们一次生成两个序列。 一个是“让我们度过时光”，另一个是“让我们度过时光”。 因此，您看到我们可以使用该波束概念来生成多个序列。 最后，我们可以确定我们最喜欢哪一个，或者哪一个产生了最大的总概率。 因此，我们可以一口气生成多个序列，然后通常包含比贪婪搜索更好的序列。 我要说的是，这实际上是从RNN采样的最常用技术之一。
Of course, there are also other things like random sampling. Here, the idea is that you select the next one according to the output probability distribution. You remember, we encoded our word as one-hot-encoded vectors. Then, we can essentially interpret the output of the RNN as a probability distribution and sample from it. This then allows us to generate many different sequences. So let’s say if “let’s” has an output probability of 0.8, it is sampled 8 out of 10 times as the next word. This creates very diverse results and it may look too random. So, you see here we get quite diverse results and the sequences that we are generating here. There’s quite some randomness that you can also observe in the generated sequences. To reduce the randomness, you can increase the probability or decrease the probability of probable or less probable words. This can be done for example by temperature sampling. Here you see that we introduced this temperature 𝜏 that we then use in order to steer the probability sampling. This is a common technique that you have already seen in various instances in this class.
当然，还有其他一些事情，例如随机抽样。 这里的想法是根据输出概率分布选择下一个。 您还记得，我们将单词编码为单热编码向量。 然后，我们可以从本质上将RNN的输出解释为概率分布并从中进行采样。 然后，这允许我们生成许多不同的序列。 因此，假设“让我们”的输出概率为0.8，则将其作为下一个单词从10次中采样8次。 这会产生非常多样的结果，并且看起来可能太随机了。 因此，您在这里看到我们得到了相当多样化的结果以及我们在这里生成的序列。 您还可以在生成的序列中观察到相当多的随机性。 为了减少随机性，您可以增加或减少概率词或概率词的概率。 这可以例如通过温度采样来完成。 在这里，您看到我们引入了此温度then，然后将其用于引导概率采样。 这是您在此类中的各种实例中已经看到的常见技术。
So let’s look into some examples and one thing that I found very interesting is character-based language modeling with RNNs. There’s a great blog post by Andrew Kaparthy which we have here. I also put it as a link to the description below. There he essentially trained an RNN for text generation based on Shakespeare. It’s trained on the character level. So, you only have one character as input and then you generate the sequence. It generates very interesting sequences. So here, you can see typical examples that have been generated. Let me read this to you:
“Pandarus Alas I think he shall be come approached and the dayWhen little srain would be attain’d into being never fed, And who is but a chain and subjects of his death, I should not sleep.”
“ Pandarus Alas，我认为他会来的，那一天永远不会被困住，几乎没有什么痛苦，而那只是他的链条和他的死亡对象，那一天，我不应该睡觉。”
Except from Karparthy’s blog
and so on. So, you can see that this is very interesting that the type of language that is generated this very close to Shakespeare but if you read through these examples, you can see that they’re essentially complete nonsense. Still, it’s interesting that the tone of the language that is generated is still present and is very typical for Shakespeare. So, that’s really interesting.
等等。 因此，您会发现生成的语言非常接近莎士比亚，这是非常有趣的，但是如果您仔细阅读这些示例，就会发现它们本质上是完全废话。 仍然有趣的是，所生成语言的语气仍然存在，对于莎士比亚来说是非常典型的。 所以，这真的很有趣。
Of course, you can generate many, many other things. One of a very nice example that I want to show to you today is composing folk music. So, music composition is typically tackled with RNNS and you can find different examples in literature, also by Jürgen Schmidhuber. The idea here is to use bigger deeper networks to generate folk music. So, what they employ is a character level RNN using ABC format including generating the title. So one example that I have here is this small piece of music. Yeah, as you can hear, it is really folk music. So, this is completely automatically generated. Interesting isn’t it? If you listen very closely, then you can also hear that folk music may be particularly suited for this because you could argue it’s kind a bit of repetitive. Still, it’s pretty awesome that the entire song is completely automatically generated. There are actually people meeting playing computer-generated songs like these folks on real instruments. Very interesting observation. So, I also put the link here for your reference if you’re interested in this. You can listen to many more examples on this website.
当然，您可以生成许多其他东西。 今天我想向大家展示的一个很好的例子就是创作民间音乐。 因此，音乐创作通常是通过RNNS解决的，您也可以在JürgenSchmidhuber的文学作品中找到不同的例子。 这里的想法是使用更大更深的网络来产生民间音乐。 因此，他们采用的是使用ABC格式的字符级RNN，包括生成标题。 所以我在这里举的一个例子就是这一小段音乐。 是的，正如您所听到的，这确实是民间音乐。 因此，这是完全自动生成的。 有趣吗？ 如果您听得很仔细，那么您还会听到民间音乐可能特别适合此操作，因为您可能会认为它有点重复。 尽管如此，整个歌曲完全自动生成还是非常棒的。 实际上，有人会像在真实乐器上演奏这些人那样的计算机生成的歌曲。 非常有趣的观察。 因此，如果您对此感兴趣，我也将链接放在此处供您参考。 您可以在此网站上收听更多示例。
So there are also RNNs for non-sequential tasks. RNNs can also be used for stationary inputs like image generation. Then, the idea is to model the process from rough sketch to final image. You can see one example here where we start essentially by drawing numbers from blurry to sharp. In this example, they use an additional attention mechanism telling the network where to look. This then generates something similar to brushstrokes. It actually uses a variational autoencoder which we will talk about when we talk on the topic of unsupervised deep learning.
因此，也有用于非顺序任务的RNN。 RNN也可用于固定输入，例如图像生成。 然后，该想法是对从粗略草图到最终图像的过程进行建模。 您可以在此处看到一个示例，我们从本质上讲是从模糊到清晰绘制数字。 在此示例中，他们使用了一种附加的注意机制来告诉网络在哪里看。 然后生成类似于笔触的内容。 它实际上使用了一种变体自动编码器，当我们讨论无监督深度学习的主题时，我们将进行讨论。
So let’s summarize this a little bit. You’ve seen recurrent neural networks are able to directly model sequential algorithms. You train via truncated backpropagation through time. The simple units suffer extremely from the exploding and vanishing gradients. We have seen that the LSTMs and GRUs are improved RNNs that explicitly model this forgetting and remembering operation. What we haven’t talked about is that there are many, many more developments that we can’t cover in this short lecture. So, it would be interesting also to talk about memory networks, neural Turing machines, and what we only touched at the moment is attention and recurrent neural networks. We’ll talk a bit more about attention in one of the next videos as well.
因此，让我们总结一下。 您已经看到递归神经网络能够直接对顺序算法建模。 您可以通过时间的截断反向传播进行训练。 简单单元遭受爆炸和消失梯度的极大折磨。 我们已经看到，LSTM和GRU是改进的RNN，可以显式地对此遗忘和记忆操作进行建模。 我们没有谈论的是，在这个简短的讲座中，我们还无法涵盖很多更多的发展。 因此，谈论存储网络，神经图灵机也很有趣，而我们目前仅涉及到注意力和循环神经网络。 在接下来的视频中 ，我们还将讨论更多有关注意力的内容。
So, next time in deep learning, we want to talk about visualization. In particular, we want to talk about visualizing architectures the training process, and of course also the inner workings of the network. We want to figure out what is actually happening inside the network and there are quite a few techniques — and to be honest — we’ve already seen some of them earlier in this class. In this lecture, we will really want to look into those methods and understand how they actually work in order to figure out what’s happening inside of deep neural networks. One interesting observation is that this is also related to neural network art. Another thing that deserves some little more thought is attention mechanisms and this will also be covered in one of the videos very soon to follow.
因此，下次在深度学习中，我们想谈谈可视化。 特别是，我们想谈谈可视化体系结构的培训过程，当然还有网络的内部运作。 我们想弄清楚网络内部实际发生了什么，并且有很多技术-坦白地说-我们在本课程的前面已经看到了其中一些技术。 在本讲座中，我们将非常想研究这些方法并了解它们的实际工作方式，以便弄清楚深度神经网络内部正在发生的事情。 一个有趣的发现是，这也与神经网络技术有关。 注意机制是值得关注的另一件事，很快就会在其中一个视频中介绍。
So, I have some comprehensive questions: “What’s the strength of RNNs compared to feed-forward networks?” Then, of course: “How do you train an RNN?”, “What are the challenges?”, “What’s the main idea behind LSTMs?” So you should be able to describe the unrolling of RNNs during the training. You should be able to describe the Elman cell, the LSTM, and the GRU. So, these are really crucial things that you should know if you have to take some tests in the very close future. So, better be prepared for questions like this one. Ok, we have some further reading below. There’s this very nice blog post by Andrew Kaparthy. There is a very cool blog post about CNN’s for a machine translation that I really recommend reading and a cool blog post for music generation which you can also find below. Of course, we also have plenty of scientific references. So, I hope you enjoyed this video and see you in the next one. Bye-bye!
因此，我有一些综合性的问题：“与前馈网络相比，RNN的优势是什么？” 然后，当然是：“您如何训练RNN？”，“挑战是什么？”，“ LSTM背后的主要思想是什么？” 因此，您应该能够描述训练期间RNN的展开。 您应该能够描述Elman单元，LSTM和GRU。 因此，如果您在不久的将来必须进行一些测试，那么这些都是至关重要的事情。 因此，最好为此类问题做好准备。 好的，我们在下面有进一步的阅读。 这是Andrew Kaparthy撰写的非常不错的博客文章 。 我真的建议阅读有关CNN的机器翻译的非常酷的博客文章，以及关于音乐生成的非常酷的博客文章 ，您也可以在下面找到。 当然，我们也有大量的科学参考资料。 因此，希望您喜欢这个视频，并在下一个视频中见到您。 再见！
If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced. If you are interested in generating transcripts from video lectures try AutoBlog.
RNN民间音乐 (RNN Folk Music)