此为《Gans in Action》（对抗神经网络实战）第一章读书笔记
Chapter 1. Introduction to GANs 对抗神经网络介绍
This chapter covers
- An overview of Generative Adversarial Networks
- What makes this class of machine learning algorithms special
- Some of the exciting GAN applications that this book covers
The notion of whether machines can think is older than the computer itself. In 1950, the famed mathematician, logician, and computer scientist Alan Turing—perhaps best known for his role in decoding the Nazi wartime enciphering machine, Enigma—penned a paper that would immortalize his name for generations to come, “Computing Machinery and Intelligence.”
In the paper, Turing proposed a test he called the imitation game, better known today as the Turing test. In this hypothetical scenario, an unknowing observer talks with two counterparts behind a closed door: one, a fellow human; the other, a computer. Turing reasons that if the observer is unable to tell which is the person and which is the machine, the computer passed the test and must be deemed intelligent.
Anyone who has attempted to engage in a dialogue with an automated chatbot or a voice-powered intelligent assistant knows that computers have a long way to go to pass this deceptively simple test. However, in other tasks, computers have not only matched human performance but also surpassed it—even in areas that were until recently considered out of reach for even the smartest algorithms, such as superhumanly accurate face recognition or mastering the game of Go.
:See “Surpassing Human-Level Face Verification Performance on LFW with GaussianFace,” by Chaochao Lu and Xiaoou Tang, 2014, https://arXiv.org/abs/1404.3840. See also the New York Times article “Google’s AlphaGo Defeats Chinese Go Master in Win for A.I.,” by Paul Mozur, 2017, http://mng.bz/07WJ.
Machine learning algorithms are great at recognizing patterns in existing data and using that insight for tasks such as classification (assigning the correct category to an example) and regression (estimating a numerical value based on a variety of inputs). When asked to generate new data, however, computers have struggled. An algorithm can defeat a chess grandmaster, estimate stock price movements, and classify whether a credit card transaction is likely to be fraudulent. In contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed. Indeed, humanity’s most basic and essential capacities—including a convivial conversation or the crafting of an original creation—can leave even the most sophisticated supercomputers in digital spasms.
This all changed in 2014 when Ian Goodfellow, then a PhD student at the University of Montreal, invented Generative Adversarial Networks (GANs). This technique has enabled computers to generate realistic data by using not one, but two, separate neural networks. GANs were not the first computer program used to generate data, but their results and versatility set them apart from all the rest. GANs have achieved remarkable results that had long been considered virtually impossible for artificial systems, such as the ability to generate fake images with real-world-like quality, turn a scribble into a photograph-like image, or turn video footage of a horse into a running zebra—all without the need for vast troves of painstakingly labeled training data.
直到2014年，博士生 Ian Goodfellow提出了生成对抗网络（GAN）。GAN是由两个神经网络组成，在产生新数据方面具有很好的通用性也被广泛应用
A telling example of how far machine data generation has been able to advance thanks to GANs is the synthesis of human faces, illustrated in figure 1.1. As recently as 2014, when GANs were invented, the best that machines could produce was a blurred countenance—and even that was celebrated as a groundbreaking success. By 2017, just three years later, advances in GANs enabled computers to synthesize fake faces whose quality rivals high-resolution portrait photographs. In this book, we look under the hood of the algorithm that made all this possible.
Figure 1.1. Progress in human face generation
(Source: “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation,” by Miles Brundage et al., 2018, https://arxiv.org/abs/1802.07228.)
1.1. What are Generative Adversarial Networks? 什么是GAN
Generative Adversarial Networks (GANs) are a class of machine learning techniques that consist of two simultaneously trained models: one (the Generator) trained to generate fake data, and the other (the Discriminator) trained to discern the fake data from real examples.
The word generative indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set. For example, if we want a GAN to synthesize images that look like Leonardo da Vinci’s, we would use a training dataset of da Vinci’s artwork.
The term adversarial points to the game-like, competitive dynamic between the two models that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like da Vinci’s. The Discriminator’s objective is to distinguish the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be da Vinci’s. The two networks are continually trying to outwit each other: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.
Finally, the word networks indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. Depending on the complexity of the GAN implementation, these can range from simple feed-forward neural networks (as you’ll see in chapter 3) to convolutional neural networks (as you’ll see in chapter 4) or even more complex variants, such as the U-Net (as you’ll see in chapter 9).
1.2. How do GANs work? GAN工作原理
The mathematics underpinning GANs are complex (as you’ll explore in later chapters, especially chapters 3 and 5); fortunately, many real-world analogies can make GANs easier to understand. Previously, we discussed the example of an art forger (the Generator) trying to fool an art expert (the Discriminator). The more convincing the fake paintings the forger makes, the better the art expert must be at determining their authenticity. This is true in the reverse situation as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger must improve to avoid being caught red-handed.
Another metaphor often used to describe GANs—one that Ian Goodfellow himself likes to use—is that of a criminal (the Generator) who forges money, and a detective (the Discriminator) who tries to catch him. The more authentic-looking the counterfeit bills become, the better the detective must be at detecting them, and vice versa.
In more technical terms, the Generator’s goal is to produce examples that capture the characteristics of the training dataset, so much so that the samples it generates look indistinguishable from the training data. The Generator can be thought of as an object recognition model in reverse. Object recognition algorithms learn the patterns in images to discern an image’s content. Instead of recognizing the patterns, the Generator learns to create them essentially from scratch; indeed, the input into the Generator is often no more than a vector of random numbers.
The Generator learns through the feedback it receives from the Discriminator’s classifications. The Discriminator’s goal is to determine whether a particular example is real (coming from the training dataset) or fake (created by the Generator). Accordingly, each time the Discriminator is fooled into classifying a fake image as real, the Generator knows it did something well. Conversely, each time the Discriminator correctly rejects a Generator-produced image as fake, the Generator receives the feedback that it needs to improve.
The Discriminator continues to improve as well. Like any classifier, it learns from how far its predictions are from the true labels (real or fake). So, as the Generator gets better at producing realistic-looking data, the Discriminator gets better at telling fake data from the real, and both networks continue to improve simultaneously.
Table 1.1 summarizes the key takeaways about the two GAN subnetworks.
1.3. GANs in action GAN实战
Now that you have a high-level understanding of GANs and their constituent networks, let’s take a closer look at the system in action. Imagine that our goal is to teach a GAN to produce realistic-looking handwritten digits. (You’ll learn to implement such a model in chapter 3 and expand on it in chapter 4.) Figure 1.2 illustrates the core GAN architecture.
Figure 1.2. The two GAN subnetworks, their inputs and outputs, and their interactions
Let’s walk through the details of the diagram:
1. Training dataset— The dataset of real examples that we want the Generator to learn to emulate with near-perfect quality. In this case, the dataset consists of images of handwritten digits. This dataset serves as input (x) to the Discriminator network.
2. Random noise vector— The raw input (z) to the Generator network. This input is a vector of random numbers that the Generator uses as a starting point for synthesizing fake examples.
3. Generator network— The Generator takes in a vector of random numbers (z) as input and outputs fake examples (x*). Its goal is to make the fake examples it produces indistinguishable from the real examples in the training dataset.
4. Discriminator network— The Discriminator takes as input either a real example (x) coming from the training set or a fake example (x*) produced by the Generator. For each example, the Discriminator determines and outputs the probability of whether the example is real.
5. Iterative training/tuning— For each of the Discriminator’s predictions, we determine how good it is—much as we would for a regular classifier—and use the results to iteratively tune the Discriminator and the Generator networks through backpropagation:
- The Discriminator’s weights and biases are updated to maximize its classification accuracy (maximizing the probability of correct prediction: x as real and x* as fake).
- The Generator’s weights and biases are updated to maximize the probability that the Discriminator misclassifies x* as real.
1.3.1. GAN training GAN训练
Learning about the purpose of the various GAN components may feel like looking at a snapshot of an engine: it cannot be understood fully until we see it in motion. That’s what this section is all about. First, we present the GAN training algorithm; then, we illustrate the training process so you can see the architecture diagram in action.
GAN training algorithm
For each training iteration do
1. Train the Discriminator:
1. Take a random real example x from the training dataset.
2. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.
3. Use the Discriminator network to classify x and x*.
4. Compute the classification errors and backpropagate the total error to update the Discriminator’s trainable parameters, seeking to minimize the classification errors.
2. Train the Generator:
1. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.
2. Use the Discriminator network to classify x*.
3. Compute the classification error and backpropagate the error to update the Generator’s trainable parameters, seeking to maximize the Discriminator’s error.
4. 计算分类损失，反向传播总损失更新识别器参数， 以减少分类损失
1. 获取随机噪声向量z，使用生成器网络，合成假样本 x*
3. 计算分类损失，反向传播更新生成器参数， 以增大识别器损失
GAN training visualized
Figure 1.3 illustrates the GAN training algorithm. The letters in the diagram refer to the list of steps in the GAN training algorithm.
1.3.2. Reaching equilibrium 达到平衡
You may wonder when the GAN training loop is meant to stop. More precisely, how do we know when a GAN is fully trained so that we can determine the appropriate number of training iterations? With a regular neural network, we usually have a clear objective to achieve and measure. For example, when training a classifier, we measure the classification error on the training and validation sets, and we stop the process when the validation error starts getting worse (to avoid overfitting). In a GAN, the two networks have competing objectives: when one network gets better, the other gets worse. How do we determine when to stop?
Those familiar with game theory may recognize this setup as a zero-sum game—a situation in which one player’s gains equal the other player’s losses. When one player improves by a certain amount, the other player worsens by the same amount. All zero-sum games have a Nash equilibrium, a point at which neither player can improve their situation or payoff by changing their actions.
GAN reaches Nash equilibrium when the following conditions are met:
- The Generator produces fake examples that are indistinguishable from the real data in the training dataset.
- The Discriminator can at best randomly guess whether a particular example is real or fake (that is, make a 50/50 guess whether an example is real).
Nash equilibrium is named after the American economist and mathematician John Forbes Nash Jr., whose life story and career were captured in the biography titled A Beautiful Mind and inspired the eponymous film.
Let us convince you of why this is the case. When each of the fake examples (x*) is truly indistinguishable from the real examples (x) coming from the training dataset, there is nothing the Discriminator can use to tell them apart from one another. Because half of the examples it receives are real and half are fake, the best the Discriminator can do is to flip a coin and classify each example as real or fake with 50% probability.
The Generator is likewise at a point where it has nothing to gain from further tuning. Because the examples it produces are already indistinguishable from the real ones, even a tiny change to the process it uses to turn the random noise vector (z) into a fake example (x*) may give the Discriminator a cue for how to discern the fake example from the real data, making the Generator worse off.
With equilibrium achieved, GAN is said to have converged. Here is when it gets tricky. In practice, it is nearly impossible to find the Nash equilibrium for GANs because of the immense complexities involved in reaching convergence in nonconvex games (more on convergence in later chapters, particularly chapter 5). Indeed, GAN convergence remains one of the most important open questions in GAN research.
Fortunately, this has not impeded GAN research or the many innovative applications of generative adversarial learning. Even in the absence of rigorous mathematical guarantees, GANs have achieved remarkable empirical results. This book covers a selection of the most impactful ones, and the following section previews some of them.
1.4. Why study GANs? 为什么研究GAN
Since their invention, GANs have been hailed by academics and industry experts as one of the most consequential innovations in deep learning. Yann LeCun, the director of AI research at Facebook, went so far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”
:See “Google’s Dueling Neural Networks Spar to Get Smarter,” by Cade Metz, Wired, 2017, http://mng.bz/KE1X.
The excitement is well justified. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from anyone else, GANs have captured the imagination of researchers and the wider public alike. They have been covered by the New York Times, the BBC, Scientific American, and many other prominent media outlets. Indeed, it was one of those exciting GAN results that probably drove you to buy this book in the first place. (Right?)
Perhaps most notable is the capacity of GANs to create hyperrealistic imagery. None of the faces in figure 1.4 belongs to a real human; they are all fake, showcasing GANs’ ability to synthesize images with photorealistic quality. The faces were produced using Progressive GANs, a technique covered in chapter 6.
Figure 1.4. These photorealistic but fake human faces were synthesized by a Progressive GAN trained on high-resolution portrait photos of celebrities.
(Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” by Tero Karras et al., 2017, https://arxiv.org/abs/1710.10196.)
Another remarkable GAN achievement is image-to-image translation. Similarly to the way a sentence can be translated from, say, Chinese to Spanish, GANs can translate an image from one domain to another. As shown in figure 1.5, GANs can turn an image of a horse into an image of zebra (and back!), and a photo into a Monet-like painting—all with virtually no supervision and no labels whatsoever. The GAN variant that made this possible is called CycleGAN; you’ll learn all about it in chapter 9.
Figure 1.5. By using a GAN variant called CycleGAN, we can turn a Monet painting into a photograph or turn an image of a zebra into a depiction of a horse, and vice versa.
(Source: See “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” by Jun-Yan Zhu et al., 2017, https://arxiv.org/abs/1703.10593.)
The more practically minded GAN use cases are just as fascinating. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations: by analyzing countless outfits, the system learns to produce new items matching any given style. In medical research, GANs are used to augment datasets with synthetic examples to improve diagnostic accuracy. In chapter 11—after you’ve mastered the ins and outs of training GANs and their variants—you’ll explore both of these applications in detail.
: See “Amazon Has Developed an AI Fashion Designer,” by Will Knight, MIT Technology Review, 2017, http://mng.bz/9wOj.
: See “Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification,” by Maayan Frid-Adar et al., 2018, https://arxiv.org/abs/1801.02385.
GANs are also seen as an important stepping stone toward achieving artificial general intelligence, an artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain—from motor skills involved in walking, to language, to creative skills needed to compose sonnets.
: See “OpenAI Founder: Short-Term AGI Is a Serious Possibility,” by Tony Peng, Synced, 2018, http://mng.bz/j5Oa. See also “A Path to Unsupervised Learning Through Adversarial Networks,” by Soumith Chintala, f Code, 2016, http://mng.bz/WOag.
But with the ability to generate new data and imagery, GANs also have the capacity to be dangerous. Much has been discussed about the spread and dangers of fake news, but the potential of GANs to create credible fake footage is disturbing. At the end of an aptly titled 2018 piece about GANs—“How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”—the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk in his 2018 article “The GANfather: The Man Who’s Given Machines the Gift of Imagination”: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale. These concerns are what motivated us to discuss the ethical considerations of GANs in chapter 12.
GANs can do much good for the world, but all technological innovations have misuses. Here the philosophy has to be one of awareness: because it is impossible to “uninvent” a technique, it is crucial to make sure people like you are aware of this technique’s rapid emergence and its substantial potential.
In this book, we are only able to scratch the surface of what is possible with GANs. However, we hope that this book will provide you with the necessary theoretical knowledge and practical skills to continue exploring any facet of this field that you find most interesting.
So, without further ado, let’s dive in!
- GANs are a deep learning technique that uses a competitive dynamic between two neural networks to synthesize realistic data samples, such as fake photorealistic imagery. The two networks that constitute a GAN are as follows:
- The Generator, whose goal is to fool the Discriminator by producing data indistinguishable from the training dataset
- The Discriminator, whose goal is to correctly distinguish between real data coming from the training dataset and the fake data produced by the Generator
- GANs have extensive applications across many different sectors, such as fashion, medicine, and cybersecurity.