生成对抗网络训练
深度学习(Deep Learning)
什么是生成对抗网络(GAN)?(What are Generative Adversarial Networks (GANs)?)
Designed by Ian Goodfellow and his colleagues in 2014, GANs consist of two neural networks that are trained together in a zero-sum game where one player’s loss is the gain of another.
GAN由Ian Goodfellow和他的同事在2014年设计,由两个神经网络组成,它们在零和游戏中一起训练,其中一个玩家的损失是另一个玩家的损失。
To understand GANs we need to be familiar with generative models and discriminative models.
要了解GAN,我们需要熟悉生成模型和判别模型。
Generative models try to output new data points using the distribution from the training set. These models generate new data instances. These models capture joint probability p(X, Y)
生成模型尝试使用训练集中的分布来输出新的数据点。 这些模型生成新的数据实例。 这些模型捕获联合概率p(X,Y)
Types of Generative models1. Explicit density models2. Implicit density models
生成模型的类型1.显式密度模型2。 隐式密度模型
Explicit density models define an explicit density function while implicit density models define a stochastic procedure that can directly generate data.
显式密度模型定义了显式密度函数,而隐式密度模型定义了可以直接生成数据的随机过程。
If you are interested in reading more about generative models, check out this popular GitHub repository below.
如果您有兴趣阅读有关生成模型的更多信息,请在下面查看此流行的GitHub存储库。
On the other hand, Discriminative models capture conditional probabilities p(X/Y) and they differentiate different data instances.
另一方面,判别模型捕获条件概率p(X / Y),并区分不同的数据实例。

Generative models solve difficult tasks. The level of attention detail is more compared to that in discriminative models. Simply speaking generative models do more work. Generative models try to approximate the real data distribution as closely as possible.
生成模型可以解决难题。 与区分模型相比,关注细节的级别更高。 简而言之,生成模型可以做更多的工作。 生成模型试图尽可能接近实际数据的分布。
In the figure above we can see that the discriminative model tries to separate 0’s and 1’s data space. Whereas the generative models closely approximate the 0’s and 1’s data space.
在上图中,我们可以看到判别模型试图将0和1的数据空间分开。 生成模型非常接近0和1的数据空间。
Now that you know the basic definitions of generative and discriminative models, let us learn about GANs.
既然您已经知道了生成模型和判别模型的基本定义,那么让我们了解GAN。
鉴别器和生成器网络-GAN游戏 (The Discriminator & Generator networks — The GAN Game)

Generative Adversarial Networks (GANs) are generative models. They generate whole images in parallel. GANs consist of 2 networks: Discriminator & Generator networks
生成对抗网络(GAN)是生成模型。 它们并行生成整个图像。 GAN包含2个网络:鉴别器和生成器网络

GANs use a differentiable function. This is usually a neural network. We call it the generator network. This generator network takes random inputs. These inputs are noise. This noise is given to a differentiable function that transforms and reshapes the same into a recognizable structure. This could be an image and the same is highly dependent on the noise at the input of the differentiable function.
GAN使用可微函数。 这通常是一个神经网络。 我们称其为发电机网络。 该发电机网络采用随机输入。 这些输入是噪声。 将此噪声提供给微分函数,该函数将其转换并重塑为可识别的结构。 这可能是图像,并且高度依赖于微分函数输入端的噪声。
For various noise inputs, we can generate many images. However, the generator network immediately doesn't start giving out realistic images. We need to train it.
对于各种噪声输入,我们可以生成许多图像。 但是,生成器网络不会立即开始发出逼真的图像。 我们需要训练它。
How do we train this generator network? Probably the same way as any other network? Actually no!
我们如何训练发电机网络? 可能与其他任何网络相同? 其实不行!
Generator networks see many images and try to output something similar to the same probability distribution. How is that done? 👀
生成器网络看到许多图像,并尝试输出类似于相同概率分布的图像。 怎么做? 👀
Here comes the Discriminator, a regular neural network classifier. The discriminator guides our generator network.
这是鉴别器,一个常规的神经网络分类器。 鉴别器指导我们的发电机网络。
For the sake of simplicity let us call the output images of the generator network as fake images. The output of the generator, the fake images, are given to the discriminator as the input. The Discriminator also sees so-called real images from the training data. The discriminator then outputs the probability that the input is a real image. So a 1 for real images, and 0 for fake images. Meanwhile, the generator also tries to output images that could be assigned a probability of 1 by the discriminator.
为了简单起见,让我们将生成器网络的输出图像称为伪图像。 生成器的输出(伪图像)作为输入提供给鉴别器。 鉴别器还从训练数据中看到所谓的真实图像。 然后,鉴别器输出输入为真实图像的概率。 因此对于真实图像为1,对于伪图像为0。 同时,生成器还尝试输出可能由鉴别器分配为1的概率的图像。
Most machine learning models try to minimize some cost function by optimizing the parameters. If we were to assign cost functions to GANs, we can say the cost for the discriminator is negative of the cost of the generator and vice versa.
大多数机器学习模型都试图通过优化参数来最小化某些成本函数。 如果我们要为GAN分配成本函数,那么可以说,鉴别器的成本是生成器成本的负数,反之亦然。
So let us try to understand how GANs work by assuming discriminator and generators as 2 players and a say a function f.
因此,让我们通过将鉴别器和生成器假设为2个参与者并说一个函数f来尝试了解GAN的工作原理。
The generator tries to decrease the output value of the function f, while the discriminator tries to increase it. Let us assume this is done until we reach an equilibrium where neither the generator can decrease the output value of the function f, nor the discriminator can increase it. Since we use 2 optimization algorithms simultaneously, one for the generator and the other for the discriminator, we may never reach an equilibrium. Adam optimizer is a good choice.
生成器试图减小函数f的输出值,而鉴别器试图增大函数f的输出值。 让我们假设这样做一直到达到平衡为止,在该平衡下,生成器既不能减小函数f的输出值,也不能使鉴别器增大它的值。 由于我们同时使用两种优化算法,一种用于生成器,另一种用于鉴别器,因此我们可能永远无法达到平衡。 Adam优化器是一个不错的选择。
Briefly speaking, the generator and discriminator compete, where the generator gives fake data to the discriminator. The discriminator which also sees training data, predicts if the received image is real or fake.
简而言之,生成器和鉴别器竞争,其中生成器将虚假数据提供给鉴别器。 鉴别器还可以查看训练数据,预测接收到的图像是真实的还是伪造的。
Look at this example below from google developers' machine learning crash course.
在Google开发人员的机器学习速成课程中查看以下示例。
The generator begins with unrealistic images and quickly learns to fool the discriminator.
生成器从不真实的图像开始,并Swift学会欺骗鉴别器。

Thus, the generator is trained over time to fool the discriminator to make it look like the fake images are much like the real ones that the discriminator sees.
因此,随着时间的流逝训练生成器以欺骗鉴别器,以使其看起来像伪造图像非常类似于鉴别器看到的真实图像。
那么培训过程如何? (So how does the training process look like?)
During the training process of the discriminator, it is shown real images and uses computes the discriminator loss. It classifies both real and fake images from the generator and discriminator loss penalizes the discriminator if any image is incorrectly classified. Through backpropagation, the discriminator updates its weights.
在鉴别器的训练过程中,它显示为真实图像,并用于计算鉴别器损耗。 它对来自生成器的真实和伪造图像进行分类,如果对任何图像进行了不正确分类,则鉴别器损失将对鉴别器进行惩罚。 通过反向传播,鉴别器更新其权重。
Similarly, the generator is given noisy inputs to generate fake images. These images are given to the discriminator and the generator loss penalizes the generator for producing a sample that the discriminator network classifies as fake. Weights are updated through backpropagation right from the discriminator into the generator.
类似地,为生成器提供了噪声输入以生成伪图像。 这些图像被提供给鉴别器,并且发生器损失惩罚了发生器以产生鉴别器网络分类为伪造的样本。 权重通过从鉴别器到生成器的反向传播进行更新。
It is important to note that the generator must be constant during the discriminator training phase. Similarly, the discriminator remains constant during the generator training phase. Thus GAN training proceeds in an alternating fashion.
重要的是要注意,在鉴别器训练阶段,发生器必须是恒定的。 同样,鉴别器在发电机训练阶段保持不变。 因此,GAN训练以交替方式进行。
甘妮丝 (MNIST GAN)
In this section, we will learn to design a GAN that can generate new images of handwritten digits. We will use the famous MNIST dataset. Get it here.
在本节中,我们将学习设计可以生成手写数字新图像的GAN。 我们将使用著名的MNIST数据集。 在这里获取。
鉴别器架构 (The Discriminator architecture)
The discriminator is going to be a typical linear classifier.
鉴别器将是典型的线性分类器。
The activation function we will be using is Leaky ReLu.
我们将使用的激活功能是Leaky ReLu 。

Why leaky ReLu?We should use a leaky ReLU to allow gradients to flow backward through the layer unhindered. A leaky ReLU is like a normal ReLU, except that there is a small non-zero output for negative input values.
为什么泄漏ReLu? 我们应该使用泄漏的ReLU,以使渐变不受阻碍地流回图层。 泄漏的ReLU与正常的ReLU相似,只是负输入值有一个小的非零输出。
class Discriminator(nn.Module):
def __init__(self, input_size, hidden_dim, output_size):
super(Discriminator, self).__init__()
# define hidden linear layers
self.fc1 = nn.Linear(input_size, hidden_dim*4)
self.fc2 = nn.Linear(hidden_dim*4, hidden_dim*2)
self.fc3 = nn.Linear(hidden_dim*2, hidden_dim)
# final fully-connected layer
self.fc4 = nn.Linear(hidden_dim, output_size)
# dropout layer
self.dropout = nn.Dropout(0.3)
def forward(self, x):
# flatten image
x = x.view(-1, 28*28)
# all hidden layers
x = F.leaky_relu(self.fc1(x), 0.2) # (input, negative_slope=0.2)
x = self.dropout(x)
x = F.leaky_relu(self.fc2(x), 0.2)
x = self.dropout(x)
x = F.leaky_relu(self.fc3(x), 0.2)
x = self.dropout(x)
# final layer
out = self.fc4(x)
return out
发电机架构(The Generator architecture)
The generator uses latent samples to make fake images. These latent samples are vectors which are mapped to the fake images. A latent vector is just a compressed, feature-level representation of an image!
生成器使用潜在样本制作伪造图像。 这些潜在样本是映射到伪图像的向量。 一种 潜矢量只是图像的压缩特征级表示!
To understand what is a latent sample, consider an autoencoder. The outputs that connect the encoder and decoder portion of a network are made up of a compressed representation that could also be referred to as a latent vector.
要了解什么是潜在样本,请考虑使用自动编码器。 连接网络的编码器和解码器部分的输出由压缩表示形式组成,该压缩表示形式也可以称为潜在矢量。
The activation function for all the layers remains the same except we will be using Tanh at the output.
除了将在输出中使用Tanh之外,所有层的激活功能均保持不变。

Why Tanh at the output?The generator has been found to perform the best with 𝑡𝑎𝑛ℎtanh for the generator output, which scales the output to be between -1 and 1, instead of 0 and 1.
为什么在输出中使用Tanh? 已经发现,对于发电机输出,发电机以tanh表现最佳,它将输出缩放到-1和1之间,而不是0和1之间。
class Generator(nn.Module):
def __init__(self, input_size, hidden_dim, output_size):
super(Generator, self).__init__()
# define hidden linear layers
self.fc1 = nn.Linear(input_size, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, hidden_dim*2)
self.fc3 = nn.Linear(hidden_dim*2, hidden_dim*4)
# final fully-connected layer
self.fc4 = nn.Linear(hidden_dim*4, output_size)
# dropout layer
self.dropout = nn.Dropout(0.3)
def forward(self, x):
# all hidden layers
x = F.leaky_relu(self.fc1(x), 0.2) # (input, negative_slope=0.2)
x = self.dropout(x)
x = F.leaky_relu(self.fc2(x), 0.2)
x = self.dropout(x)
x = F.leaky_relu(self.fc3(x), 0.2)
x = self.dropout(x)
# final layer with tanh applied
out = F.tanh(self.fc4(x))
return out
缩放图像(Scaling images)
We want the output of the generator to be comparable to the real images pixel values, which are normalized values between 0 and 1. Thus, we’ll also have to scale our real input images to have pixel values between -1 and 1 when we train the discriminator. This will be done during the training phase.
我们希望生成器的输出与真实图像的像素值相当,后者是介于0和1之间的归一化值。因此,当我们输入真实图像时,我们还必须缩放真实输入图像的像素值,使其介于-1和1之间训练鉴别器。 这将在培训阶段完成。
概括 (Generalization)
To help the discriminator generalize better, the labels are reduced a bit from 1.0 to 0.9. For this, we’ll use the parameter smooth; if True, then we should smooth our labels. In PyTorch, this looks like:labels = torch.ones(size) * 0.9
为了帮助区分器更好地泛化,将标签从1.0减少到0.9 。 为此,我们将使用参数smooth。 如果为True,则应使标签平滑。 在PyTorch中,这看起来像: labels = torch.ones(size) * 0.9
We also made use of dropout layers to avoid overfitting.
我们还利用辍学层来避免过度拟合。
损失计算 (Loss calculation)
The discriminator’s goal is to output a 1 for real and 0 for fake images. On the other hand, the generator wants to make fake images that closely resemble the real ones.
判别器的目标是为真实图像输出1,为伪图像输出0。 另一方面,生成器想要制作与真实图像非常相似的伪图像。
Thus we can say if “D” represents the loss for the discriminator, then the following can be stated:The goal of discriminator : D(real_images)=1 & D(fake_images)=0The goal of generator: D(real_images)=0 & D(fake_images)=1
因此,我们可以说,如果“ D”代表鉴别器的损失,则可以说明以下内容: 鉴别目标: D(real_images)= 1&D(fake_images)= 0 生成器的目标: D(real_images)= 0和D(fake_images)= 1
# Calculate losses
def real_loss(D_out, smooth=False):
batch_size = D_out.size(0)
# label smoothing
if smooth:
# smooth, real labels = 0.9
labels = torch.ones(batch_size)*0.9
else:
labels = torch.ones(batch_size) # real labels = 1
# numerically stable loss
criterion = nn.BCEWithLogitsLoss()
# calculate loss
loss = criterion(D_out.squeeze(), labels)
return loss
def fake_loss(D_out):
batch_size = D_out.size(0)
labels = torch.zeros(batch_size) # fake labels = 0
criterion = nn.BCEWithLogitsLoss()
# calculate loss
loss = criterion(D_out.squeeze(), labels)
return loss
We will use BCEWithLogitsLoss, which combines a sigmoid activation function (we want the discriminator to output a value 0–1 indicating whether an image is real or fake) and binary cross-entropy loss.
我们将使用BCEWithLogitsLoss ,它结合了S型激活函数(我们希望鉴别器输出值0–1指示图像是真实的还是伪造的)和二进制交叉熵损失。

训练 (Training)
As mentioned earlier, Adam is a suitable optimizer.
如前所述,Adam是合适的优化器。
The generator takes in a vector z and outputs fake images. The discriminator alternates between training on the real images and that of the fakes images produced by the generator.
生成器接收向量z并输出伪图像。 鉴别器在对真实图像的训练与由生成器产生的伪造图像的训练之间交替。
鉴别器培训涉及的步骤: (Steps involved in discriminator training:)
- We first compute the loss on real images 我们首先计算真实图像上的损失
- Generate fake images生成假图片
- Compute loss on fake images计算假图片损失
- Add the loss of the real and fake images添加真实和伪造图像的损失
- Perform backpropagation and update weights of the discriminator执行反向传播并更新鉴别器的权重
发电机培训涉及的步骤:(Steps involved in generator training:)
- Generate fake images 生成假图片
- Compute loss on fake images with inversed labels计算带有反标签的假图片的损失
- Perform backpropagation and update the weights of the generator.执行反向传播并更新发生器的权重。
import torch.optim as optim
# Optimizers
lr = 0.002
# Create optimizers for the discriminator and generator
d_optimizer = optim.Adam(D.parameters(), lr)
g_optimizer = optim.Adam(G.parameters(), lr)
import pickle as pkl
# training hyperparams
num_epochs = 100
# keep track of loss and generated, "fake" samples
samples = []
losses = []
print_every = 400
# Get some fixed data for sampling. These are images that are held
# constant throughout training, and allow us to inspect the model's performance
sample_size=16
fixed_z = np.random.uniform(-1, 1, size=(sample_size, z_size))
fixed_z = torch.from_numpy(fixed_z).float()
# train the network
D.train()
G.train()
for epoch in range(num_epochs):
for batch_i, (real_images, _) in enumerate(train_loader):
batch_size = real_images.size(0)
## Important rescaling step ##
real_images = real_images*2 - 1 # rescale input images from [0,1) to [-1, 1)
# ============================================
# TRAIN THE DISCRIMINATOR
# ============================================
d_optimizer.zero_grad()
# 1. Train with real images
# Compute the discriminator losses on real images
# smooth the real labels
D_real = D(real_images)
d_real_loss = real_loss(D_real, smooth=True)
# 2. Train with fake images
# Generate fake images
z = np.random.uniform(-1, 1, size=(batch_size, z_size))
z = torch.from_numpy(z).float()
fake_images = G(z)
# Compute the discriminator losses on fake images
D_fake = D(fake_images)
d_fake_loss = fake_loss(D_fake)
# add up loss and perform backprop
d_loss = d_real_loss + d_fake_loss
d_loss.backward()
d_optimizer.step()
# =========================================
# TRAIN THE GENERATOR
# =========================================
g_optimizer.zero_grad()
# 1. Train with fake images and flipped labels
# Generate fake images
z = np.random.uniform(-1, 1, size=(batch_size, z_size))
z = torch.from_numpy(z).float()
fake_images = G(z)
# Compute the discriminator losses on fake images
# using flipped labels!
D_fake = D(fake_images)
g_loss = real_loss(D_fake) # use real loss to flip labels
# perform backprop
g_loss.backward()
g_optimizer.step()
# Print some loss stats
if batch_i % print_every == 0:
# print discriminator and generator loss
print('Epoch [{:5d}/{:5d}] | d_loss: {:6.4f} | g_loss: {:6.4f}'.format(
epoch+1, num_epochs, d_loss.item(), g_loss.item()))
## AFTER EACH EPOCH##
# append discriminator loss and generator loss
losses.append((d_loss.item(), g_loss.item()))
# generate and save sample, fake images
G.eval() # eval mode for generating samples
samples_z = G(fixed_z)
samples.append(samples_z)
G.train() # back to train mode
# Save training generator samples
with open('train_samples.pkl', 'wb') as f:
pkl.dump(samples, f)
训练损失(Training loss)
We shall plot generator and discriminator losses against the number of epochs.
我们将生成器和鉴别器的损耗与历时数作图。
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator')
plt.plot(losses.T[1], label='Generator')
plt.title("Training Losses")
plt.legend()
plt.show()

发生器产生的样品 (Samples generated by the generator)
At the start
在开始时
# helper function for viewing a list of passed in sample images
def view_samples(epoch, samples):
fig, axes = plt.subplots(figsize=(7,7), nrows=4, ncols=4, sharey=True, sharex=True)
for ax, img in zip(axes.flatten(), samples[epoch]):
img = img.detach()
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
im = ax.imshow(img.reshape((28,28)), cmap='Greys_r')
# Load samples from generator, taken while training
with open('train_samples.pkl', 'rb') as f:
samples = pkl.load(f)
# -1 indicates final epoch's samples (the last in the list)
view_samples(-1, samples)

Overtime
随着时间的推移
rows = 10 # split epochs into 10, so 100/10 = every 10 epochs
cols = 6
fig, axes = plt.subplots(figsize=(7,12), nrows=rows, ncols=cols, sharex=True, sharey=True)
for sample, ax_row in zip(samples[::int(len(samples)/rows)], axes):
for img, ax in zip(sample[::int(len(sample)/cols)], ax_row):
img = img.detach()
ax.imshow(img.reshape((28,28)), cmap='Greys_r')
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)

This way the generator starts out with noisy images and learns over time.
这样,生成器便从嘈杂的图像开始,并随着时间的推移而学习。
You can check out the code and readme file on my GitHub profile as well.
您也可以在我的GitHub个人资料上签出代码和自述文件。
结论 (Conclusions)
Since the time Ian Goodfellow and his colleagues at the University of Montreal designed GANs, they exploded with popularity. The number of applications is remarkable. GANs were further improved by many variations some of which are CycleGAN, Conditional GAN, Progressive GAN, etc. To read more about these check out this link. Now open a Jupyter notebook and try to implement whatever you learned.
自从Ian Goodfellow和他的蒙特利尔大学同事设计GAN以来,它们便受到了欢迎。 申请数量惊人。 通过许多变体进一步改进了GAN,其中包括CycleGAN,条件GAN,渐进GAN等。要了解更多有关这些的信息,请查看此链接。 现在打开Jupyter笔记本,并尝试实施所学知识。
谢谢。 下一个见。 (Thank you. See you at the next one.)
翻译自: https://towardsdatascience.com/generative-adversarial-networks-6a17673db367
生成对抗网络训练