Generative Adversarial Networks — Demystifid
For those of you, who have already heard about GANs and are wondering What’s the hype about? should definitely go through this kernel to see the immense potential these new species of networks have.
And for those who have probably heard the name for the first time, you would be all the more amazed when you will learn about these networks.
First things first (what are they?)
GANs are a class of Unsupervised Learning Algorithms that do much more than just recognizing image / voice , predicting or translating. They implement deep neural networks or CNN and are comprised of two parts, pitting one against the other (thus the “adversarial”). These two parts are called the Generator and the Discriminator.
Generator — The generator takes the role of a forger and tries to create music/image/speech from random noise. It learns to map from a latent space to a particular data distribution of interest. It generally implements a Deconvolutional Network to do so.
Discriminator- The Discriminator on the other hand takes the role of the evaluater and tries to distinguish the fake data (created by the Generator) from the real one. It is usually implemented as a Convolutional Network.
With that said what follows is a loop in which -
1. The generator tries to maximize the probability of fooling the Discriminator by making the images(for example) more close to real in each step thereby making the Discriminator classify them as real.
2. And the discriminator guides the generator to produce more realistic images , by classifying it’s images as fake.
Let’s fit this into an analogy
You can think of a GAN as a game of cat and mouse between a counterfeiter (Generator) and a cop (Discriminator). The counterfeiter is learning to create fake money, and the cop is learning to detect the fake money. Both of them are learning and improving. The counterfeiter is constantly learning to create better fakes, and the cop is constantly getting better at detecting them. Competition in this game drives both teams to improve their methods until the counterfeits are indistiguishable from the genuine or real money.
GANs’ have incredible potential, because they can learn to imitate any distribution of data. That is, GANs can learn to create worlds spookily similar to our own in any domain: images, music, speech.
GANs have a variety of applications ranging from reconstructing 3D model of objects from images to creating the 2018 painting *Edmond de Belamy* which sold for $432,500 (Woah).
Some of them are-
* Image denoising
* Inpainting
* Super Resolution
* Structured Prediction
* Exploration in Reinforcement Learning
* Image to Image Translation
In this kernel I’m implementing Deep Convolutional GAN , based on this paper on [DCGAN] which is in contrast to but builds on [Ian GoodFellow’s paper]. GoodFellow’s paper is the first paper on GAN and implements a dense network both in the generator and the discriminator rather than a CNN. I am using the images (without class labels) from the **CIFAR_10** Dataset. These images along with the fake ones will be fed in batches to the Discriminator.Let’s take a look at the steps our GAN will follow-
1. The Generator takes in random numbers and returns an image.
2. This generated image is fed into the Discriminator alongside a stream of images taken from the actual dataset.
3. The Discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.
There are 2 feedback loops :-
1. The Discriminator is in a feedback loop with the ground truth of the images (are they real or fake), which we know.
2. The Generator is in a feedback loop with the Discriminator (did the Discriminator label it real or fake, regardless of the truth).
Without any further delay let’s import the libraries , load the dataset and get going.
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam,SGD
import keras
import matplotlib.pyplot as plt
import sys
import numpy as np
import os
print(os.listdir("../input"))
This code imports several modules from the Keras library and other standard libraries, defines a deep learning model, and prints the list of files in the specified directory.
First, it imports the following modules from Keras.
Input: A class for defining an input layer for a Keras model.
Dense: A class for defining a fully connected layer for a Keras model.
Reshape: A class for reshaping the input or output of a Keras layer.
Flatten: A class for flattening the input or output of a Keras layer.
Dropout: A class for applying dropout regularization to a Keras layer.
BatchNormalization: A class for applying batch normalization to a Keras layer.
Activation: A class for specifying the activation function of a Keras layer.
ZeroPadding2D: A class for adding zero padding to a 2D input.
LeakyReLU: A class for specifying a leaky ReLU activation function.
UpSampling2D: A class for upsampling a 2D input.
Conv2D: A class for defining a 2D convolutional layer for a Keras model.
Sequential: A class for defining a linear stack of layers for a Keras model.
Model: A class for defining a more complex Keras model with multiple inputs and outputs.
Adam: An optimizer class for training a Keras model using the Adam optimization algorithm.
SGD: An optimizer class for training a Keras model using the stochastic gradient descent (SGD) optimization algorithm.
Next, the code defines a deep learning model using Keras. The model architecture is not provided in the code, so we can’t comment on its specifics. However, it likely includes several convolutional layers followed by fully connected layers, with activation functions and dropout regularization applied at appropriate points.
Finally, the code prints the list of files in the “../input” directory. This is a relative path, so the actual directory may differ depending on the location of the script or notebook running this code.
from os import listdir, makedirs
from os.path import join, exists, expanduser
cache_dir = expanduser(join('~', '.keras'))
if not exists(cache_dir):
makedirs(cache_dir)
datasets_dir = join(cache_dir, 'datasets') # /cifar-10-batches-py
if not exists(datasets_dir):
makedirs(datasets_dir)
!cp ../input/cifar-10-python.tar.gz ~/.keras/datasets/
!ln -s ~/.keras/datasets/cifar-10-python.tar.gz ~/.keras/datasets/cifar-10-batches-py.tar.gz
!tar xzvf ~/.keras/datasets/cifar-10-python.tar.gz -C ~/.keras/datasets/
This code first imports necessary functions from the “os” and “os.path” modules, which allow for manipulating the file system. Specifically, the functions being imported are “listdir”, “makedirs”, “join”, “exists”, and “expanduser”.
The code then sets the “cache_dir” variable to the path “~/.keras”, which is the default cache directory used by Keras (a deep learning framework). If this directory does not exist, it is created using the “makedirs” function.
Next, the “datasets_dir” variable is set to the path “~/.keras/datasets”, which is a subdirectory of the cache directory. If this directory does not exist, it is created using the “makedirs” function.
Afterwards, the code copies a file called “cifar-10-python.tar.gz” from a directory called “../input” to the “datasets” directory using the “cp” command. It then creates a symbolic link from this file to a file called “cifar-10-batches-py.tar.gz” using the “ln” command. Finally, it extracts the contents of the “cifar-10-python.tar.gz” file into the “datasets” directory using the “tar” command.
This code appears to be setting up a directory structure and copying some data files into it, likely in preparation for training a neural network on the CIFAR-10 dataset (a common image classification benchmark dataset).
This code loads the CIFAR10 dataset using the Keras deep learning framework’s built-in “load_data” function. The dataset consists of 50,000 32x32 color images in 10 classes, with 5,000 images per class. The images are divided into 2 sets, a training set and a test set. The training set is used to train a machine learning model, while the test set is used to evaluate its performance.
# Load CIFAR10 data
(X_train, y_train), (_, _) = keras.datasets.cifar10.load_data()
# Select a single class images (birds)
X_train = X_train[y_train.flatten() == 2]
Next, the code selects a single class of images, specifically the images in the “birds” class. This is done by indexing the “X_train” array with a boolean array that has the same shape as the “y_train” array, where the value is True if the corresponding element in “y_train” is equal to 2 (the class index for birds), and False otherwise. This filters out all images in the other classes and only leaves the images in the birds class.
The resulting “X_train” array will contain only the images of birds from the CIFAR10 dataset. This can be useful for training a machine learning model that specializes in classifying bird images.
# Input shape
img_rows = 32
img_cols = 32
channels = 3
img_shape = (img_rows, img_cols, channels)
latent_dim = 100
This code defines some variables for use in a deep learning model.
The variables “img_rows” and “img_cols” are both set to 32, which corresponds to the height and width of the input images that will be fed into the model. The variable “channels” is set to 3, which corresponds to the number of color channels in the images (red, green, and blue).
The variable “img_shape” is then defined as a tuple containing the values of “img_rows”, “img_cols”, and “channels”. This tuple represents the shape of the input images that will be fed into the model.
Finally, the variable “latent_dim” is defined and set to 100. This variable represents the dimensionality of the latent space of the model. The latent space is an intermediate representation of the input images that the model uses to generate new images. Increasing the value of “latent_dim” can allow the model to generate more complex and diverse images, but it can also increase the computational complexity of the model.
The Generator
To learn a generator distribution pg over data x,the generator builds a mapping function from a prior noise distribution pz(z) to data space as G(z). The discriminator outputs, a single scalar representing the probability that x came from training data rather than pg.
G and D are both trained simultaneously: we adjust parameters for G to minimize log(1 — D(G(z)) and adjust parameters for D to minimize logD(x), as if they are following the two-player min-max game with value function V (G;D):
In the generator we use a method called [Upsampling] to produce images. I have used Upsampling2D but TransposeConv2d + stride or PixelShuffle could be used alternatively.
def build_generator():
model = Sequential()
model.add(Dense(128 * 8 * 8, activation="relu", input_dim=latent_dim))
model.add(Reshape((8, 8, 128)))
model.add(UpSampling2D())#upsamples to 16*16*128
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(UpSampling2D()) #upsamples to 32*32*128
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(Activation("relu"))
model.add(Conv2D(channels, kernel_size=3, padding="same"))
model.add(Activation("tanh"))
#outputs an image of 32*32*3
noise = Input(shape=(latent_dim,))
img = model(noise)
return Model(noise, img)
This code defines a function called “build_generator” that creates a generator model for a deep learning application. The generator model is used to create new images that are similar to the images in a given dataset.
The generator model is built using the Keras Sequential API, which allows for creating a model layer by layer. The generator model takes as input a vector of random noise with shape (latent_dim,), and outputs an image with shape (img_rows, img_cols, channels).
The model first adds a Dense layer with 128 * 8 * 8 neurons, which corresponds to a fully connected layer with 8x8x128=8192 outputs. This layer takes the random noise vector as input and applies a Rectified Linear Unit (ReLU) activation function to it.
Next, a Reshape layer is added to reshape the output of the Dense layer into a 3D tensor with shape (8, 8, 128). This tensor represents an 8x8 grid of 128-dimensional feature maps.
Then, an UpSampling2D layer is added to double the spatial dimensions of the feature maps, resulting in a 16x16 grid of 128-dimensional feature maps.
A Conv2D layer is then added with 128 filters and a kernel size of 3x3. This layer applies a convolution operation to the feature maps and uses a Rectified Linear Unit (ReLU) activation function.
A BatchNormalization layer is added to normalize the activations of the previous layer.
Another UpSampling2D layer is added to double the spatial dimensions of the feature maps, resulting in a 32x32 grid of 128-dimensional feature maps.
Another Conv2D layer is added with 64 filters and a kernel size of 3x3. This layer applies another convolution operation to the feature maps and uses a Rectified Linear Unit (ReLU) activation function.
Another BatchNormalization layer is added to normalize the activations of the previous layer.
Finally, a Conv2D layer is added with channels filters and a kernel size of 3x3. This layer applies a convolution operation to the feature maps and uses a hyperbolic tangent (tanh) activation function. The output of this layer is an image with shape (img_rows, img_cols, channels), which represents the generated image.
The function returns a Keras Model object that takes as input the random noise vector and outputs the generated image. Overall, this code defines the architecture of the generator model and returns it as a Keras Model object that can be compiled and trained on a dataset.
The Discriminator
The discriminator is also a CNN with leaky ReLU activations. Many activation functions will work fine with this basic GAN architecture. However, leaky ReLUs are very popular because they help the gradients flow easier through the architecture.
A regular ReLU function works by truncating negative values to 0. This has the effect of blocking the gradients to flow through the network. Instead of the function being zero, leaky ReLUs allow a small negative value to pass through. That is, the function computes the greatest value between the features and a small factor.
Leaky ReLUs represent an attempt to solve the dying ReLU problem. This situation occurs when the neurons get stuck in a state in which ReLU units always output 0s for all inputs. For these cases, the gradients are completely shut to flow back through the network.
> *This is especially important for GANs since the only way the generator has to learn is by receiving the gradients from the discriminator.*
Finally, the discriminator needs to output probabilities. We use a Sigmoid Activation for that.
def build_discriminator():
model = Sequential()
model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=img_shape, padding="same"))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
#no normalization for the first layer
model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
model.add(ZeroPadding2D(padding=((0,1),(0,1))))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
model.add(BatchNormalization(momentum=0.8))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
img = Input(shape=img_shape)
validity = model(img)
return Model(img, validity)
This code defines a function called “build_discriminator” that creates a discriminator model for a deep learning application. The discriminator model is used to distinguish between real images from a given dataset and fake images generated by a generator model.
The discriminator model is also built using the Keras Sequential API, but it takes as input an image with shape (img_rows, img_cols, channels) instead of a vector of random noise.
The model first adds a Conv2D layer with 32 filters, a kernel size of 3x3, and a stride of 2. This layer applies a convolution operation to the input image and uses a Leaky Rectified Linear Unit (LeakyReLU) activation function with a slope of 0.2.
A Dropout layer is added to regularize the activations of the previous layer and prevent overfitting.
Another Conv2D layer is added with 64 filters, a kernel size of 3x3, and a stride of 2. This layer applies another convolution operation to the feature maps and uses a LeakyReLU activation function with a slope of 0.2. A ZeroPadding2D layer is added to pad the feature maps with an extra row and column of zeros.
A BatchNormalization layer is added to normalize the activations of the previous layer.
Another Dropout layer is added to regularize the activations of the previous layer.
Another Conv2D layer is added with 128 filters, a kernel size of 3x3, and a stride of 2. This layer applies another convolution operation to the feature maps and uses a LeakyReLU activation function with a slope of 0.2.
Another BatchNormalization layer is added to normalize the activations of the previous layer.
Another Dropout layer is added to regularize the activations of the previous layer.
Another Conv2D layer is added with 256 filters, a kernel size of 3x3, and a stride of 1. This layer applies another convolution operation to the feature maps and uses a LeakyReLU activation function with a slope of 0.2.
Another BatchNormalization layer is added to normalize the activations of the previous layer.
Another Dropout layer is added to regularize the activations of the previous layer.
Finally, a Flatten layer is added to flatten the output of the previous layer into a 1D vector. A Dense layer with a single neuron and a sigmoid activation function is added to output a probability score that represents the likelihood that the input image is real.
The function returns a Keras Model object that takes as input an image and outputs a probability score. Overall, this code defines the architecture of the discriminator model and returns it as a Keras Model object that can be compiled and trained on a dataset.
A lot of changes have been made in GAN’s Architecture since Goodfeloow’s original paper , but some things remain the same :-
* Normalizing the input
* The activation function in all except the last layer of the generator must be a relu.
* The activation in the last layer of the generator which is a Dense Layer is tanh activation.
* Same goes for the discriminator, all the layers except the last have relu as activtaion and the last Dense layer uses Sigmoid Activation.
* We use binary_cross_entropy method to calculate loss in both the adversaries.(Though in some papers like [Wasserstein gan] different loss functions is used)
Now some hacks/tips that have been introduced in papers in the last few years to make GANs better are:-
* Using BatchNormalization in all layers except the input layer in the generator and the output layer in the discriminator.
* Using Adam Optimizer for the generator and SGD for the discriminator.
* Adding some random noise to the labels before feeding them to the discriminator.
* Sampling from a Gaussian Distribution instead of a Uniform distribution.
* Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
* Pre-training the discriminator.
* Adding some noise to the images before feeding them to the discriminator.
It’s not necessary that all of the above tricks will work for your model. You will have to find the ones that do.
# Build and compile the discriminator
discriminator = build_discriminator()
discriminator.compile(loss='binary_crossentropy',
optimizer=Adam(0.0002,0.5),
metrics=['accuracy'])
# Build the generator
generator = build_generator()
# The generator takes noise as input and generates imgs
z = Input(shape=(latent_dim,))
img = generator(z)
# For the combined model we will only train the generator
discriminator.trainable = False
# The discriminator takes generated images as input and determines validity
valid = discriminator(img)
# The combined model (stacked generator and discriminator)
# Trains the generator to fool the discriminator
combined = Model(z, valid)
combined.compile(loss='binary_crossentropy', optimizer=Adam(0.0002,0.5))
This code builds and compiles a generative adversarial network (GAN) model using the discriminator and generator models defined earlier.