CS 5043: HW7: Conditional Generative Adversarial Networks
Assignment notes:
- Deadline: Tuesday, April 23rd @11:59pm.
- Hand-in procedure: submit a zip file to Gradescope
- This work is to be done on your own. However, you may share
solution-specific code snippets in the open on
Slack (only!), but not full solutions. In addition, downloading
solution-specific code is not allowed.
The Problem
For this assignment, we will create a network that generates synthetic
satellite images given a semantic labeling of an image. Specifically,
we will come back to the Chesapeake Bay data set. The inputs to our
generator will be a 1-hot encoded representation of pixel-level class
labels (7 classes in total) and a set of tensors that contain random
values. In response, the generator will produce a RGB image that
plausibly matches the semantic labels. Note that this is not a
one-to-one mapping: for a single semantic input, the generator should
produce different images over a set of queries. This is possible
because:
- We only require that the generator fool a discriminator network
into "believing" that the image is real.
- For each generated image, the random inputs will be different.
Deep Learning Experiment
Implement a Generative Adversarial Network to solve this problem.
This GAN requires the implementation of three different Keras Models;
the two base models are very similar to what you implemented in HW 3
and 4:
- A discriminator model (a classifier!) that takes as input two
input tensors: a RGB image and the semantic labeling of the image, and
produces as output a probability that the input image is real
and corresponds to the semantic input. We will train this
model on its own.
- A generator model (a U-net!) that takes as input the semantic
label image and several noise tensors, and produces a RGB image
as output. We will not directly train this model - instead, it
is just a means for producing synthetic images.
- A "meta-model" that combines the two. This meta-model takes as
input the semantic label image and several noise tensors, and
produces a scalar output probability that the generated image
is real (and corresponds to the semantic input). We will use
this model to train the generator.
Data Set
We will use the same Chesapeake data loader as we did in HW 4. The
provided notebook file gives an example of using this loader such that
your data sets will be composed of batches of Image / Semantic Label
pairs.
For the sake of computational feasibility, we will be using
image_size=64 (i.e., 64x64 images).
Training Process
With GANs, we have two models that are attempting to satisfy competing
objectives: the discriminator is trying to differentiate real from
generated images, while the generator is attempting to produce images
that fool the discriminator. Training the combined system requires us
to switch between performing one or more epochs of gradient descent on
the discriminator and then performing one or more epochs on the
meta-model. This process is repeated over a large set of meta
epochs.
As we discussed in class and is provided in gan_train_loop.py, for
each meta epoch, we sample three batches from the training data set. These
are:
- Real Image / Semantic Label pairs. The objective of the
discriminator is to label these as real (output = 1)
- Generated Image / Semantic Label pairs, where the generated
image is a function of the semantic labels. The objective of the
discriminator is to label these as fake (output = 0), while the
objective of the generator is for the discriminator to label
them as real (output = 1).
- Real Images / Semantic Labels, but the labels are shuffled
across the examples so that the pairs do not correspond to each
other. The objective of the discriminator is to label these as
fake (output = 0),
U-Net Details
The U-net that you are implementing is not unlike what you implemented
in HW 4. The key difference is that we are also bringing in random
tensors. These random tensors should be connected to the decoder side
of the U-net. I recommend that this be done using concatenation (and
not addition). The connections should be:
- Some point in the middle of the stack at the bottom of the U.
- Immediately following each UpSampling step (in fact, these
concatenations can happen at the same time that the skip
connections are concatenated).
For a 64x64 base image, and two down and up stages in the U, you will
need three random tensors whose rows and columns match the
image size of that stage. Their shapes will be:
- 64, 64, 1
- 32, 32, 1
- 16, 16, 1
The training loop includes code that will generate the random numpy
arrays of these shapes.
Experiments
Settle on a complete architecture implementation. You are welcome to
use your HW3 and HW4 model building implementations as starting
points. Or - you may use the provided network_support.py
implementation tools. Get this model working well.
Next, make one change to this model and train it to its best
performance. Possible experiments to try include:
- Dropping the third case from the meta epoch (real images and
labels, but with the labels shuffled).
- Increasing the depth of your U by at least two more stages.
- Changing the inclusion of the random tensors to only use a
single tensor at the bottom of the U.
- Increasing the resolution of the images (128x128 or 256x256)
Once you have completed your experiment with the two models, produce
the following:
- Figure 0a,b,c,d: Model architectures from plot_model()
(including the two different generators).
- Figure 1a,b: for each generator, show an interesting set of 25
examples (i.e., there should be good variety in the semantic
label inputs, but there should also be a couple of examples
where the semantic labels are identical).
- Figure 2a,b: for each discriminator, show the distribution of the output
probabilities. For each discriminator, there should be three
histograms (one for each batch type).
- Reflection: answer the following questions:
- Describe your experimental modification to the model.
- Describe how this modification changed the model's
performance. Answer in detail.
What to Hand In
Turn in a single zip file that contains:
- All of your python code (.py) and any notebook file (.ipynb)
- Figures 0-2
- Reflection
Grading
- 30 pts: Clean, general code for model building (including
in-code documentation)
- 10 pts: Figures 0a,b,c,d
- 15 pts: Figure 1a,b
- 15 pts: Figure 2a,b
- 15 pts: Convincing generated images (interesting textures and
no strange colors)
- 15 pts: Reflection
- 10 pts: Bonus if you can convincingly generate roads, buildings
or water features
Hints
- You should use the dnn environment.
- Don't forget to turn off trainability of the
discriminator after you compile the discriminator model and
before you compile the meta-model.
- Use binary_cross_entropy as your loss function.
- The discriminator should quickly start to perform above
baseline performance, and stay at that level through most of
the training process.
- Three up/down stages seem to produce reasonable performance.
- I use 2-3 convolutional layers per stage in the U, and kernel
size of 3.
- This problem can feasibly be completed in CPU mode on a
not-powerful computer (including laptops). However, using a
GPU (even on a laptop) can improve training time by a factor of
5.
Frequently Asked Questions
- Yes, we only need to use the training data set for this
assignment.
- There is no need to use WandB.
andrewhfagg -- gmail.com
Last modified: Sat Apr 20 14:08:55 2024