CS 5043: HW3: Convolutional Neural Networks
Objectives
- Implement general code that constructs Convolutional Neural
Network models based on a set of parameters.
- Identify two model configurations that perform reasonably well on the
validation set.
- Perform a full set of 5 rotations to demonstrate consistency in
the model performance.
Assignment Notes
- Deadline: Thursday, March 9th @11:59pm.
- Hand-in procedure: submit to a zip file to the HW3 dropbox on
Gradescope (details below)
- This work is to be done mostly on your own. As in previous
assignments,
general discussion about Python, Keras and Tensorflow is
encouraged. New for this assignment is that you may share
solution-specific code snippets in the open on Slack, but
not full solutions. In addition, downloading
solution-specific code is not allowed.
- Do not submit MSWord documents.
Data Set
The Core50 data set
is a large database of videos of objects as they are being
moved/rotated under a variety of different lighting and background
conditions. Our general task is to classify the object being shown in a
single frame of one of these videos.
Data Organization
Provided Code
We are providing the following code posted on the main course web page:
- hw3_base.py: An experiment-execution module. Parameter
organization, loading data, executing experiment, saving
results
- core50.py: A class that will translate the metadata file
into TF Datasets for the training, validation and testing sets
- hw3.sh: A sample batch file
- exp.txt, oscer.txt, and net_shallow.txt:
command line parameter files that are a starting point for your
work
- job_control.py was provided in a previous assignment
Prediction Problem
We will focus on classifying one of four object classes: mugs, cans,
balls and cups. For HW 3, all object instances are included in each
data fold. However, different folds are composed of different
background conditions. Hence, we are building a model that can
distinguish these objects from one-another in the context of
arbitrary background (i.e., we are building models that can
distinguish all mugs from all cans, balls and cups).
The provided code (starting at load_data_set()) will set up the object
classes and TF Datasets for your.
Architectures
You will create two convolutional neural networks to distinguish these
four classes: one will be a shallow network and the other will be a deep
network. Each will nominally have the following structure:
- One or more convolutional filters, each (possibly) followed by a
max pooling layer.
- Use your favorite activation function
- In most cases, each conv/pooling layer will involve some
degree of size reduction (striding)
- Convolutional filters should not be larger than 5x5
(as the size of the filter gets larger, the memory
requirements explode)
- GlobalMaxPool
- One or more dense layers
- Choose your favorite activation function
- One output layer with four units (one for each class). The
activation for this layer should be softmax
- Loss: sparse categorical cross-entropy. The data set contains a
scalar desired output that is an integer class index
(0,1,2,3). The sparse automatically translates this into
a 1-hot encoded desired output.
- Additional metric: sparse categorical accuracy
Since the data set is relatively small (in terms of the number of
distinct objects), it is important to take steps to
address the over-fitting problem. Here are the key tools that you have:
- L1 or L2 regularization
- Dropout. Only use dropout with Dense layers
- Sparse Dropout. Only use sparse dropout only with Convolutional layers
- Try to keep the number of trainable parameters small (no more
than one million)
Experiments
- The primary objective is to get your model building code
working properly and to execute some simple experiments.
- Spend a little time informally narrowing down the
details of your two architectures, including the
hyper-parameters (layer sizes, dropout, regularization).
Don't spend a lot of time on this step
- Once you have made your choice of architecture for each,
you will perform five rotation for each model (so, a total of
10 independent runs)
-
Figure 1 and 2: Learning curves (validation accuracy and
loss as a function of epoch) for the shallow and deep models.
Put all five curves on a single plot.
- Figure 3: Generate a histogram of test set accuracy (a
total of 10 samples). The shallow and deep samples should have
different colors (an alpha of 0.5 will give the histogram
values some transparency).
- Figure 4: For a small sample of test set images, show
each image and the output probability distribution from your
two models.
Hints / Notes
- Start small: get the pipeline working first on a small network.
- We use a general function for creating networks that takes as
input a set of parameters that define the configuration of the
convolutional layers and dense layers. By changing these
parameters, we can even change the number of layers. This makes
it much easier to try a variety of things without having to
re-implement or copy a lot of code.
- Remember to check your model summary to make sure that it
matches your expectations
- Before executing on the supercomputer, look carefully at your
memory usage (our big model requires almost 10GB of memory)
- CPUS_PER_TASK in the batch file and at the command line should
be about 10
- Our default batch size is 32 (it works on moderate sized
laptops). On the supercomputer, you may be able to go up to
128 or 256
What to Hand In
A single zip file that contains:
- All of your python code, including your network building code
- If your visualization code is a Jupyter Notebook, then export
a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Figures 1-4
- Your batch file(s)
- One sample stdout file
- A written reflection that answers the following questions:
- How many parameters were needed by your shallow and deep
networks?
- What can you conclude from the validation accuracy learning
curves for each of the shallow and deep networks? How
confident are you that you have created models that you
can trust?
- Did your shallow or deep network perform better with
respect to the test set? (no need for a statistical
argument here)
Include this reflection as a separate file or at the end of
your Jupyter notebook
Grading
- 20 pts: Clean code for model building (including documentation)
- 15 pts: Figure 1: Shallow/deep loss learning curves
- 15 pts: Figure 2: Shallow/deep accuracy learning curves
- 15 pts: Figure 3: Test set histograms
- 15 pts: Figure 4: Images + probability distributions
- 20 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Mon Mar 6 23:42:33 2023