CS 5043 HW4: Complex Convolutional Neural Networks
Objectives
- Implement a branching structure for a CNN
Assignment Notes
- Deadline: Thursday, March 23rd @11:59pm.
- Hand-in procedure: submit to a zip file to the HW4 dropbox on
Gradescope (details below)
- This work is to be done largely on your own. As with HW3,
you may share solution-specific code snippets in the open on Slack, but
not full solutions. In addition, downloading
solution-specific code is not allowed.
Data Set
We are using the same Core50 data set as in
HW 3.
Provided Code
We are providing the following code posted on the main course web page:
- An updated core50.py file (corrected caching implementation)
- A new implementation of load_data_set()
Make sure to provide a correct value for the objects argument.
Prediction Problem
We will use the same prediction problem as in HW 3, but we will use all 10 classes.
Architectures
You will create two convolutional neural networks to distinguish
between the ten classes: one will be a relatively shallow network and
the other
will be a deep network. A couple possible network architectures
include:
- Multiple, parallel CNN networks that then merge at a
Concatenate layer, followed by multiple Dense layers.
- A sequence of Convolutional modules, followed by a sequence of Inception-type modules, followed by multiple Dense layers.
- Some combination of the two
Additional details:
- The network will have one output layer with ten units (one
for each class). The
activation for this layer should be softmax
- Loss: sparse categorical cross-entropy
- Additional metric: sparse categorical accuracy
Experiments
- Spend a reasonable amount of time informally narrowing down the
details of your two architectures, including the
hyper-parameters (layer sizes, dropout, regularization). Given
the nature of the datasets, I suggest that your focus on
rotations 0 and 1 to begin with.
- Choose your favorite model structure/hyper-parameters for each
type (shallow and deep) based on the validation set
performance.
- Figure 1 and 2: Graph-based representation of each model
type (you have a command line parameter to do this for you
already).
- For each type, perform five rotations for each model (so, a total of
10 independent runs).
- Figure 3 and 4: Learning curves (validation accuracy and
loss as a function of epoch) for the shallow and deep models.
Put all ten curves (shallow and deep) on a single plot; it
should be very clear which are the shallow and deep cases
(e.g., consider using a different color or line style).
- Figure 5: Generate a scatter plot of test set accuracy
for the paired shallow and deep models. Include a dashed line
along y = x.
Hints / Notes
- Create functions to build parts of your architecture.
- Get things working on your local machine before exporting to
the supercomputer.
- Remember to check your model summary and the graph-based
representation of your model to make sure that it
matches your expectations.
- Watch your RAM and thread utilization. Login to the compute
node and use the 'top' command to examine these.
- CPUS_PER_TASK in the batch file and at the command line should
be equal to or higher than your thread utilization.
- Generally, the larger you can get your batch size, the better
(as long as you are staying within your memory constraints).
What to Hand In
A single zip file that contains:
- All of your python code, including your network building code
- If your visualization code is a Jupyter Notebook, then export
a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Make sure that your code is not cut-off in the PDF
- Figures 1-5
- Your batch file(s)
- One sample stdout file
- A written reflection that answers the following questions:
- What is the outline of your network architecture?
- Which model (shallow or deep) turned out to work better?
Did you have to adjust hyper-parameters between the two,
other than network structure?
- What can you conclude from the validation accuracy learning
curves for each of the shallow and deep networks? How
confident are you that you have created models that you
can trust?
- Did your shallow or deep network perform better with
respect to the test set? (no need for a statistical
argument here)
Include this reflection as a separate file or at the end of
your Jupyter notebook
Grading
- 20 pts: Clean code for model building (including in-code documentation)
- 15 pts: Figures 1 and 2: Network architecture
- 15 pts: Figure 3: Shallow/deep loss learning curves
- 15 pts: Figure 4: Shallow/deep accuracy learning curves
- 15 pts: Figure 5: Test set scatter plot
- 20 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Thu Mar 23 19:30:30 2023