CS 5043 HW4: Complex Convolutional Neural Networks

Objectives

Implement a branching structure for a CNN

Assignment Notes

Deadline: Thursday, March 23rd @11:59pm.
Hand-in procedure: submit to a zip file to the HW4 dropbox on Gradescope (details below)
This work is to be done largely on your own. As with HW3, you may share solution-specific code snippets in the open on Slack, but not full solutions. In addition, downloading solution-specific code is not allowed.

Data Set

We are using the same Core50 data set as in HW 3.

Provided Code

We are providing the following code posted on the main course web page:

An updated core50.py file (corrected caching implementation)
A new implementation of load_data_set()
Make sure to provide a correct value for the objects argument.

Prediction Problem

We will use the same prediction problem as in HW 3, but we will use all 10 classes.

Architectures

You will create two convolutional neural networks to distinguish between the ten classes: one will be a relatively shallow network and the other will be a deep network. A couple possible network architectures include:

Multiple, parallel CNN networks that then merge at a Concatenate layer, followed by multiple Dense layers.
A sequence of Convolutional modules, followed by a sequence of Inception-type modules, followed by multiple Dense layers.
Some combination of the two

Additional details:

The network will have one output layer with ten units (one for each class). The activation for this layer should be softmax
Loss: sparse categorical cross-entropy
Additional metric: sparse categorical accuracy

Experiments

Spend a reasonable amount of time informally narrowing down the details of your two architectures, including the hyper-parameters (layer sizes, dropout, regularization). Given the nature of the datasets, I suggest that your focus on rotations 0 and 1 to begin with.
Choose your favorite model structure/hyper-parameters for each type (shallow and deep) based on the validation set performance.
Figure 1 and 2: Graph-based representation of each model type (you have a command line parameter to do this for you already).
For each type, perform five rotations for each model (so, a total of 10 independent runs).
Figure 3 and 4: Learning curves (validation accuracy and loss as a function of epoch) for the shallow and deep models. Put all ten curves (shallow and deep) on a single plot; it should be very clear which are the shallow and deep cases (e.g., consider using a different color or line style).
Figure 5: Generate a scatter plot of test set accuracy for the paired shallow and deep models. Include a dashed line along y = x.

Hints / Notes

Create functions to build parts of your architecture.
Get things working on your local machine before exporting to the supercomputer.
Remember to check your model summary and the graph-based representation of your model to make sure that it matches your expectations.
Watch your RAM and thread utilization. Login to the compute node and use the 'top' command to examine these.
CPUS_PER_TASK in the batch file and at the command line should be equal to or higher than your thread utilization.
Generally, the larger you can get your batch size, the better (as long as you are staying within your memory constraints).

What to Hand In

A single zip file that contains:

All of your python code, including your network building code
If your visualization code is a Jupyter Notebook, then export a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Make sure that your code is not cut-off in the PDF
Figures 1-5
Your batch file(s)
One sample stdout file
A written reflection that answers the following questions:
- What is the outline of your network architecture?
- Which model (shallow or deep) turned out to work better? Did you have to adjust hyper-parameters between the two, other than network structure?
- What can you conclude from the validation accuracy learning curves for each of the shallow and deep networks? How confident are you that you have created models that you can trust?
- Did your shallow or deep network perform better with respect to the test set? (no need for a statistical argument here)
Include this reflection as a separate file or at the end of your Jupyter notebook

Grading

20 pts: Clean code for model building (including in-code documentation)
15 pts: Figures 1 and 2: Network architecture
15 pts: Figure 3: Shallow/deep loss learning curves
15 pts: Figure 4: Shallow/deep accuracy learning curves
15 pts: Figure 5: Test set scatter plot
20 pts: Reflection

andrewhfagg -- gmail.com

Last modified: Thu Mar 23 19:30:30 2023