CS 5043 HW2: Deeper Networks
Objectives
- Implement network architectures that use different forms of
regularization
- Execute experiments in which we vary multiple
hyper-parameters
- Practice Holistic Cross-Validation for evaluating models
- Use SLURM to perform a large batch of experiments
- Implement code that brings the results from a large batch of
experiments into a single analysis
Assignment Notes
- Deadline: Thursday, February 23th @11:59pm.
- Hand-in procedure: submit Zip file to the HW2 dropbox on Gradescope.
- This work is to be done on your own. While general discussion
about Python, TensorFlow and Keras is okay, sharing solution-specific code is inappropriate.
Likewise, you may not download code solutions to this problem
from the Internet.
Data Set
We will use the same data set as in HW1. However, we will focus on
predicting elbow acceleration (ddtheta[1]).
Provided Code
You should modify your code from HW1 to complete this assignment.
Part 1: No Regularization
(this is essentially your HW 1)
- Implement a network with hidden layer sizes as follows: 400, 200, 100, 50, 25, 12
- Do not add regularization
- Execute a 2-dimensional experiment: rotation x training set
size (just as we did in HW 1). Use training set sizes: 1,2,3,5,9,13,18)
- Write code that aggregates the results together (in fact, you
should be use your code from HW 1).
- FIGURE 1: Generate a plot that shows FVAF as a function of training set
size. There should be two curves: mean FVAF for each of
training and validation sets.
Don't forget to label your axes and to provide a legend.
Part 2: Dropout
- Implement the same network as above.
- Add dropout to your Input layer and each of your Hidden
layers. There should be no other regularization.
- Execute a 3-dimensional experiment: rotation x training set size x
dropout rate. Attempt a reasonable range of dropout
probabilities (use at least 4 choices). Probability of dropout should
be between 0 and 1 (non-inclusive).
- Write code that aggregates the results together.
- FIGURE 2: Show mean FVAF for the validation set as a function
of training set size. There
should be one curve for each of your dropout probability
choices.
- Don't forget to label your axes and to provide a legend.
Part 3: Lx Regularization
- Perform the same form of experiment as part 2, except choose a
range of values for either L1 or L2 regularization. Choose at
least four different regularization parameters (factors of
10 are good choices). Do not include dropout.
- FIGURE 3: Create a similar plot as above.
Part 4: Bake-Off
The three parts above correspond to three different model forms that
we would like to compare.
- For both sets of 3D runs (part 2 and part 3), you have already
computed mean FVAF across the rotations. The result for each
part is a matrix indexed by hyper-parameter value and training
set size.
- For each of the two parts (Dropout and Lx): for each training
set size, identify the
hyper-parameter value that maximizes validation performance.
The result is that each training set size will have one "best"
hyper-parameter value with respect to the mean validation
performance.
- For each of the two parts and training set size, for the
best hyper-parameter set extract the mean test set
performance. This will be a vectors that is indexed by
training set size.
- FIGURE 4: Plot the mean test set performance (FVAF) as a
function of
training set size for each of parts 1, 2 and 3. Note that
there is exactly one curve for each part.
- Using the distribution of test set performance measures, perform
three T-Tests for each pair of model types for training set
size 1 (there will be N=20 samples for each model type; the
model types are no regularization, Dropout and Lx).
Report the corresponding p-values and the differences in means.
- Perform three T-Tests for each pair of model types for training
set size 18. Report the corresponding p-values and the
differences in means.
Hints
- np.argmax() will find the index of the maximum value in
a vector. If presented with a matrix, it will find the index
of the maximum value for the specified axis (dimension)
- scipy.stats.ttest_ind() will perform a 2-sample t-test
- scipy.stats.ttest_rel() will perform a paired t-test
What to Hand-In
A single zip file that contains:
- All of your python code, including your network building code
- If your visualization code is a Jupyter Notebook, then export
a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Figures 1-4
- Your batch file(s)
- One sample stdout file
- A written reflection that answers the following questions (use
raw text or a pdf file for this):
- For training set size 1, which model type do you prefer?
Justify your choice.
- For training set size 18, which model type do you prefer?
Justify your choice.
- Looking at the test performance curves, can you conclude
anything about which model approach is most appropriate
in general?
Include this reflection as a separate file or at the end of
your Jupyter notebook
Grading
- 20 pts: Clean code for all three parts (including
documentation)
- 10 pts: Figure 1: No regularization (validation perf)
- 15 pts: Figure 2: Varying Dropout rate (validation perf)
- 15 pts: Figure 3: Varying Lx regularization parameter value (validation perf)
- 10 pts: Clean code for selecting the best model
- 15 pts: Figure 4: Test set performance for the best
hyper-parameter choices for each model
- 15 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Fri Feb 17 01:12:04 2023