CS 5043 HW2: Deep Network with Hyper-parameter Search
Objectives
- Implement network architectures that use different forms of
regularization
- Execute experiments in which we vary multiple
hyper-parameters and over many experimental rotations
- Practice Holistic Cross-Validation for evaluating models
- Use SLURM to perform a large batch of experiments
- Implement code that brings the results from a large batch of
experiments into a single analysis
Assignment Notes
- Deadline: Thursday, February 27th @11:59pm.
- Hand-in procedure: submit Zip file to the HW2 dropbox on Gradescope.
- This work is to be done on your own. While general discussion
about Python, TensorFlow and Keras is okay, sharing
solution-specific code is inappropriate. Likewise, you may not
download code solutions to this problem from the network or a LLM.
- All code that we release for the class may be used.
Data Set
We will use the same data set as in HW1.
Code Changes
You should modify your code from HW1 to complete this assignment.
Here is a summary of the changes that you will need to make:
- Provided code: hw2.tar
- main(): An example of how you can use aggregate_stats()
- Add a new --meta Boolean switch to your arguments
- Configure your code so you can turn off EarlyStopping at the
command line
- Create a number of JobIterator profiles that describe the key
Cartesian product experiments (detailed below)
- Create code that will generate the required results figures.
- Use the tools from analysis_support to extract the data
that you need.
- You will need to write some code that identifies the
best hyper-parameter value for each Ntraining size.
- You will also need to provide visualization and
hypothesis testing code.
Network Configuration
Your default network/experiment will be configured as follows:
- Six hidden layers with sizes: 200, 100, 50, 25, 12, 6
- Learning rate: 10^-3
- Output type: theta
- Prediction dimension: 1
- Epochs: 1000
- Training set sizes: 1,2,3,4,6,8,11,14,18
- Rotations: all even rotations (10 in total)
- No form of regularization
- No verbosity (i.e., don't build lots of stdout files that
contain many lines)
- Turn off wandb for your big runs
Note that all of the different network and experiment configurations
must be specified at the command line (no editing your code between
Cartesian experiments).
Part 1: No Regularization
- Use the network / experiment configuration as defined above
- Total experiments: 9 x 10 = 90
Part 2: Early Stopping
- Add early stopping to your configuration, with a patience of 100
- Total experiments: 9 x 10 = 90
- Figure 1: Show mean test set FVAF as a function of training
set size for both the no regularization and early stopping
cases (2 curves)
Part 3: Dropout
- No early stopping
- Use dropout probabilities of: 0.015625, 0.03125, 0.0675, 0.125,
0.25, 0.5
- Total experiments: 9 x 10 x 6 = 540
- Figure 2: Show mean validation set FVAF a function of
training set size. Use one curve per dropout probability
Part 4: Dropout + Early Stopping
- Use early stopping with patience = 100
- Use dropout probabilities of: 0.015625, 0.03125, 0.0675, 0.125,
0.25, 0.5
- Total experiments: 9 x 10 x 6 = 540
- Figure 3: Show mean validation set FVAF as a function of
training set size. Use one curve per dropout probability
Part 5: L2 Regularization
- No early stopping
- Use L2 regularization values of: 0.1, 0.01, 0.001, 0.0001, 0.00001
- Total experiments: 9 x 10 x 6 = 450
- Figure 4: Show mean validation set FVAF as a function of
training set size. Use one curve per L2 value
Part 6: Bake-Off
The five parts above correspond to three different model forms that
we would like to compare.
- Compute the best performing hyper-parameter values for parts 3,4,5
- Compute the test set performance for these best hyper-parameter
choices for parts 3,4,5
- Figure 5: Show mean test set performance as a function of training
set size for all five cases (parts 1-5)
- Using the distribution of test set performance measures for
training size 1 and training size 18, compare the no
regularization case against the other four cases (so, four
comparisons for each of training size 1 and 18). Report the
corresponding p-values and the differences in means. Do not
worry about the multiple comparisons problem at this time.
Hints
- np.argmax() will find the index of the maximum value in
a vector. If presented with a matrix, it will find the index
of the maximum value for the specified axis (dimension)
- scipy.stats.ttest_rel() will perform a paired t-test
- All figures must include axis labels, reasonable axis tick
labels, and appropriate legends
- I found it really helpful to have a separate results directory
for each of the five parts
- I did the work for part 6 on my own laptop after downloading
the aggregate pkl files
What to Hand-In
A single zip file that contains:
- All of your python code, including your network building code
- Your batch files
- Your text files that contain experiment / network configuration
command line arguments
- One sample stdout and stderr file
- If your visualization code is a Jupyter Notebook, then submit
that notebook directly. Otherwise, submit a pdf. This
document should include:
- Figures 1-5
- A written reflection that answers the following questions (use
raw text in Jupyter or a pdf file for this):
- Explain what you conclude from Figure 5.
- For training set size 1, which model type do you
prefer relative to the no regularization case?
Justify your choice.
- For training set size 18, which model type do you
prefer relative to the no regularization case?
Justify your choice.
Grading
- 15 pts: Clean code for network and experiment configuration
(including documentation)
- 10 pts: Figure 1: No regularization vs early stopping (test FVAF)
- 10 pts: Figure 2: Varying Dropout rate (validation FVAF)
- 10 pts: Figure 3: Varying Dropout rate + early stopping
(validation FVAF)
- 10 pts: Figure 4: Varying L2 regularization parameter value
(validation FVAF)
- 15 pts: Clean code for selecting the best hyper-parameter
values for parts 3-5
- 15 pts: Figure 5: Test set performance for the best
hyper-parameter choices for cases 1-5
- 15 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Tue Feb 18 11:58:36 2025