CS 5043 HW2: Deep Network with Hyper-parameter Search
Objectives
- Implement network architectures that use different forms of
regularization
- Execute experiments in which we vary multiple
hyper-parameters
- Practice Holistic Cross-Validation for evaluating models
- Use SLURM to perform a large batch of experiments
- Implement code that brings the results from a large batch of
experiments into a single analysis
Assignment Notes
Data Set
We will use the same data set as in HW1.
Provided Code
You should modify your code from HW1 to complete this assignment.
Part 1: No Regularization
(this is essentially your HW 1)
- Implement a network with hidden layer sizes as follows: 500 250 125 75 36 17
- Do not add regularization
- Execute a 2-dimensional experiment: rotation x training set
size. Use only the even rotations, and use training set sizes:
1,2,4,6,9,13,18
- Write code that aggregates the results together (in fact, you
should be use your code from HW 1).
- FIGURE 1: Generate a plot that shows FVAF as a function of training set
size. There should be two curves: mean FVAF for each of
training and validation sets.
Don't forget to label your axes and to provide a legend.
Part 2: Dropout
- Implement the same network as above.
- Add dropout to your Input layer and each of your Hidden
layers. There should be no other regularization.
- Execute a 3-dimensional experiment: rotation x training set size x
dropout rate. Attempt a reasonable range of dropout
probabilities (use at least 5 choices). Probability of dropout should
be between 0 and 1 (non-inclusive).
- Write code that aggregates the results together.
- FIGURE 2: Show mean FVAF for the validation set as a function
of training set size. There
should be one curve for each of your dropout probability
choices.
- Don't forget to label your axes and to provide a legend.
Part 3: Lx Regularization
- Perform the same form of experiment as part 2, except choose a
range of values for either L1 or L2 regularization. Choose at
least five different regularization parameters (factors of
10 are good choices). Do not include dropout.
- FIGURE 3: Create a similar plot as with Figure 2.
Part 4: Bake-Off
The three parts above correspond to three different model forms that
we would like to compare.
- For both sets of 3D runs (part 2 and part 3), you have already
computed mean FVAF across the rotations. The result for each
is a matrix indexed by hyper-parameter value and training
set size.
- For each of the two parts (Dropout and Lx): for each training
set size, identify the
hyper-parameter value that maximizes validation performance.
The result is that each training set size will have one "best"
hyper-parameter value with respect to the mean validation
performance.
- For each of the two parts and training set size, for the
best hyper-parameter set extract the mean test set
performance. This will be a vector that is indexed by
training set size.
- FIGURE 4: Plot the mean test set performance (FVAF) as a
function of
training set size for each of parts 1, 2 and 3. Note that
there is exactly one curve for each part.
- Using the distribution of test set performance measures, perform
three T-Tests for each pair of model types for training set
size 1 (there will be N=10 samples for each model type; the
model types are no regularization, Dropout and Lx).
Report the corresponding p-values and the differences in means.
- Perform three T-Tests for each pair of model types for training
set size 18. Report the corresponding p-values and the
differences in means.
Hints
- np.argmax() will find the index of the maximum value in
a vector. If presented with a matrix, it will find the index
of the maximum value for the specified axis (dimension)
- scipy.stats.ttest_rel() will perform a paired t-test
What to Hand-In
A single zip file that contains:
- All of your python code, including your network building code
- Your batch file(s)
- One sample stdout file
- If your visualization code is a Jupyter Notebook, then submit
that notebook directly. Otherwise, submit a pdf. This
document should include:
- Figures 1-4
- A written reflection that answers the following questions (use
raw text or a pdf file for this):
- For training set size 1, which model type do you prefer?
Justify your choice.
- For training set size 18, which model type do you prefer?
Justify your choice.
- Looking at the test performance curves, can you conclude
anything about which model approach is most appropriate
in general?
Grading
- 20 pts: Clean code for all three parts (including
documentation)
- 10 pts: Figure 1: No regularization (validation perf)
- 15 pts: Figure 2: Varying Dropout rate (validation perf)
- 15 pts: Figure 3: Varying Lx regularization parameter value (validation perf)
- 10 pts: Clean code for selecting the best model
- 15 pts: Figure 4: Test set performance for the best
hyper-parameter choices for each model
- 15 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Thu Feb 29 23:18:53 2024