CS 5043 HW2: Deep Network with Hyper-parameter Search

Objectives

Implement network architectures that use different forms of regularization
Execute experiments in which we vary multiple hyper-parameters
Practice Holistic Cross-Validation for evaluating models
Use SLURM to perform a large batch of experiments
Implement code that brings the results from a large batch of experiments into a single analysis

Deadline: Tuesday, February 27th @11:59pm.
Hand-in procedure: submit Zip file to the HW2 dropbox on Gradescope.
This work is to be done on your own. While general discussion about Python, TensorFlow and Keras is okay, sharing solution-specific code is inappropriate. Likewise, you may not download code solutions to this problem from the network or a LLM.
All code that we release for the class may be used.

We will use the same data set as in HW1.

You should modify your code from HW1 to complete this assignment.

(this is essentially your HW 1)

Implement a network with hidden layer sizes as follows: 500 250 125 75 36 17
Do not add regularization
Execute a 2-dimensional experiment: rotation x training set size. Use only the even rotations, and use training set sizes: 1,2,4,6,9,13,18
Write code that aggregates the results together (in fact, you should be use your code from HW 1).
FIGURE 1: Generate a plot that shows FVAF as a function of training set size. There should be two curves: mean FVAF for each of training and validation sets. Don't forget to label your axes and to provide a legend.

Implement the same network as above.
Add dropout to your Input layer and each of your Hidden layers. There should be no other regularization.
Execute a 3-dimensional experiment: rotation x training set size x dropout rate. Attempt a reasonable range of dropout probabilities (use at least 5 choices). Probability of dropout should be between 0 and 1 (non-inclusive).
Write code that aggregates the results together.
FIGURE 2: Show mean FVAF for the validation set as a function of training set size. There should be one curve for each of your dropout probability choices.
Don't forget to label your axes and to provide a legend.

Perform the same form of experiment as part 2, except choose a range of values for either L1 or L2 regularization. Choose at least five different regularization parameters (factors of 10 are good choices). Do not include dropout.
FIGURE 3: Create a similar plot as with Figure 2.

The three parts above correspond to three different model forms that we would like to compare.

For both sets of 3D runs (part 2 and part 3), you have already computed mean FVAF across the rotations. The result for each is a matrix indexed by hyper-parameter value and training set size.
For each of the two parts (Dropout and Lx): for each training set size, identify the hyper-parameter value that maximizes validation performance. The result is that each training set size will have one "best" hyper-parameter value with respect to the mean validation performance.
For each of the two parts and training set size, for the best hyper-parameter set extract the mean test set performance. This will be a vector that is indexed by training set size.
FIGURE 4: Plot the mean test set performance (FVAF) as a function of training set size for each of parts 1, 2 and 3. Note that there is exactly one curve for each part.
Using the distribution of test set performance measures, perform three T-Tests for each pair of model types for training set size 1 (there will be N=10 samples for each model type; the model types are no regularization, Dropout and Lx). Report the corresponding p-values and the differences in means.
Perform three T-Tests for each pair of model types for training set size 18. Report the corresponding p-values and the differences in means.

np.argmax() will find the index of the maximum value in a vector. If presented with a matrix, it will find the index of the maximum value for the specified axis (dimension)
scipy.stats.ttest_rel() will perform a paired t-test

A single zip file that contains:

20 pts: Clean code for all three parts (including documentation)
10 pts: Figure 1: No regularization (validation perf)
15 pts: Figure 2: Varying Dropout rate (validation perf)
15 pts: Figure 3: Varying Lx regularization parameter value (validation perf)
10 pts: Clean code for selecting the best model
15 pts: Figure 4: Test set performance for the best hyper-parameter choices for each model
15 pts: Reflection

Last modified: Thu Feb 29 23:18:53 2024