CS 5043 HW1: Training Shallow Models and Cross-Validation
Objectives
- Implement a simple, shallow neural network that solves a
brain-machine interface prediction problem
- Incorporate Keras and custom performance metrics into the model
evaluation process
- Use SLURM to perform a large
batch of experiments. This batch will cover the Cartesian
product of rotations through the folds and training set sizes.
- Implement code that brings the results from a large batch of
experiments into a single analysis
Assignment Notes
- Deadline: Thursday, February 16th @11:59pm.
- Hand-in procedure: submit a zip file to the HW1 dropbox on Gradescope.
- This work is to be done on your own. While general discussion
about Python, TensorFlow and Keras is okay, sharing solution-specific code is inappropriate.
Likewise, you may not download code solutions to this problem from the network. All code that we release may be used.
Data Set
We are using the BMI data set that we discussed in class. It is
available on the supercomputer at:
/home/fagg/datasets/bmi/bmi_dataset2.pkl
This is a 143MB file - please do not make local copies of the file
(you don't need to). You are welcome to copy the file to other
machines, if you wish. Two requirements:
- Large data transfers to/from OSCER should be done through the
host: dtn2.oscer.ou.edu
- Please do not share this data set with others, and please
delete copies after the semester is over
The data set contains both neural and arm movement data (the latter
being, theta, dtheta, ddtheta and torque). In addition, there is a
"time" channel that is a time stamp for each sample. Arm movements
are two degrees of freedom, corresponding to the shoulder and elbow,
respectively. Each sample of the neural data already contains the
the action potential history of each neuron over multiple recent time slices.
The data are already partitioned into 20 folds for us. Each fold
contains multiple blocks of contiguous-time samples. So, if one were
to plot theta as a function of time, you would see the motion of the
arm over time (with gaps).
Across the folds, it is safe to assume that the data are independent
of one-another.
Provided Code
We are providing the following code (tar, zip):
- hw1_base.py. This is the heart of the
BMI model implementation. As such, it has a lot of features and
is rather complicated. You may use it in its entirety, adapt
it to your needs, or ignore it all together.
Features include:
- Accepts many different arguments for conducting an
experiment, including accepting many hyper-parameters.
Some hyper-parameters can be set by SLURM, allowing us to
perform a large number of experiments with a single
invocation of sbatch
- From the BMI data set file, it will extract
training/validation/testing data sets for a specified
rotation
- High-level code for conducting an individual experiment
and saving the results
- symbiotic_metrics.py: proper
Keras implementation of the fraction-of-variance-accounted-for
metric (FVAF). This implementation computes FVAF for each
dimension independently.
- FVAF = 1 - MSE/VAR
MSE is the mean squared prediction error
VAR is the variance of the quantity being predicted
- FVAF is related to the R-squared metric, but does not
suffer from the over-fitting problem that R-squared has
- -infinity < FVAF <= 1
- 1 corresponds to perfect perfect prediction
- 0 corresponds to no predictive power
- less than zero means that predictions have more variance than exists in the data
In general, a FVAF between 0.5 and 1 is a reasonable
result (but this is very problem dependent)
- job_control.py: This
program makes it easy to perform one experiment for each combination of
the hyper-parameter values.
- The constructor for the class takes as input a
dictionary.
- Each key in the dictionary corresponds to
a command line parameter.
- The value for a key contains a Python list of
possible values for that hyper-parameter.
- We can think of this dictionary as describing the
Cartesian product of all possible
values for all hyper-parameters.
- An integer value (0, 1, ...) can then be used to select
one of these hyper-parameter sets.
The selected hyper-parameter set is then used to update
the parameters stored
inside of an ArgumentParser object.
Part 1: Network
- Implement a function that constructs a neural network that can
predict arm state as a function of neural state. The network
should be relatively shallow (you don't need a lot of
non-linearities to do reasonably well, here). But, note that
quantities such as torque or velocity can be positive or
negative. So, think carefully about what the output
nonlinearity should be
- Do not take steps to address over-fitting (we will work on this
in the next homework)
- Train a network to predict shoulder acceleration as a
function of the neural state. Useful FVAFs are: 0 < FVAF < =
1. Use rotation zero for this case.
-
FIGURE 1: Produce a figure that shows both the true acceleration and the
predicted acceleration as a function of the timestamp for the
test fold. Make sure to zoom in enough on the
horizontal axis so we can see the fine temporal details.
Part 2: Multiple Runs
- Use your code base to execute a batch of experiments on OSCER
- The batch is a 2-dimensional grid: rotation (0...19) and number
of training folds (1,2,3,4,5,9,13,18). Although single experiments
will have a short running time (a few minutes), each one needs to
be executed as a separate job on OSCER (we are preparing
ourselves for more extensive experiments)
- Implement a second program that executes on your local machine
and:
- Loads all of the stored results (rotations x training
set sizes)
- Computes average training/validation/testing FVAF for
each training fold size (average across the rotations)
- FIGURE 2: Generates a plot with three curves (one for each data
set type): FVAF as a function of training set size
Hints
-
There are lots of hints embedded in the provided code. Take
the time to understand what is going on there.
- List comprehension is your friend
- Don't underestimate the amount of time it will take to complete
all of your experimental runs. Don't wait until the last
minute to queue things up on OSCER
Reading Results Files
import os
def read_all_rotations(dirname, filebase):
'''Read results from dirname from files matching filebase'''
# The set of files in the directory
files = fnmatch.filter(os.listdir(dirname), filebase)
files.sort()
results = []
# Loop over matching files
for f in files:
fp = open("%s/%s"%(dirname,f), "rb")
r = pickle.load(fp)
fp.close()
results.append(r)
return results
Example:
filebase = "bmi_torque_0_hidden_30_drop_0.50_ntrain_%02d_rot_*_results.pkl"
results = read_all_rotations("results", filebase)
will find all files that match this string (* is a wildcard here).
Expectations
Think about what the curve shapes should look like before you
generate them.
Looking Forward
For the next homework, we will be experimenting with deeper networks and with
varying hyper-parameter choices. As you write your code, think about
how to structure it (and your results data structures) so that you can
handle variations in other hyper-parameters.
What to Hand-In
A single zip file that contains:
- All of your python code, including your network building code
- If your visualization code is a Jupyter Notebook, then export
a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Figures 1 and 2
- Your batch file
- One sample stdout file
Grading
- 20 pts: clean code for executing a single experiment (including
documentation)
- 20 pts: True and predicted shoulder acceleration as a function of time (Figure
1)
- 20 pts: Executing on OSCER
- 20 pts: clean code for bringing individual results files
together
- 20 pts: Figure of FVAF as a function of training set size
(Figure 2)
andrewhfagg -- gmail.com
Last modified: Fri Feb 10 01:21:56 2023