CS 5043 HW1: Training Shallow Models
Objectives
- Implement a simple, shallow neural network that solves a
brain-machine interface prediction problem
- Incorporate Keras and custom performance metrics into the model
evaluation process
- Use SLURM to perform a large
batch of experiments. This batch will cover the Cartesian
product of rotations through the folds and training set sizes.
- Use Weights and Biases to track a full set of experiments
- Implement code that brings the results from a large batch of
experiments into a single analysis
Assignment Notes
Data Set
The BMI (Brain Machine Interface) data set is stored in a single pickle
file, available on the supercomputer at:
/home/fagg/datasets/bmi/bmi_dataset.pkl
This is a 200MB file - please do not make copies of the file on the supercomputer
(you don't need to). You are welcome to copy the file to other
machines, if you wish. Two requirements:
- Large data transfers to/from OSCER should be done through the
host: dtn2.oscer.ou.edu
- Please do not share this data set with others, and please
delete copies after the semester is over
Within this file, there is one dictionary that contains all of
the data. The keys are: 'MI', 'theta', 'dtheta', 'ddtheta', 'torque',
and 'time'. Each of these objects are python lists with 20 numpy
matrices; each matrix contains an independent fold of data, with rows
representing different samples and columns representing different
features. The samples are organized contiguously (one sample every
50ms), but there are gaps in the data.
The different fields are defined as follows:
- MI contains the data for 48 neurons. Each row encodes
the number of action potentials for each neuron at each of 20
different time bins (so, 48 x 20 = 960 columns).
- theta contains the angular position of the shoulder (in
column 0) and the elbow (in column 1) for each sample.
- dtheta contains the angular velocity of the shoulder
(in column 0) and the elbow (in column 1) for each sample.
- ddtheta contains the angular acceleration of the shoulder
(in column 0) and the elbow (in column 1) for each sample.
- torque contains the torque of the shoulder (in column 0)
and the elbow (in column 1) for each sample.
- time contains the actual time stamp of each sample.
The data are temporally partitioned into 20 folds for us. Each fold
contains multiple blocks of contiguous-time samples. So, if one were
to plot theta as a function of time, you would see the motion of the
arm over time (with gaps).
Across the folds, it is safe to assume that the data are independent
of one-another.
Provided Code
We are providing the following code (tar):
- hw1_base.py. This is the heart of the
BMI model implementation. As such, it has a lot of features and
is rather complicated. You may use it in its entirety, adapt
it to your needs, or ignore it all together.
Features include:
- Accepts many different arguments for conducting an
experiment, including accepting many hyper-parameters.
Subsets of hyper-parameters can be set using the SLURM
INDEX ARRAY, allowing us to perform a large number of
experiments with a single invocation of sbatch
- Data wrangling:
extract_data()
provides the functionality to translate the BMI data set
into ML-ready training, validation, and test data sets.
- High-level code for conducting an individual experiment
and saving the results
- Metrics: as we have talked about, our loss metric (the
metric that our gradient descent process is attempting to
optimize) is becoming one that represents a weighted sum
of multiple metrics (MSE, L1/L2 regularization terms,
etc.). This means that loss itself becomes less
interpretable. To address this, we can attach other
metrics to the fitting process using the metrics
argument to model.fit(). Metrics can be specified using
standard strings (e.g., 'mse') or specific Metrics
objects. Here, we use both root mean squared error
(RMSE) and Fraction of Variance Accounted For (FVAF).
These metrics will appear in the epoch-by-epoch report
that mode.fit() produces, and their values are reported
by model.evaluate().
- symbiotic_metrics.py: proper
Keras implementation of the fraction-of-variance-accounted-for
metric (FVAF). This implementation computes FVAF for each
dimension independently. Note that the FractionOfVarianceAccountedForSingle class is more friendly to WandB, so I suggest using it.
- FVAF = 1 - MSE/VAR
MSE is the mean squared prediction error
VAR is the variance of the quantity being predicted
- FVAF is related to the R-squared metric, but does not
suffer from the over-fitting problem that R-squared has
- -infinity < FVAF <= 1
- 1 corresponds to perfect perfect prediction
- 0 corresponds to no predictive power
- less than zero means that predictions have more variance than exists in the data
In general, a FVAF between 0.5 and 1 is a reasonable
result (but this is very problem dependent)
- job_control.py: This
code makes it easy to perform one experiment for each combination of
the hyper-parameter values.
- The constructor for the class takes as input a
dictionary.
- Each key in the dictionary corresponds to
a command line argument. The name must match the corresponding name in the parsed arguments.
- The value for a key contains a Python list of
possible values for that hyper-parameter.
- We can think of this dictionary as describing the
Cartesian product of all possible
values for all hyper-parameters.
- An integer value (0, 1, ...) can then be used to select
one of these hyper-parameter sets.
The selected hyper-parameter set is then used to override
the specified arguments stored
inside of an ArgumentParser object.
Part 1: Network
- Implement a function that constructs a neural network that can
predict arm state as a function of neural state. The network
should be relatively shallow (you don't need a lot of
non-linearities to do reasonably well, here). But, note that
quantities such as torque or velocity can be positive or
negative. So, think carefully about what the output
nonlinearity should be
- Do not take steps to address over-fitting (we will work on this
in the next assignment)
- Use rotation 5 for this test. You just need to provide
this value at the command line.
-
Train a network to predict shoulder velocity as a
function of the neural state. Useful FVAFs are: 0 < FVAF < =
1.
-
FIGURE 1: Produce a figure that shows both the true acceleration and the
predicted velocity as a function of the timestamp for the
test fold. Make sure to zoom in enough on the
horizontal axis so we can see the fine temporal details.
Part 2: Multiple Runs
- Use your code base to execute a batch of experiments on OSCER
- The batch is a 1-dimensional "grid:" number
of training folds (1,2,3,4,6,8,11,14,18), and rotation 15. Although single experiments
will have a short running time (a few minutes), each one needs to
be executed as a separate job on OSCER (we are preparing
ourselves for more extensive experiments)
- Implement a second program that:
- Loads all of the stored results (all training
set sizes)
- Extracts training/validation/testing RMSE and FVAF for
each training fold size
- FIGUREs 2a,b: Generates a plot with three curves (one for each data
set type): FVAF as a function of training set size, and
RMSE as a function of training set size
Hints
-
There are lots of hints embedded in the provided code. Take
the time to understand what is going on there.
- List comprehension is your friend
- Don't underestimate the amount of time it will take to complete
all of your experimental runs. Don't wait until the last
minute to queue things up on OSCER
- Your figures must have appropriate axis labels and legends
Expectations
Think about what the curve shapes should look like before you
generate them.
Looking Forward
For the next homework, we will be experimenting with deeper networks and with
varying hyper-parameter choices. As you write your code, think about
how to structure it (and your results data structures) so that you can
handle variations in other hyper-parameters.
What to Hand-In
A single zip file that contains:
- All of your python code, including your network building code
- Figures 1 and 2a,b
- If your visualization code is a Jupyter Notebook, then include the notebook
- If you used your favorite editor to generate
your report, then submit a pdf of its output.
- Your batch file
- One sample stdout and stderr file
Grading
- 20 pts: clean code for executing a single experiment (including
documentation)
- 20 pts: True and predicted shoulder velocity as a function of time (Figure
1)
- 20 pts: Executing on Schooner
- 20 pts: clean code for bringing individual results files
together
- 20 pts: Figure of FVAF and RMSE as a function of training set size
(Figure 2a,b)
andrewhfagg -- gmail.com
Last modified: Mon Feb 10 14:54:28 2025