CS 5043: HW6: Advanced RNNs and Attention
Assignment notes:
- Deadline: Tuesday, April 13th @11:59pm.
- Hand-in procedure: submit a zip file to Gradescope
- This work is to be done on your own. As with HW3-5,
you may share solution-specific code snippets in the open on
Slack (only!), but
not full solutions. In addition, downloading
solution-specific code is not allowed.
- Do not submit MSWord documents.
The Problem
We are using the same problem as in the previous homework
assignment. However, we will be using advanced RNN-style architectures.
Data Set
The data are the same as in HW 5.
Deep Learning Experiment
Objective: Create an advanced RNN and an Attention-based model to
perform the amino acid family classification. Implement two
architectures. Each will have the form:
- Embedding layer
- Optional pre-processing layer that can involve convolution or
pooling with striding
- Recurrent layers
- One or more Dense layers, with the output using a softmax
non-linearity
The two architectures:
- Stack of one or more GRU/LSTM layers
- Stack of Attention layers. I recommend
investigating
tf.keras.layers.MultiHeadAttention
Performance Reporting
Once you have selected a reasonable architecture and set of
hyper-parameters, produce the following figures:
- Figure 0a,b: Network architectures from plot_model()
- Figure 1: Training set Accuracy as a function of epoch for each
rotation of five rotations.
- Figure 2: Validation set Accuracy as a function of epoch for
each of the rotations.
- Figure 3: Scatter plot of Test Accuracy for the GRU and
Attention models.
- Figure 4: Scatter plot of training epochs for the GRU and
Attention models.
- Reflection: answer the following questions:
- For your Multi-Headed Attention implementation, explain
how you translated your last layer into an output
probability distribution
- Is there a difference in performance between the two
model types?
- How much computation did you need for the
training for each model type?
What to Hand In
Turn in a single zip file that contains:
- All of your python code (.py) and any notebook file (.ipynb)
[Gradescope can render notebook files directly - no need to
convert to pdf!]
- Figures 0-4
Grading
- 20 pts: Clean, general code for model building (including
in-code documentation)
- 10 pts: Figures 0a,b
- 10 pts: Figure 1
- 10 pts: Figure 2
- 10 pts: Figure 3
- 10 pts: Figure 4
- 15 pts: Reasonable test set performance for all rotations
- 15 pts: Reflection
References
- Full Data Set
Pfam: The protein families database in 2021: J. Mistry,
S. Chuguransky, L. Williams, M. Qureshi, G.A. Salazar,
E.L.L. Sonnhammer, S.C.E. Tosatto, L. Paladin, S. Raj,
L.J. Richardson, R.D. Finn, A. Bateman Nucleic Acids Research
(2020) doi: 10.1093/nar/gkaa913
- Keras Multi-headed Attention Layer
Hints
- The MultiHeadAttention class requires two tensor inputs (K/V
and Q). Since we are doing self-attention, these two tensors
can be set to be the same.
- With MultiHeadAttention, you will have to solve the problem of
how to translate from a 2D tensor (LEN x #keys) to a 1D tensor
from which a prediction must be made.
- Applying a Dense() layer with d units to a tensor of shape (a,
b, c) will yield a tensor of shape (a, b, d).
andrewhfagg -- gmail.com
Last modified: Sun Apr 2 16:26:51 2023