CS 5043: HW6: Advanced RNNs and Attention

Assignment notes:

The Problem

We are using the same problem type as in the previous homework assignment, but we are using a more complicated data set. Specifically: Because of the scale of the problem, we are going to use advanced techniques for classifying these strings. Specifically, we will use:

Deep Learning Experiment

You are to implement two different network architectures:

GRU network:

Multi-Headed-Attention network (remember that you need to use the Model API for this): NOTE: The dnn environment has a library problem that shows up when we try to use GRU and Attention layers. Please use our older environment instead:

conda activate tf
module load cuDNN/8.9.2.26-CUDA-12.2.0

Performance Reporting

Once you have selected a reasonable architecture and set of hyper-parameters, produce the following figures:
  1. Figure 0a,b: Network architectures from plot_model()

  2. Figure 1: Training set Accuracy as a function of epoch for each rotation of five rotations for both models (all on one figure).

  3. Figure 2: Validation set Accuracy as a function of epoch for each of the rotations (both models on one figure)

  4. Figure 3: Scatter plot of Test Accuracy for the GRU vs Attention models.

  5. Figure 4: Scatter plot of the number of training epochs for the GRU and Attention models.

  6. Reflection: answer the following questions:
    1. For your Multi-Headed Attention implementation, explain how you translated your last MHA layer into an output probability distribution

    2. Is there a difference in performance between the two model types?

    3. How much computation did you need for the training for each model type in terms of the number of epochs and time?


What to Hand In

Turn in a single zip file that contains:

Grading

References

Hints


andrewhfagg -- gmail.com

Last modified: Thu Apr 11 17:29:17 2024