CS 5043: HW6: String Classification

Assignment notes:

The Problem

Proteins are chains of amino acids that perform many different biological functions, depending on the specific sequence of amino acids. Families of amino acid chains exhibit similarities in their structure and function. For a new chain, one problem we would like to solve is that of predicting the family that it most likely belongs to. In this assignment, we will be classifying amino acid chains as one of 46 different families.

Data Set

Deep Learning Experiment

Objective: Compare the performance of three different neural network model types that can predict the family of a given amino acid. Each model type be composed of stacks of different recurrent layers:

For my implementation, I have one network building function that will generate all three of these model types (I suggest the same for you). Your overall architecture will look like this:

Multi-Headed-Attention network is a more involved architecture: The precise definition of these is up to you, but you should stay within these classes of solutions. You should also adjust hyper-parameters for each so that they can do their best (with respect to the validation set) without changing model architecture. That said, you should expect some performance differences between these model types.

Model Notes

API Notes

Loading Data

    dat = load_rotation(basedir=args.dataset, rotation=args.rotation, version=args.version)

Convert to TF Dataset

Translates the dat structure from load_rotation() into a set of three TF Datasets
    dataset_train, dataset_valid, dataset_test = create_tf_datasets(dat,
                                                                    batch=batch,
                                                                    prefetch=args.prefetch,
                                                                    shuffle=args.shuffle,
                                                                    repeat=(args.steps_per_epoch is not None))

Embedding Layer

An embedding layer translates a sequence of integers of some length into a sequence of token embeddings. Each integer value corresponds to one of the unique tokens. The length of the strings and the number of unique tokens is given by the data set that you load. The number of embeddings should be smaller than the number of unique tokens.
    tensor = keras.Input(shape=(len,))
    input_tensor = tensor
    tensor = keras.layers.Embedding(n_tokens,
                              n_embeddings,
                              input_length = len)(tensor)

Multi-Headed Attention

    tensor = keras.layers.MultiHeadAttention(num_heads=num_heads,
                                         key_dim=head_size)(tensor, tensor)
Input and output tensor shape is: (batch, len, embeddings)

Performance Reporting

Once you have selected a reasonable architecture and set of hyper-parameters, perform five rotations of experiments. Be careful in your choice of GPU for each model type. Produce the following figures/results:
  1. Figures 0a,b,c: Network architecture from plot_model(). Please include tensor shapes.

  2. Figures 1a,b: Training and validation set accuracy as a function of epoch for each rotation (each figure has fifteen curves).

  3. Figure 2: Validation set accuracy for the GRU model vs validation set accuracy for the Attention model. There will be one curve per rotation; each point along the curve represents the performance of the two models for a specific epoch of training. Because these two models will require different numbers of training epochs, pad the shorter of the two with its final value so that both vectors are the same length.

  4. Figure 3: Bar graph showing test accuracy for the three different model types (one group of bars for each rotation). Report these accuracies in text form, too.

  5. Figure 4: Combining the test data across the five folds, show the contingency table. Show counts (not percentages).

  6. Reflection:
    1. Describe the specific choices you made in the model architectures.

    2. Discuss in detail how consistent your model performance is across the different rotations.

    3. Discuss the relative performance of the three model types.

    4. Compare and contrast the different models with respect to the required number of training epochs and the amount of time required to complete each epoch. Why does the third model type require so much compute time for each epoch?


Provided Code

In code for class:


Hints


Network / Training Notes

Frequent updates here...


What to Hand In

Turn in a single zip file that contains:

Do not turn in pickle files.

Grading

References


andrewhfagg -- gmail.com

Last modified: Mon Apr 14 14:54:42 2025