CS 5043: HW4: Semantic Labeling

Assignment notes:

Data Set

The Chesapeake Watershed data set is derived from satellite imagery over all of the US states that are part of the Chesapeake Bay watershed system. We are using the patches part of the data set. Each patch is a 256 x 256 image with 26 channels, in which each pixel corresponds to a 1m x 1m area of space. Some of these channels are visible light channels (RGB), while others encode surface reflectivity at different frequencies. In addition, each pixel is labeled as being one of:

Here is an example of the RGB image of one patch and the corresponding pixel labels:

Notes:

Data Organization

All of the data are located on the supercomputer in: /home/fagg/datasets/radiant_earth/pa. Within this directory, there are both train and valid directories. Each of these contain directories F0 ... F9 (folds 0 to 9). Each training fold is composed of 5000 patches. Because of the size of the folds, we have provided code that produces a TF Dataset that dynamically loads the data as you need it. We will use the train directory to draw our training and validation sets from, and the valid directory to draw our testing set from.

Local testing: the file chesapeake_small.zip contains the data for folds 0 and 9 (it is 6GB compressed).

Data Access

chesapeake_loader4.py is provided. The key function call is:
ds_train, ds_valid, ds_test, num_classes = create_datasets(base_dir='/home/fagg/datasets/radiant_earth/pa',
                                                           fold:int=0,
                                                           train_filt:str='*[012345678]',
                                                           cache_dir:str=None,
                                                           repeat_train:bool=False,
                                                           shuffle_train:int=None,
                                                           batch_size:int=8,
                                                           prefetch:int=2,
                                                           num_parallel_calls:int=4):

where: Note: We strongly suggest that the values of key parameters be set using the command line

The returned Datasets will generate batches of the specified size of input/output tuples.

The Problem

Create an image-to-image translator that does semantic labeling of the images on a pixel-by-pixel basis.

Details:

Deep Learning Experiment

Create two different models:
  1. A shallow model (could even have no skip connections)
  2. A deep model
For each model type, perform 5 different experiments:

Reporting

  1. Figures 1a,b: model architectures from plot_model() (one for each of your shallow and deep networks).

  2. Figure 2a,b: Validation accuracy as a function of training epoch (5 curves per model).

  3. Figures 3a,b: for each model type, combine the test data across all 5 folds and generate a confusion matrix.

  4. Figure 4: scatter plot comparing test set accuracy for each model type (so, each point corresponds to one fold). Include a dashed y=x line on your figure.

  5. Figures 5a,b: for both models, show ten interesting examples (one per row). Each row includes three columns: Satellite image (channels 0,1,2); true labels; predicted labels.

    plt.imshow() can be useful here. For the satellite image, pass in the tensor for the image (r x c x 3). For label images, pass in a tensor of shape (r x c) and set vmax=6. This will force the label-to-color mapping to be the same across all images (imshow will pick colors for you).

    To convert your model output into a set of class labels, use np.argmax().

  6. Reflection
    1. What regularization choices did you make for your shallow and deep networks? Why?

    2. How do the training times compare between the two model types?

    3. Describe the relative test set performance of the two model types. Include the mean, min and max accuracy for your test set for both model types (report separately). Also - report the test set accuracy for each model.

    4. Describe any qualitative differences between the outputs of the two model types. What types of errors do your models tend to make?


What to Hand In

Grades

References

Hints


andrewhfagg -- gmail.com

Last modified: Thu Mar 27 23:06:01 2025