Lab: Seq2seq Model with Attention Mechanisms

Aim

In this lab, you will have a better understanding of attention mechanisms in NMT by implementing the codes. This lab is the extension of Assignment 4 where you have implemented a seq2seq model without attention mechanisms.

Practicalities

The lab work can be performed individually or in pairs. It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings!

Data

Create a new directory for this lab and copy all the files into this new work directory:

mkdir lab/
cd lab/
cp -rf /local/kurs/mt/lab/* .

Files description:

Set the working environment:
conda activate mt21

Implementation

1. Seq2seq model with attention mechanisms

In this section, you need to consider the dynamic context vector computed by the attention mechanism, when the decoder generates predictions/translations.

Compared to the seq2seq model without attention mechanism in Assignment 4, here are some main differences:

  1. You need to create a new decoder class with attention mechanism, i.e., AttnDecoderRNN in the file.
  2. You will need all the encoder outputs to compute the attention weights.
  3. The code for training and inference need to be updated.

The code already provides AttentionDot class which uses dot product to compute the scores.

You need to implement the TODO codes (TODO 1-6) in the python file. You can run
python seq2seq_attention.py
to test your code when you finish the implementation.

In addtion, you also need to write the comments for important codes to show me that you have understood the model and know well the meaning of each line. PS: writing comments is always a good habit for progamming, it will make the code easier to read for others and yourself (as you might forgot the code when you check it later).

2. Attention Mechanisms

In this section, you need to implement the conventional attention mechanism with other two different computation methods, and the multi-head attention with scaled dot-product computation method. The illustraions and equations of these functions are given on Slides 10-13 of Lecture 7: advanced NMT.

You can follow the provided code in AttentionDot class to implement these three classes, i.e., AttentionGeneral, AttentionConcat, AttentionMultihead.

Here is the list of code that you need to implement:

  1. TODO 7: code in Class AttentionGeneral
  2. TODO 8: code in Class AttentionConcat
  3. TODO 9: code in Class AttentionMultihead

You can test each class by passing "attention_type" parameter to the code, and here are the specific scripts for testing each class:

  1. python seq2seq_attention.py --attn_type general
  2. python seq2seq_attention.py --attn_type concat
  3. python seq2seq_attention.py --attn_type multihead --attention_head 2 --initial_learning_rate 0.0001
You can try different settings for your model by adding/changing parameters to the training script.

Discussion

Reporting

You do not need to finish this lab in the lab session. In addition to the python file with the implemented code, each group also needs to submit a written report. You should write some thoughts about your implementation and also answer the questions in the discussion section. If you work in groups, you can submit the same python file, the report should be a little bit different. (Both of you need to submit the code and report, and please specify your group as well.)

You should hand in these files in studium. The last deadline for this lab is October 29.