Lab 1, Attention Mechanisms

Aim

In this lab, you will implement the attention mechanism based on the simple encoder-decoder code in the last assignment.

Practicalities

This lab is intended to be performed in pairs. Team up with another student in order to solve it. It is not necessary to work in the same pair during all assignments.

Preliminary

We assume you have installed PyTorch, Numpy, matplotlib, and Jupyter Notebook during the previous assignments. You should have learned to use PyTorch (in the Machine Learning course.)

Data

Create a new directory for this assignment and copy all the files into this new work directory:

mkdir lab1/
cd lab1/
cp /local/kurs/mt/lab1/* .

Files description:

Implementation

The attention part is mainly processed in the decoder. At each decoding step, a context vector is passed to the decoder. The context vector is the weighted sum of encoder hidden states. The weights are computed by scoring the query (decoder hidden state) and keys (all the encoder hidden states).

We assume you have understood the basic codes of the basic encoder-decoder architecture in the last assignment which are the basis of this lab session. In this session, you need to implement an attention model based on Bahdanau's paper: Neural Machine Translation by Jointly Learning to Align and Translate. Section 3 of the paper and the attention lecture slides (Pages: 3-11) are the most important materials for the implementation.

The codes you need to implement have been commented with "TODO".

When you have finished your implementation, you should replace the MLP scoring function with a simpler dot-product to compute the scores in the attention and compare it with the original MLP method.

Reporting

In addition to the ipynb file with the implemented code, each group also needs to submit a written report, including the comparison among 1) the encoder-decoder model without attention mechanisms, 2) the encoder-decoder model with attention, and 3) the encoder-decoder model with dot-product attention.

You should hand in these files in studentportalen. The deadline for this lab is October 11.