Assignment 4 - Sequence to sequence models


In this assignment session you will have a better understanding of sequence to sequence models, the encoder, the decoder, and attention, by completing code. You will learn how to write a encoder-decoder model.


The assignment work can be performed individually or in pairs. It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 22, 14:00(sharp) -15:00. Before this you have to solve the tasks described in this document on your own. You can get help through the discussion forum in Studentportalen. Note that you have to attend the full examination session and be active in order to pass the assignment. If you miss this session, you will instead have to compensate for this, see the bottom of this page.


Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment4/
cd assignment4/
cp -rf /local/kurs/mt/assignment4/* .

Files description:

Since most of you work remotely, you can check the avaliable computers from here. Say "prefix" in Chomsky is free, then you can log in via ssh ssh -Y .

Set the working environment:
source ~/envNMT/bin/activate
pip install dill tdqm torchtext==0.4.0
You will get something like this: name@server$ --> (envNMT) name@server$ . This creates a virtual environment which allows you to use all the needed packages. If you want to exit the environment, just run deactivate

1 - Seq2seq without attention mechanisms

The python files are for a simple sequence to sequence (encoder-decoder) model for NMT, with or without attention mechanisms. I have removed some code in In this task, you need to complete the "TODO" code in the python file.

We are using PyTorch, please refer to the offical docs.

When you finish the TODO 1 and TODO 2, you should not get any exceptions when you run python

TODO 1: complete the forward() function in class RNNDecNoatt .

RNNDecNoatt is an RNN decoder without attention mechanisms. The forward() function is the step of forward computation in the neural networks. Given the input and the hidden states, return the predictions and the hidden states.

TODO 2: complete the forward() function in class Seq2seq.

The forward() function contains all the steps of training a seq2seq model. Complete the code (the computation step of the entire seq2seq model). Hints: 1. computation in the encoder; 2. pass the last hidden state to the decoder; 3. step by step computation in the decoder.

2 - Seq2seq with attention mechanisms

When you finish the TODO 3&4&5, you should not get any exceptions when you run python --attention

TODO 3: complete the code in class Attention.

Attention class is used to provide the attention weights over encoder hidden states. The inputs are the decoder state and encoder hidden states. There are different functions to get the weights, and here you need to complete the code of three different score functions, "dot", "general", and "concat" in Luong's PhD thesis. Then you write the code to compute attention weights which are the output of the forward function.

TODO 4: complete the forward() function in class RNNDec .

RNNDec is an RNN decoder with attention mechanisms. The forward() function is the step of forward computation in the neural networks. The decoder first generates a decoder hidden state, given the input and the previous hidden state. Then the decoder hidden state and encoder hidden states are fed into the attention model to get the attention weights. Then you compute the conext vector. Lastly, the decoder return the prediction and the hidden state. Hints: the information flow: rnn -> attention -> rnn -> output layer

TODO 5: complete the code in class Seq2seqAttn.

It is almost the same as the Seq2seq class. The only difference is the definition of the decoder.

3 - Inference

Read the code in which is used for generating translations, and answer the following questions:

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.


The assignment is supposed to be examined in class, on September 22, 14:00-15:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 25.