Assignment 4 - Sequence to sequence models

Aim

In this assignment session you will have a better understanding of sequence to sequence models, the encoder, the decoder, by completing code. You will learn how to write a encoder-decoder model.

Practicalities

The assignment work can be performed individually or in pairs (recommended). It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 16, 9-12. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session or are inactive, you will have to compensate for this, see the bottom of this page.

Preparation

Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment4/
cd assignment4/
cp -rf /local/kurs/mt/assignment4/* .

Files description:

Set the working environment:
conda init
conda activate mt21
conda install nltk
You will get something like this: name@server$ --> (mt21) name@server$ . This creates a virtual environment which allows you to use all the needed packages. If you want to exit the environment, just run conda deactivate

1 - Basic seq2seq model without attention mechanisms

The python file is for a simple sequence to sequence (encoder-decoder) model for NMT, without attention mechanisms. I have removed some code, and you need to complete the "TODO" parts in the python file. Here are two figures describing the encoder and the decoder module:

Encoder and decoder networks.

We are using PyTorch, please refer to the offical docs for detailed functions and modules.

TODO 0: initialize the encoder and the decoder instance.

TODO 1: define networks and variables in the encoder module.

TODO 2: complete the forward computation in the encoder module.

TODO 3: define networks and variables in the decoder module.

TODO 4: complete the forward computation in the decoder module.

TODO 5: complete the code to train a seq2seq model.

When you complete these TODOs correctly, you can train your seq2seq by running python seq2seq.py When your model starts training, you can leave it and move forward. (As it takes some time to finish training on CPUs.)

2 - Inference

Inference or translating, which is using a pretrained model to generate translations, given inputs. It is different from training where the model knows the entire target sentences. We usually write different functions for training and inference.

TODO 6: complete the code to generate translations.

This is almost the same as the TODO 5. You can run python seq2seq.py --inference --load_checkpoint /local/kurs/mt/seq2seq4assign4.pt to test the code. You can get a BLEU score 0.39 on the dev set. Note: if your model is different from the provided pretrained model, you might get errors. Then you should try to use the model trained by yourself.

Check the translate(*) function, answer the following questions:

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.

Reporting

The assignment is supposed to be examined in class, on September 16, 9:00-12:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 22.