Assignment 4 - Sequence to sequence models

Aim

In this assignment you will use a simple encoder-decoder model to train NMT models on a small English--Swedish data set. (We do not need to use the GPUs in this assignment. ) The model will generate some example translations every epoch. You can check the learning progress and the perplexity on the validation set during training. You will learn how to write a encoder-decoder model and use it to translate text.

This assignment is examined in class, on September 20, 9-12. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session or are inactive, you will have to compensate for this, see the bottom of this page.

Practicalities

This assignment is intended to be performed in pairs. Team up with another student in order to solve it. It is not necessary to work in the same pair during all assignments.

Take notes during the assignment. During the last half hour of the session, groups will get a chance to talk about their experiments, what they did, what surprised them, what they learned, etc. In addition, the teachers will talk to students during the session, to discuss their findings.

Data

Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment4/
cd assignment4/
cp /local/kurs/mt/assignment4/* .

Files description:

Preliminary

In this assignment, we will use Jupyter Notebook. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. We also need Matplotlib to plot the learning curve on the validation set.

Installation:
python3 -m pip install jupyter --user
python3 -m pip install matplotlib --user
Install torchtext: python3 -m pip install torchtext --user
Run Jupyter: jupyter notebook

Training

When you opened the "seq2seq.ipynb" with Jupyter Notebook, you can start training an NMT model by clicking "Cell" --> "Run All". It will automatically build the code from the beginning. You can get the output of the model at the last cell.

It may take more than 10 minutes to train an NMT model (depending on your settings). During training, please go back to the code and learn how the model works.

Once you have finished your training. Please check the perplexity on the validation set (a figure) and the example translations. You can train some new models by varying the parameters of the model. For example: initial learning rate, encoder/decoder layers, embedding size, number of hidden units, training epochs. (You can change these parameters in the last cell. ) Then you can check how do the changes affect the model.

Here are the default settings: embedding_size=128, hidden_size=128, number_layers=2, dropout_rate=0.05, number_epochs=20, initial_learning_rate=0.01. Try to train some better models by varying the parameters and explore the effects.

Codes

Reading codes is always a good method to understand the details of a model. Please go through the codes and have a better understanding of the encoder-decoder architecture. Please associate the codes with the sequence to sequence slides.

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.

Reporting

The assignment is supposed to be examined in class, on September 20, 9-12. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 25.