Assignment 5, Transformer models and OpenNMT toolkit

Aim

In this assignment session you will have a better understanding of Transformer models by completing the code. You will also learn to use OpenNMT to train NMT models.

Practicalities

The assignment work can be performed individually or in pairs. It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 29, 11:00(sharp) -12:00. Before this you have to solve the tasks described in this document on your own. You can get help through the discussion forum in Studentportalen. Note that you have to attend the full examination session and be active in order to pass the assignment. If you miss this session, you will instead have to compensate for this, see the bottom of this page.

OpenNMT

In this section, you will learn to train NMT models using OpenNMT on the Snowy GPU cluster. You have the access to the resource for your NMT projects, other course projects, and also the thesis project as well. So it is good for you to be familiar with the cluster.

Note: If you have any problems to login in Snowy (rackham). Please do the following section OpenNMT-local instead, then you will conduct the experiments on a local chomsky machine. It will takes much longer time to train a model with CPUs. (The example only takes 30mins only with CPUs, and 10mins with GPUs. In real tasks, the gap is much bigger. )

Update: It seems that there is a long queue in the system, which means that you have to wait for the GPU resource. So you can move to the next section first when you have submitted your job to run.

Preparation

You should have an account now. If not, please apply one following the instruction here: Applying accounts: Please request the membership in our course project: g2020017 and group project UPPMAX 2020/2-2 (The group project gives you priority).
More information about Snowy: User Guide, Slurm_jobs.

When you have logged in via ssh -AX username@rackham.uppmax.uu.se, create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment5/
cd assignment5/
cp -rf /proj/g2020017/assignment5/* .

Files description:

Set the working environment:
source /proj/g2020017/nmt/bin/activate

Train NMT models

The trainNMT.sh file is used to train a word-level model, the settings are set in the file. You will get the translation (pred.txt) in your model directory. Please try to revise the settings and train a different model. Compare the differences with different settings, such as the perplexity on the validation set, the loss on the training set, the translation performance, etc.

help.train and help.translate give the detailed parameters for training and translating. You can also refer to the OpenNMT documents. When you have revised the settings, you can submit your job by running sbatch trainNMT.sh, and you will get a job ID. (remember to revise the directory for new models otherwise the old one mighted be overwritten). The job will be in a queue and waits for its turn. The log file is in the current directory with a name like slurm-jobid.out.
Here are some basic commands in slurm:

When you have submitted the jobs, it will take some time to get the results. You can move to the next section and come back later.

Note: for each group, please submit at most 2 jobs at a time, otherwise it will be too crowed for others.

Notes: if you want to install some toolkits or packages by yourself (in the future), you can follow the method in assignment 3 that creates a virtual environment:
python3 -m venv the_path_to_directory
source the_path_to_directory/bin/activate
(change the_path_to_directory to a specific name) Then you can install anything you need.

OpenNMT-local

In this section, you will learn to train NMT models using OpenNMT. This is only set for people who can not login in the cluster!

Preparation

First set the working environment:
source ~/envNMT/bin/activate
You should install OpenNMT by yourself!
pip install OpenNMT-py==1.1.1

Download files:

mkdir assignment5-opennmt/
cd assignment5-opennmt/
cp -rf /local/kurs/mt/assignment5-opennmt/* .

Files description:

Train NMT models

The trainNMT.sh file is used to train a word-level model, the settings are set in the file. You will get the translation (pred.txt) in your model directory. Please try to revise the settings and train a different model. Compare the differences with different settings, such as the perplexity on the validation set, the loss on the training set, the translation performance, etc.

help.train and help.translate give the detailed parameters for training and translating. You can also refer to the OpenNMT documents. When you have revised the settings, you can start training models by running ./trainNMT.sh. (remember to revise the directory for new models otherwise the old one mighted be overwritten).

Transformer

Now open a new terminal and connect to one of the computers in the lab room. (Not the UPPMAX snowy cluster.)

Preparation

Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment5/
cd assignment5/
cp -rf /local/kurs/mt/assignment5/* .

Files description:

Set the working environment:
source ~/envNMT/bin/activate

Since most of you work remotely, you can check the avaliable computers from here. Say "prefix" in Chomsky is free, then you can log in via ssh ssh -Y username@prefix.lingfil.uu.se .

Complete the forward() function in class Transformer in transformer.py

The python file has provided the code of multi-head attention, encoder, and decoder, please complete the forward code of Transformer models. Please refer to the Transformer architecture given in the 3rd NMT lecture. Here are some tips:

  1. Encode source tokens
  2. Embed target tokens: tokens to embeddings
  3. Add position embeddings to target embeddings (same with that in the encoder)
  4. Computation in the decoder, layer by layer
  5. Final layer normalization
  6. Output projection (into the vocab size).
  7. apply the log_softmax() function
When you finish the code, you can test it by running:
python train.py --model-file model.pt --validate-only
It will compute the perplexity of the validation set. If the code is correct, you will get a number around 5.6. (It takes ~ one min to get the result.)

In addition to the Transformer class, please go through the MultiHeadAttention, EncoderLayer, and EncoderLayer classes as well.

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.

Reporting

The assignment is supposed to be examined in class, on September 29, 11:00-12:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 23.