Assignment 5, Training NMT Models

Aim

In this assignment you will use the Fairseq toolkit to train NMT models, at different granularities, with different model architectures. 1) You are expected to be familiar to NMT settings and using this toolkit. 2) You should learn the advantages of subword-level models compared to word-level models. 3) You will learn to submit jobs to a GPU cluster.

Practicalities

The assignment work can be performed individually or in pairs (recommended). It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 29, 13-16. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session or are inactive, you will have to compensate for this, see the bottom of this page.

1. Subword-level Models vs. Word-level Models

In this section, you will learn to train NMT models using Fairseq toolkit on the Snowy GPU cluster. We train NMT models at different granularities, i.e., word-level and subword-level models.

You will need the cluster for the group project, other course projects, and also the thesis project as well. So it is good for you to be familiar with using the cluster.

Preparation

Information about Snowy (Uppmax cluster): User Guide, Slurm_jobs.

When you have logged in via ssh -AX username@rackham.uppmax.uu.se, create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment5/
cd assignment5/
cp -rf /proj/uppmax2021-2-14/assignment5/* .

Files description:

Set the working environment:
module load python/3.6.8
source /proj/uppmax2021-2-14/mt21/bin/activate

Train NMT models on Snowy cluster

The train_word/bpe_model.sh files are used to train (sub)word-level models, using Transformer architecture, the settings are given in the file. The training/dev/test data also has been processed, you can just use it directly. In practical, you need to do data pre-processing before training the final NMT model, including tokenization, cleaning, true-casing, generating subwords, etc.

Here follows the head of the shell script, when you submit a job, it will first pass the parameters to the cluster and the slurm (a system of managing submitted jobs) will manage the jobs based on these parameters, such as resource-types (GPU/CPU), running time, accounting project (our course project), job_name, etc..

Before you submit your job, here are some basic commands in slurm: Now you can submit your job to the cluster by running: (if you work in pairs, each of you should run one of these two jobs to save time and resources.)
sbatch train_word_model.sh
sbatch train_bpe_model.sh
Then you can check the status of your job by running squeue -M snowy -u #your_username# . Each job will take less than 15 mins. The log information will be stored in current directory with a name like slurm-#job_id#.out, and you can check it once the job starts running.

NOTE:

You can move to the next section first and come back later when it finishes.

Questions:

2. Improving the Subword-level Model

In this this section, you are asked to improve the BLEU scores of the subword-level model. Based on what you have learned from the lectures, you first list some settings that could affect the model performance, then you choose some of them and update the settings in the bash file (I suggest you copy the original file to a new one, like cp train_bpe_model.sh #a_new_name.sh# ). NOTE: You also need to comment the preprocessing line (Lines: 36-38, fairseq-preprocess command), because we have done this in the first session, otherwise it will cause errors. Then you submit your job by running sbatch #the_new_name.sh#. You can train several different models with different settings.

Here you can find the arguments/settings for training: doc for fairseq-train, lstm settings, and transformer settings.

When the training finishes, compare the log files from different settings, such as the perplexity (ppl) on the validation set, the loss on the training set, and the translation performance (BLEU), etc.

Questions:

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.

Reporting

The assignment is supposed to be examined in class, on September 29, 13:00-16:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 22.