Assignment2 - Moses and PBSMT


In this assignment, you will use the Moses statistical machine translation system to train a phrase-based SMT system. You will tune the system with Mert, and compare performance on new sentences before and after tuning. You will also explore the dynamic programming beam search algorithm by varying some of the key parameters to the decoder. Many of you will use Moses in your projects later on, and this assignment should give you the experience that is needed for that.

This assignment is examined in class, on September 11, 13-16. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session, you will instead have to write a report, see the bottom of this page.

Take notes during the assignment. During the last hour of the session, groups will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c. In addition, the teachers will talk to students during the session, to discuss their findings.


This assignment is intended to be performed in pairs. Team up with another student in order to solve it. It is not necessary to work in the same pairs during all assignments.

In this assignment, everyone is expected to perform at least sub tasks 1 and 2, and do some task from sub task 3. You might not be able to finish the full sub task 3, though.

1 - Model training

First, you will train a complete phrase-based SMT system to familiarise yourself with the Moses training pipeline. Copy all the files from /local/kurs/mt/assignment2/data to your home space. This data is a small subset of the Swedish-English section of the Europarl corpus.

Data preparation

Before training a phrase-based SMT system, we often need to perform tokenization, casing normalization and corpus cleaning to obtain optimal results. Moses provides various tools for these operations. Have a look in /local/kurs/mt/mosesdecoder/scripts/, particularly the tokenizer and training folders to get a feel for some of the available scripts. Tokenize the training data as follows:

/local/kurs/mt/mosesdecoder/scripts/tokenizer/tokenizer.perl -l lang_id < europarl.train.lang_id >

In the above command, you should replace lang_id with the language codes sv and en for Swedish and English respectively. Note that you do not have to follow the file-naming conventions described here, but you must give your output file a different name from the input file. You should then lowercase the tokenized files as follows:

/local/kurs/mt/mosesdecoder/scripts/tokenizer/lowercase.perl < >

Make sure that your input file here is the output from the tokenizer. Alternatives to lowercasing such as truecasing are also available in Moses; we will stick with lowercasing here for simplicity.

Finally, we will clean the corpus by removing sentences containing over 40 words. We have to be careful here: as our data is parallel, we must make sure to remove both the Swedish and English sentence, even if only one of them is too long. Luckily, the following command does this for us:

/local/kurs/mt/mosesdecoder/scripts/training/clean-corpus-n.perl sv en 1 40

Language model

One of the key components of a phrase-based SMT system is a language model. For this lab, a 5-gram English language model trained on Europarl data is available for you to use at /local/kurs/mt/lab-moses/ . The LM is a HMM-based model trained using the SRILM toolkit. You do not have to copy this model to your home space in order to use it.


The Moses training pipeline consists of nine steps, all of which can be executed using the train-model.perl script, which you will find at /local/kurs/mt/mosesdecoder/scripts/training/. You can read more about the training pipeline at (look at the 'Training' sub-menu on the left-hand side of the page). The full command to train your model should look like this:

/local/kurs/mt/mosesdecoder/scripts/training/train-model.perl --corpus corpus --f sv --e en --root-dir outdir --lm 0:5:lm-file --external-bin-dir /local/kurs/mt/bin64 >logfile 2>&1

You should replace the placeholders corpus, outdir, and lm-file, with the name of the corpus (e.g. if you followed the suggested naming conventions), an output directory which will be created to store all the output files, and the path to the language model file.

If everything goes well the training should take around one minute on our Linux system. If you experience any unexpected error messages, make sure that your output directory is empty before training, and give full paths to all files and directories.

(If you're wondering what the 2>&1 part of the command does, it redirects standard error messages to standard output. Standard output and standard error are then both written to the same file by the >logfile part. Note that the order here is important. This trick is not specific to Moses and can be used with any command line applications.)
Tip: If you want to see standard output/error in the terminal as they are produced, but also have them saved to file, use the tee command. In the above example you would replace '>logfile 2>&1' with '2>&1 | tee logfile'. This can be particularly useful when executing complex commands that take more than a few seconds to run, as is the case here.


Examine the files generated by the training process. Try to figure out what information they contain by looking at them. You may consult the Moses webpage to read about the training process. The training log may help you understand what goes on during training. Try to relate the training log output to the training pipeline on lecture slides.

Phrase table

The phrase table contains five fields separated by " ||| " marks: Source phrase, target phrase, feature values, word alignments and some counts from the training corpus. Some of the feature values are probabilities summing to 1 over a certain set of alternatives, some are not.

Decoder configuration file

The Moses configuration file (moses.ini) contains different sections. The [feature] section contains pointers to the phrase table file and language model file. The configuration file also contains some feature weights. Note that the phrase table has 4 weights, one for each feature contained in the phrase table. Take a good look at this file and make yourself familiar with the main parameters defining a phrase-based SMT model.

Testing your model

Try translating some test data (which has been pre-tokenized and lowercased for you) with the Moses model you just trained:

/local/kurs/mt/mosesdecoder/bin/moses -f config-file < > out.en
Replace config-file with the path to moses.ini. Have a look at the translations in out.en. Remember that the model was trained on a small amount of data (~2000 sentences), so you will likely see many untranslated words in the output. Use the BLEU metric to assess the quality of this translation:
/local/kurs/mt/mosesdecoder/scripts/generic/multi-bleu.perl < out.en
What score do you get?

2 - Tuning your system

The translations in the previous section were obtained by running Moses with default weights for each feature of the linear model (you can see these in moses.ini). To obtain better performance, we can tune these weights to maximise translation performance (measured with the BLEU score) on a separate development data set. Here we will use minimum error-rate training (MERT) for this task. You can run MERT as follows:

/local/kurs/mt/mosesdecoder/scripts/training/ /local/kurs/mt/mosesdecoder/bin/moses config-file --working-dir outdir --mertdir /local/kurs/mt/mosesdecoder/bin >logfile.mert 2>&1
You should replace config-file with the path to moses.ini, and outdir with a new directory where you want Mert to store its output files. Make sure once again to use full paths to all these files and directories. As tuning requires normally Moses to translate the whole dev set (100 sentences) multiple times, this process make take a few minutes to run. If you want, you can set the --maximum-iterations option to 5 to cap the number of iterations.

Look at the output files produced by Mert. You will notice that it produces a new configuration file for each iteration, and records the BLEU score on the development set at the top of each of these files. The configuration file from the final iteration is copied to moses.ini.

Have a look att the output from each iteration. Can you see any change, and if so, are there also improvements? Also look at the weights for the different feature functions after each iteration, to see how they change.

We have now seen (hopefully) that running Mert improves performance on the development data. But what we are really hoping is that it also increases the BLEU score on the test data. Re-run Moses with the test data using the new configuration file produced by Mert, and calculate BLEU score on the output. How does this compare to the BLEU score Moses achieved earlier before tuning?

Note that tuning is not a deterministic process. If you try to run it again (with a different outdir), you will get a different result. If you wish you may try this, and compare the Bleu score and results you get from the two different mert runs.

3 - Exploring the search algorithm

For the rest of the assignments, we're going to use a larger pre-trained Swedish-English model (also trained on Europarl data). You can find the model in /local/kurs/mt/lab-moses/ Copy the ready-made moses.ini into your directory. Note that this configuration file was made with an earlier version of Moses so probably looks a bit different to the one you created in the previous section.

Note that you will need to tokenize and lowercase any Swedish text before translating with this model. You can use the script in the model directory to preprocess your test sentences.

Start the decoder with this model by entering:

/local/kurs/mt/mosesdecoder/bin/moses -f moses.ini
Try entering a few sentences and look at the translations you get. You can make up your own sentences or copy some sentences from a newspaper website such as DN or Svenska Dagbladet. You may also use sentences from the Europarl test set used above, or from the data sets in assignment 1. You can quit the decoder by pressing Control-D. Look at the BEST TRANSLATION line to see the scores. The decoder outputs the total score as well as the vector of the individual core feature scores.

You can increase the decoder's verbosity level to see what it does. If you run the decoder with the -v 2 option, it will tell you how many hypotheses were expanded, recombined, etc. With the -v 3 option, the decoder will dump information about all the hypotheses it expands to standard error. Make sure that you use short input sentences with -v 3! It is also a good idea to redirect the output to a file, as in the training commands above. The -report-segmentation option will show you how the input sentence was segmented into phrases.

Another way to gather information about how decoder parameters affect the output is by looking at n-best-lists containing the n best hypotheses in the part of the search space explored by the decoder. To generate n-best output, start the decoder with the -n-best-list file size option. This will output n-best-lists of the given size to the file you specify. Use an n-best size of around 100 to obtain a fair impression of the best output hypotheses.

Here are some options you can use to influence the search:
-stack S sets the stack size S for histogram pruning (default: 100)
-beam-threshold eta sets the beam threshold eta for threshold pruning (default: 0.00001, which effectively disables threshold pruning in most cases!)
-max-phrase-length p segments the input into phrases of length at most p (default: 10, which is more than the maximum phrase length in our phrase table!)
-distortion-limit d sets the distortion limit (maximum jump) to d (default: 6; 0 means no reordering permitted, -1 means unlimited)

You can also change the ttable-limit directly in moses.ini - this affects how many translation options are loaded for each span.

You can now experiment with different settings and options, for example starting with any of the below suggestions:

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss issues of the assignment, in the full class. You should all be prepared to report your main findings, and discuss interesting issues that came up during the assignment.

You should also have had some impressions of phrase-based SMT, and be prepared to discuss the following questions:


The assignment is supposed to be examined in class, on September 11, 13-16. You need to be present and active during the whole session in order to pass.

If you failed to attend the oral session, you have to make up for this, either individually, or in a pair of students. You should do sub task 1 and 2, and at least part of sub task 3. You should spend around 2.5 hours of work on the assignment. Then either discuss the assignment with your teacher during one of the project supervision sessions (given that your teacher has time), or write a report about your experiences with Moses, including discussing the questions outlined in the assignment. Your report should be 1-2 A4-pages. If you write a report, it should be handed in via the student portal as a pdf. The deadline is October 25, 2019.