In this assignment session you will gain hands-on experience of using real machine translation services. You will compare the quality of different systems using both manual and automatic evaluation methods. Last, you will assess the pros and cons of machine translated texts.
This assignment is examined in class, on September 3, 15-16. Before this session you will need to spend around two hours on doing the work described in this document. You can get help in the studentportalen course forum if needed. If you miss the examination session or are inactive, you will have to compensate for this, see the bottom of this page.
Take notes when performing the assignment work. During the 1-hour examination session, you will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.
When you have decided on your language pair, you should find an article written in your source language. A good choice is a Wikipedia article, or an article from some news service in that language (e.g. BBC for English). Copy the main text from the article, and paste it into Google translate to translate it. Do NOT spend a long time on choosing the article.
Look into the translated text, and see first what your first impression of the quality is, then try to analyze the quality and errors in some more detail, identifying what types of errors occur, and which types of errors are the most frequent. Try to think about the following questions (and feel free to make any additional observations!):
In this task we will carry out some experiments with text from two very different domains: course plans (from Luleå University) and movie subtitles. You will try methods for automatic and manual evaluation.
If you need help to login to our computer system from home, look here.
Create a new directory for this assignment and copy the following six files into this new work directory:
cp /local/kurs/mt/assignment1/data/* .
Translate the sentences in the Swedish source files into English, using Google Translate or Bing Translator. Save the translation results in separate text files in your work directory.
perl /local/kurs/mt/mosesdecoder/scripts/tokenizer/tokenizer.perl -l LANG < raw-translation.txt > tokenized-translation.txt
Use the provided multi-bleu.perl script to compute BLEU scores for the translated texts, using the reference translations provided. Record the scores you obtain for each system.
perl /local/kurs/mt/mosesdecoder/scripts/generic/multi-bleu.perl tokenized-reference.txt < tokenized-translation.txt
The multi-bleu output will look like this:
BLEU = 31.38, 69.1/47.7/29.9/19.9 (BP=0.838, ratio=0.850, hyp_len=488, ref_len=574)
The first score is the BLEU score that we are mainly interrested in. The following four scores are the 1-gram to 4-gram precision. The scores in paranthesis gives the brevity penalty and info about the length of the hypothesis and reference.
Look at the translation into English and evaluate at least 20 text lines (or some more if you have time) of either the translations for movie subtitles or syllabi for the Convertus and Google/Bing translations, using the following subjective assessment scale:
Compute the average score of your manual evaluations for each translation file. Record the scores obtained.
During the examination session, everyone will be expected to share their findings and discuss the assignment, in smaller groups and in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.
If you failed to attend the oral session, you instead have to report your assignment results later. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings, including the scores for the different evaluations in task 2 (around 1-2 A4-pages). The deadline for the compensation is October 23.
Chapter 8 in the course textbook