In this assignment session you will gain hands-on experience of using real machine translation services. You will compare the quality of different systems using both manual and automatic evaluation methods. Last, you will assess the pros and cons of machine translated texts.
This assignment is examined in class, on September 6, 9-12. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session or are inactive, you will have to compensate for this, see the bottom of this page.
Take notes during the assignment. During the last hour of the session, groups will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c. In addition, the teachers will talk to students during the session, to discuss their findings.
When you have decided on your language pair, you should find an article written in your source language. A good choice is a Wikipedia article, or an article from some news service in that language (e.g. BBC for English). Copy the main text from the article, and paste it into Google translate to translate it. Do NOT spend a long time on choosing the article.
Look into the translated text, and see first what your first expression of the quality is, then try to analyze the quality and errors in some more detail, identifying what types of errors occur, and which types of errors are the most frequent. Try to think about the following questions (and feel free to make additional observations!):
In this task we will carry out some experiments with text from two very different domains: course plans (from Luleå University) and movie subtitles. You will try methods for automatic and manual evaluation.
cp /local/kurs/mt/assignment1/data/* .
Translate the sentences in the Swedish source files into English, using Google Translate or Bing Translator. Save the translation results in separate text files in your work directory.
perl /local/kurs/mt/mosesdecoder/scripts/tokenizer/tokenizer.perl -l LANG < raw-translation.txt > tokenized-translation.txt
Use the provided multi-bleu.perl script to compute BLEU scores for the translated texts, using the reference translations provided. Record the scores you obtain for each system.
perl /local/kurs/mt/mosesdecoder/scripts/generic/multi-bleu.perl tokenized-reference.txt < tokenized-translation.txt
The multi-bleu output will look like this:
BLEU = 31.38, 69.1/47.7/29.9/19.9 (BP=0.838, ratio=0.850, hyp_len=488, ref_len=574)
The first score is the BLEU score that we are mainly interrested in. The following four scores are the 1-gram to 4-gram precision. The scores in paranthesis gives the brevity penalty and some info about the length of the hypothesis and reference.
Look at the translation into English and evaluate around 20 text lines (or how many you have time to do) of either the translations for movie subtitles or syllabi for the Convertus and Google/Bing translations, using the following subjective assessment scale:
Compute the average score of your manual evaluations for each translation file. Record the scores obtained.
Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.
If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. Spend around 45 minutes on task 1, and do task 2. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings, including the scores for the different evaluations in task 2 (around 1-2 A4-pages). The deadline for the compensation is October 25.
Chapter 8 in the course textbook