Assignment 3 - Neural Language Models


In this assignment session you will have a better understanding of neural language modeling by completing code, checking generations, exploring the ability of neural language models to generate text.


The assignment work can be performed individually or in pairs (recommended). It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 15, 9-12. Note that you have to attend the full session and be active in order to pass the assignment. If you miss this session or are inactive, you will have to compensate for this, see the bottom of this page.


Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment3/
cd assignment3/
cp -rf /local/kurs/mt/assignment3/ .

Files description:

Set the working environment: (This creates an environment for our course, and takes some time. )
close this terminal and open a new terminal
conda init
conda create -n mt21 python=3.9
conda activate mt21
conda install pytorch torchvision torchaudio cpuonly -c pytorch
If you want to exit the environment, just run conda deactivate

1 - Code Complement

The python files are for language modeling and I have removed some code in and In this task, you need to complete the "TODO" code in the python files. When you finish the code complement correctly, you should not get any exceptions when you run python --dry_run

We are using PyTorch, please refer to the offical docs for detailed functions and modules.

TODO 1: complete the forward() function in

The forward() function is the step of forward computation in the neural networks. Given the input and the hidden states, return softmax results and the hidden states. Hints: the information flow: rnn -> dropout -> output layer -> softmax

TODO 2: complete the train() function in

The train() function contains all the steps of training a neural language model. we train models in the mini-batch style, i.e., update the model every batch. What you need to do is write the code in the iteration. Given the data, we need to compute the gradients to update the model. There are three general steps of training a neural network: 1) forward computation 2) compute the loss 3) backpropagation -> gradients -> update the model (parameters)

2 - Generation

In this task, you will use pretrained language models to generate text. Check the generation and talk about your expression on the text. What surprised you and what do you think that cause errors.

A Simple LM

Here you will use a language model trained with the code in the first task to geneate a text (you do not need to train models by yourself). The perplexity on the test set is 118. Look at the training data and settings for the language model, and give your hypotheses on what causes the errors/bugs. Then think about how to improve the language model.

Settings for the pretrained model:
Embedding size 650RNN typeLSTM
Training epoch50Layers2
Initial learning rate1RNN units 650

You can run python --seed 100 --outf generation.sampling.txt to generate text. The script will generate text word by word. The generated text (generated.txt) is in the current directory. Sentences are separaed by < eos >. Note that you can replace "100" with a different number, then you will get a different text, because the seed is set for random numbers which initialize the generation differently.

Greedy (1-best) Generation selects the prediction randomly by sampling. Please edit the code to enable greedy selection, i.e., select the most probable prediction at each step.

Then you run python --seed 100 --outf generation.greedy.txt to get the generated text, and compare with the sampling-based generations. What are the differences? Which generation is better?

Pre-trained Neural LMs

Currently the topic on pre-trained language models is very popular, more and more pre-trained language models have been proposed, such as ELMo, BERT, GPT-2, XLNet, RoBERTa, GPT3, etc. You can find more language models and the relations between them from here. These models generally are based on RNNs or Transformers. (Transformer is a more advanced model than RNNs in NLP, we will learn it later. ) Choose one of the models and figure out how the model works by reading the paper or blogs from the web.

Here we explore the LMs' ability to generate texts. Write with transformer provides some LMs to generate sentences. Test the following models and conclude at least 2 main weaknesses (bugs) of these models.

GPT-2domain: general
XLNETdomain: general
arxiv-nlpdomain: NLP papers

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.


The assignment is supposed to be examined in class, on September 15, 9:00-12:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 22.

Further Reading

Open-ended generation is challenging. Here is a paper discussing language generation and its evaluation. The Curious Case of Neural Text Degeneration