Assignment 3 - Neural Language Models

Aim

In this assignment session you will have a better understanding of neural language modeling by completing code, checking generations, exploring the ability of neural language models to generate text.

Practicalities

The assignment work can be performed individually or in pairs. It is not necessary to work in the same pairs during all assignments. Note that both students in the pair should be able to independently discuss your findings! Take notes when performing the assignment. During the examination session, students will get a chance to talk about their experiments, what they did, what surprised them, what they learned, et.c.

This assignment is examined in class, on September 17, 15:00(sharp) -16:00. Before this you have to solve the tasks described in this document on your own. You can get help through the discussion forum in Studentportalen. Note that you have to attend the full examination session and be active in order to pass the assignment. If you miss this session, you will instead have to compensate for this, see the bottom of this page.

Preparation

Create a new directory for this assignment and copy all the files into this new work directory:

mkdir assignment3/
cd assignment3/
cp -rf /local/kurs/mt/assignment3/* .

Files description:

Since most of you work remotely, you can check the avaliable computers from here. Say "prefix" in Chomsky is free, then you can log in via ssh ssh -Y username@prefix.lingfil.uu.se .

Set the working environment: (This creates a virtual environment which allows you to use all the needed packages. )
python3 -m venv ~/envNMT
source ~/envNMT/bin/activate
pip install -U pip
pip install numpy
pip install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
If you want to exit the environment, just run deactivate

1 - Code Complement

The python files are for language modeling and I have removed some code in nlm.py and model.py. In this task, you need to complete the "TODO" code in the python files. When you finish the code complement correctly, you should not get any exceptions when you run python nlm.py --dry-run

We are using PyTorch, please refer to the offical docs.

TODO 1: complete the forward() function in model.py.

The forward() function is the step of forward computation in the neural networks. Given the input and the hidden states, return softmax results and the hidden states. Hints: the information flow: rnn -> dropout -> output layer -> softmax

TODO 2: complete the train() function in nlm.py.

The train() function contains all the steps of training a neural language model. we train models in the mini-batch style, i.e., update the model every batch. What you need to do is write the code in the iteration. Given the data, we need to compute the gradients to update the model. There are three general steps of training a neural network: 1) forward computation 2) compute the loss 3) backpropagation -> gradients -> update the model (parameters)

2 - Generation

In this task, you will use some trained language models to generate text. Check the generation and talk about your expression on the text. What surprised you and what do you think that cause errors.

A Simple LM

Here you will use a language model trained with the code in the first task to geneate a text. Look at the training data and settings for the language model, and give your hypotheses on what causes the errors/bugs. Then think about how to improve the language model.

You can run python generate.py --seed 100 to generate text. The generated text (generated.txt) is in the current directory. Sentences are separaed by < eos >. Note that you should replace "100" with a different number, then you will get a different text, because the seed is set for random numbers which initialize the generation differently.

Settings for training:
Embedding size 200RNN typeLSTM
Training epoch50Layers2
Learning rate20RNN units 200

Pre-trained Neural LMs

Currently the topic on pre-trained language models is very popular, more and more pre-trained language models have been proposed, such as ELMo, BERT, GPT-2, XLNet, RoBERTa, etc. You can find more language models and the relations between them from here. These models generally are based on RNNs or Transformers. (Transformer is a more advanced model than RNNs in NLP, we will learn it later. ) Choose one of the models and figure out how the model works by reading the paper or blogs from the web.

Here we explore the LMs' ability to generate texts. Write with transformer provides some LMs to generate sentences. Test the following models and conclude at least 2 main weaknesses (bugs) of these models.

GPT-2domain: general
XLNETdomain: general
arxiv-nlpdomain: NLP papers

Wrapping up

Towards the end of the assignment session, everyone will be expected to share their findings and discuss the assignment, in the full class. You should all be prepared to report your main findings, and discuss the questions asked, and any other interesting issues that came up during the assignment.

Reporting

The assignment is supposed to be examined in class, on September 17, 15:00-16:00. You need to be present and active during the whole session.

If you failed to attend the oral session, you instead have to do the assignment on your own, and report it afterwards. You can then either dicuss the assignment during one of the project supervision sessions (given that the teacher has time), or write a report where you discuss your findings (around 1-2 A4-pages). The deadline for the compensation is October 23.