English logga in

Start

Corpora

(Denna sida finns bara på engelska.)

Corpora

Some corpora available at the system are in /corpora. Folders there should have readme files that say more about what they contain, what license the material has, etc. (If it hasn’t, see who owns the folder and the files. Maybe you can ask them about the details.)

LDC

In particular /corpora/LDC has some corpora from the Linguistic Data Consortium, which we can use for linguistic education and/or non-commercial research purposes.

You can download other corpora from there yourself by creating your own user account and asking to have it acknowledged as belonging to us. Register with your .uu.se address, or you will not be acknowledged!

You can also ask Per to download a particular corpus and put it under /corpora/LDC here. That is of course better than big corpora being put under perhaps several different user accounts.