Parallel Corpora for Less Explored Languages
The Swedish Research Council and the Faculty of Languages at Uppsala University
The main goal of the project is to promote research and teaching in less explored languages by building language resources for language pairs that are dissimilar in language structure. We build parallel corpora, consisting of original texts and their translations, with contrastive studies in focus. The corpora are built semi-automatically by using a common module for formating and markup together with basic language resource kit (BLARK) for the involved languages.
BLARKs often iclude carefully compiled corpora of collected texts and a set of tools for the automatic analysis of the languages, such as sentence splitter, tokenizer, part-of-speech tagger, chunker, shallow parser, etc. These tools are used in the automatic alignment phase to improve alignment accuracy.
The parallel corpus is intended to be used in research, teaching and applications such as machine translation.English-Hindi-Swedish Parallel Corpus
English-Swedish-Turkish Parallel Corpus
Éva Á. Csató Johanson
Beáta B. Megyesi
Anna Sågvall Hein