Uppsala University * Dept. of Linguistics and Philology * Computational Linguistics * Beata Megyesi

The English-Swedish-Turkish Corpus

Financed by
The Swedish Research Council and the Faculty of Languages at Uppsala University

På svenska

Project description

The main goal of the project is to promote research and teaching in the Turkish language. More specifically, the aim is to build a language resource for Turkish, Swedish and English allowing contrastive studies between the involved languages. The language resource consists of linguistically analyzed parallel texts that are linked to each other in the three languages.

The corpus consists of original texts and their translations from Turkish to Swedish and English, and from Swedish and English to Turkish. The corpus is organized as a parallel corpus, where the texts, paragraphs, sentences and words are linked to each other. The corpus is built semi-automatically by using a basic language resource kit (BLARK) for the particular languages. The texts are linguistically analyzed with morphological features and part-of-speech as well as with dependency structures.

The parallel corpus is intended to be used in research, teaching and applications such as machine translation.


Internal pages


Beáta B. Megyesi
Éva Á Csató Johanson
Bengt Dahlqvist
Joakim Nivre
Eva Pettersson


Megyesi, B., Dahlqvist, B., Csato E., Nivre, J. 2010. The English-Swedish-Turkish Parallel Treebank. In Proceedings of Language Resources and Evaluation (LREC 2010) [.pdf]

Saxena, A., Megyesi, B., Csato Johanson, E., Dahlqvist, B. 2009. Using Paralell Corpora in Teaching and Research: The Swedish-Hindi-English and Swedish-Turkish-English Parallel Corpora. 2008. In Proceedings of Swedish Linguistic Conference (SLC 2008) [.pdf]

Megyesi, B., Csato Johanson, E., Dahlqvist, B., Gustafson-Capkova, S., Nivre, J., Pettersson, E., Sågvall Hein, A. 2008. Supporting Research Environment for Swedish and Turkish. Project Report. Department of Linguistics and Philology, Uppsala University [.pdf]

Megyesi, B., Dahlqvist, B., Petterson, E. and Nivre J. 2008. Swedish-Turkish Parallel Treebank. In Proceedings of Language Resources and Evaluation Conference, LREC 2008. [.pdf]

Megyesi, B. and Dahlqvist, B. 2007. The Swedish-Turkish Parallel Corpus and Tools for its Creation. In Proceedings of NoDaLida 2007. May 24-26 2007, Tartu, Estonia [.pdf]

Bandmann Megyesi, B., Sågvall Hein, A., Csató Johansson, E. 2006. Building a Swedish-Turkish Parallel Corpus. In Proceedings of Language Resources and Evaluation Conference. May 22-28, 2006. Genoa, Italy [.pdf]

Dadasheva, S., 2005. Den turkiska indirektiva kategorin. En undersökning av återgivningen av den turkiska indirektiva kategorin i ryska och svenska autentiska översättningar. C-uppsats. Turkiska språk, Institutionen för lingvistik och filologi, Uppsala universitet


Csato Johanson, E., Dahlqvist, B., Megyesi, B., Nivre, J., Saxena, A. 2009. "The English-Hindi-Swedish-Turkish Parallel Treebank" presented at SALT workshop on Corpus Linguistics: Ways forward.
October 8, 2009.

Poster presentation shown in the exhibition to the honour of 2006 Nobel literature laureate Orhan Pamuk.
13 December 2006 [.pdf]

Invited guest lecture by prof. Kemal Oflazer, Sabancı University, Turkey: The design and implementation of a pronunciation lexicon for Turkish
5 May 2006

Minisymposium on Computational aspects of building an annotated Swedish-Turkish parallel corpus"
4 May 2006 10:15-16.00

Invited guest lecture by Dr. G.J. van Schaaik, Department of Arabic, Persian and Turkish Languages and Cultures (TCMO), Leiden University: Information Technology & Teaching Turkish
31 March 2006 10:15-12:00