Annotated sets of compounds

Annotated compounds in German and Swedish. The zipped file contains four compound data sets, two in German and two in Swedish. In all cases the sets are running words from Europarl, annotated with compounds in two manners.

  • 1-to-1: compounds only annotated if the parts are in 1-to-1 corresponednce with the English Europarl translation (see Koehn and Knight, EACL 2003).
  • No suffix: compounds annotated based on linguistic intuition.
Nouns, adjectives, verbs and adverbs have been annotated as compounds when relevant.

For references see:

  • Swedish: Sara Stymne, Maria Holmqvist and Lars Ahrenberg. Effects of Morphological Analysis in Translation between German and English. In Proceedings of the ACL 2008 Third Workshop on Statistical Machine Translation. Pages 135-138. June 19, 2008. Columbus, Ohio. (pdf)
  • German 1-to-1: Sara Stymne. German Compounds in Factored Statistical Machine Translation. In Proceedings of GoTAL, 6th International Conference on Natural Language Processing, ed: A. Ranta and B. Nordström, Springer LNCS/LNAI Volume 5221. Pages 464-475. August 25-27, 2008. Gothenburg, Sweden. (pdf)