PLUG - Parallel Corpora in Linköping, Uppsala, and Göteborg
This is a cooperative project aimed at the development, evaluation and
application of programs for alignment and data generation from parallel
corpora with Swedish as either source or target language. Applications
include machine translation, computer-aided translation, translation data
bases, multi-lingual web dictionaries and translator's training. The participating
departments are Swedish language, Göteborg university, Computer and
information science, Linköping university, and Linguistics, Uppsala
university.
Project in Progress: The Work Packages
- Preliminary study
- Work package 1: The common project corpus
- Work package 2: Evaluation methods
- Merkel, M. & Ahrenberg, L., 1999,
Evaluating Word Alignment Systems. PLUG report, Linköping
University.
[ps, 343kB]
[gzipped ps, 102kB]
- Merkel, M., 1999a, Annotation Style Guide for the PLUG Link Annotator.
Linköping. PLUG report, Linköping University.
[ps, 840kB]
[gzipped ps, 222kB]
- Work package 3: Evaluation of results, System design and
implementation
- Ahrenberg, L.,Merkel, M., Sågvall Hein, A. & Tiedemann, J. 1999,
Evaluating LWA and UWA. PLUG deliverable 3A.1.
[ps, 1620kB]
[gzipped ps, 472kB]
[pdf, 822kB]
- the
PLUG Word Aligner - PWA
- Work package 4: Data generation and organisation
- a demo version for a
web-based search interface for multi-lingual word alignments
(password required)
- a demo version for a
web-based search interface for word alignment instances from the
declarations of the
Swedish government
(Swedish/English bitext).
- ... and this
shows all token links with multi-word-units involved for the same text.
- Work package 5: Data formalisation
- Ahrenberg, L., Superlinks: A new approach to constraining
transfer in machine translation. PLUG report.
[ps, 769 kB]
[pdf, 882 kB],
-
The final PLUG report
[ps]
[pdf]
Publications
- Ahrenberg, Lars, Merkel, Magnus, Sågvall Hein, A., Tiedemann, J.,
2000.
- Evaluation of Word Alignment Systems. In Proceedings of LREC 2000,
Athens/Greece.
[pdf, 406kB]
[ps, 757kB]
[gzipped ps, 236kB]
- Ahrenberg, Lars, Andersson, Mikael & Merkel, Magnus, 1998.
- A Simple Hybrid Aligner for Generating Lexical Correspondences in
Parallell Texts. Proceedings of COLING '98/ACL '98.
[postscript, 513kB]
[gzipped postscript, 91kB]
- Ahrenberg, L., Andersson, M. & Merkel, M., forthcoming.
- A knowledge-lite approach to word alignment.
J. Veronis (ed.), Parallel Text Processing, Kluwer Academic
Press.
- Danielsson, Pernilla & Mühlenbock, Katarina, 1998.
- When Stålhandske becomes Steelglove. A Corpus Based Study of Names
in Parallell Texts. Proceedings of AMTA'98, Langhorne,
Pennsylvania, USA: Lecture Notes in Computer Science, Springer-Verlag,
Heidelberg.
- Danielsson, P. & Mühlenbock, K., 1998,
- Retrieval of Name Translations in Parallel Corpora.
In: Proceedings of TALC98, Seacourt Press, Oxford, pp. 58-64.
- Lindvall, Lars & Ridings, Daniel, 1998.
- Länkade texter och kontrastiv lingvistik, in
Kungl. Vitterhets Historie och Antikvitets Akademiens Årsbok
1998, pp. 154-173.
- Magnus Merkel & Mikael Andersson, 2000,
- Knowledge-lite extraction of
multi-word units with language filters and entropy thresholds. In
Proceedings of RIAO'2000, Collége de France, Paris, France,
April 12-14, 2000, Volume1, pp. 737-746.
[pdf, 39kB]
[ps, 276kB]
[gzipped ps, 79kB]
- Merkel, M., Andersson, M. & Ahrenberg, L., forthcoming,
- The PLUG Link Annotator - Interactive Construction of Data from
Parallel Corpora.
In L. Borin (ed.) Parallel
Corpora, Parallel Worlds, Proceedings of Parallel Corpus Symposium,
Uppsala, April 22-23, 1999, Uppsala University.
- Merkel, M., 1999b,
- Understanding and enhancing translation by parallel text
processing.
Linköping Studies in Science and Technology. Dissertation No. 607. Linköping
University. Dept. of Computer and Information Science.
- Mühlenbock, Katarina, forthcoming.
- Kan ett namn bäras över språkgränsen? Något
om fynden i en svensk-italiensk parallellkorpus.
- Ridings, D., 1998.
- PEDANT. Parallel texts in Göteborg. LEXIKOS 8
(Afrilex-reeks/series 8: 1998) sid. 1-26.
- Sågvall Hein, A. , forthcoming.
- The PLUG-project: Parallel Corpora in Linköping, Uppsala,
Göteborg. Aims and achievements.
In L. Borin (ed.) Parallel Corpora,
Parallel Worlds, Proceedings of Parallel Corpus Symposium,
Uppsala, April 22-23, 1999, Uppsala University.
- Tiedemann, J., 2000,
- Extracting Phrasal Terms using Bitext.
In Proceedings of the Workshop on Terminology Resources and
Computation, held in conjunction with LREC 2000, Athens/Greece,
May 2000.
[pdf, 167 kB]
[ps, 146 kB]
[gzipped ps, 60 kB]
- Tiedemann, J., 1999,
- Word Alignment Step by Step.
In Proceedings of the 12th Nordic Conference on Computational
Linguistics, 1999, Technical University of Trondheim. Department of
Linguistics.
[pdf, 442 kB]
[ps, 683 kB]
[gzipped ps, 208 kB]
[slides - html]
[slides - ps]
- Tiedemann, J., forthcoming,
- Uplug - a modular corpus tool for parallel corpora.
In L. Borin (ed.) Parallel Corpora, Parallel Worlds. Proceedings of
Parallel Corpus Symposium, Uppsala, April 22-23, 1999, Uppsala
University. Department of Linguistics.
[abstract]
[ps, 765 kB]
[gzipped ps, 226 kB]
[pdf, 320 kB]
- Tiedemann, Jörg, 1999.
- Automatic Construction of Weighted String Similarity Measures
In Proceedings of the Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large
Corpora (EMNLP/VLC-99), University of Maryland, MD, USA,
1999.
[postscript]
[compressed postscript]
[pdf]
- Tiedemann, Jörg, 1998a.
- Extraction of Translation Equivalents from Parallel Corpora In
Proceedings of the 11th Nordic Conference on Computational
Linguistics, Center for Sprogteknologi, Copenhagen, 1998.
[postscript, 326kB]
[compressed postscript, 43kB]
[html]
- Tiedemann, Jörg, 1997.
- Automatical Lexicon Extraction from Aligned Bilingual
Corpora. Diploma thesis, University of Magdeburg, 1997.
[abstract]
[postscript,
2.8Mb]
[compressed
postscript, 269kB]
[html]
Links
Linköping
Göteborg
Uppsala
- a query
interface for searching the Declarations from the Swedish Government
- a corpus query
interface (password required)
- a query
interface for the PLUG corpus (password required)
- a query
interface for the Scania corpus (password required)
- The
Scania
Concordancer
(password required)
- home page of the Language Engineering Group at
Uppsala University.
- the SCANIA corpus project
- XLexEx - a user interface for
the extraction of translation equivalents from aligned bilingual
corpora
last update: 05/24/2000
comments to
joerg@stp.ling.uu.se