The PLUG Word Aligner (PWA) is a collection of tools for the
automatic alignment of word correspondences in bilingual parallel
texts. The system integrates a set of modules for knowledge-lite
approaches to word alignment, with various possibilities to change
configuration and to adapt the system to other language pairs and
text types. The system requires sentence aligned bitexts as its input
and produces a list of word and phrase correspondences in the text
(link instances) and an additional bilingual lexicon from these instances
(type links).
PWA comprises 2 word alignment systems, the Linköping Word Aligner (LWA)
and the Uppsala Word Aligner (UWA). Both system were developed within the
the co-operative project on parallel text, PLUG,
that was carried out between
November, 1997 and March, 2000. The system was developed at the Department
of Computer and Information Science at Linköping University,
Linköping/Sweden and the Department of Linguistics at
Uppsala University, Uppsala/Sweden. PWA integrates both
systems in the modular corpus toolbox
Uplug
and includes additional
tools for the automatic generation of monolingual word collocations
(phrases) and for the automated evaluation of alignment results
(the PLUG Scorer - PLS).
PWA is available to the research community
according to
this licence.
PWA is available as binary distribution for the following
operating systems
Have a look at the list of F.A.Q.
(under construction) or at the
online documentation!
The documentation includes sections about installing and running PWA.
PWA is implemented in Perl and Perl/Tk and requires quite a lot
resources from your system. Make sure to have at least 20 MB space
available on your harddisk before installing PWA-demo.
The complete installation requires about
12 MB (both on Windows and on Linux). The system requires a minimum
of 32 MB internal memory (RAM) for running the alignment. However, 64
MB of RAM or more are strongly recommended.
PWA documentation is available in several formats:
A Troubleshooting Guide and a list of F.A.Q.s are planned. Please be
patient and check this page again!
Example links were produced from the declarations of the
Swedish government
(Swedish/English bitext).
- the main window
- the Linköping Word Aligner - LWA
- the Uppsala Word Aligner - UWA
- the PLUG Link Scorer - PLS
- the Uplug logo
- Ahrenberg, Lars, Merkel, Magnus, Sågvall Hein, A., Tiedemann, J.,
2000.
- Evaluation of Word Alignment Systems. In Proceedings of LREC 2000,
Athens/Greece.
[pdf, 406kB]
[ps, 757kB]
[gzipped ps, 236kB]
- Ahrenberg, Lars, Merkel, Magnus, Sågvall Hein, A., Tiedemann, J.,
1999.
- Evaluation of LWA and UWA.
Report from the PLUG project, available as WP 15 in
Working Papers
in Computational Linguistics and
Language Engineering, Department of Linguistics, Uppsala
University, 1999.
[ps, 1620kB]
[gzipped ps, 472kB]
- Ahrenberg, Lars, Andersson, Mikael & Merkel, Magnus, 1998.
- A Simple Hybrid Aligner for Generating Lexical Correspondences in
Parallell Texts. Proceedings of COLING '98/ACL '98.
[postscript, 513kB]
[gzipped postscript, 91kB]
- Ahrenberg, L., Andersson, M. & Merkel, M., forthcoming.
- A knowledge-lite approach to word alignment.
J. Veronis (ed.), Parallel Text Processing, Kluwer Academic
Press.
- Magnus Merkel & Mikael Andersson, 2000,
- Knowledge-lite extraction of
multi-word units with language filters and entropy thresholds. In
Proceedings of RIAO'2000, Collége de France, Paris, France,
April 12-14, 2000, Volume1, pp. 737-746.
[pdf, 39kB]
[ps, 276kB]
[gzipped ps, 79kB]
- Merkel, M., 1999b,
- Understanding and enhancing translation by parallel text
processing.
Linköping Studies in Science and Technology. Dissertation No. 607. Linköping
University. Dept. of Computer and Information Science.
- Sågvall Hein, A., forthcoming.
- The PLUG-project: Parallel Corpora in Linköping, Uppsala,
Göteborg. Aims and achievements.
In L. Borin (ed.) Parallel Corpora,
Parallel Worlds, Proceedings of Parallel Corpus Symposium,
Uppsala, April 22-23, 1999, Uppsala University.
- Tiedemann, J., 2000,
- Extracting Phrasal Terms using Bitext.
In Proceedings of the Workshop on Terminology Resources and
Computation, held in conjunction with LREC 2000, Athens/Greece,
May 2000.
[pdf, 167 kB]
[ps, 146 kB]
[gzipped ps, 60 kB]
- Tiedemann, J., 1999,
- Word Alignment Step by Step.
In Proceedings of the 12th Nordic Conference on Computational
Linguistics, 1999, Technical University of Trondheim. Department of
Linguistics.
[pdf, 442 kB]
[ps, 683 kB]
[gzipped ps, 208 kB]
- Tiedemann, J., forthcoming,
- Uplug - a modular corpus tool for parallel corpora.
In L. Borin (ed.) Parallel Corpora, Parallel Worlds. Proceedings of
Parallel Corpus Symposium, Uppsala, April 22-23, 1999, Uppsala
University. Department of Linguistics.
[abstract]
[ps, 765 kB]
[gzipped ps, 226 kB]
[pdf, 320 kB]
- Tiedemann, Jörg, 1998a.
- Extraction of Translation Equivalents from Parallel Corpora In
Proceedings of the 11th Nordic Conference on Computational
Linguistics, Center for Sprogteknologi, Copenhagen, 1998.
[postscript, 326kB]
[compressed postscript, 43kB]
[html]