PWA

The PLUG Word Aligner - PWA

pwa AT stp.lingfil.uu.se PLUG | Uplug | download | documentation | examples | screen shots | publications

About PWA

The PLUG Word Aligner (PWA) is a collection of tools for the automatic alignment of word correspondences in bilingual parallel texts. The system integrates a set of modules for knowledge-lite approaches to word alignment, with various possibilities to change configuration and to adapt the system to other language pairs and text types. The system requires sentence aligned bitexts as its input and produces a list of word and phrase correspondences in the text (link instances) and an additional bilingual lexicon from these instances (type links).

PWA comprises 2 word alignment systems, the Linköping Word Aligner (LWA) and the Uppsala Word Aligner (UWA). Both system were developed within the the co-operative project on parallel text, PLUG, that was carried out between November, 1997 and March, 2000. The system was developed at the Department of Computer and Information Science at Linköping University, Linköping/Sweden and the Department of Linguistics at Uppsala University, Uppsala/Sweden. PWA integrates both systems in the modular corpus toolbox Uplug and includes additional tools for the automatic generation of monolingual word collocations (phrases) and for the automated evaluation of alignment results (the PLUG Scorer - PLS).

Download

PWA is available to the research community according to this licence.
PWA is available as binary distribution for the following operating systems Have a look at the list of F.A.Q. (under construction) or at the online documentation! The documentation includes sections about installing and running PWA.

PWA is implemented in Perl and Perl/Tk and requires quite a lot resources from your system. Make sure to have at least 20 MB space available on your harddisk before installing PWA-demo. The complete installation requires about 12 MB (both on Windows and on Linux). The system requires a minimum of 32 MB internal memory (RAM) for running the alignment. However, 64 MB of RAM or more are strongly recommended.

Documentation

PWA documentation is available in several formats: A Troubleshooting Guide and a list of F.A.Q.s are planned. Please be patient and check this page again!

Example alignments

Example links were produced from the declarations of the Swedish government (Swedish/English bitext).
source target type system

Screen shots


Publications

Ahrenberg, Lars, Merkel, Magnus, Sågvall Hein, A., Tiedemann, J., 2000.
Evaluation of Word Alignment Systems. In Proceedings of LREC 2000, Athens/Greece. [pdf, 406kB] [ps, 757kB] [gzipped ps, 236kB]
Ahrenberg, Lars, Merkel, Magnus, Sågvall Hein, A., Tiedemann, J., 1999.
Evaluation of LWA and UWA. Report from the PLUG project, available as WP 15 in Working Papers in Computational Linguistics and Language Engineering, Department of Linguistics, Uppsala University, 1999. [ps, 1620kB] [gzipped ps, 472kB]
Ahrenberg, Lars, Andersson, Mikael & Merkel, Magnus, 1998.
A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallell Texts. Proceedings of COLING '98/ACL '98. [postscript, 513kB] [gzipped postscript, 91kB]
Ahrenberg, L., Andersson, M. & Merkel, M., forthcoming.
A knowledge-lite approach to word alignment. J. Veronis (ed.), Parallel Text Processing, Kluwer Academic Press.
Magnus Merkel & Mikael Andersson, 2000,
Knowledge-lite extraction of multi-word units with language filters and entropy thresholds. In Proceedings of RIAO'2000, Collége de France, Paris, France, April 12-14, 2000, Volume1, pp. 737-746.
[pdf, 39kB] [ps, 276kB] [gzipped ps, 79kB]
Merkel, M., 1999b,
Understanding and enhancing translation by parallel text processing. Linköping Studies in Science and Technology. Dissertation No. 607. Linköping University. Dept. of Computer and Information Science.
Sågvall Hein, A., forthcoming.
The PLUG-project: Parallel Corpora in Linköping, Uppsala, Göteborg. Aims and achievements. In L. Borin (ed.) Parallel Corpora, Parallel Worlds, Proceedings of Parallel Corpus Symposium, Uppsala, April 22-23, 1999, Uppsala University.
Tiedemann, J., 2000,
Extracting Phrasal Terms using Bitext. In Proceedings of the Workshop on Terminology Resources and Computation, held in conjunction with LREC 2000, Athens/Greece, May 2000.
[pdf, 167 kB] [ps, 146 kB] [gzipped ps, 60 kB]
Tiedemann, J., 1999,
Word Alignment Step by Step. In Proceedings of the 12th Nordic Conference on Computational Linguistics, 1999, Technical University of Trondheim. Department of Linguistics.
[pdf, 442 kB] [ps, 683 kB] [gzipped ps, 208 kB]
Tiedemann, J., forthcoming,
Uplug - a modular corpus tool for parallel corpora. In L. Borin (ed.) Parallel Corpora, Parallel Worlds. Proceedings of Parallel Corpus Symposium, Uppsala, April 22-23, 1999, Uppsala University. Department of Linguistics.
[abstract] [ps, 765 kB] [gzipped ps, 226 kB] [pdf, 320 kB]
Tiedemann, Jörg, 1998a.
Extraction of Translation Equivalents from Parallel Corpora In Proceedings of the 11th Nordic Conference on Computational Linguistics, Center for Sprogteknologi, Copenhagen, 1998.
[postscript, 326kB] [compressed postscript, 43kB] [html]