Download Penn2Malt 0.2
- NB: Penn2Malt has been superseded by the more sophisticated
LTH converter (previously known as pennconverter), which we strongly
recommend. Penn2Malt is only maintained for reproducibility of old results.
-
[Download Penn2Malt.jar]
Click on this link to download Penn2Malt 0.2.
The software can be used freely. It comes with no warranty, but we welcome all comments, bug reports,
and suggestions for improvements.
User Guide for Penn2Malt 0.2
To run Penn2Malt you need the Java VM (tested for JRE 1.4.1).
Usage: java -jar Penn2Malt.jar <penn file> <head rules> <deprel> <punctuation> <penn|chtb> [tagging file]
where
- <penn file> is the Penn treebank file to be converted
- <head rules> is a file containing head rules à la Magerman and Collins
- <deprel> is a flag that determines which dependency labels will be used;
there are currently three options:
- 1 = Penn labels: phrase label + head label + dependent label (à la Collins)
- 2 = Penn labels: dependent label only
- 3 = Malt: hard-coded mapping to dependency labels (SBJ, OBJ, PRD, NMOD, VMOD, etc.)
When Penn labels are used, only phrase labels are used, except that -SBJ and -PRD are
retained on NPs, and -OBJ is added to NPs under VP that lack an adverbial function tag.
- <punctuation> is a flag that determines whether
to remove punctuation or not; two options:
- 1 = Remove punctuation according to Collins (1999)
- 2 = Retain all punctuation
- <penn|chtb> invokes specific rules for Penn Treebank (penn)
and Penn Chinese Treebank (chtb) using <deprel> option 3
- [tagging file] is an optional file containing a part-of-speech tagged
version of the data to be converted; if this file is supplied, the program will
insert the tags from this file in place of the gold standard tags
The following is an example of how the program can be used:
java -jar Penn2Malt.jar penn.23.1-2 headrules 3 2 penn penn.23.1-2.tagged
Running this command will convert the data in penn.23.1-2
using the head rules in headrules (which are the rules
used by Yamada and Matsumoto 2003), using the hard-coded Malt mapping to insert
dependency labels (option 3) and retaining all punctuation (option 2),
while replacing the gold standard tags with tags taken from the file
penn.23.1-2.tagged. The flag penn is
needed to ensure that dependency rules are inferred using the rules for English
and not for Chinese. (Head rules for converting the Penn Chinese Treebank, compiled
by Yuan Ding at Penn for the purpose of machine translation, can be found in
chn_headrules. Using this file together with
<deprel> option 3 and the flag chtb will convert
data from the Penn Chinese Treebank to dependency structures with the same
dependency labels as for English.)
The program produces three output files:
- Extension .tab = The converted data
- Extension .pos = A list of the part-of-speech tags occurring in the data
- Extension .dep = A list of the dependency labels occurring in the data
The format of the output is Malt-TAB, and the filenames
are automatically constructed from the input file name and the chosen options.