MaltParser 0.1 uses libTimbl, part of TiMBL (Tilburg Memory-Based Learner), version 5.0, in order to learn parsing models from treebanks, and we gratefully acknowledge the use of this excellent software package. However, MaltParser 0.1 is a standalone application, so there is no need to install TiMBL separately.
> ./malt -f file
where file is the name of an option file, specifying all the parameters needed. The parser can be run in two basic modes, learning (inducing a parsing model from a treebank) and parsing (using the parsing model to parse new data). In the current version of the parser, new data must be tokenized and part-of-speech tagged in the Malt-TAB format. The option file, which also specifies the parser mode, is described in detail below.
In addition, the option file may contain comment lines starting with "--". The following table lists all the available parameters with their permissible values. Default values are marked with "*". Parameters that lack a default value must be specified in the option file (if they are required by the particular configuration of modules invoked). An example option file can be found here.
|INFILE||Input file||Filename||The input (for both learning and parsing) must be in the Malt-TAB format. During learning the four columns form, postag, head, deprel are required; during parsing only the first two (form, postag) are required. An example input file can be found here.|
|OUTFORMAT||Output data format||TAB|
|VERBOSE||Output to terminal||YES*|
|POSSET||Part-of-speech tagset||Filename||The part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.|
|DEPSET||Dependency type tagset||Filename||The dependency type tagset must be specified in a text file with one tag per line (and no blank lines). The first tag must be the tag assigned to root nodes. An example file can be found here.|
|PARSERMODE||Parser mode (learning or parsing)||PARSE*|
|Parsing (using a memory-based model)|
Learning (building an instance base for memory-based learning)
|MODELTYPE||Model type (feature set) for memory-based learning, described in detail below.||MBL2|
|CoNLL 2004: Non-lexical|
CoNLL 2004: Lexical
Coling 2004: Model 2
Coling 2004: Model 1
|PROJECTIVE||Enforce projectivity or not||YES|
|Under the NO condition, no check is made to ensure that REDUCE actions are legal. Under the YES condition, illegal REDUCE actions are replaced by SHIFT actions.|
|MODELFILE||Model file||Filename||The model file contains the instance base for memory-based learning. This is an output file during learning and a required input file during parsing. A (small) example file can be found here.|
|COMMAND||TiMBL command||String||This is the commandline options sent to the TiMBL server for memory-based learning. The default value is "-m M -k 5 -d ID -L 3" (see TiMBL Reference Guide).|
The dependency arcs represent dependencies that may or may not be present at decision time, where TL, TR and NL represent the (parts-of-speech of) the leftmost and rightmost dependents of Top and Next (in case there are several dependents).
Red features are lexical features (word forms); blue features are part-of-speech features (PoS tags); and green features are dependency features (dependency types). Note that for the words Top and Next, there are both lexical and part-of-speech features.
The following table shows which features are used in the different parsing models.
NB: The "missing" models MBL1 and MBL5 are only of historical interest and are not available in the current release.