(Denna sida finns bara på engelska.)
Only a small subset of installed software is mentioned here.
Granska Tagger, an efficient Hidden Markov Model part-of-speech tagger for Swedish has its own page.
Others: [Brill] [HTK] [NLTK] [SRILM] [svannotate] [TnT] [UUParser]
Brill
Eric Brill has written a
tagger that doesn’t seem to have a more specific name than Rule Based Tagger
.
See what it says about copyright in
/local/ling/brill/RBT/COPYRIGHT
.
It assumes that you have a special directory as current when you use it,
so you can do like this:
- cd /local/ling/brill/RBT/Bin_and_Data
- ./tagger parameters
The parameters are LEXICON YOUR-CORPUS BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE.
So it’s convenient to give parameters that are files in that
directory, for example LEXICON.BROWN
as lexicon, but for
other files, like your own, for example the corpus, you then need to
give a full path.
There is more information in the folder
/local/ling/brill/RBT/Docs/
.
HTK
HTK (Hidden Markov Model
Toolkit) is used primarily for speech recognition.
There is documentation locally in the folder
/local/share/doc/htk/
.
The programs that are part of this package are
LSubset, LMerge, LNewMap,
LNorm, LPlex, LGList, LGPrep, LLink, LBuild, LFoF, LGCopy, HLMCopy,
LAdapt, Cluster, HSmooth, HVite, HResults, HSGen, HParse, HQuant,
HRest, HLRescore, HLStats, HMMIRest, HInit, HLEd, HList, HDMan,
HERest, HHEd, HBuild, HCompV, HCopy, HSLab
.
NLTK
NLTK (Natural Language Toolkit) is a series of modules and corpora for research and education in NLP in Python. See more on the Python page.
SRILM
The SRI
Language Modeling Toolkit has several programs.
They are in /local/ling/srilm/bin
and are documented
with man pages.
There are also some special man pages that are for more
than one program, see for example
man training-scripts
.
svannotate
Tokenizes, segments, tags and parses text files in Swedish.
Run svannotate --help
to get some help.
Read /local/ling/svannotate/README
for more.
In the same folder are suc.hun
and talbanken-default+splitmorph2.mco
that the program
uses unless you tells it something else.
Trigrams’n’Tags (TnT)
TnT
is presented as a very efficient statistical part-of-speech
tagger that is trainable on different languages and virtually any
tagset
.
We have a license (for the whole department) for non-commercial
use. There are four programs tnt, tnt-diff, tnt-para
and tnt-wc
.
See the folder /local/ling/tnt
with subfolders for
documentation and license.
UUParser
UUParser: A transition-based dependency parser for Universal Dependencies is made at the department. An example:
uuparser --out /tmp/out --datadir /corpora/ud/ud-treebanks-v2.5/ --include "mr_ufal"