(Denna sida finns bara på engelska.)
Only a small subset of installed software is mentioned here.
Granska Tagger, an efficient Hidden Markov Model part-of-speech tagger for Swedish has its own page.
Eric Brill has written a
tagger that doesn’t seem to have a more specific name than
Rule Based Tagger.
See what it says about copyright in
It assumes that you have a special directory as current when you use it,
so you can do like this:
- cd /local/ling/brill/RBT/Bin_and_Data
- ./tagger parameters
The parameters are LEXICON YOUR-CORPUS BIGRAMS LEXICALRULEFILE CONTEXTUALRULEFILE.
So it’s convenient to give parameters that are files in that
directory, for example
LEXICON.BROWN as lexicon, but for
other files, like your own, for example the corpus, you then need to
give a full path.
There is more information in the folder
HTK (Hidden Markov Model
Toolkit) is used primarily for speech recognition.
There is documentation locally in the folder
The programs that are part of this package are
LSubset, LMerge, LNewMap,
LNorm, LPlex, LGList, LGPrep, LLink, LBuild, LFoF, LGCopy, HLMCopy,
LAdapt, Cluster, HSmooth, HVite, HResults, HSGen, HParse, HQuant,
HRest, HLRescore, HLStats, HMMIRest, HInit, HLEd, HList, HDMan,
HERest, HHEd, HBuild, HCompV, HCopy, HSLab.
NLTK (Natural Language Toolkit) is a series of modules and corpora for research and education in NLP in Python. See more on the Python page.
Language Modeling Toolkit has several programs.
They are in
/local/ling/srilm/bin and are documented
with man pages.
There are also some special man pages that are for more
than one program, see for example
Tokenizes, segments, tags and parses text files in Swedish.
svannotate --help to get some help.
/local/ling/svannotate/README for more.
In the same folder are
talbanken-default+splitmorph2.mco that the program
uses unless you tells it something else.
is presented as
a very efficient statistical part-of-speech
tagger that is trainable on different languages and virtually any
We have a license (for the whole department) for non-commercial
use. There are four programs
tnt, tnt-diff, tnt-para
See the folder
/local/ling/tnt with subfolders for
documentation and license.
UUParser: A transition-based dependency parser for Universal Dependencies is made at the department. An example:
uuparser --out /tmp/out --datadir /corpora/ud/ud-treebanks-v2.5/ --include "mr_ufal"