MOR Tagger
Annotator that tags words with morphosyntactic codes according to MOR, a tool developed for the TalkBank project.
MOR tags each word token with multiple complex annotations, which are then disambiguated with the POST tool, for example:
| Word | MOR | |
| I'll | pro:sub|I~mod|will | |
| sing | v|sing | |
| and | coord|and | |
| talk | v|talk | |
| about | prep|about | |
| my | det:poss|my | |
| blogging | n:gerund|blog-PRESP | |
| lazily | adv|laze&dadj-Y-LY |
This annotator can simply tag each token with it's single complex tag.
As can be seen above, this can include tag groups
which represent
different alternative analyses of the word (e.g. talk
can be analysed as a
noun n|talk or a verb v|talk), and also word groups
which
represent different grammatical words within the orthographic word (e.g. I'll
is made up of a pronoun pro:sub|I and a modal verb mod|will).
This annotator also supports teasing apart tag groups and word groups, so that each token is tagged with possibly more than one distinct morphosyntactic annotation, for example:
| Word | MOR | |
| I'll | pro:sub|I | |
| mod|will | ||
| sing | v|sing | |
| and | coord|and | |
| talk | v|talk | |
| about | prep|about | |
| my | det:poss|my | |
| blogging | n:gerund|blog-PRESP | |
| lazily | adv|laze&dadj-Y-LY |
Furthermore, given several destination layers, the annotator can parse the different parts of each morphosyntactic tag, in order to annotate separately:
- prefixes
- part of speech
- part of speech subcategories
- stem
- fusional suffixes
- suffixes
- gloss
For example:
| Word | POS | POS Subcategory | Stem | Fusional Suffix | Suffix | |
| I'll | pro | sub | I | |||
| mod | will | |||||
| sing | v | sing | ||||
| and | coord | and | ||||
| talk | v | talk | ||||
| about | prep | about | ||||
| my | det | poss | my | |||
| blogging | n | gerund | blog | PRESP | ||
| lazily | adv | laze | dadj | Y | ||
| LY |
When this annotator is installed, by default it downloads the UnixCLAN source code, which includes MOR and POST, from https://dali.talkbank.org/clan/, which requires access to the internet, and also a working g++ compiler.
It also downloads and installs the English (eng) grammar from https://talkbank.org/0info/mor/ if no other grammar is supplied.
NB Currently the MOR Tagger only works on unix-like systems. When the annotator is first installed, the first thing it tries to do is download and compile MOR and POST, which requires the g++ compiler to be available on the server. In order to ensure the compiler has been installed, run:
- apt install g++ gcc-multilib g++-multilib (on Ubuntu)
- yum install gcc-c++ glibc-devel glibc-devel.i686 glibc-devel.x86_64 lsof (on RHEL/CentOS)
nzilbb.annotator.mor