MOR Tagger

Annotator that tags words with morphosyntactic codes according to MOR, a tool developed for the TalkBank project.

MOR tags each word token with multiple complex annotations, which are then disambiguated with the POST tool, for example:

Word MOR
I'll pro:sub|I~mod|will
sing v|sing
and coord|and
talk v|talk
about prep|about
my det:poss|my
blogging n:gerund|blog-PRESP
lazily adv|laze&dadj-Y-LY

This annotator can simply tag each token with it's single complex tag.

As can be seen above, this can include tag groups which represent different alternative analyses of the word (e.g. talk can be analysed as a noun n|talk or a verb v|talk), and also word groups which represent different grammatical words within the orthographic word (e.g. I'll is made up of a pronoun pro:sub|I and a modal verb mod|will).

This annotator also supports teasing apart tag groups and word groups, so that each token is tagged with possibly more than one distinct morphosyntactic annotation, for example:

Word MOR
I'll pro:sub|I
mod|will
sing v|sing
and coord|and
talk v|talk
about prep|about
my det:poss|my
blogging n:gerund|blog-PRESP
lazily adv|laze&dadj-Y-LY

Furthermore, given several destination layers, the annotator can parse the different parts of each morphosyntactic tag, in order to annotate separately:

  • prefixes
  • part of speech
  • part of speech subcategories
  • stem
  • fusional suffixes
  • suffixes
  • gloss

For example:

Word POS POS Subcategory Stem Fusional Suffix Suffix
I'll pro sub I
mod will
sing v sing
and coord and
talk v talk
about prep about
my det poss my
blogging n gerund blog PRESP
lazily adv laze dadj Y
LY

When this annotator is installed, by default it downloads the UnixCLAN source code, which includes MOR and POST, from https://dali.talkbank.org/clan/, which requires access to the internet, and also a working g++ compiler.

It also downloads and installs the English (eng) grammar from https://talkbank.org/0info/mor/ if no other grammar is supplied.

NB Currently the MOR Tagger only works on unix-like systems. When the annotator is first installed, the first thing it tries to do is download and compile MOR and POST, which requires the g++ compiler to be available on the server. In order to ensure the compiler has been installed, run:

  • apt install g++ gcc-multilib g++-multilib (on Ubuntu)
  • yum install gcc-c++ glibc-devel glibc-devel.i686 glibc-devel.x86_64 lsof (on RHEL/CentOS)