nzilbb.annotator.mor

MOR Tagger

Annotator that tags words with morphosyntactic codes according to MOR, a tool developed for the TalkBank project.

MOR tags each word token with multiple complex annotations, which are then disambiguated with the POST tool, for example:

Word		MOR
I'll		pro:sub\|I~mod\|will
sing		v\|sing
and		coord\|and
talk		v\|talk
about		prep\|about
my		det:poss\|my
blogging		n:gerund\|blog-PRESP
lazily		adv\|laze&dadj-Y-LY

This annotator can simply tag each token with it's single complex tag.

As can be seen above, this can include tag groups which represent different alternative analyses of the word (e.g. talk can be analysed as a noun n|talk or a verb v|talk), and also word groups which represent different grammatical words within the orthographic word (e.g. I'll is made up of a pronoun pro:sub|I and a modal verb mod|will).

This annotator also supports teasing apart tag groups and word groups, so that each token is tagged with possibly more than one distinct morphosyntactic annotation, for example:

Word		MOR
I'll		pro:sub\|I
I'll		mod\|will
sing		v\|sing
and		coord\|and
talk		v\|talk
about		prep\|about
my		det:poss\|my
blogging		n:gerund\|blog-PRESP
lazily		adv\|laze&dadj-Y-LY

Furthermore, given several destination layers, the annotator can parse the different parts of each morphosyntactic tag, in order to annotate separately:

prefixes
part of speech
part of speech subcategories
stem
fusional suffixes
suffixes
gloss

For example:

Word	POS	POS Subcategory	Stem	Fusional Suffix	Suffix
I'll	pro	sub	I
I'll	mod		will
sing	v		sing
and	coord		and
talk	v		talk
about	prep		about
my	det	poss	my
blogging	n	gerund	blog		PRESP
lazily	adv		laze	dadj	Y
lazily	adv		laze	dadj	LY

When this annotator is installed, by default it downloads the UnixCLAN source code, which includes MOR and POST, from https://dali.talkbank.org/clan/, which requires access to the internet, and also a working g++ compiler.

It also downloads and installs the English (eng) grammar from https://talkbank.org/0info/mor/ if no other grammar is supplied.

NB Currently the MOR Tagger only works on unix-like systems. When the annotator is first installed, the first thing it tries to do is download and compile MOR and POST, which requires the g++ compiler to be available on the server. In order to ensure the compiler has been installed, run:

apt install g++ gcc-multilib g++-multilib (on Ubuntu)
yum install gcc-c++ glibc-devel glibc-devel.i686 glibc-devel.x86_64 lsof (on RHEL/CentOS)