nzilbb.annotator.patterntagger

Pattern Tagger

The Pattern Tagger generates new annotations by matching a list of regular expressions against annotations on a selected layer; the first pattern that matches is used to tag the annotation.

For example, it could be used to identify filled pauses that might be variously transcribed as um, ummm, ahm, ah, aah, er, erm, etc...

To do this, you would configure an annotator task with the following regular expressions applied to the orthography layer to create new annotations on a new filled pauses word-tag layer:

Regular Expression		Label
`[ua]+h*m+`	→	um
`e+r*m+`	→	um
`a+h+`	→	ah
`e+r+`	→	er

The result would be that every instance of words like um, ummm, ahm, ah, aah, er, erm, etc... would be tagged with either um or ah or er.

The Pattern Matcher can also be used to tag combinations of words. For example, you might want to identify all instances of the expression kind of thing.

To do this, you would, you would configure an annotator task with the following regular expressions applied to the orthography layer to create new annotations on a new ellipsis phrase layer:

Regular Expression		Label
`kind(a\|( of)) thing`	→	...

The result would be that every instance the sequence of words kind of thing or kinda thing would be tagged with ....