Pattern Tagger
The Pattern Tagger generates new annotations by matching a list of regular expressions against annotations on a selected layer; the first pattern that matches is used to tag the annotation.
For example, it could be used to identify filled pauses that might be variously
transcribed as um
, ummm
, ahm
, ah
, aah
, er
,
erm
, etc...
To do this, you would configure an annotator task with the following regular expressions applied to the orthography layer to create new annotations on a new filled pauses word-tag layer:
Regular Expression | Label | |
---|---|---|
[ua]+h*m+ |
→ | um |
e+r*m+ |
→ | um |
a+h+ |
→ | ah |
e+r+ |
→ | er |
The result would be that every instance of words like um
, ummm
,
ahm
, ah
, aah
, er
, erm
, etc... would be tagged
with either um
or ah
or er
.
The Pattern Matcher can also be used to tag combinations of words. For example,
you might want to identify all instances of the expression kind of thing
.
To do this, you would, you would configure an annotator task with the following regular expressions applied to the orthography layer to create new annotations on a new ellipsis phrase layer:
Regular Expression | Label | |
---|---|---|
kind(a|( of)) thing |
→ | ... |
The result would be that every instance the sequence of words kind of thing
or
kinda thing
would be tagged with ...
.