Package nzilbb.labbcat.server.search

Classes that manage definition, serializations, and deserialization of search matrices.

A search matrix is generally encoded as a JSON object, with the following structure:

columns
An array of JSON objects, each representing a column (a word token) in the search matrix UI. Each object as the following structure:
layers
A JSON object where keys are layer IDs, and values are either a JSON object, or an array of JSON objects, with the following structure:
pattern
A regular expression to match the label.
not
Whether pattern matching is negated or not.
min
Inclusive numeric minimum value for the label.
max
Exclusive numeric maximum value for the label.
anchorStart
Whether this annotation starts with the word.
anchorEnd
Whether this annotation ends with the word.
target
Whether this is the target annotation of the search.
adj
How many word tokens away from this token the token that matches the next column can be.
participantQuery
An optional query to identify participants whose utterances should be searched.
transcriptQuery
An optional to identify transcripts whose utterances should be searched.

Any match condition in a column can contain no conditions - i.e. no pattern, min, or max. In this case, the condition is ignored.

e.g. the following JSON-encoded search matrix identifies all tokens of the word "the" at the beginning of a topix, followed withing three words by a token that starts with a vowel in the pronunciation:

{
   "columns":[
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "not":false,
           "pattern":"and",
           "target":true},
         "phonemes":{
           "id":"phonemes",
           "target":false},
         "topic":{
           "id":"topic",
           "target":false,
           "anchorStart":true}
       },
       "adj":3
     },
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "target":false},
         "phonemes":{
           "id":"phonemes",
           "target":false,
           "pattern":"[aeiou].*"},
         "topic":{
           "id":"topic",
           "target":false
         }
       }
     }]
 }

If a layer in the 'layers' object contains an array of object with multiple elements, then the layer expression is assumed to match multiple contiguous sub-word annotations. This allows matching of segments within context.

e.g. the following matrix identifies tokens where the spelling starts with "k", and segments start with /n/ followed by a vowel, which is the target:

{
   "columns":[
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "pattern":"k.*"},
         "segment":[
           {
             "pattern":"n",
           },{
             "pattern":"[aeiou]",
             "target":true
           }]
       }
     }]
 }