nzilbb.labbcat.server.search (LaBB-CAT backend 1.0.7 API)

Classes that manage definition, serializations, and deserialization of search matrices.

A search matrix is generally encoded as a JSON object, with the following structure:

columns

An array of JSON objects, each representing a column (a word token) in the search matrix UI. Each object as the following structure:

layers

A JSON object where keys are layer IDs, and values are either a JSON object, or an array of JSON objects, with the following structure:

pattern: A regular expression to match the label.
not: Whether pattern matching is negated or not.
min: Inclusive numeric minimum value for the label.
max: Exclusive numeric maximum value for the label.
anchorStart: Whether this annotation starts with the word.
anchorEnd: Whether this annotation ends with the word.
target: Whether this is the target annotation of the search.

adj

How many word tokens away from this token the token that matches the next column can be.

participantQuery

An optional query to identify participants whose utterances should be searched.

transcriptQuery

An optional to identify transcripts whose utterances should be searched.

Any match condition in a column can contain no conditions - i.e. no pattern, min, or max. In this case, the condition is ignored.

e.g. the following JSON-encoded search matrix identifies all tokens of the word "the" at the beginning of a topix, followed withing three words by a token that starts with a vowel in the pronunciation:

{
   "columns":[
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "not":false,
           "pattern":"and",
           "target":true},
         "phonemes":{
           "id":"phonemes",
           "target":false},
         "topic":{
           "id":"topic",
           "target":false,
           "anchorStart":true}
       },
       "adj":3
     },
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "target":false},
         "phonemes":{
           "id":"phonemes",
           "target":false,
           "pattern":"[aeiou].*"},
         "topic":{
           "id":"topic",
           "target":false
         }
       }
     }]
 }

If a layer in the 'layers' object contains an array of object with multiple elements, then the layer expression is assumed to match multiple contiguous sub-word annotations. This allows matching of segments within context.

e.g. the following matrix identifies tokens where the spelling starts with "k", and segments start with /n/ followed by a vowel, which is the target:

{
   "columns":[
     {
       "layers":{
         "orthography":{
           "id":"orthography",
           "pattern":"k.*"},
         "segment":[
           {
             "pattern":"n",
           },{
             "pattern":"[aeiou]",
             "target":true
           }]
       }
     }]
 }

Interface Summary
Interface Description

SearchResults
Represents an iterable collection of results.

Interface Summary
Interface	Description
SearchResults	Represents an iterable collection of results.

Class Summary
Class	Description
ArraySearchResults	Search results constructed from an array of selected MatchId strings.
Column	One column in a search matrix, containing patterns matching one word token.
LayerMatch	One cell in a search matrix, containing a pattern to match on one layer.
Matrix	Complete search matrix.
SearchTask	Base class for search implementations, which return a set of search results.

Package nzilbb.labbcat.server.search