Package nzilbb.labbcat.server.search
A search matrix is generally encoded as a JSON object, with the following structure:
- columns
- An array of JSON objects, each representing a column (a word token) in the
search matrix UI. Each object as the following structure:
- layers
- A JSON object where keys are layer IDs, and values are either a JSON
object, or an array of JSON objects, with the following structure:
- pattern
- A regular expression to match the label.
- not
- Whether pattern matching is negated or not.
- min
- Inclusive numeric minimum value for the label.
- max
- Exclusive numeric maximum value for the label.
- anchorStart
- Whether this annotation starts with the word.
- anchorEnd
- Whether this annotation ends with the word.
- target
- Whether this is the target annotation of the search.
- adj
- How many word tokens away from this token the token that matches the next column can be.
- participantQuery
- An optional query to identify participants whose utterances should be searched.
- transcriptQuery
- An optional to identify transcripts whose utterances should be searched.
Any match condition in a column can contain no conditions - i.e. no pattern, min, or max. In this case, the condition is ignored.
e.g. the following JSON-encoded search matrix identifies all tokens of the word "the" at the beginning of a topix, followed withing three words by a token that starts with a vowel in the pronunciation:
{ "columns":[ { "layers":{ "orthography":{ "id":"orthography", "not":false, "pattern":"and", "target":true}, "phonemes":{ "id":"phonemes", "target":false}, "topic":{ "id":"topic", "target":false, "anchorStart":true} }, "adj":3 }, { "layers":{ "orthography":{ "id":"orthography", "target":false}, "phonemes":{ "id":"phonemes", "target":false, "pattern":"[aeiou].*"}, "topic":{ "id":"topic", "target":false } } }] }
If a layer in the 'layers' object contains an array of object with multiple elements, then the layer expression is assumed to match multiple contiguous sub-word annotations. This allows matching of segments within context.
e.g. the following matrix identifies tokens where the spelling starts with "k", and segments start with /n/ followed by a vowel, which is the target:
{ "columns":[ { "layers":{ "orthography":{ "id":"orthography", "pattern":"k.*"}, "segment":[ { "pattern":"n", },{ "pattern":"[aeiou]", "target":true }] } }] }
-
Interface Summary Interface Description SearchResults Represents an iterable collection of results. -
Class Summary Class Description ArraySearchResults Search results constructed from an array of selected MatchId strings.Column One column in a search matrix, containing patterns matching one word token.LayerMatch One cell in a search matrix, containing a pattern to match on one layer.Matrix Complete search matrix.SearchTask Base class for search implementations, which return a set of search results.