Package nzilbb.labbcat.server.search
A search matrix is generally encoded as a JSON object, with the following structure:
- columns
- An array of JSON objects, each representing a column (a word token) in the
search matrix UI. Each object as the following structure:
- layers
- A JSON object where keys are layer IDs, and values are either a JSON
object, or an array of JSON objects, with the following structure:
- pattern
- A regular expression to match the label.
- not
- Whether pattern matching is negated or not.
- min
- Inclusive numeric minimum value for the label.
- max
- Exclusive numeric maximum value for the label.
- anchorStart
- Whether this annotation starts with the word.
- anchorEnd
- Whether this annotation ends with the word.
- target
- Whether this is the target annotation of the search.
- adj
- How many word tokens away from this token the token that matches the next column can be.
- participantQuery
- An optional query to identify participants whose utterances should be searched.
- transcriptQuery
- An optional to identify transcripts whose utterances should be searched.
Any match condition in a column can contain no conditions - i.e. no pattern, min, or max. In this case, the condition is ignored.
e.g. the following JSON-encoded search matrix identifies all tokens of the word "the" at the beginning of a topix, followed withing three words by a token that starts with a vowel in the pronunciation:
{
"columns":[
{
"layers":{
"orthography":{
"id":"orthography",
"not":false,
"pattern":"and",
"target":true},
"phonemes":{
"id":"phonemes",
"target":false},
"topic":{
"id":"topic",
"target":false,
"anchorStart":true}
},
"adj":3
},
{
"layers":{
"orthography":{
"id":"orthography",
"target":false},
"phonemes":{
"id":"phonemes",
"target":false,
"pattern":"[aeiou].*"},
"topic":{
"id":"topic",
"target":false
}
}
}]
}
If a layer in the 'layers' object contains an array of object with multiple elements, then the layer expression is assumed to match multiple contiguous sub-word annotations. This allows matching of segments within context.
e.g. the following matrix identifies tokens where the spelling starts with "k", and segments start with /n/ followed by a vowel, which is the target:
{
"columns":[
{
"layers":{
"orthography":{
"id":"orthography",
"pattern":"k.*"},
"segment":[
{
"pattern":"n",
},{
"pattern":"[aeiou]",
"target":true
}]
}
}]
}-
Interface Summary Interface Description SearchResults Represents an iterable collection of results. -
Class Summary Class Description ArraySearchResults Search results constructed from an array of selected MatchId strings.Column One column in a search matrix, containing patterns matching one word token.CsvResults Search results constructed from a CSV file.LayerMatch One cell in a search matrix, containing a pattern to match on one layer.Matrix Complete search matrix.SearchTask Base class for search implementations, which return a set of search results.