nzilbb-labbcat module¶
LabbcatView class¶
- class labbcat.LabbcatView(labbcatUrl, username=None, password=None)¶
API for querying a LaBB–CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs
This interface provides only read-only operations, i.e. those that can be performed by users with “view” permission.
Constructor arguments:
- Parameters:
labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
username (str or None) – The username for logging in to the server, if necessary.
password (str or None) – The password for logging in to the server, if necessary.
- Attributes:
language: The language code for server message localization, e.g. “es-AR”
Example:
import labbcat # create annotation store client corpus = labbcat.LabbcatView("https://labbcat.canterbury.ac.nz", "demo", "demo") # show some basic information print("Information about LaBB-CAT at " + corpus.getId()) layerIds = corpus.getLayerIds() for layerId in layerIds: print("layer: " + layerId) corpora = corpus.getCorpusIds() for c in corpora: print("transcripts in: " + c) for transcript in corpus.getTranscriptIdsInCorpus(c): print(" " + transcript)
- allUtterances(participantIds, transcriptTypes=None, mainParticipant=True)¶
Identifies all utterances by the given participants.
A taskId is returned. To get the actual utterances, which are represented the same way as search results, call getMatches()
- Parameters:
participantIds – A list of participant IDs to identify the utterances of.
transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.
mainParticipant – true to search only main-participant utterances, false to search all utterances.
- Returns:
The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.
- Return type:
str
- cancelTask(threadId)¶
Cancels (but does not release) a running task.
- Parameters:
threadId (str.) – The ID of the task.
- countAnnotations(id, layerId, maxOrdinal=None)¶
Gets the number of annotations on the given layer of the given transcript.
- Parameters:
id (str) – The ID of the transcript.
layerId (str) – The ID of the layer.
maxOrdinal (int or None) – The maximum ordinal for the counted annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are counted, regardless of their ordinal.
- Returns:
A (possibly empty) array of annotations.
- Return type:
int
- countMatchingAnnotations(expression)¶
Counts the number of annotations that match a particular pattern.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
id == 'ew_0_456'
!/th[aeiou].//.test(label)
first('participant').label == 'Robert' && first('utterances').start.offset == 12.345
graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5
previous.id == 'ew_0_456'
NB all expressions must match by either id or layer.id.
- Parameters:
expression (str) – An expression that determines which annotations match.
- Returns:
The number of matching annotations.
- Return type:
int
- countMatchingParticipantIds(expression)¶
Counts the number of participants that match a particular pattern.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
/Ada.+/.test(id)
labels('corpus').includes('CC')
labels('participant_languages').includes('en')
labels('transcript_language').includes('en')
!/Ada.+/.test(id) && first('corpus').label == 'CC'
all('transcript_rating').length < 2
all('participant_rating').length = 0
!annotators('transcript_rating').includes('labbcat')
first('participant_gender').label == 'NA'
The following functions can be used to generate an expression of common types:
Example:
numQbParticipants = corpus.countMatchingParticipantIds( labbcat.expressionFromCorpora("QB"))
- Parameters:
expression (str) – An expression that determines which participants match.
- Returns:
The number of matching participants.
- Return type:
int
- countMatchingTranscriptIds(expression)¶
Counts the number of transcripts that match a particular pattern.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
/Ada.+/.test(id)
labels('participant').includes('Robert')
('CC', 'IA', 'MU').includes(first('corpus').label)
first('episode').label == 'Ada Aitcheson'
first('transcript_scribe').label == 'Robert'
first('participant_languages').label == 'en'
first('noise').label == 'bell'
labels('transcript_languages').includes('en')
labels('participant_languages').includes('en')
labels('noise').includes('bell')
all('transcript_languages').length gt; 1
all('participant_languages').length gt; 1
all('transcript').length gt; 100
annotators('transcript_rating').includes('Robert')
!/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')
The following functions can be used to generate an expression of common types:
Example:
numQuakeFaceTranscripts = corpus.countMatchingTranscriptIds( labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))
- Parameters:
expression (str) – An expression that determines which transcripts match.
- Returns:
The number of matching transcripts.
- Return type:
int
- formatTranscript(id, layerIds, mimeType, dir=None)¶
Get transcript in a specified format.
- Parameters:
id (str) – The ID of the transcript to export.
layerIds (list of str) – A list of IDs of annotation layers to include in the transcript.
mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
dir (str) – A directory in which the file(s) should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
- Returns:
A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each transcript, this is not guaranteed; some formats generate a mutiple files per transcript.
- Return type:
list of str
- getAnchors(id, anchorIds)¶
Gets the given anchors in the given transcript.
- Parameters:
id (str) – The ID of the transcript.
anchorIds (list of str) – A list of anchor IDs.
- Returns:
A (possibly empty) list of anchors.
- Return type:
list of dictionaries
- getAnnotations(id, layerId, maxOrdinal=None, pageLength=None, pageNumber=None)¶
Gets the annotations on the given layer of the given transcript.
- Parameters:
id (str) – The ID of the transcript.
layerId – The ID of the layer.
maxOrdinal (int or None) – The maximum ordinal for the returned annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are returned, regardless of their ordinal.
pageLength (int or None) – The maximum number of IDs to return, or null to return all.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
- Returns:
A (possibly empty) list of annotations.
- Return type:
list of dictionaries
- getAvailableMedia(id)¶
List the media available for the given transcript.
- Parameters:
id (str) – The transcript ID.
- Returns:
List of media files available for the given transcript.
- Return type:
list of dictionaries
- getCorpusIds()¶
Gets a list of corpus IDs.
- Returns:
A list of corpus IDs.
- Return type:
list
- getDeserializerDescriptors()¶
Lists the descriptors of all registered serializers.
Deserializers are modules that import annotation structures from a specific file format, e.g. Praat TextGrid, plain text, etc.
- Returns:
A list of the descriptors of all registered serializers.
- Return type:
list of dictionaries
- getDictionaries()¶
List the dictionaries available.
- Returns:
A dictionary of lists, where keys are layer manager IDs, each of which containing a list of IDs for dictionaries that the layer manager makes available.
- Return type:
dict of lists
- getDictionaryEntries(managerId, dictionaryId, keys)¶
Lookup entries in a dictionary.
- Parameters:
managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries()).
dictionaryId –
The ID of the dictionary, as returned by getDictionaries()).
keys (list of str or list of dict) – A list of keys (words) identifying entries to look up.
- Returns:
A dictionary of lists, where keys are given keys, each of which containing a list of entries. Keys with no corresponding entry in the given dictionary will be present in the returned result, but will have no entries.
- Return type:
dict of lists
- getEpisodeDocuments(id)¶
Get a list of documents associated with the episode of the given transcript.
- Parameters:
id (str) – The transcript ID.
- Returns:
List of URLs to documents.
- Return type:
list of str
- getFragments(transcriptIds, layerIds, mimeType, dir=None, startOffsets=None, endOffsets=None, prefixNames=True)¶
Get transcript fragments in a specified format.
The intervals to extract can be defined in two possible ways:
transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
- Parameters:
transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.
mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
- Returns:
A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each interval, this is not guaranteed; some formats generate a single file or a fixed collection of files regardless of how many fragments there are.
- Return type:
list of str
- getFragmentsAsync(transcriptIds, layerIds, mimeType, startOffsets=None, endOffsets=None, prefixNames=True)¶
Starts a server task for getting transcript fragments in a specified format.
The intervals to extract can be defined in two possible ways:
transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
- Parameters:
transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.
mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
- Returns:
The threadId of the resulting task, which can be passed in to taskStatus(), waitForTask() taskResults() releaseTask(), etc.
- Return type:
str
- getId()¶
Gets the store’s ID.
- Returns:
The annotation store’s ID.
- Return type:
str
- getLayer(id)¶
Gets a layer definition.
- Parameters:
id (str) – ID of the layer to get the definition for.
- Returns:
The definition of the given layer.
- Return type:
dictionary
- getLayerIds()¶
Gets a list of layer IDs (annotation ‘types’).
- Returns:
A list of layer IDs.
- Return type:
list
- getLayers()¶
Gets a list of layer definitions.
- Returns:
A list of layer definitions.
- Return type:
list of dictionaries
- getMatchAnnotations(matchIds, layerIds, targetOffset=0, annotationsPerLayer=1)¶
Gets annotations on selected layers related to search results returned by a previous call to getMatches(threadId).
The returned list of lists contains dictionaries that represent individual annotations, with the following entries:
“id” : The annotation’s unique ID
“layerId” : The layer the annotation comes from
“label” : The annotation’s label or value
“startId” : The ID of the annotations start anchor
“endId” : The ID of the annotations end anchor
“parentId” : The annotation’s parent annotation ID
“ordinal” : The annotation’s position amongst its peers
- “confidence”A rating of confidence in the label accuracy, from 0 (no
confidence) to 100 (absolute confidence / manually annotated)
- Parameters:
matchIds (list of str or list of dict) – A list of MatchId strings, or a list of match dictionaries
layerIds (list of str) – A list of layer IDs.
targetOffset (int) – The distance from the original target of the match, e.g. - 0 : find annotations of the match target itself - 1 : find annotations of the token immediately after match target - -1 : find annotations of the token immediately before match target
annotationsPerLayer (int) – The number of annotations on the given layer to retrieve. In most cases, there’s only one annotation available. However, tokens may, for example, be annotated with ‘all possible phonemic transcriptions’, in which case using a value of greater than 1 for this parameter provides other phonemic transcriptions, for tokens that have more than one.
- Returns:
An array of arrays of Annotations, of dimensions len(matchIds) x (len(layerIds) x annotationsPerLayer). The first index matches the corresponding index in matchIds.
- Return type:
list of list of dictionary
- getMatches(search, wordsContext=0, pageLength=None, pageNumber=None)¶
Gets a list of tokens that were matched by search(pattern)
The search parameter can be either
a threadId returned from a previous call to search() or
a dict representing a pattern to search for.
If it is a threadId, and the task is still running, then this function will wait for it to finish.
If it is a pattern dict, then search() is called for the given pattern, the matches are retrieved, and releaseTask() is called to free the search resources. Some example patterns are shown below; for more detailed information, see search().
Example:
## a single list representing a 'one column' search, ## and string values, representing regular expression pattern matching pattern = { "orthography" : "ps.*" } ## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)... pattern = [ { "orthography" : "the" }, { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" }, "frequency" : { "max" : "2" } } ]
This function returns a list of match dictionaries, where each item has the following entries:
“Title” : The title of the LaBB-CAT instance</dd>
“Version” : The current version of the LaBB-CAT instance</dd>
- “MatchId”An ID which encodes which token in which utterance by which
participant of which transcript matched.
“URL” : URL that opens the corresponding transcript page at the first matching word.
“Transcript” : The name of the transcript document that the match is from.
“Participant” : The name of the participant who uttered the match.
“Corpus” : The corpus the match comes from.
“Line” : The start time of the utterance.
“LineEnd” : The end time of the utterance.
“BeforeMatch” : The context before the match.
“Text” : The match text.
“AfterMatch” : The context after the match.
- Parameters:
search (str or dict) –
This can be either a threadId returned from a previous call to search() or a dict representing a pattern to search for.
wordsContext (int) – Number of words context to include in the <q>Before Match</q> and <q>After Match</q> columns in the results.
pageLength (int or None) – The maximum number of matches to return, or None to return all.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
- Returns:
A list of IDs that can be used to identify utterances/tokens that were matched by search(pattern), or None if the task was cancelled.
- Return type:
list of dict
- getMatchingAnnotations(expression, pageLength=None, pageNumber=None)¶
Gets a list of annotations that match a particular pattern.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
id == 'ew_0_456'
!/th[aeiou].//.test(label)
first('participant').label == 'Robert' && first('utterances').start.offset == 12.345
graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5
previous.id == 'ew_0_456'
NB all expressions must match by either id or layer.id. :param expression: An expression that determines which transcripts match. :type expression: str
- Parameters:
pageLength (int or None) – The maximum number of annotations to return, or null to return all.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
- Returns:
A list of matching Annotations.
- Return type:
list of dictionaries
- getMatchingParticipantIds(expression, pageLength=None, pageNumber=None)¶
Gets a list of IDs of participants that match a particular pattern.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
/Ada.+/.test(id)
labels('corpus').includes('CC')
labels('participant_languages').includes('en')
labels('transcript_language').includes('en')
!/Ada.+/.test(id) && first('corpus').label == 'CC'
all('transcript_rating').length < 2
all('participant_rating').length = 0
!annotators('transcript_rating').includes('labbcat')
first('participant_gender').label == 'NA'
The following functions can be used to generate an expression of common types:
Example:
qbParticipants = corpus.getMatchingParticipantIds( labbcat.expressionFromCorpora("QB"))
- Parameters:
expression (str) – An expression that determines which participants match.
pageLength (int or None) – The maximum number of IDs to return, or null to return all.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
- Returns:
A list of participant IDs.
- Return type:
list
- getMatchingTranscriptIds(expression, pageLength=None, pageNumber=None, order=None)¶
Gets a list of IDs of transcripts that match a particular pattern.
The results can be exhaustive, by omitting pageLength and pageNumber, or they can be a subset (a ‘page’) of results, by given pageLength and pageNumber values.
The order of the list can be specified. If ommitted, the transcripts are listed in ID order.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
/Ada.+/.test(id)
labels('participant').includes('Robert')
('CC', 'IA', 'MU').includes(first('corpus').label)
first('episode').label == 'Ada Aitcheson'
first('transcript_scribe').label == 'Robert'
first('participant_languages').label == 'en'
first('noise').label == 'bell'
labels('transcript_languages').includes('en')
labels('participant_languages').includes('en')
labels('noise').includes('bell')
all('transcript_languages').length gt; 1
all('participant_languages').length gt; 1
all('transcript').length gt; 100
annotators('transcript_rating').includes('Robert')
!/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')
The following functions can be used to generate an expression of common types:
Example:
quakeFaceTranscripts = corpus.getMatchingTranscriptIds( labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))
- Parameters:
expression (str) – An expression that determines which transcripts match.
pageLength (int or None) – The maximum number of IDs to return, or null to return all.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
order (str) – The ordering for the list of IDs, a string containing a comma-separated list of expressions, which may be appended by “ ASC” or “ DESC”, or null for transcript ID order.
- Returns:
A list of transcript IDs.
- Return type:
list of str
- getMedia(id, trackSuffix, mimeType, startOffset=None, endOffset=None, dir=None)¶
Downloads a given media track URL for a given transcript.
- Parameters:
id (str) – The transcript ID.
trackSuffix (str) – The track suffix of the media.
mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’
startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.
endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.
dir (str) – A directory in which the file should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
- Returns:
The file name of the resulting file. If dir is None, this file will be stored under the system’s temporary directory, so once processing is finished, it should be deleted by the caller, or moved to a more permanent location.
- Return type:
list of str
- getMediaTracks()¶
List the predefined media tracks available for transcripts.
- Returns:
An ordered list of media track definitions.
- Return type:
list of dictionaries
- getMediaUrl(id, trackSuffix, mimeType, startOffset=None, endOffset=None)¶
Gets a given media track URL for a given transcript.
- Parameters:
id (str) – The transcript ID.
trackSuffix (str) – The track suffix of the media.
mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’
startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.
endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.
- Returns:
A URL to the given media for the given transcript, or null if the given media doesn’t exist.
- Return type:
str
- getParticipant(id)¶
Gets the participant record specified by the given identifier.
- Parameters:
id (str) – The ID of the participant, which could be their name or their database annotation ID.
- Returns:
An annotation representing the participant, or null if the participant was not found.
- Return type:
dictionary
- getParticipantAttributes(participantIds, layerIds)¶
Gets participant attribute values.
Retrieves participant attribute values for given participant IDs, saves them to a CSV file, and returns the name of the file.
In general, participant attributes are layers whose ID is prefixed ‘participant’, however formally it’s any layer where layer.parentId == ‘participant’ and layer.alignment == 0.
The resulting file is the responsibility of the caller to delete when finished.
- Parameters:
participantIds (list of str.) – A list of participant IDs
layerIds (list of str.) – A list of layer IDs corresponding to participant attributes.
- Returns:
The name of a CSV file with one row per participant, and one column per attribute.
- Return type:
str
- getParticipantIds()¶
Gets a list of participant IDs.
- Returns:
A list of participant IDs.
- Return type:
list
- getSerializerDescriptors()¶
Lists the descriptors of all registered serializers.
Serializers are modules that export annotation structures as a specific file format, e.g. Praat TextGrid, plain text, etc., so the mimeType of descriptors reflects what mimeTypes can be specified for getFragments()
- Returns:
A list of the descriptors of all registered serializers.
- Return type:
list of dictionaries
- getSoundFragments(transcriptIds, startOffsets=None, endOffsets=None, sampleRate=None, dir=None, prefixNames=True)¶
Downloads WAV sound fragments.
The intervals to extract can be defined in two possible ways:
transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
- Parameters:
transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
sampleRate (int) – The desired sample rate, or null for no preference.
dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
- Returns:
A list of WAV files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location.
- Return type:
list of str
- getSystemAttribute(attribute)¶
Gets the value of the given system attribute.
- Parameters:
attribute (str) – Name of the attribute.
- Returns:
The value of the given attribute, or None if the attribute doesn’t exist.
- Return type:
str
- getTasks()¶
Gets a list of all tasks on the server.
- Returns:
A list of all task statuses.
- Return type:
list of dictionaries
- getTranscript(id, layerIds=None)¶
Gets a transcript given its ID.
The returned object defines the annotation graph structure, and is a dictionary whose entries include:
“id” : the transcript ID
“schema” : a representation of the layer structure of the graph
- “anchors”a dictionary of temporal anchors that represent the start and/or end
time of an annotation (keyed by anchor ID)
- “participant”a list of participants in the transcript. Each participant is
represented by a dictionary that includes a “turn” entry which is a list of speaker turns, each turn having an “utterance” entry contatainging utterance boundary annotations, and a “word” entry containing a list of word tokens.
entries for ‘spanning’ layers that are not assigned to a specific participant.
Annotations are presented by dictionaries that have the following entries:
“id” : the unique identifier for the annotation
“label” : the annotation layer
- “startId” and “endId”the start and end anchors, which correspond to an entry
in the “anchors” dictionary
- “confidence”label confidence rating, where 100 means it was labelled by a
human, and 50 means it was labelled by an automated process.
- Parameters:
id (str) – The given transcript ID.
layerIds (list of str) – The IDs of the layers to load, or null if only transcript data is required.
- Returns:
The identified transcript.
- Return type:
dictionary
- getTranscriptAttributes(expression, layerIds, csvFileName=None)¶
Get transcript attribute values.
Retrieves transcript attribute values for a given transcript expression, saves them to a CSV file, and returns the name of the file.
The expression parameter can be an explicit list of transcript IDs, or a string query expression that identifies which transcripts to return.
The expression language is loosely based on JavaScript; expressions such as the following can be used:
/Ada.+/.test(id)
labels('participant').includes('Robert')
('CC', 'IA', 'MU').includes(first('corpus').label)
first('episode').label == 'Ada Aitcheson'
first('transcript_scribe').label == 'Robert'
first('participant_languages').label == 'en'
first('noise').label == 'bell'
labels('transcript_languages').includes('en')
labels('participant_languages').includes('en')
labels('noise').includes('bell')
all('transcript_languages').length > 1
all('participant_languages').length y 1
all('word').length > 100
annotators('transcript_rating').includes('Robert')
!/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')
The following functions can be used to generate an expression of common types:
In general, transcript attributes are layers whose ID is prefixed ‘transcript’, however formally it’s any layer where layer.parentId == ‘graph’ and layer.alignment == 0, which includes ‘corpus’ as well as transcript attribute layers.
The resulting file is the responsibility of the caller to delete when finished.
Example:
# duration/word count of QB corpus transcripts qbAttributesCsv = corpus.getTranscriptAttributes( labbcat.expressionFromCorpora("QB"), ["transcript_duration", "transcript_word count"]) # speech rate for spontaneous speech recordings spontaneousSpeechRateCsv = corpus.getTranscriptAttributes( labbcat.expressionFromTranscriptTypes(["monologue", "interview"]), ["transcript_syllables per minute"]) # language for targeted transcripts languageCsv = corpus.getTranscriptAttributes( ["AP2505_Nelson.eaf", "AP2512_MattBlack.eaf"], "transcript_language") # tidily delete CSV files os.remove([qbAttributesCsv, spontaneousSpeechRateCsv, languageCsv])
- Parameters:
expression (str or list of str.) – An expression that determines which transcripts match, or an explicit list of transcript IDs.
layerIds (list of str.) – A list of layer IDs corresponding to transcript attributes.
csvFileName (str.) – The file to save the resulting CSV rows to.
- Returns:
The name of a CSV file with one row per transcript, and one column per attribute.
- Return type:
str
- getTranscriptIds()¶
Gets a list of transcript IDs.
- Returns:
A list of transcript IDs.
- Return type:
list
- getTranscriptIdsInCorpus(id)¶
Gets a list of transcript IDs in the given corpus.
- Parameters:
id (str) – A corpus ID.
- Returns:
A list of transcript IDs.
- Return type:
list
- getTranscriptIdsWithParticipant(id)¶
Gets a list of IDs of transcripts that include the given participant.
- Parameters:
id (str) – A participant ID.
- Returns:
A list of transcript IDs.
- Return type:
list of str
- getUserInfo()¶
Gets information about the current suer, including the roles or groups they are in.
- Returns:
The user record, including a “user” entry with the user ID, and a “roles” entry which is a list of str.
- Return type:
dict
- releaseTask(threadId)¶
Release a finished task, to free up server resources.
- Parameters:
threadId (str.) – The ID of the task.
- search(pattern, participantIds=None, transcriptTypes=None, mainParticipant=True, aligned=False, matchesPerTranscript=None, overlapThreshold=None)¶
Searches for tokens that match the given pattern.
Example:
pattern = {"columns":[{"layers":{"orthography":{"pattern":"the"}}}]}
Strictly speaking, pattern should be a dictionary that matches the structure of the search matrix in the browser interface of LaBB-CAT; i.e. a dictionary with with one entrye called “columns”, which is a list of dictionaries.
Each element in the “columns” list contains a dictionary with an entry named “layers”, whose value is a dictionary for patterns to match on each layer, and optionally an element named “adj”, whose value is a number representing the maximum distance, in tokens, between this column and the next column - if “adj” is not specified, the value defaults to 1, so tokens are contiguous.
Each element in the “layers” dictionary is named after the layer it matches, and the value is a dictionary with the following possible entries:
“pattern” : A regular expression to match against the label
“min” : An inclusive minimum numeric value for the label
“max” : An exclusive maximum numeric value for the label
“not” : True to negate the match
- “anchorStart”True to anchor to the start of the annotation on this layer
(i.e. the matching word token will be the first at/after the start of the matching annotation on this layer)
- “anchorEnd”True to anchor to the end of the annotation on this layer
(i.e. the matching word token will be the last before/at the end of the matching annotation on this layer)
- “target”True to make this layer the target of the search; the results will
contain one row for each match on the target layer
Some examples of valid pattern objects are shown below.
Example:
## words starting with 'ps...' pattern = {"columns":[{"layers":{"orthography":{"pattern":"ps.*"}}}]} ## the word 'the' followed immediately or with one intervening word by ## a hapax legomenon (word with a frequency of 1) that doesn't start with a vowel pattern = { "columns" : [ { "layers" : { "orthography" : { "pattern" : "the" } } "adj" : 2 }, { "layers" : { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" }, "frequency" : { max : "2" } } } ] }
For ease of use, the function will also accept the following abbreviated forms; some examples are shown below.
Example:
## a single list representing a 'one column' search, ## and string values, representing regular expression pattern matching pattern = { "orthography" : "ps.*" } ## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)... pattern = [ { "orthography" : "the" }, { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" }, "frequency" : { "max" : "2" } } ]
- Parameters:
pattern – A dict representing the pattern to search for, which mirrors the Search Matrix in the browser interface.
participantIds – An optional list of participant IDs to search the utterances of. If null, all utterances in the corpus will be searched.
transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.
mainParticipant – true to search only main-participant utterances, false to search all utterances.
aligned – true to include only words that are aligned (i.e. have anchor confidence ≥ 50, false to search include un-aligned words as well.
matchesPerTranscript – Optional maximum number of matches per transcript to return. None means all matches.
overlapThreshold – Optional percentage overlap with other utterances before simultaneous speech is excluded. None means include all overlapping utterances.
- Returns:
The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.
- Return type:
str
- taskResults(threadId, dir=None)¶
Gets the results of the given task, as a file or list of files.
Some tasks produce a file for download when they’re finished (e.g. getFragmentsAsync()) so this function provides acces to this results file. If the results are compressed into a zip file, this function automatically unpacks the contained files.
- Parameters:
threadId (str.) – The ID of the task.
dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
- Returns:
A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. If the task has no results (yet) this function returns None.
- Return type:
list of str
- taskStatus(threadId)¶
Gets the current state of the given task.
- Parameters:
threadId (str.) – The ID of the task.
- Returns:
The status of the task.
- Return type:
dictionary
- versionInfo()¶
Gets version information of all components of LaBB-CAT.
Version information includes versions of all components and modules installed on the LaBB-CAT server, including format converters and annotator modules.
- Returns:
A dictionary of sections, each section a dictionary of modules indicating the version of that module.
- Return type:
dict
- waitForTask(threadId, maxSeconds=0)¶
Wait for the given task to finish.
- Parameters:
threadId (str) – The task ID.
maxSeconds (int) – The maximum time to wait for the task, or 0 for forever.
- Returns:
The final task status. To determine whether the task finished or waiting timed out, check result.running, which will be false if the task finished.
- Return type:
dict
LabbcatEdit class¶
- class labbcat.LabbcatEdit(labbcatUrl, username=None, password=None)¶
API for querying and updating a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs
This class inherits the read-only operations of LabbcatView and adds some write operations for updating data, i.e. those that can be performed by users with “edit” permission.
Constructor arguments:
- Parameters:
labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
username (str or None) – The username for logging in to the server, if necessary.
password (str or None) – The password for logging in to the server, if necessary.
- addDictionaryEntry(managerId, dictionaryId, key, entry)¶
Adds an entry to a dictionary.
This function adds a new entry to the given dictionary. Words can have multiple entries.
- Parameters:
managerId (str) –
The layer manager ID of the dictionary, as returned by getDictionaries()
dictionaryId (str) –
The ID of the dictionary, as returned by getDictionaries().
key (str) – The key (word) in the dictionary to add an entry for.
entry (str) – The value (definition) for the given key.
- Returns:
None if the entry was added, or an error message if not.
- Return type:
str or None
- addLayerDictionaryEntry(layerId, key, entry)¶
Adds an entry to a layer dictionary.
This function adds a new entry to the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.
- Parameters:
layerId (str) – The ID of the layer with a dictionary configured to manage it.
key (str) – The key (word) in the dictionary to add an entry for.
entry (str) – The value (definition) for the given key.
- Returns:
None if the entry was added, or an error message if not.
- Return type:
str or None
- annotatorExt(annotatorId, resource, parameters=None)¶
Retrieve annotator’s “ext” resource.
Retrieve a given resource from an annotator’s “ext” web app. Annotators are modules that perform different annotation tasks, and can optionally implement functionality for providing extra data or extending functionality in an annotator-specific way. If the annotator implements an “ext” web app, it can provide resources and implement a mechanism for iterrogating the annotator. This function provides a mechanism for accessing these resources via python.
Details about the resources available for a given annotator are available by calling getAnnotatorDescriptor() and checking “hasExtWebapp” attribute to ensure an ‘ext’ webapp is implemented, and checking details the “extApiInfo” attribute.
- Parameters:
annotatorId (str) – ID of the annotator to interrogate.
resource (str) – The name of the file to retrieve or instance method (function) to invoke. Possible values for this depend on the specific annotator being interrogated.
parameters (str) – Optional list of ordered parameters for the instance method (function).
- Returns:
The resource requested.
- Return type:
str
- deleteMedia(id, fileName)¶
Delete a given media or episode document file.
- Parameters:
id (str) – The ID transcript whose media will be deleted.
fileName (str) – The media file name, e.g. mediaFile[‘name’].
- deleteParticipant(id)¶
Deletes the given participant, and all associated meta-data.
- Parameters:
id (str) – The ID participant to delete.
- deleteTranscript(id)¶
Deletes the given transcript, and all associated files.
- Parameters:
id (str) – The ID transcript to delete.
- generateLayerUtterances(matchIds, layerId, collectionName=None)¶
Generates a layer for a given set of utterances.
This function generates annotations on a given layer for a given set of utterances, e.g. force-align selected utterances of a participant.
- Parameters:
matchIds – A list of annotation IDs, e.g. the MatchId column, or the URL column, of a results set.
layerId (str) – The ID of the layer to generate.
- Returns:
The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().
- Return type:
str
- getAnnotatorDescriptor(annotatorId)¶
Gets annotator information.
Retrieve information about an annotator. Annotators are modules that perform different annotation tasks. This function provides information about a given annotator, for example the currently installed version of the module, what configuration parameters it requires, etc.
The retuned dictionary contains the following entries:
“annotatorId” - The annotators’s unique ID
“version” - The currently install version of the annotator.
“info” - HTML-encoded description of the function of the annotator.
“infoText” - A plain text version of $info (converted automatically).
“hasConfigWebapp” - Determines whether the annotator includes a web-app for installation or general configuration.
“configParameterInfo” - An HTML-encoded definition of the installation config parameters, including a list of all parameters, and the encoding of the parameter string.
“hasTaskWebapp” - Determines whether the annotator includes a web-app for task parameter configuration.
“taskParameterInfo” - An HTML-encoded definition of the task parameters, including a list of all parameters, and the encoding of the parameter string.
“hasExtWebapp” - Determines whether the annotator includes an extras web-app which implements functionality for providing extra data or extending functionality in an annotator-specific way.
“extApiInfo” - An HTML-encoded document containing information about what endpoints are published by the ext web-app.
- Parameters:
annotatorId (str) – ID of the annotator module.
- Returns:
The annotator info.
- Return type:
dictionary of str
- newTranscript(transcript, media, trackSuffix, transcriptType, corpus, episode)¶
Uploads a new transcript.
- Parameters:
transcript (str) – The path to the transcript to upload.
media (str) – The path to media to upload, if any.
trackSuffix (str) – The track suffix for the media, which can be None.
transcriptType – The transcript type.
type – str
corpus (str) – The corpus for the transcript.
episode (str) – The episode the transcript belongs to.
- Returns:
A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().
- Return type:
dictionary of str
- removeDictionaryEntry(managerId, dictionaryId, key, entry=None)¶
Removes an entry from a dictionary.
This function removes an existing entry from the given dictionary. Words can have multiple entries.
- Parameters:
managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries
dictionaryId (str) –
The ID of the dictionary, as returned by getDictionaries().
key (str) – The key (word) in the dictionary to remove an entry for.
entry (str) – The value (definition) to remove, or None to remove all the entries for key.
- Returns:
None if the entry was removed, or an error message if not.
- Return type:
str or None
- removeLayerDictionaryEntry(layerId, key, entry=None)¶
Removes an entry from a layer dictionary.
This function removes an existing entry from the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.
- Parameters:
layerId (str) – The ID of the layer with a dictionary configured to manage it.
key (str) – The key (word) in the dictionary to remove an entry for.
entry (str) – The value (definition) to remove, or None to remove all the entries for key.
- Returns:
None if the entry was removed, or an error message if not.
- Return type:
str or None
- saveEpisodeDocument(id, document)¶
Saves the given media for the given transcript.
- Parameters:
id (str) – The transcript ID.
media (str) – The path to the document to upload.
- Returns:
A dictionary of attributes of the document file (name, url, etc.).
- Return type:
dictionary of str
- saveMedia(id, media, trackSuffix)¶
Saves the given media for the given transcript.
- Parameters:
id (str) – The transcript ID.
media (str) – The path to media to upload.
trackSuffix (str) – The track suffix for the media.
- Returns:
A dictionary of attributes of the media file (name, url, etc.).
- Return type:
dictionary of str
- saveParticipant(id, label, attributes)¶
- Saves a participant, and all its tags, to the graph store.
To change the ID of an existing participant, pass the old/current ID as the id, and pass the new ID as the label. If the participant ID does not already exist in the database, a new participant record is created.
- Parameters:
id (str) – The ID participant to update.
label (str) – The new ID (name) for the participant.
attributes (dictionary of str) – Participant attribute values - the names are the participant attribute layer IDs, and the values are the corresponding new attribute values. The pass phrase for participant access can also be set by specifying a “_password” attribute.
- Returns:
True if the participant was updated, False if there were no changes to update.
- Return type:
boolean
- updateFragment(fragment)¶
Update a transcript fragment.
This function uploads a file (e.g. Praat TextGrid) representing a fragment of a transcript, with annotations or alignments to update in LaBB-CAT’s version of the transcript.
- Parameters:
fragment (str) – The path to the fragment to upload.
- Returns:
A dictionary with information about the fragment that was updated, including URL, start_time, and end_time
- Return type:
dictionary of str
- updateTranscript(transcript, suppressGeneration=False)¶
Uploads a new version of an existing transcript.
- Parameters:
transcript (str) – The path to the transcript to upload.
suppressGeneration (boolean) – False (the default) to run automatic layer generation, True to suppress automatic layer generation.
- Returns:
A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().
- Return type:
dictionary of str
The LabbcatEdit class inherits from the LabbcatView class.
LabbcatAdmin class¶
- class labbcat.LabbcatAdmin(labbcatUrl, username=None, password=None)¶
API for querying, updating, and administering a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs
This class inherits the read-write operations of GraphStore and adds some administration operations, including definition of layers, registration of converters, etc., i.e. those that can be performed by users with “admin” permission.
Constructor arguments:
- Parameters:
labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
username (str or None) – The username for logging in to the server, if necessary.
password (str or None) – The password for logging in to the server, if necessary.
- createCategory(class_id, category, description, display_order)¶
Creates a new category record.
The dictionary returned has the following entries:
“class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
“category” : The name/id of the category.
“description” : The description of the category.
“display_order” : Where the category appears among other categories..
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
category (str) – The name/id of the category.
description (str) – The description of the category.
display_order (number) – Where the category appears among other categories.
- Returns:
A copy of the category record
- Return type:
dict
- createCorpus(corpus_name, corpus_language, corpus_description)¶
Creates a new corpus record.
The dictionary returned has the following entries:
“corpus_id” : The database key for the record.
“corpus_name” : The name/id of the corpus.
“corpus_language” : The ISO 639-1 code for the default language.
“corpus_description” : The description of the corpus.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
corpus_name (str) – The name/id of the corpus.
corpus_language (str) – The ISO 639-1 code for the default language.
corpus_description (str) – The description of the corpus.
- Returns:
A copy of the corpus record
- Return type:
dict
- createMediaTrack(suffix, description, display_order)¶
Creates a new media track record.
The dictionary returned has the following entries:
“suffix” : The suffix associated with the media track.
“description” : The description of the media track.
“display_order” : The position of the track amongst other tracks.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
suffix (str) – The suffix associated with the media track.
description (str) – The description of the media track.
display_order (str) – The position of the track amongst other tracks.
- Returns:
A copy of the media track record
- Return type:
dict
- createProject(project, description)¶
Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use createCategory instead.
- Parameters:
project (str) – The name/id of the project.
description (str) – The description of the project.
- Returns:
A copy of the project record
- Return type:
dict
- createRole(role_id, description)¶
Creates a new role record.
The dictionary returned has the following entries:
“role_id” : The name/id of the role.
“description” : The description of the role.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
role_id (str) – The name/id of the role.
description (str) – The description of the role.
- Returns:
A copy of the role record
- Return type:
dict
- createRolePermission(role_id, entity, layer, value_pattern)¶
Creates a new role permission record.
The dictionary returned has the following entries:
“role_id” : The ID of the role this permission applies to.
“entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
“layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
- “value_pattern”Regular expression for matching against the layerId label. If
the regular expression matches the label, access is allowed.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
role_id (str) – The ID of the role this permission applies to.
entity (str) – The media entity this permission applies to.
layer (str) – ID of the layer for which the label determines access.
value_pattern (str) – Regular expression for matching against.
- Returns:
A copy of the role permission record
- Return type:
dict
- createUser(user, email, resetPassword, roles)¶
Creates a new user record.
The dictionary returned has the following entries:
“user” : The id of the user.
“email” : The email address of the user.
“resetPassword” : Whether the user must reset their password when they next log in.
“roles” : Roles or groups the user belongs to.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
user (str) – The ID of the user.
email (str) – The email address of the user.
resetPassword (boolean) – Whether the user must reset their password when they next log in.
roles (list of str) – Roles or groups the user belongs to.
- Returns:
A copy of the user record
- Return type:
dict
- deleteCategory(class_id, category)¶
Deletes an existing category record.
- Parameters:
class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
category (str) – The name/id of the category.
- deleteCorpus(corpus_name)¶
Deletes an existing corpus record.
- Parameters:
corpus_name (str) – The name/id of the corpus.
- deleteLayer(id)¶
Deletes a layer.
- Parameters:
id (str) – The layer ID
- deleteLexicon(lexicon)¶
Delete a previously loaded lexicon.
By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file).
- Parameters:
lexicon (str) – The name of the lexicon to delete. e.g. ‘cmudict’
- Returns:
None if the deletion was successful, or an error message if not.
- Return type:
str or None
- deleteMediaTrack(suffix)¶
Deletes an existing media track record.
- Parameters:
suffix (str) – The suffix associated with the media track.
- deleteProject(project)¶
Deletes an existing project record.
- Parameters:
project (str) – The name/id of the project.
- deleteRole(role_id)¶
Deletes an existing role record.
- Parameters:
role_id (str) – The name/id of the role.
- deleteRolePermission(role_id, entity)¶
Deletes an existing role permission record.
- Parameters:
role_id (str) – The ID of the role this permission applies to.
entity (str) – The media entity this permission applies to.
- deleteUser(user)¶
Deletes an existing user record.
- Parameters:
user (str) – The ID of the user.
- generateLayer(layerId)¶
Generates a layer.
This function generates annotations on a given layer for all transcripts in the corpus.
- Parameters:
layerId (str) – The ID of the layer to generate.
- Returns:
The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().
- Return type:
str
- loadLexicon(file, lexicon, fieldDelimiter, fieldNames, quote=None, comment=None, skipFirstLine=False)¶
Upload a flat lexicon file for lexical tagging.
By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file). The file must have a ‘flat’ structure in the sense that it’s a simple list of dictionary entries with a fixed number of columns/fields, rather than having a complex structure.
- Parameters:
file – The full path name of the lexicon file.
lexicon (str) – The name for the resulting lexicon. If the named lexicon already exists, it will be completely replaced with the contents of the file (i.e. all existing entries will be deleted befor adding new entries from the file). e.g. ‘cmudict’
fieldDelimiter (str) – The character used to delimit fields in the file. If this is “ - “, rows are split on only the first space, in line with common dictionary formats. e.g. ‘,’ for Comma Separated Values (CSV) files.
fieldNames (str) – A list of field names, delimited by fieldDelimiter, e.g. ‘Word,Pronunciation’.
quote (str) – The character used to quote field values (if any), e.g. ‘”’.
comment (str) – The character used to indicate a line is a comment (not an entry) (if any) e.g. ‘#’.
skipFirstLine (boolean) – Whether to ignore the first line of the file (because it contains field names).
- Returns:
None if the upload was successful, or an error message if not.
- Return type:
str or None
- newLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels={}, category=None, annotatorId=None, annotatorTaskParameters=None)¶
Saves changes to a layer.
- Parameters:
id (str) – The layer ID
parentId (str) – The layer’s parent layer id.
description (str) – The description of the layer.
alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.
peers (boolean) – Whether children on this layer have peers or not.
peersOverlap (boolean) – Whether child peers on this layer can overlap or not.
parentIncludes (boolean) – Whether the parent temporally includes the child.
saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).
type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.
validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).
category (str) – Category for the layer, if any.
annotatorId (str) – The ID of the layer manager that automatically fills in annotations on the layer, if any
annotatorTaskParameters (str) – The configuration the layer manager should use when filling the layer with annotations. This is a string whose format is specific to each layer manager.
- Returns:
The resulting layer definition.
- Return type:
dict
- readCategories(class_id, pageNumber=None, pageLength=None)¶
Reads a list of category records.
The dictionaries in the returned list have the following entries:
“class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
“category” : The name/id of the category.
“description” : The description of the category.
“display_order” : Where the category appears among other categories..
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of category records.
- Return type:
list of dict
- readCorpora(pageNumber=None, pageLength=None)¶
Reads a list of corpus records.
The dictionaries in the returned list have the following entries:
“corpus_id” : The database key for the record.
“corpus_name” : The name/id of the corpus.
“corpus_language” : The ISO 639-1 code for the default language.
“corpus_description” : The description of the corpus.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of corpus records.
- Return type:
list of dict
- readMediaTracks(pageNumber=None, pageLength=None)¶
Reads a list of media track records.
The dictionaries in the returned list have the following entries:
“suffix” : The suffix associated with the media track.
“description” : The description of the media track.
“display_order” : The position of the track amongst other tracks.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of media track records.
- Return type:
list of dict
- readProjects(pageNumber=None, pageLength=None)¶
Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use readCategory(‘layer’) instead.
- Parameters:
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of project records.
- Return type:
list of dict
- readRolePermissions(role_id, pageNumber=None, pageLength=None)¶
Reads a list of role permission records.
The dictionaries in the returned list have the following entries:
“role_id” : The ID of the role this permission applies to.
“entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
“layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
- “value_pattern”Regular expression for matching against the layerId label. If
the regular expression matches the label, access is allowed.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
role_id (str) – The ID of the role this permission applies to.
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of role permission records.
- Return type:
list of dict
- readRoles(pageNumber=None, pageLength=None)¶
Reads a list of role records.
The dictionaries in the returned list have the following entries:
“role_id” : The name/id of the role.
“description” : The description of the role.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of role records.
- Return type:
list of dict
- readSystemAttributes()¶
Reads a list of system attribute records.
The dictionaries in the returned list have the following entries:
“attribute” : ID of the attribute.
“type” : The type of the attribute - “string”, “boolean”, “select”, etc.
“style” : UI style, which depends on “type”.
“label” : User-facing label for the attribute.
“description” : User-facing (long) description for the attribute.
“options” : If ‘type” == “select”, this is a dict defining possible values.
“value” : The value of the attribute.
- Returns:
A list of system attribute records.
- Return type:
list of dict
- readUsers(pageNumber=None, pageLength=None)¶
Reads a list of user records.
The dictionaries in the returned list have the following entries:
“user” : The id of the user.
“email” : The email address of the user.
“resetPassword” : Whether the user must reset their password when they next log in.
“roles” : Roles or groups the user belongs to.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
pageLength (int or None) – The maximum number of records to return, or null to return all.
- Returns:
A list of user records.
- Return type:
list of dict
- saveLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels, category)¶
Saves changes to a layer.
- Parameters:
id (str) – The layer ID
parentId (str) – The layer’s parent layer id.
description (str) – The description of the layer.
alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.
peers (boolean) – Whether children on this layer have peers or not.
peersOverlap (boolean) – Whether child peers on this layer can overlap or not.
parentIncludes (boolean) – Whether the parent temporally includes the child.
saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).
type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.
validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).
category (str) – Category for the layer, if any.
- Returns:
The resulting layer definition.
- Return type:
dict
- setPassword(user, password, resetPassword)¶
Sets a given user’s password.
- Parameters:
user (str) – The ID of the user.
password – The new password.
resetPassword (boolean) – Whether the user must reset their password when they next log in.
- updateCategory(class_id, category, description, display_order)¶
Updates an existing category record.
The dictionary returned has the following entries:
“class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
“category” : The name/id of the category.
“description” : The description of the category.
“display_order” : Where the category appears among other categories..
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
category (str) – The name/id of the category.
description (str) – The description of the category.
display_order (number) – Where the category appears among other categories.
- Returns:
A copy of the category record
- Return type:
dict
- updateCorpus(corpus_name, corpus_language, corpus_description)¶
Updates an existing corpus record.
The dictionary returned has the following entries:
“corpus_id” : The database key for the record.
“corpus_name” : The name/id of the corpus.
“corpus_language” : The ISO 639-1 code for the default language.
“corpus_description” : The description of the corpus.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
corpus_name (str) – The name/id of the corpus.
corpus_language (str) – The ISO 639-1 code for the default language.
corpus_description (str) – The description of the corpus.
- Returns:
A copy of the corpus record
- Return type:
dict
- updateMediaTrack(suffix, description, display_order)¶
Updates an existing media track record.
The dictionary returned has the following entries:
“suffix” : The suffix associated with the media track.
“description” : The description of the media track.
“display_order” : The position of the track amongst other tracks.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
suffix (str) – The suffix assocaited with the media track.
description (str) – The description of the media track.
display_order (str) – The position of the track amongst other tracks.
- Returns:
A copy of the media track record
- Return type:
dict
- updateProject(project, description)¶
Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use updateCategory instead.
- Parameters:
project (str) – The name/id of the project.
description (str) – The description of the project.
- Returns:
A copy of the project record
- Return type:
dict
- updateRole(role_id, description)¶
Updates an existing role record.
The dictionary returned has the following entries:
“role_id” : The name/id of the role.
“description” : The description of the role.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
role_id (str) – The name/id of the role.
description (str) – The description of the role.
- Returns:
A copy of the role record
- Return type:
dict
- updateRolePermission(role_id, entity, layer, value_pattern)¶
Updates an existing role permission record.
The dictionary returned has the following entries:
“role_id” : The ID of the role this permission applies to.
“entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
“layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
- “value_pattern”Regular expression for matching against the layerId label. If
the regular expression matches the label, access is allowed.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
role_id (str) – The ID of the role this permission applies to.
entity (str) – The media entity this permission applies to.
layer (str) – ID of the layer for which the label determines access.
value_pattern (str) – Regular expression for matching against.
- Returns:
A copy of the role permission record
- Return type:
dict
- updateSystemAttribute(attribute, value)¶
Updates the value of a existing system attribute record.
The dictionary returned has the following entries:
“attribute” : ID of the attribute.
“type” : The type of the attribute - “string”, “boolean”, “select”, etc.
“style” : UI style, which depends on “type”.
“label” : User-facing label for the attribute.
“description” : User-facing (long) description for the attribute.
“options” : If ‘type” == “select”, this is a dict defining possible values.
“value” : The value of the attribute.
- Parameters:
attribut – ID of the attribute.
value (str) – The new value for the attribute.
- Returns:
A copy of the systemAttribute record
- Return type:
dict
- updateUser(user, email, resetPassword, roles)¶
Updates an existing user record.
The dictionary returned has the following entries:
“user” : The id of the user.
“email” : The email address of the user.
“resetPassword” : Whether the user must reset their password when they next log in.
“roles” : Roles or groups the user belongs to.
“_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
- Parameters:
user (str) – The ID of the user.
email (str) – The email address of the user.
resetPassword (boolean) – Whether the user must reset their password when they next log in.
roles (list of str) – Roles or groups the user belongs to.
- Returns:
A copy of the user record
- Return type:
dict
The LabbcatAdmin class also inherits the LabbcatEdit class.
Query Language Generation Functions¶
- labbcat.expressionFromAttributeValue(attribute, values, negate=False)¶
Generates a query expression for matching a transcript/participant attribute.
This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list ofpossible values for a given transcript/participant attribute.
The attribute defined by ‘attribute’ is expected to have exactly one value. If it may have multiple values, use expressionFromAttributeValues() instead.
- Parameters:
attribute (str) – The transcript/participant attribute to filter by.
values (list or str) – A list of possible values for attribute, or a single value.
negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
- Returns:
A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()
- Return type:
str
- labbcat.expressionFromAttributeValues(attribute, values, negate=False)¶
Generates a query expression for matching a transcript/participant attribute.
This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list of possible values for a given transcript/participant attribute.
The attribute defined by ‘attribute’ is expected to have possibly more than one value. If it can only have one value, use expressionFromAttributeValue() instead.
- Parameters:
attribute (str) – The transcript/participant attribute to filter by.
values (list or str) – A list of possible values for attribute, or a single value.
negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
- Returns:
A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()
- Return type:
str
- labbcat.expressionFromIds(ids, negate=False)¶
Generates a query expression for matching transcripts or participants by ID.
This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes etc. using a list of IDs.
- Parameters:
ids (list or str) – A list of IDs, or a single value.
negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
- Returns:
A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()
- Return type:
str
- labbcat.expressionFromTranscriptTypes(transcriptTypes, negate=False)¶
Generates a transcript query expression for matching transcripts by type.
This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.
- Parameters:
transcriptTypes (list or str) – A list of transcript types, or a single transcript type.
negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
- Returns:
A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes()
- Return type:
str
- labbcat.expressionFromCorpora(corpora, negate=False)¶
Generates a transcript query expression for matching transcripts/participants by corpus.
This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.
- Parameters:
corpora (list or str) – A list of corpus names, or a single corpus name.
negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
- Returns:
A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes() etc.
- Return type:
str
The LabbcatAdmin class also inherits the LabbcatEdit class.
ResponseException class¶
- class labbcat.ResponseException(response)¶
Any method that creates a server request can raise this exception if an error occurs.
This has one attribute,
response
, which is a Response object representing the full response from the server, from which error messages etc. can be obtained.
- class labbcat.Response(resp, verbose=False)¶
Standard LaBB-CAT response object.
- Attributes:
model
- The model or result returned if any.httpStatus
- The HTTP status code, or -1 if not known.title
- The title reqturned by the server.version
- The server version.code
- The numeric request code (0 or 1 means no error).errors
- Errors returned.messages
- Messages returned.text
- The full plain text of the HTTP response.
- checkForErrors()¶
Convenience method for checking whether the response any errors.
If so, a corresponding ResponseException will be thrown.