nzilbb-labbcat module

LabbcatView class

class labbcat.LabbcatView(labbcatUrl, username=None, password=None)

API for querying a LaBB–CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This interface provides only read-only operations, i.e. those that can be performed by users with “view” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
  • username (str or None) – The username for logging in to the server, if necessary.
  • password (str or None) – The password for logging in to the server, if necessary.
Attributes:
language: The language code for server message localization, e.g. “es-AR”

Example:

import labbcat

# create annotation store client
corpus = labbcat.LabbcatView("https://labbcat.canterbury.ac.nz", "demo", "demo")

# show some basic information

print("Information about LaBB-CAT at " + corpus.getId())

layerIds = corpus.getLayerIds()
for layerId in layerIds: 
    print("layer: " + layerId) 

corpora = corpus.getCorpusIds()
for c in corpora:
    print("transcripts in: " + c)
    for transcript in corpus.getTranscriptIdsInCorpus(c):
        print(" " + transcript)
allUtterances(participantIds, transcriptTypes=None, mainParticipant=True)

Identifies all utterances by the given participants.

A taskId is returned. To get the actual utterances, which are represented the same way as search results, call getMatches()

Parameters:
  • participantIds – A list of participant IDs to identify the utterances of.
  • transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.
  • mainParticipant – true to search only main-participant utterances, false to search all utterances.
Returns:

The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.

Return type:

str

cancelTask(threadId)

Cancels (but does not release) a running task.

Parameters:threadId (str.) – The ID of the task.
countAnnotations(id, layerId, maxOrdinal=None)

Gets the number of annotations on the given layer of the given transcript.

Parameters:
  • id (str) – The ID of the transcript.
  • layerId (str) – The ID of the layer.
  • maxOrdinal (int or None) – The maximum ordinal for the counted annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are counted, regardless of their ordinal.
Returns:

A (possibly empty) array of annotations.

Return type:

int

countMatchingAnnotations(expression)

Counts the number of annotations that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • id == 'ew_0_456'
  • !/th[aeiou].//.test(label)
  • first('participant').label == 'Robert' && first('utterances').start.offset == 12.345
  • graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5
  • previous.id == 'ew_0_456'

NB all expressions must match by either id or layer.id.

Parameters:expression (str) – An expression that determines which annotations match.
Returns:The number of matching annotations.
Return type:int
countMatchingParticipantIds(expression)

Counts the number of participants that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)
  • labels('corpus').includes('CC')
  • labels('participant_languages').includes('en')
  • labels('transcript_language').includes('en')
  • !/Ada.+/.test(id) && first('corpus').label == 'CC'
  • all('transcript_rating').length < 2
  • all('participant_rating').length = 0
  • !annotators('transcript_rating').includes('labbcat')
  • first('participant_gender').label == 'NA'

The following functions can be used to generate an expression of common types:

Example:

numQbParticipants = corpus.countMatchingParticipantIds(
    labbcat.expressionFromCorpora("QB"))            
Parameters:expression (str) – An expression that determines which participants match.
Returns:The number of matching participants.
Return type:int
countMatchingTranscriptIds(expression)

Counts the number of transcripts that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)
  • labels('participant').includes('Robert')
  • ('CC', 'IA', 'MU').includes(first('corpus').label)
  • first('episode').label == 'Ada Aitcheson'
  • first('transcript_scribe').label == 'Robert'
  • first('participant_languages').label == 'en'
  • first('noise').label == 'bell'
  • labels('transcript_languages').includes('en')
  • labels('participant_languages').includes('en')
  • labels('noise').includes('bell')
  • all('transcript_languages').length gt; 1
  • all('participant_languages').length gt; 1
  • all('transcript').length gt; 100
  • annotators('transcript_rating').includes('Robert')
  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

Example:

numQuakeFaceTranscripts = corpus.countMatchingTranscriptIds(
    labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))            
Parameters:expression (str) – An expression that determines which transcripts match.
Returns:The number of matching transcripts.
Return type:int
formatTranscript(id, layerIds, mimeType, dir=None)

Get transcript in a specified format.

Parameters:
  • id (str) – The ID of the transcript to export.
  • layerIds (list of str) – A list of IDs of annotation layers to include in the transcript.
  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
  • dir (str) – A directory in which the file(s) should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each transcript, this is not guaranteed; some formats generate a mutiple files per transcript.

Return type:

list of str

getAnchors(id, anchorIds)

Gets the given anchors in the given transcript.

Parameters:
  • id (str) – The ID of the transcript.
  • anchorIds (list of str) – A list of anchor IDs.
Returns:

A (possibly empty) list of anchors.

Return type:

list of dictionaries

getAnnotations(id, layerId, maxOrdinal=None, pageLength=None, pageNumber=None)

Gets the annotations on the given layer of the given transcript.

Parameters:
  • id (str) – The ID of the transcript.
  • layerId – The ID of the layer.
  • maxOrdinal (int or None) – The maximum ordinal for the returned annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are returned, regardless of their ordinal.
  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
Returns:

A (possibly empty) list of annotations.

Return type:

list of dictionaries

getAvailableMedia(id)

List the media available for the given transcript.

Parameters:id (str) – The transcript ID.
Returns:List of media files available for the given transcript.
Return type:list of dictionaries
getCorpusIds()

Gets a list of corpus IDs.

Returns:A list of corpus IDs.
Return type:list
getDeserializerDescriptors()

Lists the descriptors of all registered serializers.

Deserializers are modules that import annotation structures from a specific file format, e.g. Praat TextGrid, plain text, etc.

Returns:A list of the descriptors of all registered serializers.
Return type:list of dictionaries
getDictionaries()

List the dictionaries available.

Returns:A dictionary of lists, where keys are layer manager IDs, each of which containing a list of IDs for dictionaries that the layer manager makes available.
Return type:dict of lists
getDictionaryEntries(managerId, dictionaryId, keys)

Lookup entries in a dictionary.

Parameters:
  • managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries()).
  • dictionaryId

    The ID of the dictionary, as returned by getDictionaries()).

  • keys (list of str or list of dict) – A list of keys (words) identifying entries to look up.
Returns:

A dictionary of lists, where keys are given keys, each of which containing a list of entries. Keys with no corresponding entry in the given dictionary will be present in the returned result, but will have no entries.

Return type:

dict of lists

getEpisodeDocuments(id)

Get a list of documents associated with the episode of the given transcript.

Parameters:id (str) – The transcript ID.
Returns:List of URLs to documents.
Return type:list of str
getFragments(transcriptIds, layerIds, mimeType, dir=None, startOffsets=None, endOffsets=None, prefixNames=True)

Get transcript fragments in a specified format.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
  • layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.
  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each interval, this is not guaranteed; some formats generate a single file or a fixed collection of files regardless of how many fragments there are.

Return type:

list of str

getFragmentsAsync(transcriptIds, layerIds, mimeType, startOffsets=None, endOffsets=None, prefixNames=True)

Starts a server task for getting transcript fragments in a specified format.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
  • layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.
  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.
  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
Returns:

The threadId of the resulting task, which can be passed in to taskStatus(), waitForTask() taskResults() releaseTask(), etc.

Return type:

str

getId()

Gets the store’s ID.

Returns:The annotation store’s ID.
Return type:str
getLayer(id)

Gets a layer definition.

Parameters:id (str) – ID of the layer to get the definition for.
Returns:The definition of the given layer.
Return type:dictionary
getLayerIds()

Gets a list of layer IDs (annotation ‘types’).

Returns:A list of layer IDs.
Return type:list
getLayers()

Gets a list of layer definitions.

Returns:A list of layer definitions.
Return type:list of dictionaries
getMatchAnnotations(matchIds, layerIds, targetOffset=0, annotationsPerLayer=1)

Gets annotations on selected layers related to search results returned by a previous call to getMatches(threadId).

The returned list of lists contains dictionaries that represent individual annotations, with the following entries:

  • “id” : The annotation’s unique ID
  • “layerId” : The layer the annotation comes from
  • “label” : The annotation’s label or value
  • “startId” : The ID of the annotations start anchor
  • “endId” : The ID of the annotations end anchor
  • “parentId” : The annotation’s parent annotation ID
  • “ordinal” : The annotation’s position amongst its peers
  • “confidence” : A rating of confidence in the label accuracy, from 0 (no
    confidence) to 100 (absolute confidence / manually annotated)
Parameters:
  • matchIds (list of str or list of dict) – A list of MatchId strings, or a list of match dictionaries
  • layerIds (list of str) – A list of layer IDs.
  • targetOffset (int) – The distance from the original target of the match, e.g. - 0 : find annotations of the match target itself - 1 : find annotations of the token immediately after match target - -1 : find annotations of the token immediately before match target
  • annotationsPerLayer (int) – The number of annotations on the given layer to retrieve. In most cases, there’s only one annotation available. However, tokens may, for example, be annotated with ‘all possible phonemic transcriptions’, in which case using a value of greater than 1 for this parameter provides other phonemic transcriptions, for tokens that have more than one.
Returns:

An array of arrays of Annotations, of dimensions len(matchIds) x (len(layerIds) x annotationsPerLayer). The first index matches the corresponding index in matchIds.

Return type:

list of list of dictionary

getMatches(search, wordsContext=0, pageLength=None, pageNumber=None)

Gets a list of tokens that were matched by search(pattern)

The search parameter can be either

  • a threadId returned from a previous call to search() or
  • a dict representing a pattern to search for.

If it is a threadId, and the task is still running, then this function will wait for it to finish.

If it is a pattern dict, then search() is called for the given pattern, the matches are retrieved, and releaseTask() is called to free the search resources. Some example patterns are shown below; for more detailed information, see search().

Example:

## a single list representing a 'one column' search, 
## and string values, representing regular expression pattern matching
pattern = { "orthography" : "ps.*" }

## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)...
pattern = [
  { "orthography" : "the" },
  { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
    "frequency" : { "max" : "2" } } ]

This function returns a list of match dictionaries, where each item has the following entries:

  • “MatchId” : An ID whichencodes which token in which utterance by which
    participant of which transcript matched.
  • “Transcript” : The name of the transcript document that the match is from.
  • “Participant” : The name of the participant who uttered the match.
  • “Corpus” : The corpus the match comes from.
  • “Line” : The start time of the utterance.
  • “LineEnd” : The end time of the utterance.
  • “BeforeMatch” : The context before the match.
  • “Text” : The match text.
  • “AfterMatch” : The context after the match.
Parameters:
  • search (str or dict) –

    This can be either a threadId returned from a previous call to search() or a dict representing a pattern to search for.

  • wordsContext (int) – Number of words context to include in the <q>Before Match</q> and <q>After Match</q> columns in the results.
  • pageLength (int or None) – The maximum number of matches to return, or None to return all.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
Returns:

A list of IDs that can be used to identify utterances/tokens that were matched by search(pattern), or None if the task was cancelled.

Return type:

list of dict

getMatchingAnnotations(expression, pageLength=None, pageNumber=None)

Gets a list of annotations that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • id == 'ew_0_456'
  • !/th[aeiou].&#47;/.test(label)
  • first('participant').label == 'Robert' && first('utterances').start.offset == 12.345
  • graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5
  • previous.id == 'ew_0_456'

NB all expressions must match by either id or layer.id. :param expression: An expression that determines which transcripts match. :type expression: str

Parameters:
  • pageLength (int or None) – The maximum number of annotations to return, or null to return all.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
Returns:

A list of matching Annotations.

Return type:

list of dictionaries

getMatchingParticipantIds(expression, pageLength=None, pageNumber=None)

Gets a list of IDs of participants that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)
  • labels('corpus').includes('CC')
  • labels('participant_languages').includes('en')
  • labels('transcript_language').includes('en')
  • !/Ada.+/.test(id) && first('corpus').label == 'CC'
  • all('transcript_rating').length < 2
  • all('participant_rating').length = 0
  • !annotators('transcript_rating').includes('labbcat')
  • first('participant_gender').label == 'NA'

The following functions can be used to generate an expression of common types:

Example:

qbParticipants = corpus.getMatchingParticipantIds(
    labbcat.expressionFromCorpora("QB"))            
Parameters:
  • expression (str) – An expression that determines which participants match.
  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
Returns:

A list of participant IDs.

Return type:

list

getMatchingTranscriptIds(expression, pageLength=None, pageNumber=None, order=None)

Gets a list of IDs of transcripts that match a particular pattern.

The results can be exhaustive, by omitting pageLength and pageNumber, or they can be a subset (a ‘page’) of results, by given pageLength and pageNumber values.

The order of the list can be specified. If ommitted, the transcripts are listed in ID order.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)
  • labels('participant').includes('Robert')
  • ('CC', 'IA', 'MU').includes(first('corpus').label)
  • first('episode').label == 'Ada Aitcheson'
  • first('transcript_scribe').label == 'Robert'
  • first('participant_languages').label == 'en'
  • first('noise').label == 'bell'
  • labels('transcript_languages').includes('en')
  • labels('participant_languages').includes('en')
  • labels('noise').includes('bell')
  • all('transcript_languages').length gt; 1
  • all('participant_languages').length gt; 1
  • all('transcript').length gt; 100
  • annotators('transcript_rating').includes('Robert')
  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

Example:

quakeFaceTranscripts = corpus.getMatchingTranscriptIds(
    labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))            
Parameters:
  • expression (str) – An expression that determines which transcripts match.
  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • order (str) – The ordering for the list of IDs, a string containing a comma-separated list of expressions, which may be appended by ” ASC” or ” DESC”, or null for transcript ID order.
Returns:

A list of transcript IDs.

Return type:

list of str

getMedia(id, trackSuffix, mimeType, startOffset=None, endOffset=None, dir=None)

Downloads a given media track URL for a given transcript.

Parameters:
  • id (str) – The transcript ID.
  • trackSuffix (str) – The track suffix of the media.
  • mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’
  • startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.
  • endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.
  • dir (str) – A directory in which the file should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
Returns:

The file name of the resulting file. If dir is None, this file will be stored under the system’s temporary directory, so once processing is finished, it should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getMediaTracks()

List the predefined media tracks available for transcripts.

Returns:An ordered list of media track definitions.
Return type:list of dictionaries
getMediaUrl(id, trackSuffix, mimeType, startOffset=None, endOffset=None)

Gets a given media track URL for a given transcript.

Parameters:
  • id (str) – The transcript ID.
  • trackSuffix (str) – The track suffix of the media.
  • mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’
  • startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.
  • endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.
Returns:

A URL to the given media for the given transcript, or null if the given media doesn’t exist.

Return type:

str

getParticipant(id)

Gets the participant record specified by the given identifier.

Parameters:id (str) – The ID of the participant, which could be their name or their database annotation ID.
Returns:An annotation representing the participant, or null if the participant was not found.
Return type:dictionary
getParticipantAttributes(participantIds, layerIds)

Gets participant attribute values.

Retrieves participant attribute values for given participant IDs, saves them to a CSV file, and returns the name of the file.

In general, participant attributes are layers whose ID is prefixed ‘participant’, however formally it’s any layer where layer.parentId == ‘participant’ and layer.alignment == 0.

The resulting file is the responsibility of the caller to delete when finished.

Parameters:
  • participantIds (list of str.) – A list of participant IDs
  • layerIds (list of str.) – A list of layer IDs corresponding to participant attributes.
Returns:

The name of a CSV file with one row per participant, and one column per attribute.

Return type:

str

getParticipantIds()

Gets a list of participant IDs.

Returns:A list of participant IDs.
Return type:list
getSerializerDescriptors()

Lists the descriptors of all registered serializers.

Serializers are modules that export annotation structures as a specific file format, e.g. Praat TextGrid, plain text, etc., so the mimeType of descriptors reflects what mimeTypes can be specified for getFragments()

Returns:A list of the descriptors of all registered serializers.
Return type:list of dictionaries
getSoundFragments(transcriptIds, startOffsets=None, endOffsets=None, sampleRate=None, dir=None, prefixNames=True)

Downloads WAV sound fragments.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats
  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None
Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).
  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.
  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.
  • sampleRate (int) – The desired sample rate, or null for no preference.
  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.
Returns:

A list of WAV files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getSystemAttribute(attribute)

Gets the value of the given system attribute.

Parameters:attribute (str) – Name of the attribute.
Returns:The value of the given attribute, or None if the attribute doesn’t exist.
Return type:str
getTasks()

Gets a list of all tasks on the server.

Returns:A list of all task statuses.
Return type:list of dictionaries
getTranscript(id, layerIds=None)

Gets a transcript given its ID.

The returned object defines the annotation graph structure, and is a dictionary whose entries include:

  • “id” : the transcript ID
  • “schema” : a representation of the layer structure of the graph
  • “anchors” : a dictionary of temporal anchors that represent the start and/or end
    time of an annotation (keyed by anchor ID)
  • “participant” : a list of participants in the transcript. Each participant is
    represented by a dictionary that includes a “turn” entry which is a list of speaker turns, each turn having an “utterance” entry contatainging utterance boundary annotations, and a “word” entry containing a list of word tokens.
  • entries for ‘spanning’ layers that are not assigned to a specific participant.

Annotations are presented by dictionaries that have the following entries:

  • “id” : the unique identifier for the annotation
  • “label” : the annotation layer
  • “startId” and “endId” : the start and end anchors, which correspond to an entry
    in the “anchors” dictionary
  • “confidence” : label confidence rating, where 100 means it was labelled by a
    human, and 50 means it was labelled by an automated process.
Parameters:
  • id (str) – The given transcript ID.
  • layerIds (list of str) – The IDs of the layers to load, or null if only transcript data is required.
Returns:

The identified transcript.

Return type:

dictionary

getTranscriptAttributes(expression, layerIds, csvFileName=None)

Get transcript attribute values.

Retrieves transcript attribute values for a given transcript expression, saves them to a CSV file, and returns the name of the file.

The expression parameter can be an explicit list of transcript IDs, or a string query expression that identifies which transcripts to return.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)
  • labels('participant').includes('Robert')
  • ('CC', 'IA', 'MU').includes(first('corpus').label)
  • first('episode').label == 'Ada Aitcheson'
  • first('transcript_scribe').label == 'Robert'
  • first('participant_languages').label == 'en'
  • first('noise').label == 'bell'
  • labels('transcript_languages').includes('en')
  • labels('participant_languages').includes('en')
  • labels('noise').includes('bell')
  • all('transcript_languages').length &gt; 1
  • all('participant_languages').length y 1
  • all('word').length &gt; 100
  • annotators('transcript_rating').includes('Robert')
  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

In general, transcript attributes are layers whose ID is prefixed ‘transcript’, however formally it’s any layer where layer.parentId == ‘graph’ and layer.alignment == 0, which includes ‘corpus’ as well as transcript attribute layers.

The resulting file is the responsibility of the caller to delete when finished.

Example:

# duration/word count of QB corpus transcripts
qbAttributesCsv = corpus.getTranscriptAttributes(
    labbcat.expressionFromCorpora("QB"),
    ["transcript_duration", "transcript_word count"])            

# speech rate for spontaneous speech recordings
spontaneousSpeechRateCsv = corpus.getTranscriptAttributes(
    labbcat.expressionFromTranscriptTypes(["monologue", "interview"]),
    ["transcript_syllables per minute"])

# language for targeted transcripts
languageCsv = corpus.getTranscriptAttributes(
    ["AP2505_Nelson.eaf", "AP2512_MattBlack.eaf"],
    "transcript_language")

# tidily delete CSV files
os.remove([qbAttributesCsv, spontaneousSpeechRateCsv, languageCsv])
Parameters:
  • expression (str or list of str.) – An expression that determines which transcripts match, or an explicit list of transcript IDs.
  • layerIds (list of str.) – A list of layer IDs corresponding to transcript attributes.
  • csvFileName (str.) – The file to save the resulting CSV rows to.
Returns:

The name of a CSV file with one row per transcript, and one column per attribute.

Return type:

str

getTranscriptIds()

Gets a list of transcript IDs.

Returns:A list of transcript IDs.
Return type:list
getTranscriptIdsInCorpus(id)

Gets a list of transcript IDs in the given corpus.

Parameters:id (str) – A corpus ID.
Returns:A list of transcript IDs.
Return type:list
getTranscriptIdsWithParticipant(id)

Gets a list of IDs of transcripts that include the given participant.

Parameters:id (str) – A participant ID.
Returns:A list of transcript IDs.
Return type:list of str
getUserInfo()

Gets information about the current suer, including the roles or groups they are in.

Returns:The user record, including a “user” entry with the user ID, and a “roles” entry which is a list of str.
Return type:dict
releaseTask(threadId)

Release a finished task, to free up server resources.

Parameters:threadId (str.) – The ID of the task.
search(pattern, participantIds=None, transcriptTypes=None, mainParticipant=True, aligned=False, matchesPerTranscript=None, overlapThreshold=None)

Searches for tokens that match the given pattern.

Example:

pattern = {"columns":[{"layers":{"orthography":{"pattern":"the"}}}]}

Strictly speaking, pattern should be a dictionary that matches the structure of the search matrix in the browser interface of LaBB-CAT; i.e. a dictionary with with one entrye called “columns”, which is a list of dictionaries.

Each element in the “columns” list contains a dictionary with an entry named “layers”, whose value is a dictionary for patterns to match on each layer, and optionally an element named “adj”, whose value is a number representing the maximum distance, in tokens, between this column and the next column - if “adj” is not specified, the value defaults to 1, so tokens are contiguous.

Each element in the “layers” dictionary is named after the layer it matches, and the value is a dictionary with the following possible entries:

  • “pattern” : A regular expression to match against the label
  • “min” : An inclusive minimum numeric value for the label
  • “max” : An exclusive maximum numeric value for the label
  • “not” : True to negate the match
  • “anchorStart” : True to anchor to the start of the annotation on this layer
    (i.e. the matching word token will be the first at/after the start of the matching annotation on this layer)
  • “anchorEnd” : True to anchor to the end of the annotation on this layer
    (i.e. the matching word token will be the last before/at the end of the matching annotation on this layer)
  • “target” : True to make this layer the target of the search; the results will
    contain one row for each match on the target layer

Some examples of valid pattern objects are shown below.

Example:

## words starting with 'ps...'
pattern = {"columns":[{"layers":{"orthography":{"pattern":"ps.*"}}}]}

## the word 'the' followed immediately or with one intervening word by
## a hapax legomenon (word with a frequency of 1) that doesn't start with a vowel
pattern = { "columns" : [
  { "layers" : {
      "orthography" : { "pattern" : "the" } }
    "adj" : 2 },
  { "layers" : {
      "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
      "frequency" : { max : "2" } } } ] }

For ease of use, the function will also accept the following abbreviated forms; some examples are shown below.

Example:

## a single list representing a 'one column' search, 
## and string values, representing regular expression pattern matching
pattern = { "orthography" : "ps.*" }

## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)...
pattern = [
  { "orthography" : "the" },
  { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
    "frequency" : { "max" : "2" } } ]
Parameters:
  • pattern – A dict representing the pattern to search for, which mirrors the Search Matrix in the browser interface.
  • participantIds – An optional list of participant IDs to search the utterances of. If null, all utterances in the corpus will be searched.
  • transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.
  • mainParticipant – true to search only main-participant utterances, false to search all utterances.
  • aligned – true to include only words that are aligned (i.e. have anchor confidence &ge; 50, false to search include un-aligned words as well.
  • matchesPerTranscript – Optional maximum number of matches per transcript to return. None means all matches.
  • overlapThreshold – Optional percentage overlap with other utterances before simultaneous speech is excluded. None means include all overlapping utterances.
Returns:

The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.

Return type:

str

taskResults(threadId, dir=None)

Gets the results of the given task, as a file or list of files.

Some tasks produce a file for download when they’re finished (e.g. getFragmentsAsync()) so this function provides acces to this results file. If the results are compressed into a zip file, this function automatically unpacks the contained files.

Parameters:
  • threadId (str.) – The ID of the task.
  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.
Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. If the task has no results (yet) this function returns None.

Return type:

list of str

taskStatus(threadId)

Gets the current state of the given task.

Parameters:threadId (str.) – The ID of the task.
Returns:The status of the task.
Return type:dictionary
versionInfo()

Gets version information of all components of LaBB-CAT.

Version information includes versions of all components and modules installed on the LaBB-CAT server, including format converters and annotator modules.

Returns:A dictionary of sections, each section a dictionary of modules indicating the version of that module.
Return type:dict
waitForTask(threadId, maxSeconds=0)

Wait for the given task to finish.

Parameters:
  • threadId (str) – The task ID.
  • maxSeconds (int) – The maximum time to wait for the task, or 0 for forever.
Returns:

The final task status. To determine whether the task finished or waiting timed out, check result.running, which will be false if the task finished.

Return type:

dict

LabbcatEdit class

class labbcat.LabbcatEdit(labbcatUrl, username=None, password=None)

API for querying and updating a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This class inherits the read-only operations of LabbcatView and adds some write operations for updating data, i.e. those that can be performed by users with “edit” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
  • username (str or None) – The username for logging in to the server, if necessary.
  • password (str or None) – The password for logging in to the server, if necessary.
addDictionaryEntry(managerId, dictionaryId, key, entry)

Adds an entry to a dictionary.

This function adds a new entry to the given dictionary. Words can have multiple entries.

Parameters:
  • managerId (str) –

    The layer manager ID of the dictionary, as returned by getDictionaries()

  • dictionaryId (str) –

    The ID of the dictionary, as returned by getDictionaries().

  • key (str) – The key (word) in the dictionary to add an entry for.
  • entry (str) – The value (definition) for the given key.
Returns:

None if the entry was added, or an error message if not.

Return type:

str or None

addLayerDictionaryEntry(layerId, key, entry)

Adds an entry to a layer dictionary.

This function adds a new entry to the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.

Parameters:
  • layerId (str) – The ID of the layer with a dictionary configured to manage it.
  • key (str) – The key (word) in the dictionary to add an entry for.
  • entry (str) – The value (definition) for the given key.
Returns:

None if the entry was added, or an error message if not.

Return type:

str or None

annotatorExt(annotatorId, resource, parameters=None)

Retrieve annotator’s “ext” resource.

Retrieve a given resource from an annotator’s “ext” web app. Annotators are modules that perform different annotation tasks, and can optionally implement functionality for providing extra data or extending functionality in an annotator-specific way. If the annotator implements an “ext” web app, it can provide resources and implement a mechanism for iterrogating the annotator. This function provides a mechanism for accessing these resources via python.

Details about the resources available for a given annotator are available by calling getAnnotatorDescriptor() and checking “hasExtWebapp” attribute to ensure an ‘ext’ webapp is implemented, and checking details the “extApiInfo” attribute.

Parameters:
  • annotatorId (str) – ID of the annotator to interrogate.
  • resource (str) – The name of the file to retrieve or instance method (function) to invoke. Possible values for this depend on the specific annotator being interrogated.
  • parameters (str) – Optional list of ordered parameters for the instance method (function).
Returns:

The resource requested.

Return type:

str

deleteParticipant(id)

Deletes the given participant, and all associated meta-data.

Parameters:id (str) – The ID participant to delete.
deleteTranscript(id)

Deletes the given transcript, and all associated files.

Parameters:id (str) – The ID transcript to delete.
generateLayerUtterances(matchIds, layerId, collectionName=None)

Generates a layer for a given set of utterances.

This function generates annotations on a given layer for a given set of utterances, e.g. force-align selected utterances of a participant.

Parameters:
  • matchIds – A list of annotation IDs, e.g. the MatchId column, or the URL column, of a results set.
  • layerId (str) – The ID of the layer to generate.
Returns:

The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().

Return type:

str

getAnnotatorDescriptor(annotatorId)

Gets annotator information.

Retrieve information about an annotator. Annotators are modules that perform different annotation tasks. This function provides information about a given annotator, for example the currently installed version of the module, what configuration parameters it requires, etc.

The retuned dictionary contains the following entries:

  • “annotatorId” - The annotators’s unique ID
  • “version” - The currently install version of the annotator.
  • “info” - HTML-encoded description of the function of the annotator.
  • “infoText” - A plain text version of $info (converted automatically).
  • “hasConfigWebapp” - Determines whether the annotator includes a web-app for installation or general configuration.
  • “configParameterInfo” - An HTML-encoded definition of the installation config parameters, including a list of all parameters, and the encoding of the parameter string.
  • “hasTaskWebapp” - Determines whether the annotator includes a web-app for task parameter configuration.
  • “taskParameterInfo” - An HTML-encoded definition of the task parameters, including a list of all parameters, and the encoding of the parameter string.
  • “hasExtWebapp” - Determines whether the annotator includes an extras web-app which implements functionality for providing extra data or extending functionality in an annotator-specific way.
  • “extApiInfo” - An HTML-encoded document containing information about what endpoints are published by the ext web-app.
Parameters:annotatorId (str) – ID of the annotator module.
Returns:The annotator info.
Return type:dictionary of str
newTranscript(transcript, media, mediaSuffix, transcriptType, corpus, episode)

Uploads a new transcript.

Parameters:
  • transcript (str) – The path to the transcript to upload.
  • media (str) – The path to media to upload, if any.
  • mediaSuffix (str) – The media suffix for the media.
  • transcriptType – The transcript type.
  • type – str
  • corpus (str) – The corpus for the transcript.
  • episode (str) – The episode the transcript belongs to.
Returns:

A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().

Return type:

dictionary of str

removeDictionaryEntry(managerId, dictionaryId, key, entry=None)

Removes an entry from a dictionary.

This function removes an existing entry from the given dictionary. Words can have multiple entries.

Parameters:
  • managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries
  • dictionaryId (str) –

    The ID of the dictionary, as returned by getDictionaries().

  • key (str) – The key (word) in the dictionary to remove an entry for.
  • entry (str) – The value (definition) to remove, or None to remove all the entries for key.
Returns:

None if the entry was removed, or an error message if not.

Return type:

str or None

removeLayerDictionaryEntry(layerId, key, entry=None)

Removes an entry from a layer dictionary.

This function removes an existing entry from the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.

Parameters:
  • layerId (str) – The ID of the layer with a dictionary configured to manage it.
  • key (str) – The key (word) in the dictionary to remove an entry for.
  • entry (str) – The value (definition) to remove, or None to remove all the entries for key.
Returns:

None if the entry was removed, or an error message if not.

Return type:

str or None

saveParticipant(id, label, attributes)
Saves a participant, and all its tags, to the graph store.
To change the ID of an existing participant, pass the old/current ID as the id, and pass the new ID as the label. If the participant ID does not already exist in the database, a new participant record is created.
Parameters:
  • id (str) – The ID participant to delete.
  • label (str) – The new ID (name) for the participant.
  • attributes (dictionary of str) – Participant attribute values - the names are the participant attribute layer IDs, and the values are the corresponding new attribute values. The pass phrase for participant access can also be set by specifying a “_password” attribute.
Returns:

True if the participant was updated, False if there were no changes to update.

Return type:

boolean

updateFragment(fragment)

Update a transcript fragment.

This function uploads a file (e.g. Praat TextGrid) representing a fragment of a transcript, with annotations or alignments to update in LaBB-CAT’s version of the transcript.

Parameters:fragment (str) – The path to the fragment to upload.
Returns:A dictionary with information about the fragment that was updated, including URL, start_time, and end_time
Return type:dictionary of str
updateTranscript(transcript, suppressGeneration=False)

Uploads a new version of an existing transcript.

Parameters:
  • transcript (str) – The path to the transcript to upload.
  • suppressGeneration (boolean) – False (the default) to run automatic layer generation, True to suppress automatic layer generation.
Returns:

A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().

Return type:

dictionary of str

The LabbcatEdit class inherits from the LabbcatView class.

LabbcatAdmin class

class labbcat.LabbcatAdmin(labbcatUrl, username=None, password=None)

API for querying, updating, and administering a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This class inherits the read-write operations of GraphStore and adds some administration operations, including definition of layers, registration of converters, etc., i.e. those that can be performed by users with “admin” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.
  • username (str or None) – The username for logging in to the server, if necessary.
  • password (str or None) – The password for logging in to the server, if necessary.
createCategory(class_id, category, description, display_order)

Creates a new category record.

The dictionary returned has the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
  • “category” : The name/id of the category.
  • “description” : The description of the category.
  • “display_order” : Where the category appears among other categories..
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
  • category (str) – The name/id of the category.
  • description (str) – The description of the category.
  • display_order (number) – Where the category appears among other categories.
Returns:

A copy of the category record

Return type:

dict

createCorpus(corpus_name, corpus_language, corpus_description)

Creates a new corpus record.

The dictionary returned has the following entries:

  • “corpus_id” : The database key for the record.
  • “corpus_name” : The name/id of the corpus.
  • “corpus_language” : The ISO 639-1 code for the default language.
  • “corpus_description” : The description of the corpus.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • corpus_name (str) – The name/id of the corpus.
  • corpus_language (str) – The ISO 639-1 code for the default language.
  • corpus_description (str) – The description of the corpus.
Returns:

A copy of the corpus record

Return type:

dict

createMediaTrack(suffix, description, display_order)

Creates a new media track record.

The dictionary returned has the following entries:

  • “suffix” : The suffix associated with the media track.
  • “description” : The description of the media track.
  • “display_order” : The position of the track amongst other tracks.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • suffix (str) – The suffix associated with the media track.
  • description (str) – The description of the media track.
  • display_order (str) – The position of the track amongst other tracks.
Returns:

A copy of the media track record

Return type:

dict

createProject(project, description)

Creates a new project record.

The dictionary returned has the following entries:

  • “project_id” : The database key for the record.
  • “project” : The name/id of the project.
  • “description” : The description of the project.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • project (str) – The name/id of the project.
  • description (str) – The description of the project.
Returns:

A copy of the project record

Return type:

dict

createRole(role_id, description)

Creates a new role record.

The dictionary returned has the following entries:

  • “role_id” : The name/id of the role.
  • “description” : The description of the role.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • role_id (str) – The name/id of the role.
  • description (str) – The description of the role.
Returns:

A copy of the role record

Return type:

dict

createRolePermission(role_id, entity, layer, value_pattern)

Creates a new role permission record.

The dictionary returned has the following entries:

  • “role_id” : The ID of the role this permission applies to.
  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
  • “value_pattern” : Regular expression for matching against the layerId label. If
    the regular expression matches the label, access is allowed.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • role_id (str) – The ID of the role this permission applies to.
  • entity (str) – The media entity this permission applies to.
  • layer (str) – ID of the layer for which the label determines access.
  • value_pattern (str) – Regular expression for matching against.
Returns:

A copy of the role permission record

Return type:

dict

createUser(user, email, resetPassword, roles)

Creates a new user record.

The dictionary returned has the following entries:

  • “user” : The id of the user.
  • “email” : The email address of the user.
  • “resetPassword” : Whether the user must reset their password when they next log in.
  • “roles” : Roles or groups the user belongs to.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • user (str) – The ID of the user.
  • email (str) – The email address of the user.
  • resetPassword (boolean) – Whether the user must reset their password when they next log in.
  • roles (list of str) – Roles or groups the user belongs to.
Returns:

A copy of the user record

Return type:

dict

deleteCategory(class_id, category)

Deletes an existing category record.

Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
  • category (str) – The name/id of the category.
deleteCorpus(corpus_name)

Deletes an existing corpus record.

Parameters:corpus_name (str) – The name/id of the corpus.
deleteLayer(id)

Deletes a layer.

Parameters:id (str) – The layer ID
deleteLexicon(lexicon)

Delete a previously loaded lexicon.

By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file).

Parameters:lexicon (str) – The name of the lexicon to delete. e.g. ‘cmudict’
Returns:None if the deletion was successful, or an error message if not.
Return type:str or None
deleteMediaTrack(suffix)

Deletes an existing media track record.

Parameters:suffix (str) – The suffix associated with the media track.
deleteProject(project)

Deletes an existing project record.

Parameters:project (str) – The name/id of the project.
deleteRole(role_id)

Deletes an existing role record.

Parameters:role_id (str) – The name/id of the role.
deleteRolePermission(role_id, entity)

Deletes an existing role permission record.

Parameters:
  • role_id (str) – The ID of the role this permission applies to.
  • entity (str) – The media entity this permission applies to.
deleteUser(user)

Deletes an existing user record.

Parameters:user (str) – The ID of the user.
generateLayer(layerId)

Generates a layer.

This function generates annotations on a given layer for all transcripts in the corpus.

Parameters:layerId (str) – The ID of the layer to generate.
Returns:The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().
Return type:str
loadLexicon(file, lexicon, fieldDelimiter, fieldNames, quote=None, comment=None, skipFirstLine=False)

Upload a flat lexicon file for lexical tagging.

By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file). The file must have a ‘flat’ structure in the sense that it’s a simple list of dictionary entries with a fixed number of columns/fields, rather than having a complex structure.

Parameters:
  • file – The full path name of the lexicon file.
  • lexicon (str) – The name for the resulting lexicon. If the named lexicon already exists, it will be completely replaced with the contents of the file (i.e. all existing entries will be deleted befor adding new entries from the file). e.g. ‘cmudict’
  • fieldDelimiter (str) – The character used to delimit fields in the file. If this is ” - “, rows are split on only the first space, in line with common dictionary formats. e.g. ‘,’ for Comma Separated Values (CSV) files.
  • fieldNames (str) – A list of field names, delimited by fieldDelimiter, e.g. ‘Word,Pronunciation’.
  • quote (str) – The character used to quote field values (if any), e.g. ‘”’.
  • comment (str) – The character used to indicate a line is a comment (not an entry) (if any) e.g. ‘#’.
  • skipFirstLine (boolean) – Whether to ignore the first line of the file (because it contains field names).
Returns:

None if the upload was successful, or an error message if not.

Return type:

str or None

newLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels={}, category=None, annotatorId=None, annotatorTaskParameters=None)

Saves changes to a layer.

Parameters:
  • id (str) – The layer ID
  • parentId (str) – The layer’s parent layer id.
  • description (str) – The description of the layer.
  • alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.
  • peers (boolean) – Whether children on this layer have peers or not.
  • peersOverlap (boolean) – Whether child peers on this layer can overlap or not.
  • parentIncludes (boolean) – Whether the parent temporally includes the child.
  • saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).
  • type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.
  • validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).
  • category (str) – Category for the layer, if any.
  • annotatorId (str) – The ID of the layer manager that automatically fills in annotations on the layer, if any
  • annotatorTaskParameters (str) – The configuration the layer manager should use when filling the layer with annotations. This is a string whose format is specific to each layer manager.
Returns:

The resulting layer definition.

Return type:

dict

readCategories(class_id, pageNumber=None, pageLength=None)

Reads a list of category records.

The dictionaries in the returned list have the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
  • “category” : The name/id of the category.
  • “description” : The description of the category.
  • “display_order” : Where the category appears among other categories..
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of category records.

Return type:

list of dict

readCorpora(pageNumber=None, pageLength=None)

Reads a list of corpus records.

The dictionaries in the returned list have the following entries:

  • “corpus_id” : The database key for the record.
  • “corpus_name” : The name/id of the corpus.
  • “corpus_language” : The ISO 639-1 code for the default language.
  • “corpus_description” : The description of the corpus.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of corpus records.

Return type:

list of dict

readMediaTracks(pageNumber=None, pageLength=None)

Reads a list of media track records.

The dictionaries in the returned list have the following entries:

  • “suffix” : The suffix associated with the media track.
  • “description” : The description of the media track.
  • “display_order” : The position of the track amongst other tracks.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of media track records.

Return type:

list of dict

readProjects(pageNumber=None, pageLength=None)

Reads a list of project records.

The dictionaries in the returned list have the following entries:

  • “project_id” : The database key for the record.
  • “project” : The name/id of the project.
  • “description” : The description of the project.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of project records.

Return type:

list of dict

readRolePermissions(role_id, pageNumber=None, pageLength=None)

Reads a list of role permission records.

The dictionaries in the returned list have the following entries:

  • “role_id” : The ID of the role this permission applies to.
  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
  • “value_pattern” : Regular expression for matching against the layerId label. If
    the regular expression matches the label, access is allowed.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • role_id (str) – The ID of the role this permission applies to.
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of role permission records.

Return type:

list of dict

readRoles(pageNumber=None, pageLength=None)

Reads a list of role records.

The dictionaries in the returned list have the following entries:

  • “role_id” : The name/id of the role.
  • “description” : The description of the role.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of role records.

Return type:

list of dict

readSystemAttributes()

Reads a list of system attribute records.

The dictionaries in the returned list have the following entries:

  • “attribute” : ID of the attribute.
  • “type” : The type of the attribute - “string”, “boolean”, “select”, etc.
  • “style” : UI style, which depends on “type”.
  • “label” : User-facing label for the attribute.
  • “description” : User-facing (long) description for the attribute.
  • “options” : If ‘type” == “select”, this is a dict defining possible values.
  • “value” : The value of the attribute.
Returns:A list of system attribute records.
Return type:list of dict
readUsers(pageNumber=None, pageLength=None)

Reads a list of user records.

The dictionaries in the returned list have the following entries:

  • “user” : The id of the user.
  • “email” : The email address of the user.
  • “resetPassword” : Whether the user must reset their password when they next log in.
  • “roles” : Roles or groups the user belongs to.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.
  • pageLength (int or None) – The maximum number of records to return, or null to return all.
Returns:

A list of user records.

Return type:

list of dict

saveLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels, category)

Saves changes to a layer.

Parameters:
  • id (str) – The layer ID
  • parentId (str) – The layer’s parent layer id.
  • description (str) – The description of the layer.
  • alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.
  • peers (boolean) – Whether children on this layer have peers or not.
  • peersOverlap (boolean) – Whether child peers on this layer can overlap or not.
  • parentIncludes (boolean) – Whether the parent temporally includes the child.
  • saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).
  • type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.
  • validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).
  • category (str) – Category for the layer, if any.
Returns:

The resulting layer definition.

Return type:

dict

setPassword(user, password, resetPassword)

Sets a given user’s password.

Parameters:
  • user (str) – The ID of the user.
  • password – The new password.
  • resetPassword (boolean) – Whether the user must reset their password when they next log in.
updateCategory(class_id, category, description, display_order)

Updates an existing category record.

The dictionary returned has the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.
  • “category” : The name/id of the category.
  • “description” : The description of the category.
  • “display_order” : Where the category appears among other categories..
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.
  • category (str) – The name/id of the category.
  • description (str) – The description of the category.
  • display_order (number) – Where the category appears among other categories.
Returns:

A copy of the category record

Return type:

dict

updateCorpus(corpus_name, corpus_language, corpus_description)

Updates an existing corpus record.

The dictionary returned has the following entries:

  • “corpus_id” : The database key for the record.
  • “corpus_name” : The name/id of the corpus.
  • “corpus_language” : The ISO 639-1 code for the default language.
  • “corpus_description” : The description of the corpus.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • corpus_name (str) – The name/id of the corpus.
  • corpus_language (str) – The ISO 639-1 code for the default language.
  • corpus_description (str) – The description of the corpus.
Returns:

A copy of the corpus record

Return type:

dict

updateMediaTrack(suffix, description, display_order)

Updates an existing media track record.

The dictionary returned has the following entries:

  • “suffix” : The suffix associated with the media track.
  • “description” : The description of the media track.
  • “display_order” : The position of the track amongst other tracks.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • suffix (str) – The suffix assocaited with the media track.
  • description (str) – The description of the media track.
  • display_order (str) – The position of the track amongst other tracks.
Returns:

A copy of the media track record

Return type:

dict

updateProject(project, description)

Updates an existing project record.

The dictionary returned has the following entries:

  • “project_id” : The database key for the record.
  • “project” : The name/id of the project.
  • “description” : The description of the project.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • project (str) – The name/id of the project.
  • description (str) – The description of the project.
Returns:

A copy of the project record

Return type:

dict

updateRole(role_id, description)

Updates an existing role record.

The dictionary returned has the following entries:

  • “role_id” : The name/id of the role.
  • “description” : The description of the role.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • role_id (str) – The name/id of the role.
  • description (str) – The description of the role.
Returns:

A copy of the role record

Return type:

dict

updateRolePermission(role_id, entity, layer, value_pattern)

Updates an existing role permission record.

The dictionary returned has the following entries:

  • “role_id” : The ID of the role this permission applies to.
  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).
  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.
  • “value_pattern” : Regular expression for matching against the layerId label. If
    the regular expression matches the label, access is allowed.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • role_id (str) – The ID of the role this permission applies to.
  • entity (str) – The media entity this permission applies to.
  • layer (str) – ID of the layer for which the label determines access.
  • value_pattern (str) – Regular expression for matching against.
Returns:

A copy of the role permission record

Return type:

dict

updateSystemAttribute(attribute, value)

Updates the value of a existing system attribute record.

The dictionary returned has the following entries:

  • “attribute” : ID of the attribute.
  • “type” : The type of the attribute - “string”, “boolean”, “select”, etc.
  • “style” : UI style, which depends on “type”.
  • “label” : User-facing label for the attribute.
  • “description” : User-facing (long) description for the attribute.
  • “options” : If ‘type” == “select”, this is a dict defining possible values.
  • “value” : The value of the attribute.
Parameters:
  • attribut – ID of the attribute.
  • value (str) – The new value for the attribute.
Returns:

A copy of the systemAttribute record

Return type:

dict

updateUser(user, email, resetPassword, roles)

Updates an existing user record.

The dictionary returned has the following entries:

  • “user” : The id of the user.
  • “email” : The email address of the user.
  • “resetPassword” : Whether the user must reset their password when they next log in.
  • “roles” : Roles or groups the user belongs to.
  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.
Parameters:
  • user (str) – The ID of the user.
  • email (str) – The email address of the user.
  • resetPassword (boolean) – Whether the user must reset their password when they next log in.
  • roles (list of str) – Roles or groups the user belongs to.
Returns:

A copy of the user record

Return type:

dict

The LabbcatAdmin class also inherits the LabbcatEdit class.

Query Language Generation Functions

labbcat.expressionFromAttributeValue(attribute, values, negate=False)

Generates a query expression for matching a transcript/participant attribute.

This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list ofpossible values for a given transcript/participant attribute.

The attribute defined by ‘attribute’ is expected to have exactly one value. If it may have multiple values, use expressionFromAttributeValues() instead.

Parameters:
  • attribute (str) – The transcript/participant attribute to filter by.
  • values (list or str) – A list of possible values for attribute, or a single value.
  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromAttributeValues(attribute, values, negate=False)

Generates a query expression for matching a transcript/participant attribute.

This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list of possible values for a given transcript/participant attribute.

The attribute defined by ‘attribute’ is expected to have possibly more than one value. If it can only have one value, use expressionFromAttributeValue() instead.

Parameters:
  • attribute (str) – The transcript/participant attribute to filter by.
  • values (list or str) – A list of possible values for attribute, or a single value.
  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromIds(ids, negate=False)

Generates a query expression for matching transcripts or participants by ID.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes etc. using a list of IDs.

Parameters:
  • ids (list or str) – A list of IDs, or a single value.
  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromTranscriptTypes(transcriptTypes, negate=False)

Generates a transcript query expression for matching transcripts by type.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.

Parameters:
  • transcriptTypes (list or str) – A list of transcript types, or a single transcript type.
  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
Returns:

A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromCorpora(corpora, negate=False)

Generates a transcript query expression for matching transcripts/participants by corpus.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.

Parameters:
  • corpora (list or str) – A list of corpus names, or a single corpus name.
  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).
Returns:

A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes() etc.

Return type:

str

The LabbcatAdmin class also inherits the LabbcatEdit class.

ResponseException class

class labbcat.ResponseException(response)

Any method that creates a server request can raise this exception if an error occurs.

This has one attribute, response, which is a Response object representing the full response from the server, from which error messages etc. can be obtained.

class labbcat.Response(resp, verbose=False)

Standard LaBB-CAT response object.

Attributes:
  • model - The model or result returned if any.
  • httpStatus - The HTTP status code, or -1 if not known.
  • title - The title reqturned by the server.
  • version - The server version.
  • code - The numeric request code (0 or 1 means no error).
  • errors - Errors returned.
  • messages - Messages returned.
  • text - The full plain text of the HTTP response.
checkForErrors()

Convenience method for checking whether the response any errors.

If so, a corresponding ResponseException will be thrown.