nzilbb-labbcat module

LabbcatView class

class labbcat.LabbcatView(labbcatUrl, username=None, password=None)

API for querying a LaBB–CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This interface provides only read-only operations, i.e. those that can be performed by users with “view” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.

  • username (str or None) – The username for logging in to the server, if necessary.

  • password (str or None) – The password for logging in to the server, if necessary.

Attributes:

language: The language code for server message localization, e.g. “es-AR”

Example:

import labbcat

# create annotation store client
corpus = labbcat.LabbcatView("https://labbcat.canterbury.ac.nz", "demo", "demo")

# show some basic information

print("Information about LaBB-CAT at " + corpus.getId())

layerIds = corpus.getLayerIds()
for layerId in layerIds: 
    print("layer: " + layerId) 

corpora = corpus.getCorpusIds()
for c in corpora:
    print("transcripts in: " + c)
    for transcript in corpus.getTranscriptIdsInCorpus(c):
        print(" " + transcript)
allUtterances(participantIds, transcriptTypes=None, mainParticipant=True)

Identifies all utterances by the given participants.

A taskId is returned. To get the actual utterances, which are represented the same way as search results, call getMatches()

Parameters:
  • participantIds – A list of participant IDs to identify the utterances of.

  • transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.

  • mainParticipant – true to search only main-participant utterances, false to search all utterances.

Returns:

The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.

Return type:

str

cancelTask(threadId)

Cancels (but does not release) a running task.

Parameters:

threadId (str.) – The ID of the task.

countAnnotations(id, layerId, maxOrdinal=None)

Gets the number of annotations on the given layer of the given transcript.

Parameters:
  • id (str) – The ID of the transcript.

  • layerId (str) – The ID of the layer.

  • maxOrdinal (int or None) – The maximum ordinal for the counted annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are counted, regardless of their ordinal.

Returns:

A (possibly empty) array of annotations.

Return type:

int

countMatchingAnnotations(expression)

Counts the number of annotations that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • id == 'ew_0_456'

  • !/th[aeiou].//.test(label)

  • first('participant').label == 'Robert' && first('utterances').start.offset == 12.345

  • graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5

  • previous.id == 'ew_0_456'

NB all expressions must match by either id or layer.id.

Parameters:

expression (str) – An expression that determines which annotations match.

Returns:

The number of matching annotations.

Return type:

int

countMatchingParticipantIds(expression)

Counts the number of participants that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)

  • labels('corpus').includes('CC')

  • labels('participant_languages').includes('en')

  • labels('transcript_language').includes('en')

  • !/Ada.+/.test(id) && first('corpus').label == 'CC'

  • all('transcript_rating').length < 2

  • all('participant_rating').length = 0

  • !annotators('transcript_rating').includes('labbcat')

  • first('participant_gender').label == 'NA'

The following functions can be used to generate an expression of common types:

Example:

numQbParticipants = corpus.countMatchingParticipantIds(
    labbcat.expressionFromCorpora("QB"))            
Parameters:

expression (str) – An expression that determines which participants match.

Returns:

The number of matching participants.

Return type:

int

countMatchingTranscriptIds(expression)

Counts the number of transcripts that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)

  • labels('participant').includes('Robert')

  • ('CC', 'IA', 'MU').includes(first('corpus').label)

  • first('episode').label == 'Ada Aitcheson'

  • first('transcript_scribe').label == 'Robert'

  • first('participant_languages').label == 'en'

  • first('noise').label == 'bell'

  • labels('transcript_languages').includes('en')

  • labels('participant_languages').includes('en')

  • labels('noise').includes('bell')

  • all('transcript_languages').length gt; 1

  • all('participant_languages').length gt; 1

  • all('transcript').length gt; 100

  • annotators('transcript_rating').includes('Robert')

  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

Example:

numQuakeFaceTranscripts = corpus.countMatchingTranscriptIds(
    labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))            
Parameters:

expression (str) – An expression that determines which transcripts match.

Returns:

The number of matching transcripts.

Return type:

int

formatTranscript(id, layerIds, mimeType, dir=None)

Get transcript in a specified format.

Parameters:
  • id (str) – The ID of the transcript to export.

  • layerIds (list of str) – A list of IDs of annotation layers to include in the transcript.

  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.

  • dir (str) – A directory in which the file(s) should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each transcript, this is not guaranteed; some formats generate a mutiple files per transcript.

Return type:

list of str

getAnchors(id, anchorIds)

Gets the given anchors in the given transcript.

Parameters:
  • id (str) – The ID of the transcript.

  • anchorIds (list of str) – A list of anchor IDs.

Returns:

A (possibly empty) list of anchors.

Return type:

list of dictionaries

getAnnotations(id, layerId, maxOrdinal=None, pageLength=None, pageNumber=None)

Gets the annotations on the given layer of the given transcript.

Parameters:
  • id (str) – The ID of the transcript.

  • layerId – The ID of the layer.

  • maxOrdinal (int or None) – The maximum ordinal for the returned annotations. e.g. a maxOrdinal of 1 will ensure that only the first annotation for each parent is returned. If maxOrdinal is None, then all annotations are returned, regardless of their ordinal.

  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

Returns:

A (possibly empty) list of annotations.

Return type:

list of dictionaries

getAvailableMedia(id)

List the media available for the given transcript.

Parameters:

id (str) – The transcript ID.

Returns:

List of media files available for the given transcript.

Return type:

list of dictionaries

getCorpusIds()

Gets a list of corpus IDs.

Returns:

A list of corpus IDs.

Return type:

list

getDeserializerDescriptors()

Lists the descriptors of all registered serializers.

Deserializers are modules that import annotation structures from a specific file format, e.g. Praat TextGrid, plain text, etc.

Returns:

A list of the descriptors of all registered serializers.

Return type:

list of dictionaries

getDictionaries()

List the dictionaries available.

Returns:

A dictionary of lists, where keys are layer manager IDs, each of which containing a list of IDs for dictionaries that the layer manager makes available.

Return type:

dict of lists

getDictionaryEntries(managerId, dictionaryId, keys)

Lookup entries in a dictionary.

Parameters:
  • managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries()).

  • dictionaryId

    The ID of the dictionary, as returned by getDictionaries()).

  • keys (list of str or list of dict) – A list of keys (words) identifying entries to look up.

Returns:

A dictionary of lists, where keys are given keys, each of which containing a list of entries. Keys with no corresponding entry in the given dictionary will be present in the returned result, but will have no entries.

Return type:

dict of lists

getEpisodeDocuments(id)

Get a list of documents associated with the episode of the given transcript.

Parameters:

id (str) – The transcript ID.

Returns:

List of URLs to documents.

Return type:

list of str

getFragmentAnnotationData(layerId, transcriptIds, startOffsets=None, endOffsets=None, dir=None)

Gets binary annotation data in fragments.

In some annotation layers, the annotations have not only a textual label, but also binary data associated with it; e.g. an image or a data file. In these cases, the ‘type’ of the layer is a MIME type, e.g. ‘image/png’.

This function gets annotations between given start/end times on the given MIME-typed layer, and retrieves the binary data as files, whose names are returned by the function.

The intervals to extract from can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats

  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None, in which case the starts/ends are the boundaries of the utterance that matched.

Parameters:
  • layerId – The ID of the layer with a MIME type, from which annotation files will be extractied.

  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).

  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.

  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.

  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

Returns:

A list of files (e.g. PNG images). If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getFragments(transcriptIds, layerIds, mimeType, dir=None, startOffsets=None, endOffsets=None, prefixNames=True)

Get transcript fragments in a specified format.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats

  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None

Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).

  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.

  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.

  • layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.

  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.

  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.

Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. NB Although many formats will generate exactly one file for each interval, this is not guaranteed; some formats generate a single file or a fixed collection of files regardless of how many fragments there are.

Return type:

list of str

getFragmentsAsync(transcriptIds, layerIds, mimeType, startOffsets=None, endOffsets=None, prefixNames=True)

Starts a server task for getting transcript fragments in a specified format. The task continues running after this function returns, and can be monitored with taskStatus(), cancelled with cancelTask(), and the final results retrieved with taskResults(). The caller should eventually call releaseTask() to free server resources after the task is cancelled or finished.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats

  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None

Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).

  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.

  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.

  • layerIds (list of str) – A list of IDs of annotation layers to include in the fragment.

  • mimeType (list of str) – The desired format, for example “text/praat-textgrid” for Praat TextGrids, “text/plain” for plain text, etc.

  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.

Returns:

The threadId of the resulting task, which can be passed in to taskStatus(), waitForTask() taskResults() releaseTask(), etc.

Return type:

str

getId()

Gets the store’s ID.

Returns:

The annotation store’s ID.

Return type:

str

getLayer(id)

Gets a layer definition.

Parameters:

id (str) – ID of the layer to get the definition for.

Returns:

The definition of the given layer.

Return type:

dictionary

getLayerIds()

Gets a list of layer IDs (annotation ‘types’).

Returns:

A list of layer IDs.

Return type:

list

getLayers()

Gets a list of layer definitions.

Returns:

A list of layer definitions.

Return type:

list of dictionaries

getMatchAnnotations(matchIds, layerIds, targetOffset=0, annotationsPerLayer=1, offsetThreshold=None)

Gets annotations on selected layers related to search results returned by a previous call to getMatches(threadId).

The returned list of lists contains dictionaries that represent individual annotations, with the following entries:

  • “id” : The annotation’s unique ID

  • “layerId” : The layer the annotation comes from

  • “label” : The annotation’s label or value

  • “startId” : The ID of the annotations start anchor

  • “endId” : The ID of the annotations end anchor

  • “parentId” : The annotation’s parent annotation ID

  • “ordinal” : The annotation’s position amongst its peers

  • “confidence”A rating of confidence in the label accuracy, from 0 (no

    confidence) to 100 (absolute confidence / manually annotated)

If offsetThreshold is a value between 0 and 100, the annotations may also include a “start” entry and an “end” entry, representing the start/end anchors of the annotation which define the position of the annotation in time. These values are dictionaries with the following entries:

  • “id” : The anchor’s unique ID

  • “offset”The time (in seconds since the start of the recording, unless the

    transcript is textual rather than speech, in which case it represents the number of characters from the beginning of the document)

  • “confidence”A rating of confidence in the alignment accuracy, from 0 (no

    confidence) to 100 (absolute confidence / manually specified)

Parameters:
  • matchIds (list of str or list of dict) –

    A list of MatchId strings, or a list of match dictionaries of the kind returned by getMatches()

  • layerIds (list of str) – A list of layer IDs.

  • targetOffset (int) – The distance from the original target of the match, e.g. - 0 : find annotations of the match target itself - 1 : find annotations of the token immediately after match target - -1 : find annotations of the token immediately before match target

  • annotationsPerLayer (int) – The number of annotations on the given layer to retrieve. In most cases, there’s only one annotation available. However, tokens may, for example, be annotated with ‘all possible phonemic transcriptions’, in which case using a value of greater than 1 for this parameter provides other phonemic transcriptions, for tokens that have more than one.

  • offsetThreshold (int) – The minimum confidence for alignments, e.g. - None – do not return alignments; - 0 – return all alignments, regardless of confidence; - 50 – return only alignments that have been at least automatically aligned; - 100 – return only manually-set alignments.

Returns:

If annotationsPerLayer == 1 and only one layer is specified in layerIds, a one-dimensional array of Annotations, of dimension len(matchIds) is returned. Otherwise, the return value is an array of arrays of Annotations, of dimensions len(matchIds) x (len(layerIds)x*annotationsPerLayer*). Either way, the first index matches the corresponding index in matchIds.

Return type:

list of list of dictionary

getMatches(search, wordsContext=0, pageLength=None, pageNumber=None)

Gets a list of tokens that were matched by search(pattern)

The search parameter can be either

  • a threadId returned from a previous call to search() or

  • a dict representing a pattern to search for.

If it is a threadId, and the task is still running, then this function will wait for it to finish.

If it is a pattern dict, then search() is called for the given pattern, the matches are retrieved, and releaseTask() is called to free the search resources. Some example patterns are shown below; for more detailed information, see search().

Example:

## a single list representing a 'one column' search, 
## and string values, representing regular expression pattern matching
pattern = { "orthography" : "ps.*" }

## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)...
pattern = [
  { "orthography" : "the" },
  { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
    "frequency" : { "max" : "2" } } ]

This function returns a list of match dictionaries, where each item has the following entries:

  • “Title” : The title of the LaBB-CAT instance</dd>

  • “Version” : The current version of the LaBB-CAT instance</dd>

  • “MatchId”An ID which encodes which token in which utterance by which

    participant of which transcript matched.

  • “URL” : URL that opens the corresponding transcript page at the first matching word.

  • “Transcript” : The name of the transcript document that the match is from.

  • “Participant” : The name of the participant who uttered the match.

  • “Corpus” : The corpus the match comes from.

  • “Line” : The start time of the utterance.

  • “LineEnd” : The end time of the utterance.

  • “BeforeMatch” : The context before the match.

  • “Text” : The match text.

  • “AfterMatch” : The context after the match.

Parameters:
  • search (str or dict) –

    This can be either a threadId returned from a previous call to search() or a dict representing a pattern to search for.

  • wordsContext (int) – Number of words context to include in the <q>Before Match</q> and <q>After Match</q> columns in the results.

  • pageLength (int or None) – The maximum number of matches to return, or None to return all.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

Returns:

A list of IDs that can be used to identify utterances/tokens that were matched by search(pattern), or None if the task was cancelled.

Return type:

list of dict

getMatchingAnnotationData(expression, dir=None)

Gets binary data for annotations that match a particular pattern.

In some annotation layers, the annotations have not only a textual label, but also binary data associated with it; e.g. an image or a data file. In these cases, the ‘type’ of the layer is a MIME type, e.g. ‘image/png’.

This function gets annotations that match the given expression on a MIME-typed layer, and retrieves the binary data as files, whose names are returned by the function.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • id == 'ew_0_456'

  • !/th[aeiou].&#47;/.test(label)

  • first('participant').label == 'Robert' && first('utterances').start.offset == 12.345

  • graph.id == 'AdaAicheson-01.trs' && layer.id == 'mediapipeFrame' && start.offset < 10.5

  • previous.id == 'ew_0_456'

NB all expressions must match by either id or layer.id.

Parameters:
  • expression (str) – An expression that determines which annotations match.

  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getMatchingAnnotations(expression, pageLength=None, pageNumber=None)

Gets a list of annotations that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • id == 'ew_0_456'

  • !/th[aeiou].&#47;/.test(label)

  • first('participant').label == 'Robert' && first('utterances').start.offset == 12.345

  • graph.id == 'AdaAicheson-01.trs' && layer.id == 'orthography' && start.offset < 10.5

  • previous.id == 'ew_0_456'

NB all expressions must match by either id or layer.id.

Parameters:
  • expression (str) – An expression that determines which transcripts match.

  • pageLength (int or None) – The maximum number of annotations to return, or null to return all.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

Returns:

A list of matching Annotations.

Return type:

list of dictionaries

getMatchingParticipantIds(expression, pageLength=None, pageNumber=None)

Gets a list of IDs of participants that match a particular pattern.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)

  • labels('corpus').includes('CC')

  • labels('participant_languages').includes('en')

  • labels('transcript_language').includes('en')

  • !/Ada.+/.test(id) && first('corpus').label == 'CC'

  • all('transcript_rating').length < 2

  • all('participant_rating').length = 0

  • !annotators('transcript_rating').includes('labbcat')

  • first('participant_gender').label == 'NA'

The following functions can be used to generate an expression of common types:

Example:

qbParticipants = corpus.getMatchingParticipantIds(
    labbcat.expressionFromCorpora("QB"))            
Parameters:
  • expression (str) – An expression that determines which participants match.

  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

Returns:

A list of participant IDs.

Return type:

list

getMatchingTranscriptIds(expression, pageLength=None, pageNumber=None, order=None)

Gets a list of IDs of transcripts that match a particular pattern.

The results can be exhaustive, by omitting pageLength and pageNumber, or they can be a subset (a ‘page’) of results, by given pageLength and pageNumber values.

The order of the list can be specified. If ommitted, the transcripts are listed in ID order.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)

  • labels('participant').includes('Robert')

  • ('CC', 'IA', 'MU').includes(first('corpus').label)

  • first('episode').label == 'Ada Aitcheson'

  • first('transcript_scribe').label == 'Robert'

  • first('participant_languages').label == 'en'

  • first('noise').label == 'bell'

  • labels('transcript_languages').includes('en')

  • labels('participant_languages').includes('en')

  • labels('noise').includes('bell')

  • all('transcript_languages').length gt; 1

  • all('participant_languages').length gt; 1

  • all('transcript').length gt; 100

  • annotators('transcript_rating').includes('Robert')

  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

Example:

quakeFaceTranscripts = corpus.getMatchingTranscriptIds(
    labbcat.expressionFromAttributeValue("transcript_quakeface", "1"))            
Parameters:
  • expression (str) – An expression that determines which transcripts match.

  • pageLength (int or None) – The maximum number of IDs to return, or null to return all.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • order (str) – The ordering for the list of IDs, a string containing a comma-separated list of expressions, which may be appended by “ ASC” or “ DESC”, or null for transcript ID order.

Returns:

A list of transcript IDs.

Return type:

list of str

getMedia(id, trackSuffix, mimeType, startOffset=None, endOffset=None, dir=None)

Downloads a given media track URL for a given transcript.

Parameters:
  • id (str) – The transcript ID.

  • trackSuffix (str) – The track suffix of the media.

  • mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’

  • startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.

  • endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.

  • dir (str) – A directory in which the file should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

Returns:

The file name of the resulting file. If dir is None, this file will be stored under the system’s temporary directory, so once processing is finished, it should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getMediaTracks()

List the predefined media tracks available for transcripts.

Returns:

An ordered list of media track definitions.

Return type:

list of dictionaries

getMediaUrl(id, trackSuffix, mimeType, startOffset=None, endOffset=None)

Gets a given media track URL for a given transcript.

Parameters:
  • id (str) – The transcript ID.

  • trackSuffix (str) – The track suffix of the media.

  • mimeType (str) – The MIME type of the media, which may include parameters for type conversion, e.g. ‘text/wav; samplerate=16000’

  • startOffset (float or None) – The start offset of the media sample, or null for the start of the whole recording.

  • endOffset (float or None) – The end offset of the media sample, or null for the end of the whole recording.

Returns:

A URL to the given media for the given transcript, or null if the given media doesn’t exist.

Return type:

str

getParticipant(id)

Gets the participant record specified by the given identifier.

Parameters:

id (str) – The ID of the participant, which could be their name or their database annotation ID.

Returns:

An annotation representing the participant, or null if the participant was not found.

Return type:

dictionary

getParticipantAttributes(participantIds, layerIds)

Gets participant attribute values.

Retrieves participant attribute values for given participant IDs, saves them to a CSV file, and returns the name of the file.

In general, participant attributes are layers whose ID is prefixed ‘participant’, however formally it’s any layer where layer.parentId == ‘participant’ and layer.alignment == 0.

The resulting file is the responsibility of the caller to delete when finished.

Parameters:
  • participantIds (list of str.) – A list of participant IDs

  • layerIds (list of str.) – A list of layer IDs corresponding to participant attributes.

Returns:

The name of a CSV file with one row per participant, and one column per attribute.

Return type:

str

getParticipantIds()

Gets a list of participant IDs.

Returns:

A list of participant IDs.

Return type:

list

getSerializerDescriptors()

Lists the descriptors of all registered serializers.

Serializers are modules that export annotation structures as a specific file format, e.g. Praat TextGrid, plain text, etc., so the mimeType of descriptors reflects what mimeTypes can be specified for getFragments()

Returns:

A list of the descriptors of all registered serializers.

Return type:

list of dictionaries

getSoundFragments(transcriptIds, startOffsets=None, endOffsets=None, sampleRate=None, dir=None, prefixNames=True)

Downloads WAV sound fragments.

The intervals to extract can be defined in two possible ways:

  1. transcriptIds is a list of strings, and startOffsets and endOffsets are lists of floats

  2. transcriptIds is a list of dict objects returned by getMatches(threadId), and startOffsets and endOffsets are None

Parameters:
  • transcriptIds (list of str or list of dict) – A list of transcript IDs (transcript names), or a list of dictionaries returned by getMatches(threadId).

  • startOffsets (list of float or None) – A list of start offsets, with one element for each element in transcriptIds.

  • endOffsets (list of float or None) – A list of end offsets, with one element for each element in transcriptIds.

  • sampleRate (int) – The desired sample rate, or null for no preference.

  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

  • prefixNames (boolean) – Whether to prefix fragment names with a numeric serial number or not.

Returns:

A list of WAV files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location.

Return type:

list of str

getSystemAttribute(attribute)

Gets the value of the given system attribute.

Parameters:

attribute (str) – Name of the attribute.

Returns:

The value of the given attribute, or None if the attribute doesn’t exist.

Return type:

str

getTasks()

Gets a list of all tasks on the server.

Returns:

A list of all task statuses.

Return type:

list of dictionaries

getTranscript(id, layerIds=None)

Gets a transcript given its ID.

The returned object defines the annotation graph structure, and is a dictionary whose entries include:

  • “id” : the transcript ID

  • “schema” : a representation of the layer structure of the graph

  • “anchors”a dictionary of temporal anchors that represent the start and/or end

    time of an annotation (keyed by anchor ID)

  • “participant”a list of participants in the transcript. Each participant is

    represented by a dictionary that includes a “turn” entry which is a list of speaker turns, each turn having an “utterance” entry contatainging utterance boundary annotations, and a “word” entry containing a list of word tokens.

  • entries for ‘spanning’ layers that are not assigned to a specific participant.

Annotations are presented by dictionaries that have the following entries:

  • “id” : the unique identifier for the annotation

  • “label” : the annotation layer

  • “startId” and “endId”the start and end anchors, which correspond to an entry

    in the “anchors” dictionary

  • “confidence”label confidence rating, where 100 means it was labelled by a

    human, and 50 means it was labelled by an automated process.

Parameters:
  • id (str) – The given transcript ID.

  • layerIds (list of str) – The IDs of the layers to load, or null if only transcript data is required.

Returns:

The identified transcript.

Return type:

dictionary

getTranscriptAttributes(expression, layerIds, csvFileName=None)

Get transcript attribute values.

Retrieves transcript attribute values for a given transcript expression, saves them to a CSV file, and returns the name of the file.

The expression parameter can be an explicit list of transcript IDs, or a string query expression that identifies which transcripts to return.

The expression language is loosely based on JavaScript; expressions such as the following can be used:

  • /Ada.+/.test(id)

  • labels('participant').includes('Robert')

  • ('CC', 'IA', 'MU').includes(first('corpus').label)

  • first('episode').label == 'Ada Aitcheson'

  • first('transcript_scribe').label == 'Robert'

  • first('participant_languages').label == 'en'

  • first('noise').label == 'bell'

  • labels('transcript_languages').includes('en')

  • labels('participant_languages').includes('en')

  • labels('noise').includes('bell')

  • all('transcript_languages').length &gt; 1

  • all('participant_languages').length y 1

  • all('word').length &gt; 100

  • annotators('transcript_rating').includes('Robert')

  • !/Ada.+/.test(id) && first('corpus').label == 'CC' && labels('participant').includes('Robert')

The following functions can be used to generate an expression of common types:

In general, transcript attributes are layers whose ID is prefixed ‘transcript’, however formally it’s any layer where layer.parentId == ‘graph’ and layer.alignment == 0, which includes ‘corpus’ as well as transcript attribute layers.

The resulting file is the responsibility of the caller to delete when finished.

Example:

# duration/word count of QB corpus transcripts
qbAttributesCsv = corpus.getTranscriptAttributes(
    labbcat.expressionFromCorpora("QB"),
    ["transcript_duration", "transcript_word count"])            

# speech rate for spontaneous speech recordings
spontaneousSpeechRateCsv = corpus.getTranscriptAttributes(
    labbcat.expressionFromTranscriptTypes(["monologue", "interview"]),
    ["transcript_syllables per minute"])

# language for targeted transcripts
languageCsv = corpus.getTranscriptAttributes(
    ["AP2505_Nelson.eaf", "AP2512_MattBlack.eaf"],
    "transcript_language")

# tidily delete CSV files
os.remove([qbAttributesCsv, spontaneousSpeechRateCsv, languageCsv])
Parameters:
  • expression (str or list of str.) – An expression that determines which transcripts match, or an explicit list of transcript IDs.

  • layerIds (list of str.) – A list of layer IDs corresponding to transcript attributes.

  • csvFileName (str.) – The file to save the resulting CSV rows to.

Returns:

The name of a CSV file with one row per transcript, and one column per attribute.

Return type:

str

getTranscriptIds()

Gets a list of transcript IDs.

Returns:

A list of transcript IDs.

Return type:

list

getTranscriptIdsInCorpus(id)

Gets a list of transcript IDs in the given corpus.

Parameters:

id (str) – A corpus ID.

Returns:

A list of transcript IDs.

Return type:

list

getTranscriptIdsWithParticipant(id)

Gets a list of IDs of transcripts that include the given participant.

Parameters:

id (str) – A participant ID.

Returns:

A list of transcript IDs.

Return type:

list of str

getUserInfo()

Gets information about the current suer, including the roles or groups they are in.

Returns:

The user record, including a “user” entry with the user ID, and a “roles” entry which is a list of str.

Return type:

dict

processWithPraat(praatScript, windowOffset, matchIds, offsets, endOffsets=None, genderAttribute='participant_gender', attributes=None)

Process a set of intervals with Praat.

This function instructs the LaBB-CAT server to invoke Praat for a set of sound intervals, in order to extract acoustic measures.

The exact measurements to return depend on the praatScript that is invoked. This is a Praat script fragment that will run once for each sound interval specified.

There are functions to allow the generation of a number of pre-defined praat scripts for common tasks such as formant, pitch, intensity, and centre of gravity – see

You can provide your own script, either by building a string with your code, or loading one from a file.

LaBB-CAT prefixes praatScript with code to open a sound file and extract a defined part of it into a Sound object which is then selected.

LaBB-CAT ‘Remove’s this Sound object after the script finishes executing. Any other objects created by the script must be ‘Remove’d before the end of the script (otherwise Praat runs out of memory during very large batches)

LaBB-CAT assumes that all calls to the function ‘print’ correspond to fields for export and each field must be printed on its own line. Specifically it scans for lines of the form:

print ‘myOutputVariable’ ‘newline$’

Variables that can be assumed to be already set in the context of the script are:

  • windowOffset

    – the value used for the Window Offset; how much context to include.

  • windowAbsoluteStart

    – the start time of the window extracted relative to the start of the original audio file.

  • windowAbsoluteEnd

    – the end time of the window extracted relative to the start of the original audio file.

  • windowDuration

    – the duration of the window extracted (including window offset).

  • targetAbsoluteStart

    – the start time of the target interval relative to the start of the original audio file.

  • targetAbsoluteEnd

    – the end time of the target interval relative to the start of the original audio file.

  • targetStart

    – the start time of the target interval relative to the start of the window extracted.

  • targetEnd

    – the end time of the target interval relative to the start of the window extracted.

  • targetDuration

    – the duration of the target interval.

  • sampleNumber

    – the number of the sample within the set of samples being processed.

  • sampleName$

    – the name of the extracted/selected Sound object.

Parameters:
  • praatScript (str) – Script to run on each match.

  • windowOffset (float) – In many circumstances, you will want some context before and after the sample start/end time. For this reason, you can specify a “window offset” - this is a number of seconds to subtract from the sample start and add to the sample end time, before extracting that part of the audio for processing. For example, if the sample starts at 2.0s and ends at 3.0s, and you set the window offset to 0.5s, then Praat will extract a sample of audio from 1.5s to 3.5s, and do the selected processing on that sample. The best value for this depends on what the praatScript is doing; if you are getting formants from vowels, including some context ensures that the formants at the edges are more accurate (in LaBB-CAT’s web interface, the default value for this 0.025), but if you’re getting max pitch or COG during a segment, most likely you want a window.offset of 0 to ensure neighbouring segments don’t influence the measurement.

  • matchIds (list of str or list of dict) –

    A list of MatchId strings, or a list of match dictionaries of the kind returned by getMatches()

  • offsets (list of float or None) – Either list of start offsets (in which case endOffsets must also be specified) or a list of Annotation dict objects of the kind returned by getMatchAnnotations() (in which case endOffsets should be None). Either way, there must be one element for each element in matchIds.

  • endOffsets (list of float or None) – If offsets is a list of start offsets, this must be list of end offsets, with one element for each element in matchtIds. Otherwise, None

  • genderAttribute (str) – Which participant attribute represents the participant’s gender.

  • attributes (list) – A list of participant attribute names to make available to the script. For example, if you want to use different acoustic parameters depending on what the gender of the speaker is, including the “participant_gender” attribute will make a variable called participant_gender$ available to the praat script, whose value will be the gender of the speaker of that segment.

Returns:

A list of dictionaries of acoustic measurements, one of each matchId.

Return type:

list of dict

processWithPraatAsync(praatScript, windowOffset, matchIds, offsets, endOffsets=None, genderAttribute='participant_gender', attributes=None)

Starts a server task for processing a set of intervals with Praat.

The task continues running after this function returns, and can be monitored with taskStatus(), cancelled with cancelTask(), and the final results retrieved with taskResults(). The caller should eventually call releaseTask() to free server resources after the task is cancelled or finished.

This function instructs the LaBB-CAT server to invoke Praat for a set of sound intervals, in order to extract acoustic measures.

The exact measurements to return depend on the praatScript that is invoked. This is a Praat script fragment that will run once for each sound interval specified.

There are functions to allow the generation of a number of pre-defined praat scripts for common tasks such as formant, pitch, intensity, and centre of gravity – see

You can provide your own script, either by building a string with your code, or loading one from a file.

LaBB-CAT prefixes praatScript with code to open a sound file and extract a defined part of it into a Sound object which is then selected.

LaBB-CAT ‘Remove’s this Sound object after the script finishes executing. Any other objects created by the script must be ‘Remove’d before the end of the script (otherwise Praat runs out of memory during very large batches)

LaBB-CAT assumes that all calls to the function ‘print’ correspond to fields for export and each field must be printed on its own line. Specifically it scans for lines of the form:

print ‘myOutputVariable’ ‘newline$’

Variables that can be assumed to be already set in the context of the script are:

  • windowOffset

    – the value used for the Window Offset; how much context to include.

  • windowAbsoluteStart

    – the start time of the window extracted relative to the start of the original audio file.

  • windowAbsoluteEnd

    – the end time of the window extracted relative to the start of the original audio file.

  • windowDuration

    – the duration of the window extracted (including window offset).

  • targetAbsoluteStart

    – the start time of the target interval relative to the start of the original audio file.

  • targetAbsoluteEnd

    – the end time of the target interval relative to the start of the original audio file.

  • targetStart

    – the start time of the target interval relative to the start of the window extracted.

  • targetEnd

    – the end time of the target interval relative to the start of the window extracted.

  • targetDuration

    – the duration of the target interval.

  • sampleNumber

    – the number of the sample within the set of samples being processed.

  • sampleName$

    – the name of the extracted/selected Sound object.

Parameters:
  • praatScript (str) – Script to run on each match.

  • windowOffset (float) – In many circumstances, you will want some context before and after the sample start/end time. For this reason, you can specify a “window offset” - this is a number of seconds to subtract from the sample start and add to the sample end time, before extracting that part of the audio for processing. For example, if the sample starts at 2.0s and ends at 3.0s, and you set the window offset to 0.5s, then Praat will extract a sample of audio from 1.5s to 3.5s, and do the selected processing on that sample. The best value for this depends on what the praatScript is doing; if you are getting formants from vowels, including some context ensures that the formants at the edges are more accurate (in LaBB-CAT’s web interface, the default value for this 0.025), but if you’re getting max pitch or COG during a segment, most likely you want a window.offset of 0 to ensure neighbouring segments don’t influence the measurement.

  • matchIds (list of str or list of dict) –

    A list of MatchId strings, or a list of match dictionaries of the kind returned by getMatches()

  • offsets (list of float or None) –

    Either list of start offsets (in which case endOffsets must also be specified) or a list of Annotation dict objects of the kind returned by getMatchAnnotations() (in which case endOffsets should be None). Either way, there must be one element for each element in matchIds.

  • endOffsets (list of float or None) – If offsets is a list of start offsets, this must be list of end offsets, with one element for each element in matchtIds. Otherwise, None

  • genderAttribute (str) – Which participant attribute represents the participant’s gender.

  • attributes (list) – A list of participant attribute names to make available to the script. For example, if you want to use different acoustic parameters depending on what the gender of the speaker is, including the “participant_gender” attribute will make a variable called participant_gender$ available to the praat script, whose value will be the gender of the speaker of that segment.

Returns:

The threadId of the resulting task, which can be passed in to taskStatus(), waitForTask() taskResults() releaseTask(), etc.

Return type:

str

releaseTask(threadId)

Release a finished task, to free up server resources.

Parameters:

threadId (str.) – The ID of the task.

search(pattern, participantIds=None, transcriptTypes=None, mainParticipant=True, aligned=False, matchesPerTranscript=None, overlapThreshold=None)

Searches for tokens that match the given pattern.

Example:

pattern = {"columns":[{"layers":{"orthography":{"pattern":"the"}}}]}

Strictly speaking, pattern should be a dictionary that matches the structure of the search matrix in the browser interface of LaBB-CAT; i.e. a dictionary with with one entrye called “columns”, which is a list of dictionaries.

Each element in the “columns” list contains a dictionary with an entry named “layers”, whose value is a dictionary for patterns to match on each layer, and optionally an element named “adj”, whose value is a number representing the maximum distance, in tokens, between this column and the next column - if “adj” is not specified, the value defaults to 1, so tokens are contiguous.

Each element in the “layers” dictionary is named after the layer it matches, and the value is a dictionary with the following possible entries:

  • “pattern” : A regular expression to match against the label

  • “min” : An inclusive minimum numeric value for the label

  • “max” : An exclusive maximum numeric value for the label

  • “not” : True to negate the match

  • “anchorStart”True to anchor to the start of the annotation on this layer

    (i.e. the matching word token will be the first at/after the start of the matching annotation on this layer)

  • “anchorEnd”True to anchor to the end of the annotation on this layer

    (i.e. the matching word token will be the last before/at the end of the matching annotation on this layer)

  • “target”True to make this layer the target of the search; the results will

    contain one row for each match on the target layer

Some examples of valid pattern objects are shown below.

Example:

## words starting with 'ps...'
pattern = {"columns":[{"layers":{"orthography":{"pattern":"ps.*"}}}]}

## the word 'the' followed immediately or with one intervening word by
## a hapax legomenon (word with a frequency of 1) that doesn't start with a vowel
pattern = { "columns" : [
  { "layers" : {
      "orthography" : { "pattern" : "the" } }
    "adj" : 2 },
  { "layers" : {
      "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
      "frequency" : { max : "2" } } } ] }

For ease of use, the function will also accept the following abbreviated forms; some examples are shown below.

Example:

## a single list representing a 'one column' search, 
## and string values, representing regular expression pattern matching
pattern = { "orthography" : "ps.*" }

## a list containing the columns (adj defaults to 1, so matching tokens are contiguous)...
pattern = [
  { "orthography" : "the" },
  { "phonemes" : { "not" : True, "pattern" : "[cCEFHiIPqQuUV0123456789~#\$@].*" },
    "frequency" : { "max" : "2" } } ]
Parameters:
  • pattern – A dict representing the pattern to search for, which mirrors the Search Matrix in the browser interface.

  • participantIds – An optional list of participant IDs to search the utterances of. If null, all utterances in the corpus will be searched.

  • transcriptTypes – An optional list of transcript types to limit the results to. If null, all transcript types will be searched.

  • mainParticipant – true to search only main-participant utterances, false to search all utterances.

  • aligned – true to include only words that are aligned (i.e. have anchor confidence &ge; 50, false to search include un-aligned words as well.

  • matchesPerTranscript – Optional maximum number of matches per transcript to return. None means all matches.

  • overlapThreshold – Optional percentage overlap with other utterances before simultaneous speech is excluded. None means include all overlapping utterances.

Returns:

The threadId of the resulting task, which can be passed in to getMatches(), taskStatus(), waitForTask() releaseTask(), etc.

Return type:

str

taskResults(threadId, dir=None)

Gets the results of the given task, as a file or list of files.

Some tasks produce a file for download when they’re finished (e.g. getFragmentsAsync()) so this function provides acces to this results file. If the results are compressed into a zip file, this function automatically unpacks the contained files.

Parameters:
  • threadId (str.) – The ID of the task.

  • dir (str) – A directory in which the files should be stored, or null for a temporary folder. If specified, and the directory doesn’t exist, it will be created.

Returns:

A list of files. If dir is None, these files will be stored under the system’s temporary directory, so once processing is finished, they should be deleted by the caller, or moved to a more permanent location. If the task has no results (yet) this function returns None.

Return type:

list of str

taskStatus(threadId)

Gets the current state of the given task.

Parameters:

threadId (str.) – The ID of the task.

Returns:

The status of the task.

Return type:

dictionary

versionInfo()

Gets version information of all components of LaBB-CAT.

Version information includes versions of all components and modules installed on the LaBB-CAT server, including format converters and annotator modules.

Returns:

A dictionary of sections, each section a dictionary of modules indicating the version of that module.

Return type:

dict

waitForTask(threadId, maxSeconds=0)

Wait for the given task to finish.

Parameters:
  • threadId (str) – The task ID.

  • maxSeconds (int) – The maximum time to wait for the task, or 0 for forever.

Returns:

The final task status. To determine whether the task finished or waiting timed out, check result.running, which will be false if the task finished.

Return type:

dict

LabbcatEdit class

class labbcat.LabbcatEdit(labbcatUrl, username=None, password=None)

API for querying and updating a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This class inherits the read-only operations of LabbcatView and adds some write operations for updating data, i.e. those that can be performed by users with “edit” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.

  • username (str or None) – The username for logging in to the server, if necessary.

  • password (str or None) – The password for logging in to the server, if necessary.

addDictionaryEntry(managerId, dictionaryId, key, entry)

Adds an entry to a dictionary.

This function adds a new entry to the given dictionary. Words can have multiple entries.

Parameters:
  • managerId (str) –

    The layer manager ID of the dictionary, as returned by getDictionaries()

  • dictionaryId (str) –

    The ID of the dictionary, as returned by getDictionaries().

  • key (str) – The key (word) in the dictionary to add an entry for.

  • entry (str) – The value (definition) for the given key.

Returns:

None if the entry was added, or an error message if not.

Return type:

str or None

addLayerDictionaryEntry(layerId, key, entry)

Adds an entry to a layer dictionary.

This function adds a new entry to the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.

Parameters:
  • layerId (str) – The ID of the layer with a dictionary configured to manage it.

  • key (str) – The key (word) in the dictionary to add an entry for.

  • entry (str) – The value (definition) for the given key.

Returns:

None if the entry was added, or an error message if not.

Return type:

str or None

annotatorExt(annotatorId, resource, parameters=None)

Retrieve annotator’s “ext” resource.

Retrieve a given resource from an annotator’s “ext” web app. Annotators are modules that perform different annotation tasks, and can optionally implement functionality for providing extra data or extending functionality in an annotator-specific way. If the annotator implements an “ext” web app, it can provide resources and implement a mechanism for iterrogating the annotator. This function provides a mechanism for accessing these resources via python.

Details about the resources available for a given annotator are available by calling getAnnotatorDescriptor() and checking “hasExtWebapp” attribute to ensure an ‘ext’ webapp is implemented, and checking details the “extApiInfo” attribute.

Parameters:
  • annotatorId (str) – ID of the annotator to interrogate.

  • resource (str) – The name of the file to retrieve or instance method (function) to invoke. Possible values for this depend on the specific annotator being interrogated.

  • parameters (str) – Optional list of ordered parameters for the instance method (function).

Returns:

The resource requested.

Return type:

str

deleteMedia(id, fileName)

Delete a given media or episode document file.

Parameters:
  • id (str) – The ID transcript whose media will be deleted.

  • fileName (str) – The media file name, e.g. mediaFile[‘name’].

deleteParticipant(id)

Deletes the given participant, and all associated meta-data.

Parameters:

id (str) – The ID participant to delete.

deleteTranscript(id)

Deletes the given transcript, and all associated files.

Parameters:

id (str) – The ID transcript to delete.

generateLayerUtterances(matchIds, layerId, collectionName=None)

Generates a layer for a given set of utterances.

This function generates annotations on a given layer for a given set of utterances, e.g. force-align selected utterances of a participant.

Parameters:
  • matchIds – A list of annotation IDs, e.g. the MatchId column, or the URL column, of a results set.

  • layerId (str) – The ID of the layer to generate.

Returns:

The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().

Return type:

str

getAnnotatorDescriptor(annotatorId)

Gets annotator information.

Retrieve information about an annotator. Annotators are modules that perform different annotation tasks. This function provides information about a given annotator, for example the currently installed version of the module, what configuration parameters it requires, etc.

The retuned dictionary contains the following entries:

  • “annotatorId” - The annotators’s unique ID

  • “version” - The currently install version of the annotator.

  • “info” - HTML-encoded description of the function of the annotator.

  • “infoText” - A plain text version of $info (converted automatically).

  • “hasConfigWebapp” - Determines whether the annotator includes a web-app for installation or general configuration.

  • “configParameterInfo” - An HTML-encoded definition of the installation config parameters, including a list of all parameters, and the encoding of the parameter string.

  • “hasTaskWebapp” - Determines whether the annotator includes a web-app for task parameter configuration.

  • “taskParameterInfo” - An HTML-encoded definition of the task parameters, including a list of all parameters, and the encoding of the parameter string.

  • “hasExtWebapp” - Determines whether the annotator includes an extras web-app which implements functionality for providing extra data or extending functionality in an annotator-specific way.

  • “extApiInfo” - An HTML-encoded document containing information about what endpoints are published by the ext web-app.

Parameters:

annotatorId (str) – ID of the annotator module.

Returns:

The annotator info.

Return type:

dictionary of str

newTranscript(transcript, media, trackSuffix, transcriptType, corpus, episode)

Uploads a new transcript.

Parameters:
  • transcript (str) – The path to the transcript to upload.

  • media (str) – The path to media to upload, if any.

  • trackSuffix (str) – The track suffix for the media, which can be None.

  • transcriptType – The transcript type.

  • type – str

  • corpus (str) – The corpus for the transcript.

  • episode (str) – The episode the transcript belongs to.

Returns:

A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().

Return type:

dictionary of str

removeDictionaryEntry(managerId, dictionaryId, key, entry=None)

Removes an entry from a dictionary.

This function removes an existing entry from the given dictionary. Words can have multiple entries.

Parameters:
  • managerId (str) – The layer manager ID of the dictionary, as returned by getDictionaries

  • dictionaryId (str) –

    The ID of the dictionary, as returned by getDictionaries().

  • key (str) – The key (word) in the dictionary to remove an entry for.

  • entry (str) – The value (definition) to remove, or None to remove all the entries for key.

Returns:

None if the entry was removed, or an error message if not.

Return type:

str or None

removeLayerDictionaryEntry(layerId, key, entry=None)

Removes an entry from a layer dictionary.

This function removes an existing entry from the dictionary that manages a given layer, and updates all affected tokens in the corpus. Words can have multiple entries.

Parameters:
  • layerId (str) – The ID of the layer with a dictionary configured to manage it.

  • key (str) – The key (word) in the dictionary to remove an entry for.

  • entry (str) – The value (definition) to remove, or None to remove all the entries for key.

Returns:

None if the entry was removed, or an error message if not.

Return type:

str or None

saveEpisodeDocument(id, document)

Saves the given media for the given transcript.

Parameters:
  • id (str) – The transcript ID.

  • media (str) – The path to the document to upload.

Returns:

A dictionary of attributes of the document file (name, url, etc.).

Return type:

dictionary of str

saveMedia(id, media, trackSuffix)

Saves the given media for the given transcript.

Parameters:
  • id (str) – The transcript ID.

  • media (str) – The path to media to upload.

  • trackSuffix (str) – The track suffix for the media.

Returns:

A dictionary of attributes of the media file (name, url, etc.).

Return type:

dictionary of str

saveParticipant(id, label, attributes)
Saves a participant, and all its tags, to the graph store.

To change the ID of an existing participant, pass the old/current ID as the id, and pass the new ID as the label. If the participant ID does not already exist in the database, a new participant record is created.

Parameters:
  • id (str) – The ID participant to update.

  • label (str) – The new ID (name) for the participant.

  • attributes (dictionary of str) – Participant attribute values - the names are the participant attribute layer IDs, and the values are the corresponding new attribute values. The pass phrase for participant access can also be set by specifying a “_password” attribute.

Returns:

True if the participant was updated, False if there were no changes to update.

Return type:

boolean

transcriptUpload(transcript, media, merge, trackSuffix=None)

Upload a transcript file and associated media files, as the first stage in adding or modifying a transcript to LaBB-CAT. The second stage is transcriptUploadParameters()

Parameters:
  • transcript (str) – The path to the transcript to upload.

  • media (str) – The path to media to upload, if any.

  • merge (boolean) – Whether the upload corresponds to updates to an existing transcript (True) or a new transcript (False).

  • trackSuffix (str) – The track suffix for the media, which can be None.

Returns:

A dictionary containing the following entries:

  • “id” - The unique identifier to use for this upload when subsequently calling

    transcriptUploadParameters()

  • “parameters” - A list of dict representing the parameters that

    require values to be passed into transcriptUploadParameters() The parameters returned may include both information required by the format deserializer (e.g. mappings from tiers to LaBB-CAT layers) and also general information required by LaBB-CAT (e.g. the corpus, episode, and type of the transcript).

Return type:

dict

Each parameter returned is a dict that may contain the following attributes:

  • “name” - The name that should be used when specifying the value for the parameter

    when calling transcriptUploadParameters()

  • “label” - A label for the parameter intended for display to the user.

  • “hint” - A description of the purpose of the parameter, for display to the user.

  • “type” - The type of the parameter, e.g. “String”, “Double”, “Integer”, “Boolean”.

  • “required” - True if the value must be specified, False if it is optional.

  • “value” - A default value for the parameter.

  • “possibleValues” - A list of possible values, if the possibilities are limited

    to a finite set.

The required parameters may include both information required by the format deserializer (e.g. mappings from tiers to LaBB-CAT layers) and also general information required by LaBB-CAT, such as:

  • “labbcat_corpus” - The corpus the new transcript(s) belong(s) to.

  • “labbcat_episode” - The episode the new transcript(s) belong(s) to.

  • “labbcat_transcript_type” - The transcript type for the new transcript(s).

  • “labbcat_generate” - Whether to re-regenerate layers of automated annotations or not.

transcriptUploadDelete(id)

Cancel a transcript upload started by a previous call to transcriptUpload(), deleting any uploaded files from the server.

Parameters:

id (str) –

Upload ID returned by the prior call to transcriptUpload().

transcriptUploadParameters(id, parameters)

The second part of a transcript upload process started by a call to transcriptUpload(), which specifies values for the parameters required to save the uploaded transcript to LaBB-CAT’s database.

If the response includes more parameters, then this method should be called again to supply their values.

Parameters:
  • id (str) –

    Upload ID returned by the prior call to transcriptUpload().

  • parameters (dict) –

    A dictionary with an attribute and value for each parameter returned by the prior call to transcriptUpload().

Returns:

A dictionary containing the following entries:

  • “transcripts” - a dictionary for which each key is a transcript name, and its

    value is the threadId of the server task processing the uploaded transcript, which can be passed to taskStatus() to monitor progress.

  • “id” - The unique identifier for this upload for if a subsequent call is required to

    transcriptUploadParameters()

  • “parameters” - A list of dict representing the parameters that still

    require values to be passed into transcriptUploadParameters() if any.

Return type:

dict

updateFragment(fragment)

Update a transcript fragment.

This function uploads a file (e.g. Praat TextGrid) representing a fragment of a transcript, with annotations or alignments to update in LaBB-CAT’s version of the transcript.

Parameters:

fragment (str) – The path to the fragment to upload.

Returns:

A dictionary with information about the fragment that was updated, including URL, start_time, and end_time

Return type:

dictionary of str

updateTranscript(transcript, suppressGeneration=False)

Uploads a new version of an existing transcript.

Parameters:
  • transcript (str) – The path to the transcript to upload.

  • suppressGeneration (boolean) – False (the default) to run automatic layer generation, True to suppress automatic layer generation.

Returns:

A dictionary of transcript IDs (transcript names) to task threadIds. The task status can be updated using taskStatus().

Return type:

dictionary of str

The LabbcatEdit class inherits from the LabbcatView class.

LabbcatAdmin class

class labbcat.LabbcatAdmin(labbcatUrl, username=None, password=None)

API for querying, updating, and administering a LaBB-CAT annotation graph store; a database of linguistic transcripts represented using Annotation Graphs

This class inherits the read-write operations of GraphStore and adds some administration operations, including definition of layers, registration of converters, etc., i.e. those that can be performed by users with “admin” permission.

Constructor arguments:

Parameters:
  • labbcatUrl (str) – The ‘home’ URL of the LaBB-CAT server.

  • username (str or None) – The username for logging in to the server, if necessary.

  • password (str or None) – The password for logging in to the server, if necessary.

createCategory(class_id, category, description, display_order)

Creates a new category record.

The dictionary returned has the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.

  • “category” : The name/id of the category.

  • “description” : The description of the category.

  • “display_order” : Where the category appears among other categories..

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.

  • category (str) – The name/id of the category.

  • description (str) – The description of the category.

  • display_order (number) – Where the category appears among other categories.

Returns:

A copy of the category record

Return type:

dict

createCorpus(corpus_name, corpus_language, corpus_description)

Creates a new corpus record.

The dictionary returned has the following entries:

  • “corpus_id” : The database key for the record.

  • “corpus_name” : The name/id of the corpus.

  • “corpus_language” : The ISO 639-1 code for the default language.

  • “corpus_description” : The description of the corpus.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • corpus_name (str) – The name/id of the corpus.

  • corpus_language (str) – The ISO 639-1 code for the default language.

  • corpus_description (str) – The description of the corpus.

Returns:

A copy of the corpus record

Return type:

dict

createMediaTrack(suffix, description, display_order)

Creates a new media track record.

The dictionary returned has the following entries:

  • “suffix” : The suffix associated with the media track.

  • “description” : The description of the media track.

  • “display_order” : The position of the track amongst other tracks.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • suffix (str) – The suffix associated with the media track.

  • description (str) – The description of the media track.

  • display_order (str) – The position of the track amongst other tracks.

Returns:

A copy of the media track record

Return type:

dict

createProject(project, description)

Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use createCategory instead.

Parameters:
  • project (str) – The name/id of the project.

  • description (str) – The description of the project.

Returns:

A copy of the project record

Return type:

dict

createRole(role_id, description)

Creates a new role record.

The dictionary returned has the following entries:

  • “role_id” : The name/id of the role.

  • “description” : The description of the role.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • role_id (str) – The name/id of the role.

  • description (str) – The description of the role.

Returns:

A copy of the role record

Return type:

dict

createRolePermission(role_id, entity, layer, value_pattern)

Creates a new role permission record.

The dictionary returned has the following entries:

  • “role_id” : The ID of the role this permission applies to.

  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).

  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.

  • “value_pattern”Regular expression for matching against the layerId label. If

    the regular expression matches the label, access is allowed.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • role_id (str) – The ID of the role this permission applies to.

  • entity (str) – The media entity this permission applies to.

  • layer (str) – ID of the layer for which the label determines access.

  • value_pattern (str) – Regular expression for matching against.

Returns:

A copy of the role permission record

Return type:

dict

createUser(user, email, resetPassword, roles)

Creates a new user record.

The dictionary returned has the following entries:

  • “user” : The id of the user.

  • “email” : The email address of the user.

  • “resetPassword” : Whether the user must reset their password when they next log in.

  • “roles” : Roles or groups the user belongs to.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • user (str) – The ID of the user.

  • email (str) – The email address of the user.

  • resetPassword (boolean) – Whether the user must reset their password when they next log in.

  • roles (list of str) – Roles or groups the user belongs to.

Returns:

A copy of the user record

Return type:

dict

deleteCategory(class_id, category)

Deletes an existing category record.

Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.

  • category (str) – The name/id of the category.

deleteCorpus(corpus_name)

Deletes an existing corpus record.

Parameters:

corpus_name (str) – The name/id of the corpus.

deleteLayer(id)

Deletes a layer.

Parameters:

id (str) – The layer ID

deleteLexicon(lexicon)

Delete a previously loaded lexicon.

By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file).

Parameters:

lexicon (str) – The name of the lexicon to delete. e.g. ‘cmudict’

Returns:

None if the deletion was successful, or an error message if not.

Return type:

str or None

deleteMediaTrack(suffix)

Deletes an existing media track record.

Parameters:

suffix (str) – The suffix associated with the media track.

deleteProject(project)

Deletes an existing project record.

Parameters:

project (str) – The name/id of the project.

deleteRole(role_id)

Deletes an existing role record.

Parameters:

role_id (str) – The name/id of the role.

deleteRolePermission(role_id, entity)

Deletes an existing role permission record.

Parameters:
  • role_id (str) – The ID of the role this permission applies to.

  • entity (str) – The media entity this permission applies to.

deleteUser(user)

Deletes an existing user record.

Parameters:

user (str) – The ID of the user.

generateLayer(layerId)

Generates a layer.

This function generates annotations on a given layer for all transcripts in the corpus.

Parameters:

layerId (str) – The ID of the layer to generate.

Returns:

The taskId of the resulting annotation layer generation task. The task status can be updated using taskStatus().

Return type:

str

loadLexicon(file, lexicon, fieldDelimiter, fieldNames, quote=None, comment=None, skipFirstLine=False)

Upload a flat lexicon file for lexical tagging.

By default LaBB-CAT includes a layer manager called the Flat Lexicon Tagger, which can be configured to annotate words with data from a dictionary loaded from a plain text file (e.g. a CSV file). The file must have a ‘flat’ structure in the sense that it’s a simple list of dictionary entries with a fixed number of columns/fields, rather than having a complex structure.

Parameters:
  • file – The full path name of the lexicon file.

  • lexicon (str) – The name for the resulting lexicon. If the named lexicon already exists, it will be completely replaced with the contents of the file (i.e. all existing entries will be deleted befor adding new entries from the file). e.g. ‘cmudict’

  • fieldDelimiter (str) – The character used to delimit fields in the file. If this is “ - “, rows are split on only the first space, in line with common dictionary formats. e.g. ‘,’ for Comma Separated Values (CSV) files.

  • fieldNames (str) – A list of field names, delimited by fieldDelimiter, e.g. ‘Word,Pronunciation’.

  • quote (str) – The character used to quote field values (if any), e.g. ‘”’.

  • comment (str) – The character used to indicate a line is a comment (not an entry) (if any) e.g. ‘#’.

  • skipFirstLine (boolean) – Whether to ignore the first line of the file (because it contains field names).

Returns:

None if the upload was successful, or an error message if not.

Return type:

str or None

newLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels={}, category=None, annotatorId=None, annotatorTaskParameters=None)

Saves changes to a layer.

Parameters:
  • id (str) – The layer ID

  • parentId (str) – The layer’s parent layer id.

  • description (str) – The description of the layer.

  • alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.

  • peers (boolean) – Whether children on this layer have peers or not.

  • peersOverlap (boolean) – Whether child peers on this layer can overlap or not.

  • parentIncludes (boolean) – Whether the parent temporally includes the child.

  • saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).

  • type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.

  • validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).

  • category (str) – Category for the layer, if any.

  • annotatorId (str) – The ID of the layer manager that automatically fills in annotations on the layer, if any

  • annotatorTaskParameters (str) – The configuration the layer manager should use when filling the layer with annotations. This is a string whose format is specific to each layer manager.

Returns:

The resulting layer definition.

Return type:

dict

readCategories(class_id, pageNumber=None, pageLength=None)

Reads a list of category records.

The dictionaries in the returned list have the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.

  • “category” : The name/id of the category.

  • “description” : The description of the category.

  • “display_order” : Where the category appears among other categories..

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of category records.

Return type:

list of dict

readCorpora(pageNumber=None, pageLength=None)

Reads a list of corpus records.

The dictionaries in the returned list have the following entries:

  • “corpus_id” : The database key for the record.

  • “corpus_name” : The name/id of the corpus.

  • “corpus_language” : The ISO 639-1 code for the default language.

  • “corpus_description” : The description of the corpus.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of corpus records.

Return type:

list of dict

readMediaTracks(pageNumber=None, pageLength=None)

Reads a list of media track records.

The dictionaries in the returned list have the following entries:

  • “suffix” : The suffix associated with the media track.

  • “description” : The description of the media track.

  • “display_order” : The position of the track amongst other tracks.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of media track records.

Return type:

list of dict

readProjects(pageNumber=None, pageLength=None)

Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use readCategory(‘layer’) instead.

Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of project records.

Return type:

list of dict

readRolePermissions(role_id, pageNumber=None, pageLength=None)

Reads a list of role permission records.

The dictionaries in the returned list have the following entries:

  • “role_id” : The ID of the role this permission applies to.

  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).

  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.

  • “value_pattern”Regular expression for matching against the layerId label. If

    the regular expression matches the label, access is allowed.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • role_id (str) – The ID of the role this permission applies to.

  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of role permission records.

Return type:

list of dict

readRoles(pageNumber=None, pageLength=None)

Reads a list of role records.

The dictionaries in the returned list have the following entries:

  • “role_id” : The name/id of the role.

  • “description” : The description of the role.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of role records.

Return type:

list of dict

readSystemAttributes()

Reads a list of system attribute records.

The dictionaries in the returned list have the following entries:

  • “attribute” : ID of the attribute.

  • “type” : The type of the attribute - “string”, “boolean”, “select”, etc.

  • “style” : UI style, which depends on “type”.

  • “label” : User-facing label for the attribute.

  • “description” : User-facing (long) description for the attribute.

  • “options” : If ‘type” == “select”, this is a dict defining possible values.

  • “value” : The value of the attribute.

Returns:

A list of system attribute records.

Return type:

list of dict

readUsers(pageNumber=None, pageLength=None)

Reads a list of user records.

The dictionaries in the returned list have the following entries:

  • “user” : The id of the user.

  • “email” : The email address of the user.

  • “resetPassword” : Whether the user must reset their password when they next log in.

  • “roles” : Roles or groups the user belongs to.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • pageNumber (int or None) – The zero-based page number to return, or null to return the first page.

  • pageLength (int or None) – The maximum number of records to return, or null to return all.

Returns:

A list of user records.

Return type:

list of dict

saveLayer(id, parentId, description, alignment, peers, peersOverlap, parentIncludes, saturated, type, validLabels, category)

Saves changes to a layer.

Parameters:
  • id (str) – The layer ID

  • parentId (str) – The layer’s parent layer id.

  • description (str) – The description of the layer.

  • alignment (number) – The layer’s alignment - 0 for none, 1 for point alignment, 2 for interval alignment.

  • peers (boolean) – Whether children on this layer have peers or not.

  • peersOverlap (boolean) – Whether child peers on this layer can overlap or not.

  • parentIncludes (boolean) – Whether the parent temporally includes the child.

  • saturated (boolean) – Whether children must temporally fill the entire parent duration (true) or not (false).

  • type (str) – The type for labels on this layer, e.g. string, number, boolean, ipa.

  • validLabels (dict) – List of valid label values for this layer, or Nothing if the layer values are not restricted. The ‘key’ is the possible label value, and each key is associated with a description of the value (e.g. for displaying to users).

  • category (str) – Category for the layer, if any.

Returns:

The resulting layer definition.

Return type:

dict

setPassword(user, password, resetPassword)

Sets a given user’s password.

Parameters:
  • user (str) – The ID of the user.

  • password – The new password.

  • resetPassword (boolean) – Whether the user must reset their password when they next log in.

updateCategory(class_id, category, description, display_order)

Updates an existing category record.

The dictionary returned has the following entries:

  • “class_id” : What kind of attributes are categorised - “transcript” or “speaker”.

  • “category” : The name/id of the category.

  • “description” : The description of the category.

  • “display_order” : Where the category appears among other categories..

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • class_id (str) – What kind of attributes are categorised - “transcript” or “speaker”.

  • category (str) – The name/id of the category.

  • description (str) – The description of the category.

  • display_order (number) – Where the category appears among other categories.

Returns:

A copy of the category record

Return type:

dict

updateCorpus(corpus_name, corpus_language, corpus_description)

Updates an existing corpus record.

The dictionary returned has the following entries:

  • “corpus_id” : The database key for the record.

  • “corpus_name” : The name/id of the corpus.

  • “corpus_language” : The ISO 639-1 code for the default language.

  • “corpus_description” : The description of the corpus.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • corpus_name (str) – The name/id of the corpus.

  • corpus_language (str) – The ISO 639-1 code for the default language.

  • corpus_description (str) – The description of the corpus.

Returns:

A copy of the corpus record

Return type:

dict

updateMediaTrack(suffix, description, display_order)

Updates an existing media track record.

The dictionary returned has the following entries:

  • “suffix” : The suffix associated with the media track.

  • “description” : The description of the media track.

  • “display_order” : The position of the track amongst other tracks.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • suffix (str) – The suffix assocaited with the media track.

  • description (str) – The description of the media track.

  • display_order (str) – The position of the track amongst other tracks.

Returns:

A copy of the media track record

Return type:

dict

updateProject(project, description)

Deprecated as ‘projects’ are now categories with classId = ‘layer’ - use updateCategory instead.

Parameters:
  • project (str) – The name/id of the project.

  • description (str) – The description of the project.

Returns:

A copy of the project record

Return type:

dict

updateRole(role_id, description)

Updates an existing role record.

The dictionary returned has the following entries:

  • “role_id” : The name/id of the role.

  • “description” : The description of the role.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • role_id (str) – The name/id of the role.

  • description (str) – The description of the role.

Returns:

A copy of the role record

Return type:

dict

updateRolePermission(role_id, entity, layer, value_pattern)

Updates an existing role permission record.

The dictionary returned has the following entries:

  • “role_id” : The ID of the role this permission applies to.

  • “entity” : The media entity this permission applies to - a string made up of “t” (transcript), “a” (audio), “v” (video), or “i” (image).

  • “layer” : ID of the layer for which the label determines access. This is either a valid transcript attribute layer ID, or “corpus”.

  • “value_pattern”Regular expression for matching against the layerId label. If

    the regular expression matches the label, access is allowed.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • role_id (str) – The ID of the role this permission applies to.

  • entity (str) – The media entity this permission applies to.

  • layer (str) – ID of the layer for which the label determines access.

  • value_pattern (str) – Regular expression for matching against.

Returns:

A copy of the role permission record

Return type:

dict

updateSystemAttribute(attribute, value)

Updates the value of a existing system attribute record.

The dictionary returned has the following entries:

  • “attribute” : ID of the attribute.

  • “type” : The type of the attribute - “string”, “boolean”, “select”, etc.

  • “style” : UI style, which depends on “type”.

  • “label” : User-facing label for the attribute.

  • “description” : User-facing (long) description for the attribute.

  • “options” : If ‘type” == “select”, this is a dict defining possible values.

  • “value” : The value of the attribute.

Parameters:
  • attribut – ID of the attribute.

  • value (str) – The new value for the attribute.

Returns:

A copy of the systemAttribute record

Return type:

dict

updateUser(user, email, resetPassword, roles)

Updates an existing user record.

The dictionary returned has the following entries:

  • “user” : The id of the user.

  • “email” : The email address of the user.

  • “resetPassword” : Whether the user must reset their password when they next log in.

  • “roles” : Roles or groups the user belongs to.

  • “_cantDelete” : This is not a database field, but rather is present in records returned from the server that can not currently be deleted; a string representing the reason the record can’t be deleted.

Parameters:
  • user (str) – The ID of the user.

  • email (str) – The email address of the user.

  • resetPassword (boolean) – Whether the user must reset their password when they next log in.

  • roles (list of str) – Roles or groups the user belongs to.

Returns:

A copy of the user record

Return type:

dict

The LabbcatAdmin class also inherits the LabbcatEdit class.

Query Language Generation Functions

labbcat.expressionFromAttributeValue(attribute, values, negate=False)

Generates a query expression for matching a transcript/participant attribute.

This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list ofpossible values for a given transcript/participant attribute.

The attribute defined by ‘attribute’ is expected to have exactly one value. If it may have multiple values, use expressionFromAttributeValues() instead.

Parameters:
  • attribute (str) – The transcript/participant attribute to filter by.

  • values (list or str) – A list of possible values for attribute, or a single value.

  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).

Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromAttributeValues(attribute, values, negate=False)

Generates a query expression for matching a transcript/participant attribute.

This function generates a query expression fragment which can be passed as the expression parameter of getMatchingTranscriptIds or getMatchingParticipantIds etc. using a list of possible values for a given transcript/participant attribute.

The attribute defined by ‘attribute’ is expected to have possibly more than one value. If it can only have one value, use expressionFromAttributeValue() instead.

Parameters:
  • attribute (str) – The transcript/participant attribute to filter by.

  • values (list or str) – A list of possible values for attribute, or a single value.

  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).

Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromIds(ids, negate=False)

Generates a query expression for matching transcripts or participants by ID.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes etc. using a list of IDs.

Parameters:
  • ids (list or str) – A list of IDs, or a single value.

  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).

Returns:

A query expression which can be passed as the expression parameter of countMatchingParticipantIds() getMatchingParticipantIds() countMatchingTranscriptIds() getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromTranscriptTypes(transcriptTypes, negate=False)

Generates a transcript query expression for matching transcripts by type.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.

Parameters:
  • transcriptTypes (list or str) – A list of transcript types, or a single transcript type.

  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).

Returns:

A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes()

Return type:

str

labbcat.expressionFromCorpora(corpora, negate=False)

Generates a transcript query expression for matching transcripts/participants by corpus.

This function generates a query expression fragment which can be passed as the expression parameter of getTranscriptAttributes or getMatchingTranscriptIds etc. using a list of transcript types.

Parameters:
  • corpora (list or str) – A list of corpus names, or a single corpus name.

  • negate (boolean) – Whether to match the given values (False), or everything except the given values (True).

Returns:

A query expression which can be passed as the expression parameter of getMatchingTranscriptIds() or getTranscriptAttributes() etc.

Return type:

str

Praat Script Fragment Generation Functions

labbcat.praatScriptFormants(formants=[1, 2], samplePoints=[0.5], timeStep=0.0, maxNumberFormants=5, maxFormant=5500, maxFormantMale=5000, genderAttribute='participant_gender', valueForMale='M', windowLength=0.025, preemphasisFrom=50)

Generates a script for extracting formants, for use with processWithPraat()

This function generates a Praat script fragment which can be passed as the praat.script parameter of [processWithPraat], in order to extract selected formants.

Parameters:
  • formants (list of int) – A list of integers specifying which formants to extract, e.g [1,2] for the first and second formant.

  • samplePoints (list of float) – A list of numbers (0 <= samplePoints <= 1) specifying multiple points at which to take the measurement. The default is a single point at 0.5 - this means one measurement will be taken halfway through the target interval. If, for example, you wanted eleven measurements evenly spaced throughout the interval, you would specify samplePoints as being [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0].

  • timeStep (float) – Time step in seconds, or 0.0 for ‘auto’.

  • maxNumberFormants (int) – Maximum number of formants.

  • maxFormant (int) – Maximum formant value (Hz) for all speakers, or for female speakers, if maxFormantMale is also specified.

  • maxFormantMale (int) – Maximum formant value (Hz) for male speakers, or NULL to use the same value as max.formant.

  • genderAttribute (str) – Name of the LaBB-CAT participant attribute that contains the participant’s gender - normally this is “participant_gender”.

  • valueForMale (str) – The value that the genderAttribute has when the participant is male.

  • windowLength (float) – Window length in seconds.

  • preemphasisFrom (int) – Pre-emphasis from (Hz)

Returns:

A script fragment which can be passed as the praatScript parameter of processWithPraat()

Return type:

str

labbcat.praatScriptFastTrack(formants=[1, 2], samplePoints=[0.5], lowestAnalysisFrequency=5000, lowestAnalysisFrequencyMale=4500, highestAnalysisFrequency=7000, highestAnalysisFrequencyMale=6500, genderAttribute='participant_gender', valueForMale='M', timeStep=0.002, trackingMethod='burg', numberOfFormants=3, maximumF1Frequency=1200, maximumF1Bandwidth=None, maximumF2Bandwidth=None, maximumF3Bandwidth=None, minimumF4Frequency=2900, enableRhoticHeuristic=True, enableF3F4ProximityHeuristic=True, numberOfSteps=20, numberOfCoefficients=5)

Generates a script for extracting formants using FastTrack, for use with processWithPraat()

This function generates a Praat script fragment which can be passed as the praat.script parameter of processWithPraat(), in order to extract selected formants using the FastTrack Praat plugin.

The FastTrack Praat plugin, developed by Santiago Barreda, automatically runs multiple formant analyses on each segment, selects the best (the smoothest, with optional heuristics), and makes the winning formant object available for measurement. For more information, see FastTrack

Parameters:
  • formants (list of int) – A list of integers specifying which formants to extract, e.g [1,2] for the first and second formant.

  • samplePoints (list of float) – A vector of numbers (0 <= samplePoints <= 1) specifying multiple points at which to take the measurement. The default is a single point at 0.5 - this means one measurement will be taken halfway through the target interval. If, for example, you wanted eleven measurements evenly spaced throughout the interval, you would specify sample.points as being [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0].

  • lowestAnalysisFrequency (int) – Lowest analysis frequency (Hz) by default.

  • lowestAnalysisFrequencyMale (int) – Lowest analysis frequency (Hz) for male speakers, or None to use the same value as lowestAnalysisFrequency.

  • highestAnalysisFrequency (int) – Highest analysis frequency (Hz) by default.

  • highestAnalysisFrequencyMale (int) – Highest analysis frequency (Hz) for male speakers, or None to use the same value as highestAnalysisFrequency.

  • genderAttribute (str) – Name of the LaBB-CAT participant attribute that contains the participant’s gender - normally this is “participant_gender”.

  • valueForMale (str) – The value that the genderAttribute has when the participant is male.

  • timeStep (float) – Time step in seconds

  • trackingMethod (str) – tracking_method parameter for trackAutoselectProcedure; “burg” or “robust”.

  • numberOfFormants (int) – Number of formants to track - 3 or 4.

  • maximumF1Frequency (int) – Specifying a value enables the F1 frequency heuristic: Median F1 frequency should not be higher than this value.

  • maximumF1Bandwidth (int) – Specifying a value (e.g. 500) enables the F1 bandwidth heuristic: Median F1 bandwidth should not be higher than this value.

  • maximumF2Bandwidth (int) – Specifying a value (e.g. 600) enables the F2 bandwidth heuristic: Median F2 bandwidth should not be higher than this value.

  • maximumF3Bandwidth (int) – Specifying a value (e.g. 900) enables the F3 bandwidth heuristic: Median F3 bandwidth should not be higher than this value.

  • minimumF4Frequency (int) – Specifying a value enables the F4 frequency heuristic: Median F4 frequency should not be lower than this value.

  • heuristic (enable:.rhotic.heuristic Whether to enable the rhotic) – If F3 < 2000 Hz, F1 and F2 should be at least 500 Hz apart.

  • enableF3F4ProximityHeuristic (boolean) – Whether to enable the F3/F4 proximity heuristic: If (F4 - F3) < 500 Hz, F1 and F2 should be at least 1500 Hz apart.

  • numberOfSteps (int) – Number of analyses between low and high analysis limits. More analysis steps may improve results, but will increase analysis time (50 percent more steps = around 50 percent longer to analyze).

  • numberOfCoefficients – Number of coefficients for formant prediction. More coefficients allow for more sudden, and ‘wiggly’ formant motion.

  • numberOfCoefficients – int

Returns:

A script fragment which can be passed as the praatScript parameter of processWithPraat()

Return type:

str

labbcat.praatScriptCentreOfGravity(powers=[2], spectrumFast=True)

Generates a script for extracting the CoG, for use with processWithPraat()

This function generates a Praat script fragment which can be passed as the praat.script parameter of [processWithPraat], in order to extract one or more spectral centre of gravity (CoG) measurements.

Parameters:
  • powers (list of float) – A list of numbers specifying which powers to query for to extract, e.g. [1,2].

  • spectrumFast (boolean) – Whether to use the ‘fast’ option when creating the spectrum object to query.

Returns:

A script fragment which can be passed as the praatScript parameter of processWithPraat()

Return type:

str

labbcat.praatScriptIntensity(minimumPitch=100.0, timeStep=0.0, subtractMean=True, getMaximum=True, samplePoints=None, interpolation='cubic', skipErrors=True)

Generates a script for extracting maximum intensity, for use with processWithPraat()

This function generates a Praat script fragment which can be passed as the praatScript parameter of processWithPraat(), in order to extract one or more maximum intensity values.

Parameters:
  • minimumPitch (float) – Minimum pitch (Hz).

  • timeStep (float) – Time step in seconds, or 0.0 for ‘auto’.

  • subtractMean (boolean) – Whether to subtract the mean or not.

  • getMaximum (boolean) – Extract the maximum intensity for the sample.

  • samplePoints (list of float) – A list of numbers (0 <= samplePoints <= 1) specifying multiple points at which to take the measurement. The default is None, meaning no individual measurements will be taken (only the aggregate values identified by getMaximum). A single point at 0.5 means one measurement will be taken halfway through the target interval. If, for example, you wanted eleven measurements evenly spaced throughout the interval, you would specify sample.points as being [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0].

  • interpolation (str) – If samplePoints are specified, this is the interpolation to use when getting individual values. Possible values are ‘nearest’, ‘linear’, ‘cubic’, ‘sinc70’, or ‘sinc700’.

  • skipErrors – Sometimes, for some segments, Praat fails to create an Intensity object. If skipErrors = True, analysis those segments will be skipped, and corresponding pitch values will be returned as “–undefined–”. If skip.errors = False, the error message from Praat will be returned in the Error field, but no pitch measures will be returned for any segments in the same recording.

Returns:

A script fragment which can be passed as the praatScript parameter of processWithPraat()

Return type:

str

labbcat.praatScriptPitch(getMean=True, getMinimum=False, getMaximum=False, timeStep=0.0, pitchFloor=60, maxNumberOfCandidates=15, veryAccurate=False, silenceThreshold=0.03, voicingThreshold=0.5, octaveCost=0.01, octaveJumpCost=0.35, voicedUnvoicedCost=0.35, pitchCeiling=500, pitchFloorMale=30, voicingThresholdMale=0.4, pitchCeilingMale=250, genderAttribute='participant_gender', valueForMale='M', samplePoints=None, interpolation='linear', skipErrors=True)

Generates a script for extracting pitch, for use with processWithPraat()

This function generates a Praat script fragment which can be passed as the praatScript parameter of processWithPraat(), in order to extract pitch information.

Parameters:
  • getMean (boolean) – Whether to extract the mean pitch for the sample

  • getMinimum (boolean) – Whether to extract the minimum pitch for the sample

  • getMaximum (boolean) – Whether to extract the maximum pitch for the sample

  • timeStep (float) – Step setting for praat command

  • pitchFloor (int) – Minimum pitch (Hz) for all speakers, or for female speakers, if pitchFloorMale is also specified

  • maxNumberOfCandidates (int) – Maximum number of candidates setting for praat command

  • veryAccurate (boolean) – Accuracy setting for praat command

  • silenceThreshold (float) – Silence threshold setting for praat command

  • voicingThreshold (int) – Voicing threshold (Hz) for all speakers, or for female speakers, if voicingThresholdMale is also specified

  • octaveCost (float) – Octave cost setting for praat command

  • octaveJumpCost (float) – Octave jump cost setting for praat command

  • voicedUnvoicedCost (float) – Voiced/unvoiced cost setting for praat command

  • pitchCeiling (int) – Maximum pitch (Hz) for all speakers, or for female speakers, if pitchFloorMale is also specified

  • pitchFloorMale (int) – Minimum pitch (Hz) for male speakers

  • voicingThresholdMale (int) – Voicing threshold (Hz) for male speakers

  • pitchCeilingMale (int) – Maximum pitch (Hz) for male speakers

  • genderAttribute (str) – Name of the LaBB-CAT participant attribute that contains the participant’s gender - normally this is “participant_gender”

  • valueForMale (str) – The value that the genderAttribute has when the participant is male

  • samplePoints – A list of numbers (0 <= samplePoints <= 1) specifying multiple points at which to take the measurement The default is None, meaning no individual measurements will be taken (only the aggregate values identified by getMean, getMinimum, and getMaximum). A single point at 0.5 means one measurement will be taken halfway through the target interval. If, for example, you wanted eleven measurements evenly spaced throughout the interval, you would specify sample.points as being [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0].

  • interpolation (str) – If sample.points are specified, this is the interpolation to use when getting individual values. Possible values are ‘nearest’ or ‘linear’.

  • skipErrors (boolean) – Sometimes, for some segments, Praat fails to create a Pitch object. If skipErrors = True, analysis those segments will be skipped, and corresponding pitch values will be returned as “–undefined–”. If skip.errors = FALSE, the error message from Praat will be returned in the Error field, but no pitch measures will be returned for any segments in the same recording.

Returns:

A script fragment which can be passed as the praatScript parameter of processWithPraat()

Return type:

str

ResponseException class

class labbcat.ResponseException(response)

Any method that creates a server request can raise this exception if an error occurs.

This has one attribute, response, which is a Response object representing the full response from the server, from which error messages etc. can be obtained.

class labbcat.Response(resp, verbose=False)

Standard LaBB-CAT response object.

Attributes:

  • model - The model or result returned if any.

  • httpStatus - The HTTP status code, or -1 if not known.

  • title - The title reqturned by the server.

  • version - The server version.

  • code - The numeric request code (0 or 1 means no error).

  • errors - Errors returned.

  • messages - Messages returned.

  • text - The full plain text of the HTTP response.

checkForErrors()

Convenience method for checking whether the response any errors.

If so, a corresponding ResponseException will be thrown.