nzilbb-labbcat Documentation

This is client library for communicating with LaBB-CAT web application servers.

What is LaBB-CAT?

LaBB-CAT is a web-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations.

Annotations of various types can be automatically generated or manually added.

LaBB-CAT servers are usually password-protected linguistic corpora, and can be accessed manually via a web browser, or programmatically using a client library like this one.

What is this library?

The library copies from nzilbb.ag.IGraphStoreQuery and related Java interfaces, for standardized API calls.

nzilbb-labbcat is available in the Python Package Index here.

Detailed Python documentation is available here.

Example

The following example shows how to:

upload a transcript to LaBB-CAT,
wait for the automatic annotation tasks to finish,
extract the annotation labels, and
delete the transcript from LaBB-CAT.

import labbcat
# Connect to the LaBB-CAT annotation store
corpus = labbcat.LabbcatEdit("http://localhost:8080/labbcat", "labbcat", "labbcat")

# List the corpora on the server
corpora = corpus.getCorpusIds()

# List the transcript types
transcript_type_layer = corpus.getLayer("transcript_type")
transcript_types = transcript_type_layer["validLabels"]

# Upload a transcript
corpus_id = corpora[0]
transcript_type = next(iter(transcript_types))
taskId = corpus.newTranscript(
    "test/labbcat-py.test.txt", None, None, transcript_type, corpus_id, "test")

# wait for the annotation generation to finish
corpus.waitForTask(taskId)
corpus.releaseTask(taskId)

# get the "POS" layer annotations
annotations = corpus.getAnnotations("labbcat-py.test.txt", "pos")
labels = list(map(lambda annotation: annotation["label"], annotations))

# find all /a/ segments (phones) in the whole corpus
results = corpus.getMatches({ "segment" : "a" })

# get the start/end times of the segments
segments = corpus.getMatchAnnotations(results, "segment", offsetThreshold=50)

# get F1/F2 at the midpoint of each /a/ vowel
formantsAtMidpoint = corpus.processWithPraat(
  labbcat.praatScriptFormants(), 0.025, results, segments)

# delete tha transcript from the corpus
corpus.deleteTranscript("labbcat-py.test.txt")