nzilbb.formatter.doccano (0.1.1)

JSONL format for export/import with doccano

Serializer/deserializer for JSONL files compatible with doccano

Doccano supports text-only (i.e. character offset) annotation, but also supports importing arbitrary meta-data which is passed through from import to export, so when the graph to serialize has temporal offsets (i.e. Graph.getOffsetUnits() == “s”) this meta-data is used to retain a mapping of character offsets to seconds.

The structure used is:

each text is a transcript
each line is an utterance
the first utterance of each turn starts with the participant label, formatted: “${participant}:\t”
span/phrase layers are tagged
meta-data:
- transcript - graph ID
- anchors - object keyed on layer ID, each value being array of couples, one element for each annotation, each couple being the start time and end time of the annotation. “anchors” will contain at least the “utterance” offsets, one annotation for each line in the “text”

NB If a graph is serialized with annotations, edited with Doccano, and then exported for deserialization, the annotations included in the serialization will be ignored during deserialization.

This is because serialized annotations have their original anchor offsets included, but we can't guarantee that these annotations haven't been edited in Doccano, and so the saved offsets may be no longer valid.

For this reason, Doccano can currently only be used to add annotations on new layers, not for editing existing layers.