nzilbb.formatter.doccano (0.1.1)
JSONL format for export/import with doccano
Serializer/deserializer for JSONL files compatible with doccano
Doccano supports text-only (i.e. character offset) annotation, but also supports importing arbitrary meta-data which is passed through from import to export, so when the graph to serialize has temporal offsets (i.e. Graph.getOffsetUnits() == “s”) this meta-data is used to retain a mapping of character offsets to seconds.
The structure used is:
- each text is a transcript
- each line is an utterance
- the first utterance of each turn starts with the participant label, formatted: “${participant}:\t”
- span/phrase layers are tagged
- meta-data:
- transcript - graph ID
- anchors - object keyed on layer ID, each value being array of couples, one element for each annotation, each couple being the start time and end time of the annotation. “anchors” will contain at least the “utterance” offsets, one annotation for each line in the “text”
NB If a graph is serialized with annotations, edited with Doccano, and then exported for deserialization, the annotations included in the serialization will be ignored during deserialization.
This is because serialized annotations have their original anchor offsets included, but we can't guarantee that these annotations haven't been edited in Doccano, and so the saved offsets may be no longer valid.
For this reason, Doccano can currently only be used to add annotations on new layers, not for editing existing layers.