nzilbb.formatter.clan (1.4.2)
Serialization for CHAT files produced by CLAN.
NB the current implementation is not exhaustive; it only covers:
-
Time synchronization codes, including mid-line synchronization.
Overlapping utterances in the same speaker turn are handled as follows:- If overlap is partial, the start of the second utterance is set to the end of the first.
- If overlap is total, the two utterances are chained together with a non-aligned anchor between them.
-
Disfluency marking with &+ - e.g.
so &+sund Sunday -
Non-standard form expansion - e.g.
gonna [: going to] -
Incomplete word completion - e.g.
dinner doin(g) all -
Acronym/proper name joining with _ - e.g.
no T_V in my room -
Retracing - e.g.
<some friends and I> [//] uhorand sit [//] sets him -
Repetition/stuttered false starts - e.g.
the <picnic> [/] picnicorthe Saturday [/] in the morning -
Errors - e.g.
they've <work up a hunger> [* s:r]orthey got [* m] to -
Pauses - untimed, (e.g.
(.),(...)), or timed (e.g.(0.15),(2.),(1:05.15)) -
%morline annotations (or %pos line annotations, if present)
Layers to add to fully capture supported tags
Word layers
(All Type=Text, Alignment=None)
- completion: Incomplete word completion - e.g.
dinner doin(g) all - disfluency: Disfluency marking with
&+- e.g.so &+sund Sunday - expansion: Non-standard form expansion - e.g.
gonna [: going to]
If there's a %mor layer in the transcripts:
(All Type=Text, Alignment=Intervals)
- mor: Complete
%mortag(s) for the word token - there can be multiple tags, e.g. contractions and cliticizations.
These tags are then spit into the following parts: - morFusionalSuffix
- morGloss
- morPOS
- morPOSSubcategory
- morPrefix
- morStem
- morSuffix
If there's also a %gra layer in the transcripts:
- gra (Type=Text, Alignment=Intervals): Complete
%gratag, one for each%morannotation. (These tags mark grammatical relations, between a dependent and a head; the dependent is tagged, but currently is not formally linked to its head.)
Phrase layers
(All Type=Text, Alignment=Intervals)
- cunit: The grammatical unit for each utterance, labelled with utterance terminator
- error: Errors - e.g.
they've <work up a hunger> [* s:r]orthey got [* m] to - linkage: Multiple words in a name joined by
_e.g.Winnie_ther_Pooh - pause: Pauses - untimed, (e.g.
(.),(...)), or timed (e.g.(0.15),(2.),(1:05.15)) - repetition: Repetition/stuttered false starts - e.g.
the <picnic> [/] picnicorthe Saturday [/] in the morning - retrace: Retracing - e.g.
<some friends and I> [//] uhorand sit [//] sets him
Span layers
- gem (Type=Text, Alignment=Intervals): Parts of the transcript marked for separate analysis.
Participant Attributes
- language: The language the participant speaks.
- corpus: A one-word label for the corpus in lowercase.
- age: The age of the speaker, using the form years;months.days
as in
2;11.17for 2 years, 11 months, and 17 days. - sex:
gendercan be used as the attribute name. - group: Any single word label.
- SES: Socio-economic status
e.g.WCfor working class,UCfor upper class,MCfor middle class,LIfor limited income) - role: The speaker's (standardized) role e.g.
Target_Child,Target_Adult,Child,Mother,Father,Participant,Investigator,Adult,Friend,Unidentified, etc. - education: Educational level of the speaker e.g.
Elem,HS,UG,Grad,Doc - custom: Any additional information needed for a given project.
Transcript Attributes
- scribe: The person who transcribed the transcript.
- language: The language(s) of the speech in the recording.
- recordingdate: The date of the recording.
- location: Location of the recording.
- recordingquality: Recorging quality.
- roomlayout: Room layout.
nzilbb.formatter.clan