ChaToTextGrid

Converts CLAN CHAT transcripts to Praat TextGrids

Praat doesn't support any meta-data like @Date, @Location, etc. so all CHAT header meta-data is lost when converting to .TextGrid.

This conversion will only work well for CHAT transcripts that are fully aligned; i.e. all lines include time alignment bullets.

The CLAN parser is not exhaustive; it one parses:

Disfluency marking with &+ - e.g. so &+sund Sunday
Non-standard form expansion - e.g. gonna [: going to]
Incomplete word completion - e.g. dinner doin(g) all
Acronym/proper name joining with _ - e.g. no T_V in my room
Retracing - e.g. <some friends and I> [//] uh or and sit [//] sets him
Repetition/stuttered false starts - e.g. the <picnic> [/] picnic or the Saturday [/] in the morning
Errors - e.g. they've <work up a hunger> [* s:r] or they got [* m] to

Deserializing from “CLAN CHAT transcript” text/x-chat

Command-line configuration parameters for deserialization:


`--cUnitLayer=`Layer	Layer for marking c-units
`--tokenLayer=`Layer	Output word tokens come from this layer
`--disfluencyLayer=`Layer	Layer for disfluency annotations
`--nonWordLayer=`Layer	Layer for non-word noises
`--expansionLayer=`Layer	Layer for expansion annotations
`--errorsLayer=`Layer	Layer for error annotations
`--linkageLayer=`Layer	Layer for linkage annotations
`--repetitionsLayer=`Layer	Layer for repetition annotations
`--retracingLayer=`Layer	Layer for retracing annotations
`--pauseLayer=`Layer	Layer for marking unfilled pauses
`--completionLayer=`Layer	Layer for completion annotations
`--morLayer=`Layer	Layer for morphosyntactic tags
`--morPrefixLayer=`Layer	Layer for prefixes in MOR tags
`--morPartOfSpeechLayer=`Layer	Layer for parts of speech in MOR tags
`--morPartOfSpeechSubcategoryLayer=`Layer	Layer for subcategories of parts of speech in MOR tags
`--morStemLayer=`Layer	Layer for stems in MOR tags
`--morFusionalSuffixLayer=`Layer	Layer for fusional suffixes in MOR tags
`--morSuffixLayer=`Layer	Layer for (non-fusional) suffixes in MOR tags
`--morGlossLayer=`Layer	Layer for English glosses in MOR tags
`--gemLayer=`Layer	Layer for gems
`--transcriberLayer=`Layer	Layer for transcriber name
`--languagesLayer=`Layer	Layer for transcriber language
`--dateLayer=`Layer	Layer for date of the interaction
`--locationLayer=`Layer	Layer for location of the interaction
`--recordingQualityLayer=`Layer	Layer for recording quality
`--roomLayoutLayer=`Layer	Layer for room layout
`--tapeLocationLayer=`Layer	Layer for tape and location on the tape covered by the transcription
`--targetParticipantLayer=`Layer	Layer for identifying target participants
`--SESLayer=`Layer	Layer for SES
`--roleLayer=`Layer	Layer for role
`--educationLayer=`Layer	Layer for education
`--sexLayer=`Layer	Layer for sex
`--customLayer=`Layer	Layer for custom
`--corpusLayer=`Layer	Layer for corpus
`--languageLayer=`Layer	Layer for language
`--ageLayer=`Layer	Layer for age
`--groupLayer=`Layer	Layer for group
`--includeTimeCodes=`Boolean	Include utterance sychronization information when exporting transcripts
`--splitMorTagGroups=`Boolean	Split alternative MOR taggings into separate annotations
`--splitMorWordGroups=`Boolean	Split MOR word morphemes (clitics, components of compounds ) into separate annotations. This is only supported when Split MOR Tag Groups is also enabled.

Serializing to “Praat TextGrid” text/praat-textgrid

Command-line configuration parameters for serialization:


`--commentLayer=`Layer	Commentary
`--noiseLayer=`Layer	Noise annotations
`--lexicalLayer=`Layer	Lexical tags
`--pronounceLayer=`Layer	Manual pronunciation tags
`--renameShortNumericSpeakers=`Boolean	Short speaker names like ‘S1’ should be prefixed with the transcript name during import
`--useConventions=`Boolean	Whether to use text conventions for comment, noise, lexical, and pronounce annotations