TextToEaf

Converts time-aligned plain text .txt transcripts to ELAN .eaf files

The plain text transcript must include synchronisation information

  • i.e. time codes - and must end in a timecode, indicating the end time

of the last utterance.

Check the –timestampFormat setting matches your time codes.

This setting uses Java SimpleDateFormat format:

https://docs.oracle.com/javase/8/docs/api/index.html?java/text/SimpleDateFormat.html

Deserializing from “Plain Text Document” text/plain

Command-line configuration parameters for deserialization:

--commentLayer=Layer Commentary
--noiseLayer=Layer Background noises
--lexicalLayer=Layer Lexical tags
--pronounceLayer=Layer Non-standard pronunciation tags
--orthographyLayer=Layer Orthography
--useConventions=Boolean Whether to use text conventions for comment, noise, lexical, and pronounce annotations
--maxParticipantLength=Integer The maximum length of a participant name
--maxHeaderLines=Integer The maximum number of lines in a meta-data header
--participantFormat=String Format for marking a change of turn within the transcript body - e.g. {0}:, where {0} is a place-holder for the participant ID/name
--metaDataFormat=String Format for a meta-data line in the header - e.g. {0}={1}, where {0} is a place-holder for the attribute name or key, and {1} is a place-holder for the attribute value
--timestampFormat=String Format for a time stamp - e.g. HH:mm:ss.SSS

Serializing to “ELAN EAF Transcript” text/x-eaf+xml

Command-line configuration parameters for serialization:

--commentLayer=Layer Commentary
--noiseLayer=Layer Noise annotations
--lexicalLayer=Layer Lexical tags
--pronounceLayer=Layer Manual pronunciation tags
--authorLayer=Layer Name of transcriber
--dateLayer=Layer Document date
--languageLayer=Layer The language of the whole transcript
--phraseLanguageLayer=Layer For tagging individual phrases with a language
--useConventions=Boolean Whether to use text conventions for comment, noise, lexical, and pronounce annotations
--ignoreBlankAnnotations=Boolean Whether to skip annotations with no label, or process them
--minimumTurnPauseLength=Double Minimum amount of time between two turns by the same speaker, with no intervening speaker, for which the inter-turn pause counts as a turn change boundary. If the pause is shorter than this, the turns are merged into one.