Converters
These are standalone programs that convert transcripts from one tool format to another, e.g.
- trs - Transcriber transcripts
- eaf - ELAN files
- vtt - web subtitles (Web VTT)
- slt - SALT transcripts
- cha - CLAN CHAT transcripts
- textgrid - Praat TextGrids
- pdf - PDF files
- tex - LaTeX files
- txt - plain text files
- kaldi - input files for the Kaldi automatic speech recognition training system
to↓ from→ | trs | eaf | vtt | slt | cha | textgrid | txt |
---|---|---|---|---|---|---|---|
trs | eaf-to-trs | vtt-to-trs | slt-to-trs | cha-to-trs | textgrid-to-trs | ||
eaf | trs-to-eaf | vtt-to-eaf | slt-to-eaf | cha-to-eaf | textgrid-to-eaf | txt-to-eaf | |
vtt | trs-to-vtt | eaf-to-vtt | slt-to-vtt | cha-to-vtt | textgrid-to-vtt | ||
slt | trs-to-slt | eaf-to-slt | |||||
cha | trs-to-cha | eaf-to-cha | vtt-to-cha | ||||
textgrid | trs-to-textgrid | eaf-to-textgrid | vtt-to-textgrid | slt-to-textgrid | cha-to-textgrid | ||
txt | trs-to-txt | ||||||
trs-to-pdf | eaf-to-pdf | vtt-to-pdf | slt-to-pdf | cha-to-pdf | textgrid-to-pdf | ||
tex | trs-to-tex | eaf-to-tex | vtt-to-tex | slt-to-tex | textgrid-to-tex | ||
kaldi | trs-to-kaldi | eaf-to-kaldi | textgrid-to-kaldi |
To use a particular converter, you need to have Java installed on your system. Download the file, and double-click it to run.
If double-clicking doesn't work, you can run the converter from the command line, by entering:
java -jar vtt-to-textgrid.jar
By default converters display a window on to which you can drag and drop files for converting. However, they can also be run in ‘batch mode’, which allows you to automatically convert a list of files from the command line - e.g.
java -jar trs-to-textgrid.jar --batchmode *.trs
Some conversions have configurable output, e.g.
java -jar trs-to-txt.jar *.trs
…will include annotations and participant names in the output text files, but:
java -jar trs-to-txt.jar --textonly *.trs
…produces text files that exclude all annotations and participant names.
The --usage
command-line switch prints information about command-line options.
As many formats do not support the meta-data, annotation granularity or ontology of other formats, many of these conversions necessarily entail loss of data. However, mappings are made from one format to another wherever possible.
For notes about specific correspondences or data losses, use the --help
command-line
switch, or use the Help|Information menu option of the conversion utility concerned.
These use annotator serializers/deserializers to read a file in one format, convert it to an annotation graph, and then write that graph out as a file in another format. As pointed out by Cochran et al. (2007 - Report from TILR Working Group 1 : Tools interoperability and input/output formats) this saves having order n2 explicit conversion algorithms between formats; only 2n format conversions are required (as some of these formats above are output-only, it's actually less than 2n).
This exemplifies an approach to linguistic data interoperability called the interlingua philosophy on interoperability by Witt et al. (2009) and uses annotation graphs as an ‘interlingua’ similar to work by Schmidt et al. (2008), except that rather using a third file format as a persistent intermediary, the annotation graph models of the linguistic data are ephemeral, existing in memory only for the duration of the conversion.