Identifying Unaligned Utterances

Under some circumstances, forced alignment can fail to produce alignments for some utterances; i.e. the utterance has no phone annotations created, the words are not aligned, and no htk annotation is created. This can happen because of the following factors:

Not enough data (if you’re using the ‘train-and-align’ approach)
Poor quality recording, background noises, etc.
Simultaneous speech (ignored by default)
Inaccurate transcripts
Inaccurate utterance alignment
Lack of pause marking in the transcripts
Mismatched phonology between dictionary and speech
e.g. using a rhotic dictionary to align non-rhotic speech

You can identify the utterances for which alignment has failed using LaBB-CAT’s search and export functionality:

Click search and select the speaker(s) you aligned.
The search should be “the first word of each utterance that doesn’t have an htk annotation” - i.e.:
- orthography layer: matches .+
- utterance layer: tick the left-hand checkbox that anchors the word to the beginning of the utterance
- htk¹ layer: doesn’t match .+.
When the results are listed, click CSV Export

The resulting file has the start and end time of each utterance in the Line and LineEnd columns. If you want to know the total duration of the unaligned utterances, use Excel or R to calculate the difference between LineEnd and Line to get the line duration, and then sum these durations to get the total, which is in seconds.

Footnotes

htk or whatever the phrase tag layer is in the forced alignment configuration↩︎

Reuse

CC BY-SA 4.0

Other Formats

Identifying Unaligned Utterances

Footnotes

Reuse

Copyright