Forced Alignment
Forced alignment is the automatic processing of recordings of utterance and their orthographic transcripts in order order to determing the start and end times of the individual words, and the phones within the words.
There are several ways that forced alignment can be achieved in LaBB-CAT:
- WebMAUS with BAS Web Services
- HTK using the Penn Aligner (P2FA) pre-trained acoustic models
- HTK by training your own acoustic models for alignment (‘train-and-align’)
- MFA using pre-trained acoustic models
- MFA by trining your own acoustic models for alignment (‘train-and-align’)
Comparing Forced Alignment Methods
There are several tools and methods listed above for force-aligning your recordings, and each works well or badly depending on different factors. It can be difficult to know which method to use.
Alignment Accuracy
Being an unsupervised automatic process, the alignments are not always optimal. Various factors can degrade the quality of alignments:
- Not enough data (if you’re using the ‘train-and-align’ approach)
- Poor quality recording, background noises, etc.
- Simultaneous speech (ignored by default)
- Inaccurate transcripts
- Inaccurate utterance alignment
- Lack of pause marking in the transcripts
- Mismatched phonology between dictionary and speech. e.g. using a rhotic dictionary to align non-rhotic speech
Because of this, you should manually inspect and possibly correct at least some of your data.
Sometimes the above factors can cause alignment failure for some utterances; i.e. the utterance has no phone annotations created, the words are not aligned.
You can use LaBB-CAT’s search/export functionality to identify utterances that were not aligned.
Checking/Correcting Alignments
There are two ways you can check/correct alignments:
- LaBB-CAT integrates with Praat
- LaBB-CAT integrates with the EMU-webApp
After Alignment
Once your data has been force-aligned, you will have start/end times for phones within words, which opens many possibilities for analysis and further annotation, for example.
- Batch processing of targeted tokens with Praat
- Reconstruction of syllables