nzilbb.formatter.whisper (1.3.0)

Parser for transcriptions output by the Whisper ASR system.

The parser can parse either the stdout output of the whisper command, or the .json formatted file that whisper produces.

If the .json file is specified, the formatter can include inter-word pause markers. i.e. if the pause between one word and the next is longer than one of three thresholds, the space between the words is labelled as either a short, medium, or long pause.

Configuration

The following parameters can be specified for the formatter:

languageLayer
The optional ID of the annotation layer for storing the language of the transcript.
minShortPauseLength
The minimum inter-word pause length, in seconds, before a pause counts as a ‘short pause’. The default value is 0.35s.
minMediumPauseLength
The minimum inter-word pause length, in seconds, before a pause counts as a ‘medium pause’. The default value is 0.7s.
minLongPauseLength
The minimum inter-word pause length, in seconds, before a pause counts as a ‘long pause’. The default value is 1.4s.
shortPauseLabel
If an inter-word pause has a duration between minShortPauseLength and minMediumPauseLength, then the word before the pause will have this string appended to its label (after a space). The default value of (.) corresponds to the CA/CHAT convention for short pauses.
mediumPauseLabel
If an inter-word pause has a duration between minMediumPauseLength and minMediumPauseLength, then the word before the pause will have this string appended to its label (after a space). The default value of (..) corresponds to the CA/CHAT convention for medium pauses.
longPauseLabel
If an inter-word pause has a duration between minLongPauseLength and minMediumPauseLength, then the word before the pause will have this string appended to its label (after a space). The default value of (...) corresponds to the CA/CHAT convention for short pauses.
maxUtteranceDuration
Maximum utterance duration to target in seconds. Utterances with a duration longer than this will be split on longer inter-word pauses (the longest pause first). The default value is 15.0s.
utterancePadding
Maximum number of seconds to subtract from the start time and add to the end time of each utterance, to allow for alignment errors of first/last word in each segment.

The pause label definitions (shortPauseLabel, mediumPauseLabel and longPauseLabel) may specify that the duration of the pause is included in the label, by including a decimal format definition enclosed in curly braces. For example, if you want long pauses to use the CA/CHAT conventions of a duration within parentheses, you can specify ({0.000}) as the value for longPauseLabel.