nzilbb.formatter.whisper (1.3.0)
Parser for transcriptions output by the Whisper ASR system.
The parser can parse either the stdout output of the whisper command,
or the .json formatted file that whisper produces.
If the .json file is specified, the formatter can include inter-word pause markers. i.e. if the pause between one word and the next is longer than one of three thresholds, the space between the words is labelled as either a short, medium, or long pause.
Configuration
The following parameters can be specified for the formatter:
- languageLayer
- The optional ID of the annotation layer for storing the language of the transcript.
- minShortPauseLength
- The minimum inter-word pause length, in seconds, before a pause
counts as a ‘short pause’. The default value is
0.35s. - minMediumPauseLength
- The minimum inter-word pause length, in seconds, before a pause
counts as a ‘medium pause’. The default value is
0.7s. - minLongPauseLength
- The minimum inter-word pause length, in seconds, before a pause
counts as a ‘long pause’. The default value is
1.4s. - shortPauseLabel
- If an inter-word pause has a duration between minShortPauseLength
and minMediumPauseLength, then the word before the pause will have
this string appended to its label (after a space). The default value
of
(.)corresponds to the CA/CHAT convention for short pauses. - mediumPauseLabel
- If an inter-word pause has a duration between minMediumPauseLength
and minMediumPauseLength, then the word before the pause will have
this string appended to its label (after a space). The default value
of
(..)corresponds to the CA/CHAT convention for medium pauses. - longPauseLabel
- If an inter-word pause has a duration between minLongPauseLength
and minMediumPauseLength, then the word before the pause will have
this string appended to its label (after a space). The default value
of
(...)corresponds to the CA/CHAT convention for short pauses. - maxUtteranceDuration
- Maximum utterance duration to target in seconds. Utterances with a
duration longer than this will be split on longer inter-word pauses
(the longest pause first). The default value is
15.0s. - utterancePadding
- Maximum number of seconds to subtract from the start time and add to the end time of each utterance, to allow for alignment errors of first/last word in each segment.
The pause label definitions (shortPauseLabel, mediumPauseLabel and
longPauseLabel) may specify that the duration of the pause is
included in the label, by including a decimal format definition
enclosed in curly braces. For example, if you want long pauses to use
the CA/CHAT conventions of a duration within parentheses, you can
specify ({0.000}) as the value for longPauseLabel.
nzilbb.formatter.whisper