Package nzilbb.encoding
Class ARPAbet2DISC
- java.lang.Object
-
- nzilbb.encoding.PhonemeTranslator
-
- nzilbb.encoding.ARPAbet2DISC
-
- All Implemented Interfaces:
Function<String,String>
,UnaryOperator<String>
public class ARPAbet2DISC extends PhonemeTranslator
Translates ARPAbet-encoded phonemic transcriptions like T R AE2 N S K R IH1 P SH AX0 N to CELEX-DISC-encoded transcriptions like tr{nskrIpS@n.There are differences between the
DISC2CMU
translation and this one:ARPAbet includes phonemes not in the CMU set:
- 'L'
- "DX" - flap - this is an extension to DISC
- '^'
- "NX" - nasal flap - doesn't exist in DISC, we make it /n/
- '?'
- "TQ" - glottal stop - this is an extension to DISC
Also
CMU2DISC
strictly uses only phonemes in the CMU dictionary set, where the "ARPAbet" translation may also contain phonemes corresponding to those that exist in DISC but not in ARPAbet:- 'F'
- "EM" - e.g. idealism
- 'H'
- "EN" - e.g. burden
- 'P'
- "EL" - e.g. dangle
- 'C'
- "UN" - e.g. bacon
- '0'
- "VN" - e.g. lingerie
- '~'
- "ON" - e.g. bouillon
- 'c'
- "IM" - e.g. timbre
- 'q'
- "IN" - e.g. detente
... and any other phones encountered that are in neither set are passed through unchanged.
Mapping ARPAbet DISC Example Vowels AA → # START odd/father AE → { TRAP at/fast AH → V STRUT hut/but AO → $ THOUGHT ought/fall - one-to-two - this could also be Q! AO → Q LOT ought/off AW → 6 MOUTH cow/how AX → @ schwa discuss AY → 2 PRICE hide/my EH → E DRESS Ed/red ER → 3 NURSE hurt/her EY → 1 FACE ate/say IH → I KIT it/big IY → i FLEECE eat/bee OW → 5 GOAT oat/show OY → 4 CHOICE toy/boy UH → U FOOT hood/should UW → u GOOSE two/you Syllabics not in CMU set but exist in DISC EM → F idealism EN → H burden EL → P dangle UN → C bacon VN → 0 lingerie ON → ~ bouillon IM → c timbre IN → q detente Consonants B → b CH → J D → d DH → D F → f G → g HH → h JH → _ K → k L → l M → m N → n NG → N P → p R → r S → s SH → S T → t TH → T V → v W → w Y → j Z → z ZH → Z DX → L flap Not in the CMU set but exist in Buckeye corpus. This is an extension to DISC (although it clashes with a German vowel) NX → ^ nasal flap doesn't exist in DISC, we make it /n/ TQ → ? glottal stop this is an extension to DISC - Author:
- Robert Fromont robert@fromont.net.nz
- See Also:
DISC2ARPAbet
-
-
Constructor Summary
Constructors Constructor Description ARPAbet2DISC()
Default constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
apply(String source)
Translates a phonemic transcription from the source encoding to the destination encoding.-
Methods inherited from class nzilbb.encoding.PhonemeTranslator
getDestinationEncoding, getSourceEncoding
-
-