Package nzilbb.encoding
Class DISC2ARPAbet
- java.lang.Object
-
- nzilbb.encoding.PhonemeTranslator
-
- nzilbb.encoding.DISC2ARPAbet
-
- All Implemented Interfaces:
Function<String,String>
,UnaryOperator<String>
public class DISC2ARPAbet extends PhonemeTranslator
Translates CELEX-DISC-encoded transcriptions like tr{nskrIpSVn to ARPAbet-encoded phonemic transcriptions like T R AE N S K R IH P SH AX N.There are differences between the
CMU2DISC
translation and this one:ARPAbet includes phonemes not in the CMU set:
- 'L'
- "DX" - flap - this is an extension to DISC
- '^'
- "NX" - nasal flap - doesn't exist in DISC, we make it /n/
- '?'
- "TQ" - glottal stop - this is an extension to DISC
Also
CMU2DISC
strictly uses only phonemes in the CMU dictionary set, where the "ARPAbet" translation may also contain phonemes corresponding to those that exist in DISC but not in ARPAbet:- 'F'
- "EM" - e.g. idealism
- 'H'
- "EN" - e.g. burden
- 'P'
- "EL" - e.g. dangle
- 'C'
- "UN" - e.g. bacon
- '0'
- "VN" - e.g. lingerie
- '~'
- "ON" - e.g. bouillon
- 'c'
- "IM" - e.g. timbre
- 'q'
- "IN" - e.g. detente
... and any other phones encountered that are in neither set are passed through unchanged.
Mapping DISC ARPAbet Example Vowels # → AA START odd/father { → AE TRAP at/fast V → AH STRUT hut/but $ → AO THOUGHT ought/fall - two-to-one Q → AO LOT ought/off - two-to-one 6 → AW MOUTH cow/how @ → AX schwa discuss 2 → AY PRICE hide/my E → EH DRESS Ed/red 3 → ER NURSE hurt/her 1 → EY FACE ate/say I → IH KIT it/big i → IY FLEECE eat/bee 5 → OW GOAT oat/show 4 → OY CHOICE toy/boy U → UH FOOT hood/should u → UW GOOSE two/you Consonants b → B J → CH d → D D → DH f → F g → G h → HH _ → JH k → K l → L m → M n → N N → NG p → P r → R R → R Possible linking R is pretty definitely R s → S S → SH t → T T → TH v → V w → W j → Y z → Z Z → ZH Not in the CMU set but exist in Buckeye corpus L → DX flap this is an extension to DISC ^ → NX nasal flap doesn't exist in DISC, we make it /n/ ? → TQ glottal stop this is an extension to DISC Not in CMU set but exist in DISC 7 → IY R NEAR 8 → EH R SQUARE 9 → UH R CURE F → EM idealism H → EN burden P → EL dangle C → UN bacon 0 → VN lingerie ~ → ON bouillon c → IM timbre q → IN detente - Author:
- Robert Fromont robert@fromont.net.nz
- See Also:
ARPAbet2DISC
,CMU2DISC
-
-
Constructor Summary
Constructors Constructor Description DISC2ARPAbet()
Default constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
apply(String source)
Translates a phonemic transcription from the source encoding to the destination encoding.-
Methods inherited from class nzilbb.encoding.PhonemeTranslator
getDestinationEncoding, getSourceEncoding
-
-