Class ARPAbet2DISC

  • All Implemented Interfaces:
    Function<String,​String>, UnaryOperator<String>

    public class ARPAbet2DISC
    extends PhonemeTranslator
    Translates ARPAbet-encoded phonemic transcriptions like T R AE2 N S K R IH1 P SH AX0 N to CELEX-DISC-encoded transcriptions like tr{nskrIpS@n.

    There are differences between the DISC2CMU translation and this one:

    ARPAbet includes phonemes not in the CMU set:

    'L'
    "DX" - flap - this is an extension to DISC
    '^'
    "NX" - nasal flap - doesn't exist in DISC, we make it /n/
    '?'
    "TQ" - glottal stop - this is an extension to DISC

    Also CMU2DISC strictly uses only phonemes in the CMU dictionary set, where the "ARPAbet" translation may also contain phonemes corresponding to those that exist in DISC but not in ARPAbet:

    'F'
    "EM" - e.g. idealism
    'H'
    "EN" - e.g. burden
    'P'
    "EL" - e.g. dangle
    'C'
    "UN" - e.g. bacon
    '0'
    "VN" - e.g. lingerie
    '~'
    "ON" - e.g. bouillon
    'c'
    "IM" - e.g. timbre
    'q'
    "IN" - e.g. detente

    ... and any other phones encountered that are in neither set are passed through unchanged.

    Mapping
    ARPAbetDISCExample
    Vowels
    AA # START odd/father
    AE { TRAP at/fast
    AH V STRUT hut/but
    AO $ THOUGHT ought/fall - one-to-two - this could also be Q!
    AO Q LOT ought/off
    AW 6 MOUTH cow/how
    AX @ schwa discuss
    AY 2 PRICE hide/my
    EH E DRESS Ed/red
    ER 3 NURSE hurt/her
    EY 1 FACE ate/say
    IH I KIT it/big
    IY i FLEECE eat/bee
    OW 5 GOAT oat/show
    OY 4 CHOICE toy/boy
    UH U FOOT hood/should
    UW u GOOSE two/you
    Syllabics not in CMU set but exist in DISC
    EM F idealism
    EN H burden
    EL P dangle
    UN C bacon
    VN 0 lingerie
    ON ~ bouillon
    IM c timbre
    IN q detente
    Consonants
    B b
    CH J
    D d
    DH D
    F f
    G g
    HH h
    JH _
    K k
    L l
    M m
    N n
    NG N
    P p
    R r
    S s
    SH S
    T t
    TH T
    V v
    W w
    Y j
    Z z
    ZH Z
    DX L flap Not in the CMU set but exist in Buckeye corpus. This is an extension to DISC (although it clashes with a German vowel)
    NX ^ nasal flap doesn't exist in DISC, we make it /n/
    TQ ? glottal stop this is an extension to DISC
    Author:
    Robert Fromont robert@fromont.net.nz
    See Also:
    DISC2ARPAbet
    • Constructor Detail

      • ARPAbet2DISC

        public ARPAbet2DISC()
        Default constructor.
    • Method Detail

      • apply

        public String apply​(String source)
        Translates a phonemic transcription from the source encoding to the destination encoding.
        Specified by:
        apply in interface Function<String,​String>
        Overrides:
        apply in class PhonemeTranslator
        Parameters:
        source - Phonemic transcription in the source encoding.
        Returns:
        Phonemic transcription in the destination encoding.