Class DISC2ARPAbet

  • All Implemented Interfaces:
    Function<String,​String>, UnaryOperator<String>

    public class DISC2ARPAbet
    extends PhonemeTranslator
    Translates CELEX-DISC-encoded transcriptions like tr{nskrIpSVn to ARPAbet-encoded phonemic transcriptions like T R AE N S K R IH P SH AX N.

    There are differences between the CMU2DISC translation and this one:

    ARPAbet includes phonemes not in the CMU set:

    'L'
    "DX" - flap - this is an extension to DISC
    '^'
    "NX" - nasal flap - doesn't exist in DISC, we make it /n/
    '?'
    "TQ" - glottal stop - this is an extension to DISC

    Also CMU2DISC strictly uses only phonemes in the CMU dictionary set, where the "ARPAbet" translation may also contain phonemes corresponding to those that exist in DISC but not in ARPAbet:

    'F'
    "EM" - e.g. idealism
    'H'
    "EN" - e.g. burden
    'P'
    "EL" - e.g. dangle
    'C'
    "UN" - e.g. bacon
    '0'
    "VN" - e.g. lingerie
    '~'
    "ON" - e.g. bouillon
    'c'
    "IM" - e.g. timbre
    'q'
    "IN" - e.g. detente

    ... and any other phones encountered that are in neither set are passed through unchanged.

    Mapping
    DISCARPAbetExample
    Vowels
    # AA START odd/father
    { AE TRAP at/fast
    V AH STRUT hut/but
    $ AO THOUGHT ought/fall - two-to-one
    Q AO LOT ought/off - two-to-one
    6 AW MOUTH cow/how
    @ AX schwa discuss
    2 AY PRICE hide/my
    E EH DRESS Ed/red
    3 ER NURSE hurt/her
    1 EY FACE ate/say
    I IH KIT it/big
    i IY FLEECE eat/bee
    5 OW GOAT oat/show
    4 OY CHOICE toy/boy
    U UH FOOT hood/should
    u UW GOOSE two/you
    Consonants
    b B
    J CH
    d D
    D DH
    f F
    g G
    h HH
    _ JH
    k K
    l L
    m M
    n N
    N NG
    p P
    r R
    R R Possible linking R is pretty definitely R
    s S
    S SH
    t T
    T TH
    v V
    w W
    j Y
    z Z
    Z ZH
    Not in the CMU set but exist in Buckeye corpus
    L DX flap this is an extension to DISC
    ^ NX nasal flap doesn't exist in DISC, we make it /n/
    ? TQ glottal stop this is an extension to DISC
    Not in CMU set but exist in DISC
    7 IY R NEAR
    8 EH R SQUARE
    9 UH R CURE
    F EM idealism
    H EN burden
    P EL dangle
    C UN bacon
    0 VN lingerie
    ~ ON bouillon
    c IM timbre
    q IN detente
    Author:
    Robert Fromont robert@fromont.net.nz
    See Also:
    ARPAbet2DISC, CMU2DISC
    • Constructor Detail

      • DISC2ARPAbet

        public DISC2ARPAbet()
        Default constructor.
    • Method Detail

      • apply

        public String apply​(String source)
        Translates a phonemic transcription from the source encoding to the destination encoding.
        Specified by:
        apply in interface Function<String,​String>
        Overrides:
        apply in class PhonemeTranslator
        Parameters:
        source - Phonemic transcription in the source encoding.
        Returns:
        Phonemic transcription in the destination encoding.