Class DISC2CMU

  • All Implemented Interfaces:
    Function<String,​String>, UnaryOperator<String>

    public class DISC2CMU
    extends PhonemeTranslator
    Translates CELEX-DISC-encoded transcriptions like tr{nskrIpSVn to CMU-encoded phonemic transcriptions like T R AE N S K R IH P SH AH N.

    The CMU encoding is assumed to use only the phonemes used by the CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

    Thanks to Stefanie Jannedy for this mapping.

    There are differences between the ARPAbet2DISC translation and this one, primarily that this translation is strict; phonemes that are not explicitly present in the phone set are dropped, where ARPAbet2DISC includes extra phonemes, includes some extensions to ARPAbet and DISC, and passes through unknown phonemes unchanged.

    Mapping
    SourceDestinationExample
    Vowels
    # AA START odd/father
    { AE TRAP at/fast
    V AH STRUT hut/but
    $ AO THOUGHT ought/fall - two-to-one
    Q AO LOT ought/off - two-to-one
    6 AW MOUTH cow/how
    @ IH schwa discuss doesn't exist in CMU
    2 AY PRICE hide/my
    E EH DRESS Ed/red
    3 ER NURSE hurt/her
    1 EY FACE ate/say
    I IH KIT it/big
    i IY FLEECE eat/bee
    5 OW GOAT oat/show
    4 OY CHOICE toy/boy
    U UH FOOT hood/should
    u UW GOOSE two/you
    Consonants
    b B
    J CH
    d D
    D DH
    f F
    g G
    h HH
    _ JH
    k K
    l L
    m M
    n N
    N NG
    p P
    r R
    R R Possible linking R is pretty definitely R
    s S
    S SH
    t T
    T TH
    v V
    w W
    j Y
    z Z
    Z ZH
    Not in the CMU set but exist in Buckeye corpus
    L D flap this is an extension to DISC
    ? K glottal stop this is an extension to DISC
    Not in CMU set but exist in DISC
    7 IY R NEAR
    8 EH R SQUARE
    9 UH R CURE
    F IH M idealism
    H IH N burden
    P IH L dangle
    C IH NG bacon
    0 AO N lingerie
    ~ AO N bouillon
    c AO M timbre
    q AO N detente
    Author:
    Robert Fromont robert@fromont.net.nz
    See Also:
    CMU2DISC, DISC2ARPAbet
    • Constructor Detail

      • DISC2CMU

        public DISC2CMU()
        Default constructor.
    • Method Detail

      • getDefaultStress

        public String getDefaultStress()
        Getter for defaultStress: Default stress value to append to vowels.
        Returns:
        Default stress value to append to vowels.
      • setDefaultStress

        public DISC2CMU setDefaultStress​(String newDefaultStress)
        Setter for defaultStress: Default stress value to append to vowels.
        Parameters:
        newDefaultStress - Default stress value to append to vowels.
      • apply

        public String apply​(String source)
        Translates a phonemic transcription from the source encoding to the destination encoding.
        Specified by:
        apply in interface Function<String,​String>
        Overrides:
        apply in class PhonemeTranslator
        Parameters:
        source - Phonemic transcription in the source encoding.
        Returns:
        Phonemic transcription in the destination encoding.