Class DISC2HTK

  • All Implemented Interfaces:
    Function<String,​String>, UnaryOperator<String>

    public class DISC2HTK
    extends PhonemeTranslator
    Translates CELEX-DISC-encoded transcriptions like tr{nskrIpS@n to a form that would word for a Hidden Markov Model Toolkit (HTK) .dict file, like: t r _{ n s k r I p S _@ n.

    The phonemes are space-delimited, and labels that don't start with an alphabetic character are prefixed with '_' to avoid HTK processing errors.

    This translator will also handle input that is IPA-encoded, e.g. ˈmʌt.n̩.ˌt͡ʃɔps to a form that would word for a Hidden Markov Model Toolkit (HTK) .dict file, like: m _ʌ t n̩ t͡ʃ _ɔ p s.

    Mapping
    SourceDestinationExample
    Author:
    Robert Fromont robert@fromont.net.nz
    See Also:
    HTK2DISC
    • Constructor Detail

      • DISC2HTK

        public DISC2HTK()
        Default constructor.
    • Method Detail

      • apply

        public String apply​(String source)
        Translates a phonemic transcription from the source encoding to the destination encoding.
        Specified by:
        apply in interface Function<String,​String>
        Overrides:
        apply in class PhonemeTranslator
        Parameters:
        source - Phonemic transcription in the source encoding.
        Returns:
        Phonemic transcription in the destination encoding.
      • IsIPADiacritic

        public static boolean IsIPADiacritic​(char c)
        Determines whether the given character is a known diacritic in IPA.

        Diacritics include:

        ̩
        syllabic
        ̍
        syllabic
        ̯
        non-syllabic
        ̑
        non-syllabic
        ʰ
        aspirated
        nasal release
        ̚
        no audible release
        ˡ
        lateral release
        ᶿ
        Voiceless dental fricative release
        ˣ
        Voiceless velar fricative release
        Mid central vowel release
        ̥
        voiceless
        ̊
        voiceless
        ̤
        breathy voiced
        ̬
        voiced
        ̰
        creaky-voiced
        ̪
        dental
        ̼
        linguolanbial
        ̻
        laminal
        ̺
        apical
        ̟
        advanced
        ˖
        advanced
        ̠
        retracted
        ˗
        retracted
        ̽
        mid-centralized
        ̝
        raised
        ˔
        raised
        ̞
        lowered
        ˕
        lowered
        ̹
        more rounded
        ̜
        less rounded
        ʷ
        labialized
        ʲ
        palatalized
        labio palatalized
        Labialized without protrusion of the lips or velarization
        ˠ
        velarized
        ˤ
        Pharyngealized
        ̘
        advanced tongue root
        ̙
        retracted tongue root
        ̃
        nasalized
        ː
        long
        ˑ
        half long
        ̆
        extra short
        Parameters:
        c - The character.
        Returns:
        true if c is a diacritic, false otherwise
      • IsIPASuprasegmental

        public static boolean IsIPASuprasegmental​(char c)
        Determines whether the given character is a known suprasegmental in IPA.

        Suprasegmentals include:

        ˈ
        primary stress
        ˌ
        secondary
        linking
        .
        syllable boundary
        Parameters:
        c - The character.
        Returns:
        true if c is a diacritic, false otherwise