Package nzilbb.encoding
Class DISC2HTK
- java.lang.Object
-
- nzilbb.encoding.PhonemeTranslator
-
- nzilbb.encoding.DISC2HTK
-
- All Implemented Interfaces:
Function<String,String>
,UnaryOperator<String>
public class DISC2HTK extends PhonemeTranslator
Translates CELEX-DISC-encoded transcriptions like tr{nskrIpS@n to a form that would word for a Hidden Markov Model Toolkit (HTK) .dict file, like: t r _{ n s k r I p S _@ n.The phonemes are space-delimited, and labels that don't start with an alphabetic character are prefixed with '_' to avoid HTK processing errors.
This translator will also handle input that is IPA-encoded, e.g. ˈmʌt.n̩.ˌt͡ʃɔps to a form that would word for a Hidden Markov Model Toolkit (HTK) .dict file, like: m _ʌ t n̩ t͡ʃ _ɔ p s.
Mapping Source Destination Example - Author:
- Robert Fromont robert@fromont.net.nz
- See Also:
HTK2DISC
-
-
Constructor Summary
Constructors Constructor Description DISC2HTK()
Default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
apply(String source)
Translates a phonemic transcription from the source encoding to the destination encoding.static boolean
IsIPADiacritic(char c)
Determines whether the given character is a known diacritic in IPA.static boolean
IsIPASuprasegmental(char c)
Determines whether the given character is a known suprasegmental in IPA.-
Methods inherited from class nzilbb.encoding.PhonemeTranslator
getDestinationEncoding, getSourceEncoding
-
-
-
-
Method Detail
-
apply
public String apply(String source)
Translates a phonemic transcription from the source encoding to the destination encoding.
-
IsIPADiacritic
public static boolean IsIPADiacritic(char c)
Determines whether the given character is a known diacritic in IPA.Diacritics include:
- ̩
- syllabic
- ̍
- syllabic
- ̯
- non-syllabic
- ̑
- non-syllabic
- ʰ
- aspirated
- ⁿ
- nasal release
- ̚
- no audible release
- ˡ
- lateral release
- ᶿ
- Voiceless dental fricative release
- ˣ
- Voiceless velar fricative release
- ᵊ
- Mid central vowel release
- ̥
- voiceless
- ̊
- voiceless
- ̤
- breathy voiced
- ̬
- voiced
- ̰
- creaky-voiced
- ̪
- dental
- ̼
- linguolanbial
- ̻
- laminal
- ̺
- apical
- ̟
- advanced
- ˖
- advanced
- ̠
- retracted
- ˗
- retracted
- ̽
- mid-centralized
- ̝
- raised
- ˔
- raised
- ̞
- lowered
- ˕
- lowered
- ̹
- more rounded
- ̜
- less rounded
- ʷ
- labialized
- ʲ
- palatalized
- ᶣ
- labio palatalized
- ᶹ
- Labialized without protrusion of the lips or velarization
- ˠ
- velarized
- ˤ
- Pharyngealized
- ̘
- advanced tongue root
- ̙
- retracted tongue root
- ̃
- nasalized
- ː
- long
- ˑ
- half long
- ̆
- extra short
- Parameters:
c
- The character.- Returns:
- true if c is a diacritic, false otherwise
-
IsIPASuprasegmental
public static boolean IsIPASuprasegmental(char c)
Determines whether the given character is a known suprasegmental in IPA.Suprasegmentals include:
- ˈ
- primary stress
- ˌ
- secondary
- ‿
- linking
- .
- syllable boundary
- Parameters:
c
- The character.- Returns:
- true if c is a diacritic, false otherwise
-
-