Package nzilbb.encoding
Phonemic and phonetic transcriptions may be expressed using a number of systems, for example:
- Unicode IPA
- One or more Unicode character per phoneme, possibly including diacritics,
e.g.
there'll
→ ðɛəɹl̩ - CELEX DISC
- Exactly one ASCII character per phoneme,
e.g.
there'll
→ D8r@l - ARPAbet
- Phonemes are one or two uppercase ASCII characters, possibly suffixed with a
digit indicating stress.
e.g.
there'll
→ DH EH1 R AX0 L - CMU
- A subset of ARPAbet, which excludes certain phonemes, including AX (schwa)
e.g.
there'll
→ DH EH1 R AH0 L
This package includes classes for translating from one phoneme encoding to another.
The following table presents some common encodings and equivalences or near-equivalences between phonemes. 1
Example | IPA | SAM-PA | DISC2 | CPA3 | Kirshenbaum4 | ARPAbet | CMU Dict |
---|---|---|---|---|---|---|---|
Vowels | |||||||
kit | ɪ | I | I | I | I | IH | IH |
dress | ɛ | E | E | E | E | EH | EH |
trap | æ | { | { | ^/ | & | AE | AE |
strut | ʌ | V | V | ^ | V | AH | AH |
foot | ʊ | U | U | U | U | UH | UH |
another | ǝ | @ | @ | @ | @ | AX | |
fleece | iː | i: | i | i: | i: | IY | IY |
bath | ɑː | A: | # | A: | A: | AA | AA |
lot | ɒ | Q | Q | Q | A. | AO | AO |
thought | ɔː | O: | $ | O: | O: | ||
goose | uː | u: | u | u: | u: | UW | UW |
nurse | ɜː | 3ː | 3 | @: | V” | ER | ER |
face | eɪ | eI | 1 | e/ | eI | EY | EY |
price | aɪ | aI | 2 | a/ | aI | AY | AY |
choice | ɔɪ | OI | 4 | o/ | OI | OY | OY |
goat | ǝʊ | @U | 5 | O/ | @U | OW | OW |
mouth | aʊ | aU | 6 | A/ | aU | AW | AW |
near | ɪǝ | I@ | 7 | I/ | I@ | IY R | IY R |
square | ɛǝ | E@ | 8 | E/ | E@ | EH R | EH R |
cure | ʊǝ | U@ | 9 | U/ | U@ | UH R | UH R |
timbre | æ | {~ | c | ^/~ | &~ | ||
détente | ɑ̃ː | A~: | q | A~: | A~: | ||
lingerie | æ̃ː | {~: | 0 | ^/~: | &~: | ||
bouillon | ɒ̃ː | O~: | ~ | O~: | A.~: | ||
Consonants | |||||||
pat | p | p | p | p | p | P | P |
bad | b | b | b | b | b | B | B |
tack | t | t | t | t | t | T | T |
dad | d | d | d | d | d | D | D |
cad | k | k | k | k | k | K | K |
game | g | g | g | g | g | G | G |
bang | ŋ | N | N | N | N | NG | NG |
mad | m | m | m | m | m | M | M |
nat | n | n | n | n | n | N | N |
lad | l | l | l | l | l | L | L |
rat | ɹ | r | r | r | r | R | R |
fat | f | f | f | f | f | F | F |
vat | v | v | v | v | v | V | V |
thin | Ɵ | T | T | T | T | TH | TH |
then | ð | D | D | D | D | DH | DH |
sap | s | s | s | s | s | S | S |
zap | z | z | z | z | z | Z | Z |
sheep | ʃ | S | S | S | S | SH | SH |
measure | Ʒ | Z | Z | Z | Z | ZH | ZH |
yank | j | j | j | j | j | Y | Y |
had | h | h | h | h | h | HH | HH |
wet | w | w | w | w | w | W | W |
cheap | ʧ | tS | J | T/ | tS | CH | CH |
jeep | ʤ | dZ | _ | J/ | dZ | JH | JH |
loch | x | x | x | x | x | ||
bacon | ŋ̩ | N, | C | N, | N- | ||
idealism | m̩ | m, | F | m, | m- | ||
burden | n̩ | n, | H | n, | n- | ||
dangle | l̩ | l, | P | l, | l- | ||
car alarm | * | r* | R | r* | |||
uh-oh | ʔ | ? | ? | Q | |||
father | ɚ | AXR | |||||
wetter | ɾ | DX |
1 In the table, some phoneme representations are highlighted with a bold typeface; this highlighting is intended to indicate representations that are unpredictable in some way, either because they're substantially different from IPA or from English orthographical convention, or they're different from the corresponding representation in an otherwise-similar set of representations. Others are highlighted with an italic typeface; these are examples of representations that actually use a combination of two phonemes, where in other sets only one phoneme is used.
2 SAM-PA and DISC phonemes taken from CELEX English Guide (1995) § 2.4.1 pp. 31-32, Tables 3 & 4.
3 The Computer Phonetic Alphabet (CPA) was developed for seven European languages, based on the IPA - Kugler-Kruse (1987)
- Author:
- Robert Fromont robert@fromont.net.nz
-
Class Summary Class Description ARPAbet2DISC Translates ARPAbet-encoded phonemic transcriptions like T R AE2 N S K R IH1 P SH AX0 N to CELEX-DISC-encoded transcriptions like tr{nskrIpS@n.CMU2DISC Translates CMU-encoded phonemic transcriptions like T R AE2 N S K R IH1 P SH AH0 N to CELEX-DISC-encoded transcriptions like tr{nskrIpSVn.DISC2ARPAbet Translates CELEX-DISC-encoded transcriptions like tr{nskrIpSVn to ARPAbet-encoded phonemic transcriptions like T R AE N S K R IH P SH AX N.DISC2CMU Translates CELEX-DISC-encoded transcriptions like tr{nskrIpSVn to CMU-encoded phonemic transcriptions like T R AE N S K R IH P SH AH N.DISC2HTK Translates CELEX-DISC-encoded transcriptions like tr{nskrIpS@n to a form that would word for a Hidden Markov Model Toolkit (HTK) .dict file, like: t r _{ n s k r I p S _@ n.DISC2IPA Translates CELEX-DISC-encoded transcriptions like tr{nskrIpS@n to IPA-encoded phonemic transcriptions using Unicode characters like tɹænskɹɪpʃən.DISC2Kirshenbaum Translates CELEX-DISC-encoded transcriptions like tr{nskrIpS@n to Kirshenbaum-encoded phonemic transcriptions like tr&nskrIpS@n.DISC2SAMPA Translates CELEX-DISC-encoded transcriptions like str1n_ to SAMPA-encoded phonemic transcriptions like streIndZ.DISC2Unisyn Translates CELEX-DISC-encoded transcriptions like "tr{n-'skrIp-S@n to Unisyn-encoded phonemic transcriptions like ~ t r a n . * s k r i p . sh @ n.DISC2XSAMPA Translates CELEX-DISC-encoded transcriptions like str1n_ to X-SAMPA-encoded phonemic transcriptions like str\eIndZ.HTK2DISC Translates Hidden Markov Model Toolkit (HTK) dictionary pronunciations like t r _{ n s k r I p S _@ n.IPA2DISC Translates IPA-encoded phonemic transcriptions using Unicode characters like tɹænskɹɪpʃən to CELEX-DISC-encoded transcriptions like tr{nskrIpS@nKirshenbaum2DISC Translates Kirshenbaum-encoded phonemic transcriptions like tr&nskrIpS@n to CELEX-DISC-encoded transcriptions like tr{nskrIpS@n.PhonemeTranslator Base-class for Functions that convert phonemic transcriptions from one encoding to another.SAMPA2DISC Translates SAMPA-encoded phonemic transcriptions like streIndZ to CELEX-DISC-encoded transcriptions like str1n_.Unisyn2DISC Translates Unisyn-encoded phonemic transcriptions like ~ t r a n . * s k r i p . sh @ n to CELEX-DISC-encoded transcriptions like "tr{n-'skrIp-S@n.ValidLabelsDefinitions Utility functions for generating predefined valid-label definitions using different encodings.XSAMPA2DISC Translates X-SAMPA-encoded phonemic transcriptions like str\eIndZ to CELEX-DISC-encoded transcriptions like str1n_.