Package nzilbb.encoding

Handling for translation between different phoneme encodings.

Phonemic and phonetic transcriptions may be expressed using a number of systems, for example:

Unicode IPA
One or more Unicode character per phoneme, possibly including diacritics, e.g. there'llðɛəɹl̩
CELEX DISC
Exactly one ASCII character per phoneme, e.g. there'llD8r@l
ARPAbet
Phonemes are one or two uppercase ASCII characters, possibly suffixed with a digit indicating stress. e.g. there'llDH EH1 R AX0 L
CMU
A subset of ARPAbet, which excludes certain phonemes, including AX (schwa) e.g. there'llDH EH1 R AH0 L

This package includes classes for translating from one phoneme encoding to another.

The following table presents some common encodings and equivalences or near-equivalences between phonemes. 1

Example IPA SAM-PA DISC2 CPA3 Kirshenbaum4 ARPAbet CMU Dict
Vowels
kit ɪ I I I I IH IH
dress ɛ E E E E EH EH
trap æ { { ^/ & AE AE
strut ʌ V V ^ V AH AH
foot ʊ U U U U UH UH
another ǝ @ @ @ @ AX  
fleece i: i i: i: IY IY
bath ɑː A: # A: A: AA AA
lot ɒ Q Q Q A. AO AO
thought ɔː O: $ O: O:
goose u: u u: u: UW UW
nurse ɜː 3 @: V” ER ER
face eI 1 e/ eI EY EY
price aI 2 a/ aI AY AY
choice ɔɪ OI 4 o/ OI OY OY
goat ǝʊ @U 5 O/ @U OW OW
mouth aU 6 A/ aU AW AW
near ɪǝ I@ 7 I/ I@ IY R IY R
square ɛǝ E@ 8 E/ E@ EH R EH R
cure ʊǝ U@ 9 U/ U@ UH R UH R
timbre æ {~ c ^/~ &~    
détente ɑ̃ː A~: q A~: A~:    
lingerie æ̃ː {~: 0 ^/~: &~:    
bouillon ɒ̃ː O~: ~ O~: A.~:    
Consonants
pat p p p p p P P
bad b b b b b B B
tack t t t t t T T
dad d d d d d D D
cad k k k k k K K
game g g g g g G G
bang ŋ N N N N NG NG
mad m m m m m M M
nat n n n n n N N
lad l l l l l L L
rat ɹ r r r r R R
fat f f f f f F F
vat v v v v v V V
thin Ɵ T T T T TH TH
then ð D D D D DH DH
sap s s s s s S S
zap z z z z z Z Z
sheep ʃ S S S S SH SH
measure Ʒ Z Z Z Z ZH ZH
yank j j j j j Y Y
had h h h h h HH HH
wet w w w w w W W
cheap ʧ tS J T/ tS CH CH
jeep ʤ dZ _ J/ dZ JH JH
loch x x x x x    
bacon ŋ̩ N, C N, N-    
idealism m, F m, m-    
burden n, H n, n-    
dangle l, P l, l-    
car alarm * r* R r*      
uh-oh ʔ ?     ? Q  
father ɚ         AXR  
wetter ɾ         DX  

1 In the table, some phoneme representations are highlighted with a bold typeface; this highlighting is intended to indicate representations that are unpredictable in some way, either because they're substantially different from IPA or from English orthographical convention, or they're different from the corresponding representation in an otherwise-similar set of representations. Others are highlighted with an italic typeface; these are examples of representations that actually use a combination of two phonemes, where in other sets only one phoneme is used.

2 SAM-PA and DISC phonemes taken from CELEX English Guide (1995) § 2.4.1 pp. 31-32, Tables 3 & 4.

3 The Computer Phonetic Alphabet (CPA) was developed for seven European languages, based on the IPA - Kugler-Kruse (1987)

Author:
Robert Fromont robert@fromont.net.nz