nzilbb.annotator.phonemetranscoder

Phoneme Transcoder

The Phoneme Transcoder translates word pronunciations from one phoneme encoding system to another.

Phonemic and phonetic transcriptions may be expressed using a number of systems, for example:

Unicode IPA: One or more Unicode character per phoneme, possibly including diacritics, e.g. there'll → ðɛəɹl̩
CELEX DISC: Exactly one ASCII character per phoneme, e.g. there'll → D8r@l
ARPAbet: Phonemes are one or two uppercase ASCII characters, possibly suffixed with a digit indicating stress. e.g. there'll → DH EH1 R AX0 L
CMU: A subset of ARPAbet, which excludes certain phonemes, including AX (schwa) e.g. there'll → DH EH1 R AH0 L

The annotator supports selecting from a number of predetermined conversions between known encodings, or a custom mapping between label characters can be specified, e.g. for language where the orthography maps transparently to the phonology.

The following table presents some common encodings and equivalences or near-equivalences between phonemes. ¹

Example	IPA	SAM-PA	DISC²	CPA³	Kirshenbaum⁴	ARPAbet	CMU Dict
Vowels
kit	ɪ	I	I	I	I	IH	IH
dress	ɛ	E	E	E	E	EH	EH
trap	æ	{	{	^/	&	AE	AE
strut	ʌ	V	V	^	V	AH	AH
foot	ʊ	U	U	U	U	UH	UH
another	ǝ	@	@	@	@	AX
fleece	iː	i:	i	i:	i:	IY	IY
bath	ɑː	A:	#	A:	A:	AA	AA
lot	ɒ	Q	Q	Q	A.	AO	AO
thought	ɔː	O:	$	O:	O:	AO	AO
goose	uː	u:	u	u:	u:	UW	UW
nurse	ɜː	3ː	3	@:	V”	ER	ER
face	eɪ	eI	1	e/	eI	EY	EY
price	aɪ	aI	2	a/	aI	AY	AY
choice	ɔɪ	OI	4	o/	OI	OY	OY
goat	ǝʊ	@U	5	O/	@U	OW	OW
mouth	aʊ	aU	6	A/	aU	AW	AW
near	ɪǝ	I@	7	I/	I@	IY R	IY R
square	ɛǝ	E@	8	E/	E@	EH R	EH R
cure	ʊǝ	U@	9	U/	U@	UH R	UH R
timbre	æ	{~	c	^/~	&~
détente	ɑ̃ː	A~:	q	A~:	A~:
lingerie	æ̃ː	{~:	0	^/~:	&~:
bouillon	ɒ̃ː	O~:	~	O~:	A.~:
Consonants
pat	p	p	p	p	p	P	P
bad	b	b	b	b	b	B	B
tack	t	t	t	t	t	T	T
dad	d	d	d	d	d	D	D
cad	k	k	k	k	k	K	K
game	g	g	g	g	g	G	G
bang	ŋ	N	N	N	N	NG	NG
mad	m	m	m	m	m	M	M
nat	n	n	n	n	n	N	N
lad	l	l	l	l	l	L	L
rat	r	r	r	r	r	R	R
fat	f	f	f	f	f	F	F
vat	v	v	v	v	v	V	V
thin	Ɵ	T	T	T	T	TH	TH
then	ð	D	D	D	D	DH	DH
sap	s	s	s	s	s	S	S
zap	z	z	z	z	z	Z	Z
sheep	ʃ	S	S	S	S	SH	SH
measure	Ʒ	Z	Z	Z	Z	ZH	ZH
yank	j	j	j	j	j	Y	Y
had	h	h	h	h	h	HH	HH
wet	w	w	w	w	w	W	W
cheap	ʧ	tS	J	T/	tS	CH	CH
jeep	ʤ	dZ	_	J/	dZ	JH	JH
loch	x	x	x	x	x
bacon	ŋ̩	N,	C	N,	N-
idealism	m̩	m,	F	m,	m-
burden	n̩	n,	H	n,	n-
dangle	l̩	l,	P	l,	l-
car alarm	*	r*	R	r*
uh-oh	ʔ	?			?	Q
father	ɚ					AXR
wetter	ɾ					DX

¹ In the table, some phoneme representations are highlighted with a bold typeface; this highlighting is intended to indicate representations that are unpredictable in some way, either because they're substantially different from IPA or from English orthographical convention, or they're different from the corresponding representation in an otherwise-similar set of representations. Others are highlighted with an italic typeface; these are examples of representations that actually use a combination of two phonemes, where in other sets only one phoneme is used.

² SAM-PA and DISC phonemes taken from CELEX English Guide (1995) § 2.4.1 pp. 31-32, Tables 3 & 4.

³ The Computer Phonetic Alphabet (CPA) was developed for seven European languages, based on the IPA - Kugler-Kruse (1987)

⁴ http://en.wikipedia.org/wiki/Kirshenbaum