P is for Phonotactics

29 10 2017
knish 2015


Why is baseball called be-su-bu-ro in Japanese? Why do most learners say clothiz and not clothes? Why am I called Escott by Spanish speakers and Arabic speakers alike? Why can we say /gz/ when it is the middle of a word (exam) and at the end of a word (dogs) but not at the beginning? (Check a dictionary if you are in any doubt). Why are clash and crash recognizably English words but cnash is not?  Is it because it’s hard to say? Well, not if you can say knish, which – if you live in New York, and like to eat them – you regularly do.  It’s not that we can’t say cnash, cfash or cpash – we just don’t.

Why? The answer is, of course, is to be found in phonotactics, i.e. the study of the sound combinations that are permissible in any given language. (Important note: we are talking about sound combinations – not letter combinations – this is not about spelling).  In Japanese, syllables are limited to a single consonant plus vowel construction (CV), with strong constraints on whether another consonant can be added (CVC). Hence be-su-bu-ro for baseball. And bat-to for bat, and su-to-rai-ku  for strike (Zsiga 2006). As for Escott: Spanish does not allow words to begin with /s/ plus another consonant – hence the insertion of word-initial /ɛ/, which gives *Escott (like escuela, estado, etc) – a process called epenthesis. (Epenthesis accounts for the extra vowel English speakers insert in certain regular past tense combinations: liked, loved, but wanted.)


Shmuck with knish

English allows for many more consonant clusters than, say, Japanese or Hawaiian (with its only 13 phonemes in all), but nothing like some languages, like Russian. According to O’Connor (1973, p. 231) ‘there are 289 initial consonant clusters in Russian as compared with 50 in English.’ English almost makes up for this by allowing many more word-final clusters (think, for example, of sixth and glimpsed – CVCCC and CCVCCCC, respectively) but Russian still has the edge(142 to 130). Of course, these figures don’t exhaust the possibilities that are available in each language: there are 24 consonant sounds in English, so, theoretically, there are 242 two-consonant combinations, and 243 three-consonant combinations. But we use only a tiny fraction of them. And some combinations are only found in borrowings from other languages, like knish and shmuck. (Theoretically, as O’Connor points out, ‘it is possible to imagine two different languages with the same inventory of phonemes but whose phonemes combine together in quite different ways’ [p. 229]. In which case, a phonemic chart on the classroom wall would be of much less use than a chart of all the combinations).

Likewise, there is no theoretical limit as to which consonants can appear at the beginning of a syllable or at the end of it. But, ‘whereas in English all but the consonants /h, ŋ, j and w/ may occur both initially and finally in CVC syllables, i.e. 20 out of the total 24, in Cantonese only 6 out of a total of 20 occur in both positions, since only /p, t, k, m, n, ŋ/ occur in final position, the remainder being confined to initial position’ (O’Connor, p. 232).

It’s this kind of information that is often missing from comparisons of different languages. This was driven home recently as I reviewed a case study assignment that my MA students have been doing, in which they were asked to analyze the pronunciation difficulties of a learner of their choice. What often puzzles them is that the learner might produce a sound correctly in one word, but not in another – in some cases, even leaving it out completely. The answer, of course, is not in phonemics, but in phonotactics: it’s all about where the sound is, and in what combinations. And it is perhaps just as significant a cause of L1 interference as are phonemic differences.  Yet, apart from mentions of consonant clusters, there a few if any references to phonotactics in the pedagogical literature. (In The New A-Z of ELT, phonotactics gets a mention in the entry on consonant clusters, but – note to self! – phonotactics is not just about consonants: it also deals with vowel sequences, and which vowels habitually follow which consonants.)

Phonotactics is also of interest to researchers into language acquisition, since our sensitivity to what sound sequences are permissible in our first language seems to become entrenched at a very early age.  Ellis (2002, p. 149), for example, quotes research that showed ‘that 8-month-old infants exposed for only 2 minutes to unbroken strings of nonsense syllables (e.g., bidakupado) are able to detect the difference between three-syllable sequences that appeared as a unit and sequences that also appeared in their learning set but in random order. These infants achieved this learning on the basis of statistical analysis of phonotactic sequence data, right at the age when their caregivers start to notice systematic evidence of their recognising words.’

piet and knishery


Such findings lend support to usage-based theories of language acquisition (e.g. Christiansen and Chater 2016), where sequence processing and learning – not just of sounds but also of lexical and grammatical items – may be the mechanism that drives acquisition. It seems we are genetically programmed to recognize and internalize complex sequences: there is neurobiological evidence, for example, that shows considerable overlap of the mechanisms involved in language learning and the learning of other kinds of sequences, such as musical tunes.  As Ellis (op.cit.), summarizing the evidence, concludes, ‘much of language learning is the gradual strengthening of associations between co-occurring elements of the language and… fluent language performance is the exploitation of this probabilistic knowledge’ (p.173). What starts as phonotactics ends up as collocation, morphology and syntax.


Christiansen, M.H. & Chater, N. (2016) Creating language: integrating evolution, acquisition, and processing. Cambridge, Mass.: MIT Press.

Ellis, N.C. (2002) ‘Frequency effects in language processing: a review with implications for theories of implicit and explicit language acquisition.’ Studies in SLA, 24/2.

O’Connor, J.D. (1973) Phonetics. Harmondsworth: Penguin.

Zsiga, E. (2006) ‘The sounds of language,’ in Fasold, R.W. & Connor-Linton, J. (eds) An introduction to language and linguistics. Cambridge: Cambridge University Press.


M is for Minimal pairs

8 10 2017

The story of the Australian pig farmer whose livestock were decimated by floods has been circulating on the Internet recently. A reporter misheard him say that ‘Thirty thousand pigs were floating down the river’, and reported it as such. In fact, what he had said was: ‘Thirty sows and pigs…’.  A nice example of how a minimal pair mistake can cause problems even among native speakers.

Just to remind you, here’s how minimal pairs are defined in The New A-Z of ELT:

A minimal pair is a pair of words which differ in meaning when only one sound (one phoneme) is changed. Pair and bear are minimal pairs, since their difference in meaning depends on the different pronunciation of their first sound: p versus b. However, pair and pear are not minimal pairs, since, although they differ in meaning, they are pronounced the same. Minimal pairs are widely used in pronunciation teaching to help learners discriminate between sound contrasts, particularly those that don’t exist in their L1, for the purposes of both recognition and production.

On the MA course I teach for The New School, I set the students a task in which they describe how they might exploit this kind of minimal pairs activity (from Baker 2006):

ship or sheep 2006

Here’s my feedback on the task:

As I suggest, such activities may have limited usefulness. Indeed, does anyone still do them?


Baker, A. (2006) Ship or sheep? (2nd edn). Cambridge: Cambridge University Press.


A is for Accent

1 10 2017

vivir-es-facil-con-los-ojos-cerrados‘Living is easy with eyes closed’, David Trueba’s 2013 movie, which I watched again on TV this week, is interwoven with references to language and language teaching. It is based on the true story of a high-school English teacher in Spain who, in 1966, manages to infiltrate himself on to the set of ‘How I won the War’, which was being filmed in a remote part of Almería, and persuade John Lennon to include the printed lyrics of songs in subsequent Beatles albums.

Apart from the teacher’s inspired use of Beatles lyrics to imbue his students with a feel for English, the film touches on other language issues too. At one point the teacher comments on the broadness of the accent of an elderly villager, who retorts, ‘No, I don’t have an accent. It’s them from Málaga and Cádiz who have the really broad accents.’

The perception that only other people have accents is, of course, a common one. So, too, is the view that some accents are ‘neutral’ or ‘slight’ or ‘faint’ – whereas others are ‘thick’ or ‘broad’ or ‘strong’. What this really means is that any given speaker’s pronunciation displays features that are either nearer to, or further from, the accent that the interlocutor is most familiar with. This could be the local one (as in the case of the man from Almería), or, more typically these days, the ‘standard’, where ‘standard’ is defined as ‘the variety that is normally spoken by educated people and used in news broadcasts and other similar situations’ (Carter, 1995, p. 145).

Significantly, the adjectives that most commonly co-occur with accent (according to the Corpus of Contemporary American English [Davies 2008-], and excluding for the moment names of languages – like French, Russian etc) are: thick, heavy, foreign, slight, strong, soft, faint, fake, lilting, native, clipped, funny, strange, different, good, charming and sexy.  Notice how value-laden many of these adjectives are. This fact serves to remind us that – for the ‘person in the street’ at least – there is no such thing as a ‘neutral’ accent, in the sense of an accent that is value-free.

This was driven home this week by the appearance of a video on the BBC Website in which  a young Polish woman living in the UK is reduced, literally, to tears by the negative reaction her accent supposedly evokes among Britons – an accent that is hardly thick, heavy or funny, incidentally. Accordingly, she enlists the services of an elocution teacher, who promises to rid her of her accent once and for all. (The teacher’s exaggerated RP vowels and her manner of drilling them is reminiscent of Professor Henry Higgins in Shaw’s Pygmalion, and the way he successfully erases the Cockney accent of Eliza Doolittle, and, in so doing, effectively erases her identity).

my fair lady 02

Rex Harrison as Henry Higgins, Audrey Hepburn as Eliza Doolittle in the film of the musical ‘My Fair Lady’


What the Polish woman is seeking is what is marketed as ‘accent reduction’, which, as Jennifer Jenkins (2000, p. 208) points out, is predicated on a misunderstanding of what second language acquisition means, i.e. not subtraction, but addition: ‘An L2 accent is not an accent reduced but an accent gained: a facility which increases learners’ choices by expanding their phonological repertoires.’ And she adds, ‘Interestingly, we never hear references to “grammar reduction” or “vocabulary reduction”. No writer of L2 pedagogic grammars or vocabulary courses would entertain the notion that learners need to reduce their L1 grammar or vocabulary in order to acquire the L2.’

Of course, such arguments will probably not appease the Polish woman who desperately wants to achieve a kind of social invisibility. Nevertheless, they serve to remind us that our choices – as teachers, curriculum designers and materials writers – have a strong ethical component, as Bill Johnston (2003, pp 39-40) argues:

It is commonly known in our field that the English language includes a bewildering diversity of varieties, especially accents… The problem in the field of ELT is to know which of these varieties to teach. My contention that this decision is moral in nature– that is, that it is grounded in values — stems from the fact that… language varieties themselves are not value neutral. Quite the opposite, in fact is true: the different varieties of English are highly value laden. Accents are closely linked to the identities of  individuals and groups of people; to value one accent over another is, rather directly, to value one group of people over another.

Accent and idenity are inextricably interconnected. I wonder if ‘accent reduction’ courses would be quite as popular if they were re-branded as ‘identity reduction’ courses?


Carter, R. (1995) Key Words in Language and Literacy. London: Routledge.

Davies, Mark. (2008-) The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. Available online at https://corpus.byu.edu/coca/

Jenkins, J. (2000) The Phonology of English as an International Language. Oxford: Oxford University Press.

Johnston, B. (2003) Values in English Language Teaching. Mahwah, NJ: Lawrence Erlbaum.


P is for Phoneme

17 03 2013

aeIs the phoneme dead?

We’ve been doing a unit on phonology, and my doubts about the phoneme are partly a reflection of my students’ own difficulties with the concept.  Not surprisingly, I’ve been having to tease out the difference between phonemic symbols and phonetic symbols, and even between phonology and phonics.

But all the time I’ve been dreading the day when someone challenges this definition (from An A to Z):

‘A phoneme is one of the distinctive sounds of a particular language. That is to say, it is not any sound, but it is a sound that, to speakers of the language, cannot be replaced with another sound without causing a change in meaning’.

The definition has an authoritative ring to it, not least because it simply re-states what by many is considered a founding principle of all linguistics. Listen to Jakobson (1990: 230) who practically bellows the fact: ’The linguistic value … of any phoneme in any language whatever, is only its power to distinguish the word containing this phoneme from any words which, similar in all other respects, contain some other phoneme’ (emphasis in original).

dHow is it, then, that we regularly teach that the ‘s’ at the end of cats is a different phoneme than the ‘s’ at the end of dogs?  If different phonemes flag different meanings, what change of meaning is represented in the difference between /s/ and /z/? Or, for that matter, between final /t/ and final /d/, as in chased and killed?   If there is no difference in meaning (since /s/ and /z/ both index plurality, and /t/ and /d/ both index past tense), aren’t they simply different ways of pronouncing the same phoneme?

Phonemes, after all, are not phones, i.e. sounds. Acoustically speaking there are many different ways – even for a single speaker – of realizing a specific phoneme. This is why Daniel Jones (1950: 7) defined phonemes as ‘small families of sounds, each family consisting of an important sound of the language together with other related sounds’ (my emphasis). These related sounds are the different allophones of the phoneme.

Hence the analogy with chess pieces: the way individual chess pieces are designed will vary from set to set, but they will always bear certain family resemblances, bishops all having mitres, and knights having horse heads, etc. More important than their form (and one reason that this analogy seems to work so well),  is the relationship that they have with one another, including the ‘rules’ that constrain the way that they may behave. Bishops can’t do what knights do, nor go where knights go, and vice versa.

Phonemes – like chess pieces – are defined in relation to one another. As Bloomfield (1935: 81) put it, ‘the phoneme is kept distinct from all other phonemes of its language. Thus, we speak the vowel of a word like pen in a great many ways, but not in any way that belongs to the vowel of pin, and not in any way that belongs to the vowel of pan: the three types are kept rigidly apart.’

ngIn fact, a purely structuralist argument would say it’s not actually about meaning at all, it’s about ‘complementary distribution’, or, as Jones (1950: 132) puts it (also bellowing): ‘NO ONE MEMBER EVER OCCURS IN A  WORD IN THE SAME PHONETIC CONTEXT AS ANY OTHER MEMBER’.  That is to say, the /s/ at the end of cats and the /z/ at the end of dogs never occur where the other occurs, and vice versa. But is this true? What happens to the /z/ at the end of dogs in the sentence: The dogs seem restless? Hasn’t it become /s/?

Ah, yes, you say – but sounds in connected speech are influenced by their environment, blending with or accommodating to the sounds around them. The true test for a phoneme is if it distinguishes isolated words, like pin and pen – those infamous minimal pairs. But when are words ever isolated? When does the phonetic environment not have an effect?  And isn’t the voiced /z/ at the end of dogs, and the unvoiced /s/ at the end of cats also an effect of the phonetic environment? That is to say, where does connected speech start becoming connected if not at the juxtaposition of two sounds?

It gets even trickier when we consider weak forms. There are at least two different ways of saying can, as in I can dance: I /kæn/ dance, or I /kən/ dance. Both are possible, even where the stress remains on dance. The latter is simply more reduced. But the meaning is unchanged. [kæn] and [kən] are not minimal pairs. They are different phonetic realizations of the same word (hence the square brackets). Phonetic. Not phonemic. Shouldn’t, therefore, they both be transcribed as /kæn/?

In researching this, I’ve encountered a lot of debate as to whether the concept of the phoneme has any currency at all any more. As one scholar puts it, ‘the phoneme, to all appearances, no longer holds a central place in phonological theory’ (Dresher 2011: 241). The problem seems to boil down to one of identification: is the phoneme a physical thing that can be objectively described, or is it psychological – a mental representation independent of the nature of the acoustic signal?

eThe answer to the first question (is it physical?) seems to be no, there are no ‘distinctive features’ or family resemblances (such as voicing or lip-rounding) that unequivocally categorize sounds as belonging to one phoneme family and not another.

On the other hand, there is some evidence, including neurological, that the phoneme does have a psychological reality, and that speakers of languages that share the same sounds will perceive these sounds differently, according to whether they flag meaning differences or not. (This is analogous to the idea that if your language does not distinguish between blue and green, you will see both blue and green as being shades of the same colour).  This, in turn, is consistent with Jakobson’s claim that ‘if we compare any two particular languages, we will see that from an acoustic and motor point of view their sounds could be identical, while the way they are grouped into phonemes is different’ (p. 223).

It’s not for nothing, therefore, that the concept of the phoneme has given us the very valuable distinction between emic and etic, i.e. the perspective of the insider vs that of the outsider. Phonemes capture something that we, the insiders, intuit about language, even if their objective reality is elusive. We know that pronunciation impacts on meaning, even if we don’t quite know how.

Perhaps Jakobson (op. cit. 230) had good reason to claim, therefore, that ‘the phoneme functions, ergo it exists’.


Bloomfield, L. (1935) Language, London: George Allen & Unwin.

Dresher, E. (2011) ‘The Phoneme’, in van Oostendorp, M., Ewen, C.J., Hume, E., & Rice, K. (eds) The Blackwell Companion to Phonology, Oxford: Blackwell, available here

Jakobson, R. (1990) On Language, edited by Waugh, L.R. & Monville-Burston, M., Cambridge, Mass: Harvard University Press.

Jones, D. (1950) The Phoneme: Its nature and use, Cambridge: W. Heffer & Sons.

Illustrations from the very clever phonemic chart that comes with English File (Oxenden, C. and Seligson, P., 1996, Oxford University Press).

A is for Accommodation

6 01 2013

You may well have seen this YouTube clip a month or so ago: British footballer Joey Barton is interviewed in France not long after having debuted for the Marseille football club.  Much commented upon – and mocked – was his thick French accent, despite his being a native speaker of English and speaking little or no French. The Daily Mail, for example, described it as ‘an embarrassing display’ and ‘a comedy French accent’. Judge for yourself…

What Barton of course was doing (although neither he nor the Daily Mail named it as such) was accommodating his accent to that of his audience. Accommodation, as Robin Walker (2010: 97) reminds us, is ‘the ability to adjust your speech and aspects of spoken communication so that they become more (or less) like that of your interlocutors’.  David Crystal (2003: 6) adds that, ‘among the reasons why people converge towards the speech pattern of their listener are the desires to identify more closely with the listener, to win social approval, or simply to increase the communicative efficiency of the interaction’.

Winning social approval may well have motivated Barton, a newcomer to the region, to assume a French accent. But more important still was the need to be intelligible: in his defence he had said that ‘it is very difficult to do a press conference in Scouse for a room full of French journalists. The alternative is to speak like a ‘Allo Allo!’ character’.

Whatever the reason, Barton’s much-publicized accommodation is a good, if extreme, example of what most of us tend to do naturally and instinctively, and not just at the level of accent.  Jenny Jenkins (2000: 169) identifies a wide range of linguistic and prosodic features that are subject to convergence between speakers, ‘such as speech rate, pauses, utterance length, pronunciation and… non-vocal features such as smiling and gaze’.

Basic English 1 two figures01And, as Richardson et al., (2008: 75) note, ‘conversational partners do not limit their behavioural coordination to speech. They spontaneously move in synchrony with each other’s speech rhythms’, a finding which is likened to the ‘synchrony, swing, and coordination’ displayed by members of a jazz band. The researchers tracked the posture and gaze position of conversants to show that this coordination is not simply a byproduct of the interaction, but the physical embodiment of the speakers’ cognitive alignment – ‘an intimate temporal coupling between conversants’ (p. 88) or, (in T.S.Eliot’s words) ‘the whole consort dancing together’.

Arguably, accommodation occurs not only at the paralinguistic level, but at the linguistic one too. As we speak, for example, we are continuously monitoring our interlocutor’s degree of understanding, and adjusting our message accordingly. This is especially obvious in the way we talk to children and non-native speakers, forms of talk called  ‘caretaker talk’ and ‘foreigner talk’, respectively. Both varieties are characterized by considerable simplification, although there are significant differences. Caretaker talk is often pitched higher and is slower than talk used with adults, but, while simpler, is nearly always grammatically well-formed. Foreigner talk, on the other hand, tolerates greater use of non-grammatical, pidgin-like forms, as in ‘me wait you here’, or ‘you like drink much, no?’

Various theories have been proposed as to how speakers modify their talk like this. One is that they ‘regress’ to an early stage in their own language development. Another is that they negotiate a mutually-intelligible degree of communication. A third (and this is really a form of accommodation) is that they simply match their language to that of their interlocutor, imitating its simplifications, including its lack of grammatical accuracy. Rod Ellis (1994: 265), however, thinks that this explanation is unlikely, as ‘it is probably asking too much of learners’ interlocutors to measure simultaneously the learners’ phonology, lexicon, syntax, and discourse with sufficient accuracy to adjust their own language output’.

However, this was written before the discovery of ‘mirror neurons’, and their key role in enabling imitative behavior.  As Iacoboni (2008: 91-92) observes, ‘the fact that the major language area of the human brain is also a critical area for imitation and contains mirror neurons offers a new view of language and cognition in general’.  According to Iacobini, it is because of these mirror neurons that ‘during conversations we imitate each other’s expressions, even each other’s syntactic constructions… If one person engaged in a dialogue uses the word “sofa” rather than the word “couch,” the other person engaged in the dialogue will do the same’ (op. cit. 97-98).

It seems, then, that as humans we are hard-wired to imitate one another.

Basic English 1 two figures02So, what are the implications for language teaching? In the interests both of intelligibility and establishing ‘comity’, Joey Barton’s adaptive accent strategy may be the way to go. For learners of English, whose interlocutors may not themselves be native speakers, this may mean learning to adapt to other non-native speaker accents. As Jenkins (2007: 238) argues, ‘in international communication, the ability to accommodate  to interlocutors with other first languages than one’s own… is a far more important skill than the ability to imitate the English of a native speaker.’

So, in the interests of mutual intelligibility, rather than teaching pronunciation per se, maybe we should be teaching accommodation skills. The question, of course, is how?


Crystal, D. (2003) A Dictionary of Linguistics and Phonetics (5th edition) Oxford: Blackwell.

Ellis, R. (1994) The Study of Second language Acquisition, Oxford: Oxford University Press.

Iacoboni, M. (2008) Mirroring People: The New Science of How We Connect with Others, New York: Farrar, Straus and Giroux,

Jenkins, J. (2000) The Phonology of English as an International Language, Oxford: Oxford University Press.

Jenkins, J. (2007) English as a Lingua Franca: Attitude and Identity, Oxford: Oxford University Press.

Basic English 1 two figures03Richardson, D.C., Dale, R., & Shockley, K., (2008) ‘Synchrony and swaying in conversation: coordination, temporal dynamics, and communication,’ in Wachsmuth, I., Lenzen, M., & Knoblich, G. (eds) Embodied Communication in Humans and Machines, Oxford: Oxford University Press.

Walker, R. (2010) Teaching the Pronunciation of English as a Lingua Franca, Oxford: Oxford University Press.

Illustrations from Ogden, C.K. (ed.) (n.d.) The Basic Way to English, London: Evans Brothers.

V is for Voice setting

18 12 2011

A correspondent has reminded me of an article I wrote – ages ago – on voice setting (you can read it here):

I have just read your article ‘Having a good jaw: voice setting phonology’, and having noted the year in which it was published, I am interested to find out if you or anyone else, has conducted any studies on the exercises you suggested?

Never mind the mouth, check out the tash!

Just to remind you, voice setting – or ‘bases of articulation’ –  is the general term for those “general differences in tension, in tongue shape, in pressure of the articulators, in lip and cheek and jaw posture and movement, which run through the whole articulatory process” (O’Connor 1973:289).  It’s argued that voice settings vary from language to language, e.g.

“In English the lips and jaw move little, in French they move much more, with vigorous lip-rounding and spreading: the cheeks are relaxed in English but tensed in French: the tongue-tip is tenser in English and more used than in French, where the blade is dominant, and so on.” (O’Connor op.cit.)

Over the years I’ve collected  a number of non-specialist descriptions – from novels and poems, principally – that nicely capture voice setting characteristics. Here’s a selection:

“His voice rang like a metal clipper hitting a bucket and he spoke English. Proper English … he sprinkled ers and even errers in his sentences as liberally as he gave out his twisted-mouth smiles. His lips pulled not down… but to the side, and his head lay on one side or the other, but never straight on the end of his neck”. (Maya Angelou I Know How the Caged Bird Sings).

When you hear it languishing

and hooing and cooing and sidling through the front teeth,

the oxford voice

or worse still

the would-be oxford voice

you don’t even laugh anymore, you can’t …

(D.H.Lawrence: “The Oxford Voice”)

“Watching him twisting his mouth into that intelligently ironical shape that is necessary for the production of Dutch noises, I was reminded of how much I liked the semi-gargling sound Netherlanders make, brewing each word up at the back of their throats and then having to unpick it with their teeth.”  (Howard Jacobson: The Land of Oz)

What I was arguing (in the aforementioned article) was that accurate pronunciation at the segmental level (i.e. of individual sounds) is at least partly contingent on adjusting to the specific vocal setting for the language you’re trying to speak. That is to say, accent is as much an effect of top-down features as it is of bottom-up ones. Hence, it might repay teachers of pronunciation to start working on these top-down features first, in advance of fine-tuning for  phonemic distinctions.

To that end, I suggested an activity sequence that included awareness-raising activities such as watching videos of speakers with the sound off, in order to try and guess what language they are speaking, or role play activities where learners attempt to speak their own language with a marked English (RP or GA) accent, in the way that – for example – Brits or ‘gringos’ are portrayed locally in the movies. This might lead to some discussion as to what is actually happening – physically – when you ‘speak with an English accent’.

Read my lips

But, to answer my correspondent’s question, I don’t know of any follow-up to these suggestions, or, for that matter, of any research into the pedagogical applications of voice setting theory at all.  Besides, I’m wondering if – in this era of English as a Lingua Franca – is it really all that necessary to take such drastic steps to ‘nativise’ learners’ accents?


O’Connor, J.D. 1973. Phonetics. Harmondsworth: Penguin.

Thornbury, S. 1993. Having a good jaw: voice-setting phonology. ELT Journal, 47/2, 126-31.

Illustrations from Jones, D. 1932. An Outline of English Phonetics (3rd edn.) Leipzig: Teubner.

P is for Phonemic Chart

8 08 2010

(That’s phonEMIC, not phonETIC, by the way. There’s a big difference!)

Ever since I’ve been teaching in the US I’ve been challenged by the need to devise a chart of the phonemes of American English (General American or GA) that can be used in the same way as the original British English (RP) chart, both as a training and a teaching tool. (Incidentally, it’s an often overlooked fact that the layout of the original RP chart – along with lots of ways of exploiting it in class – is due to the work of Adrian Underhill).

Adrian Underhill’s ‘Sound Foundations’ Chart (Macmillan)

In fact, the search for a GA equivalent goes back even earlier, to 1995, when I was assessing a CELTA course here in New York and was surprised to find that the language analysis trainer was trying to knock the round peg of GA sounds into the square hole of the RP chart. Fifteen years later I discover that not much has changed: another large training organisation here is using an “Americanized” version of the original RP chart, but one which not only includes five more vowel sounds than GA is normally credited with having, but adds two diphthongs ( /ʌɪ/ and /ɔʊ/) that, as far as I know, belong to no known variety of English!

Of course, the problem of devising a GA chart is complicated by the fact that – unlike the case of RP – there is no single, agreed upon, system of transcribing American vowels. (Compare any two American learners’ dictionaries, for instance). This is probably due to the fact that, while there is less accent variation across North America than there is within the British Isles, there is no single variety that can (or is allowed to) claim the prestigious status that RP enjoys.

In 2007, while teaching at SIT in Brattleboro, Vermont, I came up with a chart that was based closely on the description in Celce-Murcia et al. (1996) – see inset below (click to expand).

GA chart (after Celce-Murcia et. al., 1996)

The layout of the chart attempts to reflect the elegance of Adrian’s RP chart, with the consonants ranged from front-of-mouth to back-of-mouth obstruction, and the vowels roughly mapped on to the classic (Daniel Jones?) vowel quadrant. In terms of the symbols, the consonants were not a problem: the only change involved changing the symbol /j/ for a /y/. The vowels were another story.

First of all the layout had to be reconfigured to accommodate the fewer vowel sounds of GA (16 vs 20 in RP). While the three ‘heterogenous’ diphthongs are separated out and colour-coded, no attempt was made to distinguish the simple vowels from the vowels with an adjacent glide (/iy/, /ey/, /ow/) since the latter, technically, are not diphthongs.  Nor were combinations with /r/ (such as /ır/ and /or/) included, since, technically, these are not individual phonemes but are attempts to represent the way certain vowel sounds are “colored” by the consonants that follow them (which may be /r/, /l/ or /rl/). The only exception I made was the case of /ɜr/ which, as Celce-Murcia et al. point out, is used “to capture a significant difference in quality between the /ʌ/ in bud and the /ɜr/ in bird” (p. 105) and which they include as their “15th phoneme” of North American English (the 16th being the schwa).  Finally, an optional superscript /r/ was added to the schwa, because the combination of schwa and post-vocalic /r/ is often distinguished from schwa, phonetically, by being transcribed with a different symbol (ɚ). This represents the (phonemic) difference in GA between the final vowels in cheeta and cheater, for example. Note also that both /ɔ/ and /ɑ/ are represented in the chart, in deference to those varieties of GA that do distinguish between caught and cot.

This chart has served OK over the years, but I’ve not been entirely happy with it – not least because of the use of the consonant symbols /y/ and /w/ to flag lengthening and lip rounding, as well as the clumsy superscript [r]s. So I revisted the literature, and came up with a new one, based on the description in Roca and Johnson (1999). The consonants remain as they were. The main differences to the vowels is that I’ve abandoned the /y/ and /w/ add-ons, susbtituting symbols that more accurately realise the phonetic qualities of the homogeneous (adjacent glide) and heterogeneous (non-adjacent glide) diphthongs, colour-coding these respectively, as well as substituting the symbol ɚ for the r-coloured schwa alternative, and /ɝ/ for the r-coloured vowel in bird. I’ve also re-positioned /ʌ/ so that its central and back quality is more accurately represented, and turned the division between /ɔ/ and /ɑ/ into a dotted line to flag that, in some varieties, these two sounds are not distinguished. : chart v5

All comments will be gratefully received and acknowledged.


Celce-Murcia, M., Brinton, D.M., and Goodwin, J.M. (1996) Teaching Pronunciation. Cambridge University Press.

Roca, I., and Johnson, W. (1999) A Course in Phonology. Oxford: Blackwell.


Click here ( US phonemic chart ) to see a pdf version of Adrian Underhill’s GA Chart – mentioned in his comments below. (Thanks, Adrian!)