P is for Phonotactics

29 10 2017
knish 2015


Why is baseball called be-su-bu-ro in Japanese? Why do most learners say clothiz and not clothes? Why am I called Escott by Spanish speakers and Arabic speakers alike? Why can we say /gz/ when it is the middle of a word (exam) and at the end of a word (dogs) but not at the beginning? (Check a dictionary if you are in any doubt). Why are clash and crash recognizably English words but cnash is not?  Is it because it’s hard to say? Well, not if you can say knish, which – if you live in New York, and like to eat them – you regularly do.  It’s not that we can’t say cnash, cfash or cpash – we just don’t.

Why? The answer is, of course, is to be found in phonotactics, i.e. the study of the sound combinations that are permissible in any given language. (Important note: we are talking about sound combinations – not letter combinations – this is not about spelling).  In Japanese, syllables are limited to a single consonant plus vowel construction (CV), with strong constraints on whether another consonant can be added (CVC). Hence be-su-bu-ro for baseball. And bat-to for bat, and su-to-rai-ku  for strike (Zsiga 2006). As for Escott: Spanish does not allow words to begin with /s/ plus another consonant – hence the insertion of word-initial /ɛ/, which gives *Escott (like escuela, estado, etc) – a process called epenthesis. (Epenthesis accounts for the extra vowel English speakers insert in certain regular past tense combinations: liked, loved, but wanted.)


Shmuck with knish

English allows for many more consonant clusters than, say, Japanese or Hawaiian (with its only 13 phonemes in all), but nothing like some languages, like Russian. According to O’Connor (1973, p. 231) ‘there are 289 initial consonant clusters in Russian as compared with 50 in English.’ English almost makes up for this by allowing many more word-final clusters (think, for example, of sixth and glimpsed – CVCCC and CCVCCCC, respectively) but Russian still has the edge(142 to 130). Of course, these figures don’t exhaust the possibilities that are available in each language: there are 24 consonant sounds in English, so, theoretically, there are 242 two-consonant combinations, and 243 three-consonant combinations. But we use only a tiny fraction of them. And some combinations are only found in borrowings from other languages, like knish and shmuck. (Theoretically, as O’Connor points out, ‘it is possible to imagine two different languages with the same inventory of phonemes but whose phonemes combine together in quite different ways’ [p. 229]. In which case, a phonemic chart on the classroom wall would be of much less use than a chart of all the combinations).

Likewise, there is no theoretical limit as to which consonants can appear at the beginning of a syllable or at the end of it. But, ‘whereas in English all but the consonants /h, ŋ, j and w/ may occur both initially and finally in CVC syllables, i.e. 20 out of the total 24, in Cantonese only 6 out of a total of 20 occur in both positions, since only /p, t, k, m, n, ŋ/ occur in final position, the remainder being confined to initial position’ (O’Connor, p. 232).

It’s this kind of information that is often missing from comparisons of different languages. This was driven home recently as I reviewed a case study assignment that my MA students have been doing, in which they were asked to analyze the pronunciation difficulties of a learner of their choice. What often puzzles them is that the learner might produce a sound correctly in one word, but not in another – in some cases, even leaving it out completely. The answer, of course, is not in phonemics, but in phonotactics: it’s all about where the sound is, and in what combinations. And it is perhaps just as significant a cause of L1 interference as are phonemic differences.  Yet, apart from mentions of consonant clusters, there a few if any references to phonotactics in the pedagogical literature. (In The New A-Z of ELT, phonotactics gets a mention in the entry on consonant clusters, but – note to self! – phonotactics is not just about consonants: it also deals with vowel sequences, and which vowels habitually follow which consonants.)

Phonotactics is also of interest to researchers into language acquisition, since our sensitivity to what sound sequences are permissible in our first language seems to become entrenched at a very early age.  Ellis (2002, p. 149), for example, quotes research that showed ‘that 8-month-old infants exposed for only 2 minutes to unbroken strings of nonsense syllables (e.g., bidakupado) are able to detect the difference between three-syllable sequences that appeared as a unit and sequences that also appeared in their learning set but in random order. These infants achieved this learning on the basis of statistical analysis of phonotactic sequence data, right at the age when their caregivers start to notice systematic evidence of their recognising words.’

piet and knishery


Such findings lend support to usage-based theories of language acquisition (e.g. Christiansen and Chater 2016), where sequence processing and learning – not just of sounds but also of lexical and grammatical items – may be the mechanism that drives acquisition. It seems we are genetically programmed to recognize and internalize complex sequences: there is neurobiological evidence, for example, that shows considerable overlap of the mechanisms involved in language learning and the learning of other kinds of sequences, such as musical tunes.  As Ellis (op.cit.), summarizing the evidence, concludes, ‘much of language learning is the gradual strengthening of associations between co-occurring elements of the language and… fluent language performance is the exploitation of this probabilistic knowledge’ (p.173). What starts as phonotactics ends up as collocation, morphology and syntax.


Christiansen, M.H. & Chater, N. (2016) Creating language: integrating evolution, acquisition, and processing. Cambridge, Mass.: MIT Press.

Ellis, N.C. (2002) ‘Frequency effects in language processing: a review with implications for theories of implicit and explicit language acquisition.’ Studies in SLA, 24/2.

O’Connor, J.D. (1973) Phonetics. Harmondsworth: Penguin.

Zsiga, E. (2006) ‘The sounds of language,’ in Fasold, R.W. & Connor-Linton, J. (eds) An introduction to language and linguistics. Cambridge: Cambridge University Press.




23 responses

29 10 2017
Richard Wilson (@ritchartwinson)

This is something we studied at university but I haven’t seen mentioned again since I started English language teaching. Unfortunately, pronunciation still equates to phonemic script and drilling for many teachers. When discussing the range of possible speech sounds in English and what problems these might pose for Chinese learners in a training session, I was accused of making triphthongs up by the trainer!

30 10 2017
Scott Thornbury

Thanks, Richard. Checking some materials for teaching pronunciation, I have found very little that might be labelled ‘phontactics’. But, to his credit, Mark Hancock has a section (in English pronunciation in use (Intermedate) (Cambridge) called ‘Combining sounds’, which has exercises on consonant groups at the beginnings and ends of words, among other things.

29 10 2017

“These infants achieved this learning on the basis of statistical analysis of phonotactic sequence data”.

“What starts as phonotactics ends up as collocation, morphology and syntax”.

Says you, Scott. An interesting discussion is spoiled by your ignoring a basic requirement of any serious scholarship, viz.: to clearly distinguish between assertions and facts. Your claims about language learning aren’t facts, but rather assertions, and they’re assertions which rest on a young, incomplete and widely criticised view of second language learning .

Even the most strident advocate of the theory you seem to have so uncritically adopted, including Nick Ellis, recognise that attempts to explain language learning in terms of power law, connectionist models of language learning, etc., still have a very long way to go before they answer the myriad objections to them; see, for example the replies to the Ellis (2002) article you cite in the same issue of the SSLA journal. Lots more interesting work has been done since, of course (not that your chosen source, Christiansen & Chater (2016), does a good job of arguing the case – I note that even Geoffrey Sampson gives it the thumbs down), and we should welcome the work going on to explore the extent and limits of stats-based, connectionist-style learning mechanisms. But let’s remember that what they’re doing is testing a hypothesis in the laboratory; it’s a bit early to act as if their theory is now the established explanation of language learning.

Gregg & Eubank conclude their reply to Ellis by quoting Glymour (2000, p.194): “Although among animals and very young humans, causal relations must somehow be discovered from associations, associations are not causes. Nothing good can come of confusing the two ideas. Nothing good has.” That’s a big point about the limitations many see in associationist approaches. A smaller point about attempts to understand complicated phenomena is that nothing good can come from confusing claims stemming from theoretical hypotheses with facts.

Gregg, K.R. and Eubank, L. (2002) News Flash-Hume Still Dead. Studies in second language acquisition, 24, 2, 237-248.

30 10 2017
Scott Thornbury

Thanks for your comment, Geoff… which I sensed might be not long in coming, I have to say 😉

One man’s facts are another man’s assertions, I suppose. You (Geoff) believe that the poverty of the stimulus argument is a fact; I believe it is an assertion. Your belief resides not in any evidence you have personally gathered (e.g. through the study of large corpora of caretaker talk) but because it is a view shared by scholars you align with (a diminishing population, I have to add) and that, to you, it has a certain plausibility – perhaps because it accords with certain fundamental belief’s you hold about the separation of the mind from its body and ultimately from its environment.

I find it equally plausible that the acute sensitivity that we have to the phonotactics of our L1 (easily tested by asking a large number of people to separate a list of nonsense words into two categories – could be English vs couldn’t be English: grook, ptop, stross, sbuw , etc) comes from an innate ability to perceive and recognize frequently occurring patterns or sequences in the input we are exposed to from an early age. No other explanation – e.g. that we are somehow hard-wired with this knowledge – seems plausible (to little me).

It’s not a huge step from there to argue that our sensitivity to collocations (e.g. that it’s ‘a high probability’ not ‘a big probability’, and it’s ‘a good chance’ but not ‘a high chance’, etc etc) stems from the same innate but non-language-specific sensitivity to pattern frequencies – a priming effect, in other words.

Nor is it a huge step to argue that morphology might also be an effect of multiple primings: the classic ‘wug’ test on kids seems to have demonstrated this: ‘This is a wug. And here are two….’ ‘Wugs’. The much-touted morpheme sequences, so beloved by early proponents of a natural order of acquisition, seem to be – at least in part – an effect of frequency in the input.

The final step – into syntax – is perhaps a leap of faith, but having got this far, I’m prepared to take it, and thereby side with Michael Hoey, i.e:

‘What we think of as grammar is the product of the accumulation of all the lexical primings of an individual’s lifetime. As we collect and associate collocational primings, we create semantic associations and colligations … These nest and combine and give rise to an incomplete, inconsistent and leaky, but nevertheless workable, grammatical system.’

What starts as phonotactics ends up as collocation, morphology and syntax. QED.

30 10 2017
Scott Thornbury

A footnote to my previous comment on frequency effects with regard to collocations and other multi-word expressions (MWEs): in an article called ‘The idiom principle revisited’, (Applied Linguistics 2015:36/5) Siyanova-Chanturia and Martinez review a number of studies into the processing of multi-word expressions in both first and second language acquisition and argue (assert?) that these studies support usage-based and exemplar-based approaches to language acquisition, processing, and use. ‘According to the proponents of these theories, frequency effects are present in smaller (morphemes, words), as well as larger units (compounds, compositional phrases, idioms), and, thus, all linguistic material should be represented and processed in a similar way. As Bod (2006) argues, the allocation of representations to linguistic exemplars is accomplished purely on the basis of statistics (i.e. the frequency of occurrence) and, thus, language should be viewed not as a set of specific grammar rules (and neither as two distinct and disjointed entities – the lexicon and grammar), but as a statistical accumulation of linguistic experiences that changes every time a particular word, phrase, or sentence is encountered’ (p.560).

(And they specifically reference ‘growing evidence’ for Hoey’s theory of ‘lexical priming’ in the acquisition of MWE’s. It’s comforting to know that I am not the only one who is wrong!)

5 12 2017

I found this post interesting following some time I had spent In Hungary in a city called Szeged. The very first thing I learned about Hungary prior to my arrival was that in Hungarian- SZ is pronounced as [s], S is pronounced as “sh” and ZS is pronounced like “s” in the word “pleasure”.
Of course, there are plenty more phonotactics in the Hungarian language. Since that experience, I always find other languages interesting in that aspect.

29 10 2017
Heidi A. Karow

In one class, I have students from 4 different continents, including an Algerian who lived in Montreal for a stretch. I sometimes use Swan & Smith’s “Learner English”. It would be great to have a condensed chart for a busy teacher to reference for phonotactics.
I knew a Romanian named Aurel. Apparently all my attempts to pronounce the 3 syllables correctly missed the mark.

30 10 2017
Scott Thornbury

Thanks, Heidi. Learner English (Swan & Smith) does in fact include considerable information about phonotactics, although labelled ‘consonant clusters’. For example, on Arabic, there is a list of initial two-segment clusters occurring in English but not in Arabic (pr, pl, gr, gl etc). They also note that ‘the range of final clusters is also much smaller in Arabic. Of the 78 three-segment clusters and 14 four-segment clusters occurring finally in English, none occurs in Arabic’. Hence ‘monthiz’ for ‘months’ and ‘neckist’ for ‘next’.

29 10 2017

Thanks for this post Scott. Useful if you are working in a monolingual context especially. Richard, I think we need more discussions like that in teacher training courses… So much more to do than “listen and repeat” if learners are going to improve effectively.

30 10 2017

Hi Scott,

Thanks for replying.

I do NOT believe that the poverty of the stimulus argument is a fact, and it’s very telling that you should ascribe such a belief to me. Chomsky doesn’t believe that the poverty of the stimulus argument is a fact either, because, being a rationalist who rejects a relativist epistemology, he doesn’t suppose that one man’s facts are another man’s assertions; indeed, he makes a categorical distinction between the two.

Having ascribed a belief to me that I don’t hold, you then explain why I hold it! You suggest that I believe that the poverty of the stimulus argument is a fact not because of any first hand evidence, but because it’s a belief shared by scholars I align with and because “it accords with certain fundamental belief’s (sic) you hold about the separation of the mind from its body and ultimately from its environment”. This is a mangled reference to mind-body dualism, and no doubt you intend it to be a pot-shot against “positivists”, those deluded mugs suffering from science envy who just don’t “get” the splendid, chaotic, complex unity that reigns over the universe. Well, since the mind is a theoretical construct, I fail to see how it can have a body to be separated from, but in any case, I leave readers to judge the quality of your explanation. Let me offer my own.
take the poverty of the stimulus argument to be just that – an argument (hence, not a fact), and one that forms part of a coherent theory of language and language learning. Chomsky’s UG is a theory of language and of language learning that is supported by empirical evidence from studies that are reported in respectable academic journals and which lead to replication studies, which in turn lead to further discussion and refinement of the theory. The theory could well turn out to be wrong, but so far, it strikes many as the best description of language and the best theory of language learning to appear so far. If continued development in computer-based statistical language learning models and usage-based theories of language learning one day result in a powerful explanatory theory that stands up to empirical tests, then I’ll happily accept that UG is wrong. But such an event seems a long way off, and given the way things are today, there is no justification for your confidently telling people, some of them paying to do an MA course in SLA with you, that infants achieve their knowledge of phonotactics on the basis of statistical analysis of phonotactic sequence data as if it were a fact; it is no such thing

One strength of UG theory is that it offers what many see as the best explanation so far of the knowledge young children have of language. Your alternative explanation is that the knowledge comes from “an innate ability to perceive and recognize frequently occurring patterns or sequences in the input we are exposed to from an early age”. You state the case in such a way as to make it a reductionist argument that amounts to no explanation at all, as you’d appreciate if you understood how logic and scientific theory construction work. Furthermore, more articulate versions of statistical learning fail to explain the knowledge children have of aspects of language, including phonotactics, when there is zero frequency in the input.

The last bit of your QED argument is “the final step into syntax”, a “leap of faith” involving embracing Michael Hoey’s view of grammar. Hoey states that there are no rules of grammar, no English outside a description of the patterns we observe among those who use it. Grammar and semantics are post-hoc effects of the way lexical items have been primed, so there is no right or wrong in language, no sense in talk of something being ungrammatical. Everybody’s language is truly unique, in that all our lexical items are primed differently as a result of different encounters. I wonder how many of your readers will follow you as you genuflect to Hoey and leap off the cliff after him. How many will happily abandon any interest in the grammar rules which guide the construction of novel, well formed sentences? Hoey and you notwithstanding, people speaking English (including learners of English as an L2) invent millions of novel utterances every day, and they do so by making use of, among other things, grammatical knowledge.

You’re free to adopt a controversial view of language and a young, incomplete, disputed theory of language learning and to argue your case. Good luck to you, and it’s always good to hear you on these matters. What you shouldn’t do, I suggest, is to confuse assumptions or theoretical constructs with facts, or theories which offer tentative explanations of phenomena with the truth, whatever that turns out to be.

30 10 2017
Scott Thornbury

Hi Geoff… thanks for your comment. I’m sorry if I ascribed to you beliefs you don’t have. l also accept your point that usage-based theories are just that – theories. But they are not, perhaps, any more controversial than UG, even if younger. In their youth may be their vitality. After all, even if de Bot (2014) is half correct, UG has had its day: ‘That the universal grammar (UG) paradigm is in decline in AL [applied linguistics] is beyond doubt. It can be seen from the virtual disappearance of UG-based presentations at major conferences like the American Association of Applied Linguistics, International symposia on bilingualism, and the Boston conference on language development.’ Far from being ‘complete’ or ‘coherent’, it simply has not delivered (to me and one or two others) a credible explanation of first or second language acquisition. For example, de Bot quotes Andrea Tyler: ‘The current big gap between linguistics generally and language teaching was partly caused by the fact that UG could not explain many aspects of SLA and many people gave up linguistics due to the UG failure.’ Usage-based theories might not (yet) explain all aspect of SLA either, but at least they don’t ask you to accept that (as Chris Knight wrote, in an exchange of letters between himself and Chomsky in the London Review of Books recently) ‘the biological capacity underlying language didn’t gradually evolve, that it had no precursors but instead sprang up, perfectly formed, via a single mutation, or that it wasn’t designed for communication but remained inactive in speechless individuals for millennia following its installation. These notions are so asocial, apolitical and devoid of practical application that I can only assume Chomsky favoured them to keep his conscience clear: he needed them to ensure that his militarily funded linguistics couldn’t possibly have any military use.’ Yikes.

7 11 2017

Hi Scott, not sure if this has been referenced elsewhere on your site, but there’s an interesting, brief, readable, and recent paper by Ewa Dabrowska entitled “What exactly is Universal Grammar, and has anyone seen it?”, which provides a good overview of many of the issues with this topic.

8 11 2017
Scott Thornbury

Thanks, Patrick – yes, in fact I did post a link to this paper in the comments on the post P is for Poverty of the stimulus. For what it’s worth, I’ll re-post it:


30 10 2017

Hi again,

It’s certainly open season for Chomsky bashing, and I think it’s a good sign, even if much of the bashing seems to me to be so ill-informed and badly-argued. To say that fewer people take UG seriously or that it’s put language teachers off (applied) linguistics is to say nothing tthat challenges the theory itself, of course.

I haven’t read the Chris Knight piece in the LRB, but the bit you quote is more amusing than serious, surely. To start with, I don’t suppose Knight is daft enough to suppose that the sole purpose of language is communication, ignoring its rather important role in thinking. Whether or not the the biological capacity underlying language is asocial, or apolitical doesn’t seem critical to me, and nothing he says justifies the claim that it’s devoid of practical application. Still, I must say the last bit is quite witty.

30 10 2017
Scott Thornbury

If I remember, Knight was the author of a book that accuses MIT (and Chomsky by association) of benefitting from military funding during the cold war. As you say, that makes not a jot of difference to the probity or not of his theory, unless Knight is serious in suggesting that it was contrived so as to be useless to the Soviets!

31 10 2017
J.J. Almagro

Where do phonesthemes fit into the process of L2 acquisition/ learning? Do phonesthemes prime phonotactics? Meaning making over articulatory ability?

1 11 2017
Scott Thornbury

Thanks, JJ. The subject of phonosthemes (aka sound symbolism) is fascinating. For those who are not familiar with the term, a phonostheme is a sound or sound sequence that suggests a particular meaning. For example, the sn- in words like snore, snort, sniff, sniffle, snuff, snot, snarl, snicker, etc has a root meaning associated with the nose (itself a partial anagram of -sn – ). Likewise, the gl- in gleam, glitter, glow, glisten, glare etc relates to sight. And think of the -mble- words: mumble, grumble, tremble, bumble, stumble, and their associations with lack of precision or control. But, in answer to your question, I don’t know of any research into the priming capacity of such combinations. It stands to reason, though, that an English speaker, if offered a nonsense word with a phonostheme in it (e.g. snarm, snope etc) plus some possible definitions, would opt for the one that had associations with ‘nose’. And that is priming.

2 11 2017
J.J. Almagro

Thanks, Scott.
Kind of wondering if given a definition and asked for a made-up verb, would English speakers come up with verbs with the same phonostheme? One-syllable or two-syllable verbs?

25 11 2017
J,C. Laderas.

Absolutely! It’s interesting and an amazingly smart question… But, if you allow me…; in a word, it doesn’t matter, the real point is ‘what’ those ‘one/two/three-syllable verbs, truly bring about as a ‘theme/topic indeed, though, maybe, that would depend on the ‘abstraction weigh’, the resurgent issue ponders, i.e: ‘more resounding – more syllables, it could be…

1 11 2017

A great topic!
This topic sounds interesting to me because it reminds of a course in contrastive analysis of Persian and English sounds while I was doing BA at the university. In addition to the Arabs and Spanish learners, Iranian learners tend to add ?e for words staring with sp, st, , sc, sk, sm, sn, and sl. Thus “Scott” is rendered as ?escott This is probably because CCV cluster common in English is non existent in Persian, especially in initial positions. Instead, CV, which is common in Persian initially, is substituted for CCV in English. In addition, Iranians tend to aspirate “p” and “k” and “t” in all positions. so where “k” after “s” in “Scott” is aspirated while in English S, K and T are unaspirated after s. The point I want to make is that these differences cause nonnative speakers to speak English with an accent, though some of these differences are conspicuous changes in one’s accent like the ?s in Scott and some trivial like aspiration.

1 11 2017
Scott Thornbury

Thanks for those examples of contrastive phonotactics, Shahram. It seems that Spanish is not alone in applying epenthesis to initial s+ consonant combinations.

5 11 2017
Nyr Indictor

Thanks for this interesting post. Two very minor comments:
1. “Why is baseball called be-su-bu-ro in Japanese?” It’s not. It’s usually called “yakyū” (野球 in kanji, やきゅう in hiragana). To be sure, most Japanese yakyū enthusiasts know and use the English word, which, in Japanese, is pronounced bēsubōru (ベースボール in katakana). Bēsubōru is perceived as 6 syllables (properly morae), with the long vowels each counting as two units. The long ē is a rendering of the diphthong /eɪ/, but I’m not sure why the /ɔ/ becomes a long ō (perhaps vowel plus continuant /l/ sounds like a long vowel to Japanese speakers?).
2. While many New Yorkers pronounce “knish” /knɪʃ/, quite a few insist on /kə’nɪʃ/, thus avoiding the /kn/ cluster. People who say /nɪʃ/ are generally understood to be from out of town.

5 11 2017
Scott Thornbury

Thanks, Nyr, for the correction. Not being a speaker of Japanese, I was relying on my source (Zsiga 2006, mentioned in the references), but I should know always to check even the seemingly most authoritative sources!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: