5 11 2017

West GSL“Lexis is the core or heart of language”, wrote Michael Lewis  (Lewis, 1993, p. 89). Yes, but which lexis?  Given the hundreds of thousands of words that there are, which ones should we be teaching soonest? Is there a ‘core’ vocabulary? If so, where can we find it? If it is a list, how is it organized? And on what principles of selection is it based?

These questions were prompted by a student on my MA TESOL who asked if the measure of an item’s ‘core-ness’ was simply its frequency. I suspected that there might be more to it than this, and this impelled me to look at the literature on word lists.

The most famous of these, of course, is Michael West’s General Service List (GSL), first published in 1936 and then revised and enlarged in 1953. I am the proud owner of not just one but two copies of West, one of which clearly once belonged to a writer (see pic), who used it to keep within the 3000 word limit imposed by his or her publishers.

Michael West flyleaf.jpgCompiled before the days of digitalized corpora, the GSL was based on a print corpus of up to 5 million words, diligently trawled through by a small army of researchers (‘of high intelligence and especially trained for the task’) for the purposes of establishing frequency counts – not just of individual words but of their different meanings.

But frequency was not the only criterion for inclusion in the GSL. West and his collaborators also assessed whether a word was relatively infrequent but necessary, because it lacked a viable equivalent – ‘vessel’ being one: ‘container’ doesn’t work for ‘blood vessels’, for example.  Conversely, some words may be frequent but unnecessary, because there are adequate non-idiomatic alternatives, i.e. they have cover. Finally, informal and highly emotive words were excluded, on the grounds that they would not be a priority for learners.

In the end the GSL comprised around 2000 word families (but over 4000 different lemmas, i.e. words that have the same stem and are the same part of speech: dance, danced, dancing, but not dancer) and even today, despite its age, the GSL gives a coverage of nearly 85% of the running words in any corpus of non-specialist texts (according to Bresina & Gablasova, 2015 – see below).

Subsequently, Carter (1998) has elaborated on the criteria for what constitutes ‘core-ness’. One is a core word’s capacity to define other words. Hence the words chosen by lexicographers for dictionary definitions are a reliable source of core vocabulary. One such is the Longman Defining Vocabulary (LDV): you can find it at the back of the Longman Dictionary of Contemporary English (my edition is that of 2003) or at a number of websites, including this one.

The publishers comment, ‘The words in the Defining Vocabulary have been carefully chosen to ensure that the definitions are clear and easy to understand, and that the words used in explanations are easier than the words being defined.’

laugh entry GSL

Entry for ‘laugh’ in the GSL

Another test of coreness is superordinateness: ‘Core words have generic rather than specific properties’ (Carter 1998, p. 40). Hence, flower is more core than rose;  tool more core than hammer. For this reason, perhaps, core words are the words writers tend to use when they are writing summaries.


Core words are also more likely to have opposites than non-core words: fat vs. thin, laugh vs. cry. But what is the opposite of corpulent, say? Or giggle?

Core words also tend to have a greater range of collocates – compare start vs commence (start work/an argument/a career/a rumour/a conversation etc.) And they have high word-building potential, i.e. they combine productively with other morphemes: startup, headstart, starter, starting line, etc.  Core words are also neutral: they do not have strong emotional associations; they do not index particular cultures (dress vs sari, for example), nor are they specific to certain discourse fields: compare galley, starboard, and below deck  with kitchen, left, and downstairs. (i.e. a nautical discourse vs. a less marked one.)

On this last aspect, an important test of a word’s coreness is not just its overall frequency but its frequency in a wide range of contexts and genres – its dispersion. In a  recent attempt to update the GSL, and to eliminate the subjectivity of West’s criteria, Bresina & Gablasova (2015) tested for the ‘average reduced frequency’ (ARF): ‘ARF is a measure that takes into account both the absolute frequency of a lexical item and its distribution in the corpus… Thus if a word occurs with a relatively high absolute frequency only in a small number of texts, the ARF will be small’ (op. cit, p. 8). Bresina and Gablasova also drew on – not just one corpus – but a range of corpora, including the 12-billion word EnTenTen12 corpus, to produce a New General Service List which, while much trimmer than West’s original (2500 vs. 4000 lemmas), and therefore perhaps more ‘learnable’, still gives a comparable coverage of corpus-based text – around 80%. (The full text of the article, along with the word list itself, can be found here).

More impressive still, and also called a New General Service List, is the one compiled by Browne et al (2013) which, with 2800 lemmas, claims to provide more than 90% coverage of the kinds of general English texts learners are likely to read.

Other potentially useful word lists include The Oxford 3000: ‘a list of the 3000 most important words to learn in English’ – accessible here.  Again, dispersion – not just frequency – has been an important criterion in choosing these: ‘ We include as keywords only those words which are frequent across a range of different types of text. In other words, keywords are both frequent and used in a variety of contexts.’ And the publishers add:

In addition, the list includes some very important words which happen not to be used frequently, even though they are very familiar to most users of English. These include, for example, words for parts of the body, words used in travel, and words which are useful for explaining what you mean when you do not know the exact word for something. These words were identified by consulting a panel of over seventy experts in the fields of teaching and language study.

Inevitably, there is a lot of overlap in these lists (they would hardly be ‘core vocabulary’ lists if there were not) but the differences, more than the similarities, are intriguing – and suggestive, not only of the corpora from which the lists were derived, but also of the criteria for selection, including their intended audience and purpose. To give you a flavor:

Words in West’s GSL not in LDV: plaster, jealous, gay, inch, widow, elephant, cushion, cork, chimney, pupil, quart.

Words in LDV not in GSL: traffic, sexual, oxygen, nasty, infectious, piano, computer, prince.

Words in Oxford 3000 not in either GSL or LDV: fridge, gamble, garbage, grandchild, sleeve, software, vocabulary… Note also that the Oxford 3000 includes phrasal verbs, which are not systematically included in the other lists, e.g. pull apart/ down/ off/ in/ over/ through/ up + pull yourself together.

Of course, the key question is: what do you actually do with these lists? Are they simply guidelines for materials writers and curriculum planners? Or should learners be encouraged to memorize them? In which case, how?



Brezina, V. and Gablasova, D. (2015) ‘Is there a core general vocabulary? Introducing the New General Service List,’ Applied Linguistics, 36/1. See also this website: http://corpora.lancs.ac.uk/vocab/index.php

Browne, C., Culligan, B. & Phillips, J. (2013) ‘New General Service List’ http://www.newgeneralservicelist.org/

Carter, R. (1998) Vocabulary: Applied linguistic perspectives (2nd edition) London: Routledge.

Lewis, M. (1993) The lexical approach. Hove: LTP.

West, M. (1953) A general service list of English words. London: Longman.



29 responses

5 11 2017

Thanks for another fascinating post, Scott, chock-full of links to valuable resources. May I add another? My colleagues use the Academic Word List (https://www.victoria.ac.nz/lals/resources/academicwordlist) when introducing students to academic writing, particularly in the fields of engineering, IT and the natural sciences.

6 11 2017
Scott Thornbury

Thanks, James, Yes, for lack fo space I omitted mention of the AWL, although its ‘design principle’ – like the (new) GSL(s) also balances frequency with dispersion (or what Coxhead refers to as ‘range’): the 500+ word families in the AWL are frequent across four academic sub-corpora: Arts, Commerce, Law and Science. A very useful resource.

17 11 2017

The AWL was also updated, resulting in the NAWL. It can be found at the NGSL web site.

5 11 2017
Heidi A. Karow

I have used those lists to put the vastness of the English language into perspective, making a distinction between basic words, so-called academic words, and finally those specialized terms used by specific groups primarily or even exclusively (e.g. surgeons, taxi drivers, linguists, etc.). I did this with international students at a Canadian college.

My teaching focus is more practical currently, community-based with lower level students. Those lists still can serve as a reference to check if a given word is the best one to highlight or not. However, my daily schedule is so busy these days, I would do this rarely.

6 11 2017
Scott Thornbury

Thanks, Heidi. And a great tool that gives you instant information of the ‘vocabulary profile’ of any given text – i.e. which words in the text are in the top 1000 frequency band, and which are the AWL (see James’ comment above) is the VocabProfile function at Lexical Tutor: https://www.lextutor.ca/vp/

5 11 2017
David Deubelbeiss

I have to disagree with James. Not valuable resources at all but I do appreciate the historical and qualitative review of these large “lists”.

The use of word lists and especially large lists of words (core, general, toeic, academic and so on …) as teaching and study “material” js in my opinion the primary reason so many schools fail at teaching the language. Think of all the years students spend learning a language at schools only to miserably be able to say “hello”, “goodbye”. It’s an indictment of the profession. Where emphasized like Japan, Turkey you see the unfortunate results. There is considerable evidence that memorization (which word lists implicitly promote) impedes learning and especially fluency. It’s comparable to the ineffective rote memorization methodology of grammar translation. Vocabulary is “learned” incidentally – through meaningful exposure to the words in a purposeful form. Let’s keep it real. Lists are intellectual fabrications. Students don’t need to “know” language (unless preparing for those absolutely useless tests). Students need to “grow” language. The creation of lists, just a ploy for X professor or researcher to promote their name and keep their job.

I’ll add that the allure of “lists” comes from the teacher, not the student. From the curriculum designer not the occupant living there. It gives the appearance of structure, rigor, method but other than for absolute beginners, is all smoke and mirrors. Let’s teach organically not structurally.

6 11 2017
Scott Thornbury

“There is considerable evidence that memorization (which word lists implicitly promote) impedes learning and especially fluency.” This is a bold claim, David! I’d agree that the memorizing of lists of decontextualized, randomly selected – and often low-frequency – words, which is a major part of a number of educational cultures – is probably not time well spent – compared, say, to putting the same time into extensive reading. Nevertheless, to get to the point that extensive reading is fluent, learners need a critical mass of (recognition) vocabulary (or productive vocabulary, if the goal is spoken fluency), and – all things being equal – the kinds of lists I have been discussing offer the means to achieving this threshold, if used in conjunction with strategies that encourage expanded rehearsal, i.e. increasing the length of time between testing and review, through the use, for example, of word cards (either analogue or digital). To quote Nation and Waring:

The suggestion that learners should directly learn vocabulary from cards, to a large degree out of context, may be seen by some teachers as a step back to outdated methods of learning and not in agreement with a communicative approach to language learning. This may be so, but the research evidence supporting the use of such an approach as one part of a vocabulary learning program is strong.

1.There is a very large number of studies showing the effectiveness of such learning in terms of amount and speed of learning. See Nation (1982), Paivio and Desrochers (1981) and Pressley et al. (1982) for a review of these studies.

2. Research on learning from context shows that such learning does occur but that it requires learners to engage in large amounts of reading and listening because the learning is small and cumulative (Nagy, Herman and Anderson, 1985). This should not be seen as an argument that learning from context is not worthwhile. It is by far the most important vocabulary learning strategy and an essential part of any vocabulary learning program. For fast vocabulary expansion, however, it is not sufficient by itself. There is no research that shows that learning from context provides better results than learning from word cards (Nation, 1982).


7 11 2017
David Deubelbeiss

I’m not against vocabulary study per say. The devil is always in the delivery. I think there is a place for basic vocabulary study on X topic. But we take it way too far …. I’ll make no apologies for advocating “messy” vocabulary acquisition through “meaningful learning” and my own experience tells me that students learn best without memorization and retrieval practice but just plain practice using the language and connecting new to prior knowledge meaningfully. There must be a problem to solve in order for a learner to put something into long term memory. Memorizing lists in whatever fashion is devoid of a problem – all the information is there. In using lists, we want to put one order into that which order remains multiple and individual. Ok – Lewis Carroll said it better.

“There is no research that shows that learning from context provides better results than learning from word cards (Nation, 1982).”

I’d pull my Frank Smith out of my hat (2004) – There is no shortage of research demonstrating the powerful facilitator effect of meaningful context on word and meaning identification. Indeed there is no evidence to the contrary.” Paul Nation has done a lot but I’ll take Frank Smith any day. The problem with SLA is it is so insular, so focused on the pieces, it doesn’t see the whole. We don’t learn language piece by piece ….

I’ll suggest further about this problem – there is a strong and false paradigm that governs so many researcher’s and teacher’s beliefs about language learning. A belief that the brain is a storage and retrieval system and that to learn language one must fill this box and also know how to store things in the box and then retrieve them. (the banking system). I totally disagree with this and it is a wrong belief that misdrives research and practice. We know the brain does not work like this except for short term memory (which is soon dissolved). Long term memory needs to make meaningful connections …. Further, learners need to know both what a word is and more importantly what it is not – lists don’t provide that – the world of experience does.

But if lists and flashcards are so good – what is responsible for the horrid results of school systems where students learn lists of words each day or week? It’s a lack of reality, a world of words divorced from purposeful use. I taught grade 4 ESL for years and it was standard for lang. arts class to use word lists. Dolch, Fry, others … Each week a chapter of students doing fill in the blanks, crosswords, flashcards, manipulating words in workbooks. The teacher assessing students through questioning, testing. My own experience shows the next year – the same students knew a pitiful percentage of said words. What kind of education is this? Further – what crime did I participate in?

Students can become good at word and meaning identification but this is far from what language is about. It’s just something that in the short term can be recalled, tested and provide the illusion that the teacher and system is successful. That’s how 32 million Americans have graduated from grade school but yet remain illiterate, unable to read and write. Fortunately, they didn’t learn to speak and listen in school.

– We can argue in circles but I prefer to judge things based on my own experience as both a language teacher and learner – not by the many horrid research studies (some exceptions) that do exist about language acquisition and the black box that is the mind (small samples, impossible controls [how do you know what a student is thinking], conclusions that aren’t statistically valid, the problem of “meaning”,). I think of how for thousands of years people have learned language without lists, flashcards and rote memorization. I think of my own learning and what I’ve tried. I think of the distinction between “knowing language” (words, expressions etc …) and “being language” not just able to use the language but also flow and feel the language. I think of the times in my second language, many – where words just came to me. Never studied them on a list but picked them up through the environment. I think of how many times I didn’t know the words but knew the meaning and through exposure in context, those words became my own (a much more efficient to learn this way).

Sorry for the long reply, typing fast this morning …. Learning IS messy and by default teaching too.

8 11 2017
Scott Thornbury

Thanks, David, for your impassioned response! I take your point that language items that are acquired from contexts of use are probably more durable than those memorized randomly off a list. In fact – as an advocate of usage-based approaches – it would be hypocritical to state otherwise. However, I also recognize that not all learners have unfettered exposure to real language in use (bearing in mind that you need multiple encounters with a word in order to acquire it from incidental exposure), nor will all learners necessarily have the attentional or even motivational resources to make use of the language they are exposed to, however rich. Most writers on vocabulary acquisition would seem to concur with Nation (2001) that, although incidental vocabulary learning from, for example, extensive reading is possible, ‘learning rates can be increased considerably by some deliberate attention to vocabulary’. Attention-drawing can include pre-teaching and ‘seeing a list before reading’ and ‘having a list while reading’ (p.252).

There may also be priming effects from having first learned a word from a list, and then encountering it ‘in the wild’. But that’s my view, not Nation’s, and comes from my reading of Hoey.

In the end, as Nyr suggests below, it’s not about lists per se, it’s what you do with them. The kinds of fairly mechanical activities you describe would not seem to be conducive to any sort of learning, let alone learning the words of an L2.

On the other hand, Folse (in Vocabulary Myths, Michigan, 2004) has a whole chapter dedicated to the myth that ‘using word lists to learn second language vocabulary is unproductive.’ After reviewing the research (some of it his own) he concludes that ‘lists are not the evil that they have been portrayed to be. Research to support this claim of evil simply does not exist. In fact, many learners like learning from lists and actually ask for them. Therefore, it is important that teachers be aware of the various professionally developed lists that may (or may not) be appropriate for the particular students’ (p.44).

5 11 2017
Luiz Otávio Barros

Thanks for this fascinating post, Scott. Three things occurred to me as I read it.
1. I sometimes find myself choosing lexis on the basis not only of inherent “coreness” but also on the topic at hand. There’s sometimes a bit of a trade-off between the kinds of words students will need to complete a certain task/discuss a certain topic and the kinds of words they should know because, well, it’s a frequent/useful/usable item in its own right.
2. It seems that different degrees of “coreness” within one single lexical set can be tricky, too. So if we, say, choose to teach arm, hand and shoulder, some students will inevitably ask, “And what’s the word for [elbow]? How about [wrist]?” So, personally, in those cases I find the boundaries even fuzzier.
3. Because these lists try to be as universal as possible, they ignore what seems to be an important variable: the students’ mother tongues. There are lots and lots of “core” words that speakers of Romance languages will recognize as they encounter them (formally and incidentally), and it seems like a waste of time to present and clarify meaning in most cases. So, even if we focus mainly on pronunciation, collocation and the “grammar” of the word (to me, the most sensible thing to do), do those words count as “new” words? Do they make it to the whatever list we choose to use?

6 11 2017
Scott Thornbury

Thanks, Luiz, for your insightful comments. I’d totally agree that there are other factors, apart from ‘coreness’, that will determine the items to include in a lexical syllabus, and that knowledge of the 2000 or 3000 most frequent words may still not be sufficient to engage with even many non-specialist texts. In fact, it is often the non-frequent words in a text that carry the burden of the text’s meaning. This is where a keyword tool is helpful, i.e. a program that identifies the words that are significantly frequent in a text, compared to their frequency in a reference corpus. Again, Lexical Tutor has just such a tool: https://www.lextutor.ca/key/

Your second point – about varying frequency within lexical sets – is also well made. It’s a fact, apparently, that the least frequently mentioned day f the week is Thursday and the most frequently mentioned one is Sunday (I may be wrong, but you get my drift). It would be supremely counter-intuitive to teach only those days of the week that are mentioined often, leaving Thursday out. This is what I call the ‘Wardrobe argument’ – the idea that low frequency items like ‘wardrobe’ should not be taught even when presenting other items of furniture (‘Don’t mention the wardrobe!’). It might be better to think – not so much in terms of frequent words – but in terms of frequent lexical sets.

And yes: cognates offer (many) learners a useful bridgehead into the target language lexicon – and should be exploited as much as possible (is there a list of Spanish/English cognates, I wonder, that are also core words?).

5 11 2017

David highlights the problem with wordlists, the temptation to believe that teaching the 3,000 words as a list is the panacea we have been waiting for.

Instead the list though is surely a reference that comes after the communicative aim has been established? Teachers should be asking themselves how this list be used to supply input for students that need to talk about thoughts and feelings about their next job interview, or how many new collocations of ‘have’, ‘take’ and ‘make’ can be made and used by students to think about improvements to their immediate surroundings. It is how things like this are used. Personally, I want to know which words collocate most with others – a reason I have a copy of your Natural Grammar, Scott, and use websites such as Just the Word – because these are what students are more likely to encounter. Likewise, a greater ability to use the 3,000 words flexibility with all their collocations, colligations, etc. is what really separates proficient from non-proficient users of a language.

6 11 2017
Scott Thornbury

Thanks for the comment. With regard to memorizing word lists, using, for example, word cards, see my response to David above. Of course, words on their own – as you suggest – are not enough: it’s the way that they combine that is the key to fluency. But maybe memorizing (well designed) word lists is a way of kick-starting the process.

7 11 2017
Josh Kurzweil

Hi Scott,
Thanks for a great post and for starting such an interesting discussion. The idea of well-designed lists really strikes home for me. In my own teaching and learning, I have been exploring the idea of vocabulary lists that related to topics and tasks. For example, what words do people need to talk about jobs or hobbies (at basic levels) and topics like crime, the healthcare system, and business at higher levels.

As a learner, I like having lists that I can study so that I can prepare to talk about topics. With lists, I can also benefit from study strategies such as spaced repetition and retrieval practice. Talking about a challenging topic/task makes me ‘hungry’ to learn those words and want to try having that conversation again. In the past, I often found it frustrating to learn a bunch of words through a conversation but then not have an opportunity to use them again because we moved to a new topic in the next lesson. By recycling speaking tasks/topics with well-designed lists, students can feel a sense of progress. This approach fits nicely with the idea of recycling speaking tasks as discussed in “Teaching Speaking” by Goh and Burns.

What are some well-designed lists that you think serve communicative language teaching? I’d love to see what you and other folks have done.

p.s. If anyone is curious to see lists I have created, you can find some on my Quizlet site.

8 11 2017
Scott Thornbury

Thanks, Josh, for your endorsement of the lists concept, and for the link. Quizlet is a great tool for designing digitalized word cards for the kind of spaced repetition you mention.

5 11 2017

hi all

the comprehensible input people e.g. tprs (teaching proficiency through reading & storytelling) have things like the Super Seven which are 7 seven verbs covering various semantic domains such as Location, Existence, Possession etc [http://tprsforchinese.blogspot.fr/2013/07/super-seven.html];

one could do something similar using the semantically tagged new-GSL list that Scott discusses here over at this site – [http://corpora.lancs.ac.uk/vocab]

so for example in the “psychological actions and processes” category if we were working with intermediate students then verbs such as “experience”, “observe” “approach”, which rank 1012, 1116 and 1141 respectively could be used as a basis for a class (note i am assuming here for illustration purposes that intermediate students would need vocabulary beyond the first 1000 word)

btw for history geeks here is a partial timeline of wordlists – http://timemapper.okfnlabs.org/muranava/a-brief-history-of-wordlists


6 11 2017
Scott Thornbury

Thanks, Mura. And thanks for the links to the new-GSL – which I’ve incorporated into the original post. The idea of using semantic categories as a guide to both including ‘necessary’ words and weeding out unnecessary ones is intriguing – and sort of returns us to the idea of the notional syllabus first mooted by Wilkins in 1976, where ‘the linguistic content is planned according to the semantic demands of the learner’ (p. 19). Thanks to corpus data, we are now in a better position to identify that linguistic content, especially at the level of lexis and phraseology.
And being a history (of ELT) geek, I loved the time-line!

6 11 2017

Pinning the kalligrammatid to the wall, the prehistoric lepidopterist smiled in recognition that the pin brought birth to her immortality and death to prehistory. “Now people will understand that these flying insects came from the earliest times known to humanity.” Years later Chuang Tzu woke up and thought, “What the…heck…is going on here?” Meanwhile the weather over here has taken a turn for the worse and Ria and Ben are going through a difficult time.

Of course, all time is difficult and no pin can hold a butterfly to a cave wall. If Chuang Tzu really did wake up, his answer would have been clear and perhaps the flapping of the wings was in response to the early winds of the nascent tornado. One notes, with raised eyebrows, that the difficulties experienced by Ria and Ben helped pay for the mortgage and brought happiness to many.

Most of the words we use belong to no list at all. Yet.

Cuss dis.

6 11 2017
Scott Thornbury

Chortling audibly.

6 11 2017

Richards (2001) argues that frequency is not a good criterion for vocabulary selection. Comprehending 85 percent of a text does not depend on the proportion of familiar words. It does not mean that one understands 85 percent of the text. Comprehension mainly depends on the subject or the topic, the writer’s approach, and learners’ background knowledge.

In addition, frequency is different from usefulness. Most frequent words may not be useful. On these grounds, frequency and dispersion (or range) are believed to be the most useful criteria. However, words with the highest frequency and widest range are also problematic because they may not be the most teachable ones in an introductory language course.

Besides frequency and dispersion, Richards lists the following criteria for vocabulary selection: teachability ( ease of teaching words realia), similarity (cognates), coverage (hyponymy), availability ( words in their immediate contexts), and defining power (words used to define others).

8 11 2017
Scott Thornbury

Thanks, Shahram, and I would agree with Richards that familiarity with the individual words in a text does not necessarily correlate with understanding the text. I like to use this text (from the Encyclopedia Britannica) as an example: ‘In the General Theory of Relativity, space and time are first fused together into a continuum called space-time. The geometrical properties of this space-time determine the evolution of the physical processes in space and time. The geometrical properties of the space-time continuum in turn are determined by the masses (and physical processes) present in space and time. Thus in the latter theory we do not have anything external or uninfluenced, and the series of producers of mechanical phenomena is closed.’

On the other hand, lack of familiarity with the words in a text can be a serious impediment to understanding – easily demonstrated by using a text in which every seventh word is a nonsense word:

Traditional perry making is broadly similar to chernaguyoo cider making, in that the fruit is drabbled, crushed, and pressed to extract the juice, chock is then fermented using the wild mimmies found on the fruit’s skin. The lobly differences between perry and cider are broot pears must be left for a sobby period to mature after picking, and yim pomace must be left to stand blorrel initial crushing to lose tannins, a taddomy analogous to wine maceration. After initial jammy, the drink undergoes a secondary borgavish fermentation while maturing.

8 11 2017
Nyr Indictor

Thanks for a remarkable overview of the history of core vocabulary lists. There is an enduring sense (we see it in the records of European explorers to other lands) that the basic knowledge of a language can be encapsulated in a wordlist. Perhaps one of the most impressive feats of core vocabulary compilation is G. A. Grierson’s massive Linguistic Survey of India, 11 volumes in 19 parts, which attempted to record the same vocabulary items for every language in that nation. A different kind of core wordlist is the Swadesh list (and many others like it) used by historical comparative linguists to assess the degree of divergence between languages.

In the context of language teaching, I have mixed feelings about the concept of a “core vocabulary.” I see nothing wrong with establishing a list of important-words-that-students-really-need-to-know, and do not feel, as David does, that doing so will necessarily encourage rote memorization. Most beginning textbooks I have encoutered contain wordlists (or vocabulary indices) at the end, which are de facto core wordlists, and I don’t get the impression that many students try to sit down and memorize these lists. I also disagree that “The use of word lists … is … the primary reason so many schools fail at teaching the language.” Personally, I love word lists, and find them motivating when I learn languages. (As an occasional interpreter, wordlists have been critical resources for me.) I think that students are unsuccessful at learning languages for many reasons (and it is not always a failure on the part of the school, though of course it can be). I think it can be very helpful to have wordlists because they help the teacher and student focus on attainable goals.

But I do have problems with the core vocabulary concept, mostly on a sociopolitical level.

The intriguing inscription in Scott’s copy of West’s GSL is a fascinating record of how such a list may be used, but it also makes me cringe: here we see the list taking on a life of its own, with a publisher hiring a writer to work within the parameters of the list. I don’t feel comfortable with institutions (governments, companies, advertisers, publishers) dictating language use. Imagine writing a whole book without being able to use words like “giggle” and “corpulent”. Not only is a book liable to be a bit soulless when written under such constraints (it reminds me of lipogrammatic novels written without a particular letter of the alphabet; see the Wikipedia article on constrained writing for other such fancies), it has the effect of presenting only those words that are easy for students to learn.

Here’s what I mean. Take the word “chair,” for instance. Unless your students come from a culture where chairs are not used, the chances are they will have a word that is a near-semantic equivalent in their L1. Learning this word will be easy enough. But words like “giggle” or “straddle” or “curdle”? These words (all frequentative verbs) are unlikely to have close L1 counterparts; any translation will be a paraphrase. In my opinion, the frequentatives are part of what gives English its flavor; you don’t need to know all of them (there are over 100 ending in -le and -er, some of them rare or dialectal), but knowing a few will give you a taste of what English is. Such words challenge our students (and us as well) to look at the universe in new ways. “Corpulent” is rarer, and belongs to a smaller, but equally flavorful group of about 30-40 words (e.g., truculent, virulent, fraudulent, opulent, succulent), some of which are obsolete. Onions (1966) provides information about these prefixes (see references, below).

If we limit ourselves to teaching a core vocabulary, we are limiting ourselves (and our students) to a subset of English, one that may be useful for international business, but one that minimizes those aspects of English that are interesting, unique and which open the mind to new ways of expression. By all means use textbooks that focus on a core vocabulary, but don’t forget to subvert the constraints of the textbook by introducing other language as well.

One area where core vocabularies are vital is in certain ESP contexts. I would not want to fly in a plane piloted by someone who hadn’t learned core aviation vocabulary well. Similarly, multilingual teams working on an oil rig, in an operating room, or in a mine need to rely on a core language (not just lexis) that they all know well, receptively and productively.

Finally, no discussion of core vocabulary would be complete without a reference to the most lucrative core vocabulary list ever created, namely the 400-word list devised by Phyllis Cerf for Theodore Geisel (Dr. Seuss). The two co-founded the Beginner Books imprint of Random House (see the Wikipedia entry for Beginner Books, and also Morgan and Morgan (1995)), created in 1957. Geisel kept to the list, and in some cases used far fewer words, notably Green Eggs and Ham, which has only 50 different words. The success of the Dr. Seuss books changed the world of early readers for children, many of which are still written to conform to core vocabulary lists. Personally, I am not a fan of these works, but I feel one must acknowledge the impact such books have had, not only on Children’s Literature, but on the English language itself.

Grierson, G. A., ed. and comp. (1904-28) Linguistic Survey of India. Calcutta : Office of the Superintendent of Government Printing, India. Reissued in the 1960’s by New Delhi: Motilal Banarsidas. There is also a searchable online database (currently down) at http://www.joao-roiz.jp/LSI/.

Morgan, Judith and Neil (1995) Dr. Seuss & Mr. Geisel: A Biography. New York: Random House.

Onions, CT, ed. (1966) Oxford Dictionary of English Etymology. Oxford at the University Press. See especially the entries for the frequentative suffixes “-er4” (p. 323) and “-le3” (p. 520), as well as the suffix “-ulent” (p. 954).

10 11 2017
Scott Thornbury

Thanks, Nyr, for your comment, and your fascinating insights into the history of lists. Frequentative suffixes seem to straddle the border between phonology and lexis – or between form and meaning, even – and resonate with the comment about phonosthemes made by J.J.Almagro in response to my post about phonotactics.

10 11 2017
Lexical Leo

Thank you for this interesting overview, Scott. It’s reassuring to know that frequency was not the only factor taken into consideration when the original GSL was compiled (by people ‘of high intelligence’). Generally, I also have mixed feelings about vocabulary lists but not because of rote learning that they seem to encourage.

My main problem with such lists (be it GSL or the one found on LexTutor) is their organising principle – around word families. It assumes that all the words within a family are of equal frequency, let alone equal utility. “Farther” and “farthest” are included under the same entry as “far”, but learners encounter “far” relatively early on, and the other two later. Or take “excellent” for example, which is much more frequent than its root “excel”, and much more useful for learners right from the very beginning. This ‘unfair’ treatment* of all words in the same family is what I want to address in my IATEFL talk in April (if it gets accepted!). The new-GSL (Brezina & Gablasova 2015) partially redresses this – it’s organised by lemmas/headwords rather than word families.

Another fallacy associated with such lists is the assumption that all senses of a word are equally frequent and important. But the process of vocabulary learning is incremental, as many of the compilers of such lists acknowledge themselves (e.g. Paul Nation); you can’t possibly learn ALL senses of a word at once. Intermediate learners can learn the word “disposal” as part of the chunk “I’m at your disposal” and only later encounter its additional senses (e.g. waste disposal). Therefore it would be helpful if “disposal” and any other polysemous word entry (and most English words are polysemous) were accompanied by glosses, usage notes or guidelines as to what sense to focus on.

As to your question, I think such lists should mainly be used by course planners and balanced with such considerations as relevance to the learner in a given context – I echo one of Luiz’s points here. “Church” might be a highly useful word in some contexts/cultures but “mosque” or “synagogue” would be more relevant in others – I don’t know if the more superordinate word “temple” is a solution here. (“Every Sunday my family goes to a temple” ? ).

All things considered, I’m not against such lists – quite the opposite, but other things should be taken into account for these lists to be pedagogically useful. These include relevance, possibly cultural factors, dispersion (which you mention) and what Widdowson refers to “valency” – the potential of a word to generate further learning. Another noteworthy criterion for inclusion is what is known in the Continental linguistic tradition as disponibilité (lexical availability).

On a different note, it’s interesting that you chose to start the post with Michael Lewis’s quote, because something tells me that Lewis would ultimately be against any such list.


* Now that I’ve written the comment I realise that “unfair treatment” is a bit of a misnomer because, in fact, the words are treated fairly/equally.

11 11 2017
Scott Thornbury

Thanks, Leo – your make such good points about the selection of items for ‘core vocabulary’ status that your comment should really be the post and my post the comment! Regarding word families vs. lemmas, I seem to remember that Averil’s Academic Word List was criticised on similar grounds. Another thought – should a core vocabulary be restricted to whole words only – what about bound morphemes (un- , re- , -wise, -ly, etc). Wouldn’t the inclusion of these obviate the loss of words that might result from a lemma-based vs word family-based approach? Don’t affixes have valency??

And you’re right about Michael Lewis – he would be apoplectic if he knew his name was associated with a list – especially a list of single-word items chosen primarily in terms of their frequency! He was VERY dismissive of my Natural Grammar on these grounds.

17 11 2017
Lexical Leo

Thank you, Scott. I’m flattered!
Surely affixes have valency, and Norbert Schmitt – or Paul Nation (or maybe both) supports the view that they should be taught explicitly. Should they be added to core lists then? Hmmm… They are not words after all – they are ‘bound’ and cannot stand on their own. But if the whole organising principle were to be revised, then why not?

Talking of Averil Coxhead, she notes the following while describing her new AWL, although it can equally apply to GSL or any other list:

“An academic word list should play a crucial role in setting vocabulary goals for language courses, guiding learners in their independent study, and informing course and material designers in selecting texts and developing learning activities” (Coxhead 2000: 214)

So, teachers seem to be ‘off the hook’? 🙂

17 11 2017

1, I missed any reference to the Dolch list of so-called sight words (http://www.dolchword.net/dolch-word-list.html).

2. Where can I find the Dr. Seuss 400/220 words?

3. While I use the NGSL and the Dolch list to guide me when teaching little kids, there are many content words (especially, elephant, monkey, doll, and body parts) that are invaluable in helping them learn English.

20 11 2017
Scott Thornbury

Thanks, Oneota – yes both the Dolch list and (I think) the Dr Seuss list get a mention in Mura’s fantastic time line of word lists ere: http://timemapper.okfnlabs.org/muranava/a-brief-history-of-wordlists#8

With regard to words like elephant, monkey etc – yes, Harold Palmer (1917) would justify these on the grounds of what he called ‘expedience’. They may not be frequent or even useful, but they are easily illustrated, and, for childsren, intrinsically interesting – great for teaching reading.

17 11 2017

I tend to see core vocabularies as statistical and semantic realities waiting to be uncovered rather than academic constructs designed to satisfy linguists, teachers and students. Naturally they vary with time, culture, age and different learner goals.

All children and adults learn via a core and expanding lexicogrammar – it is impossible to learn a language otherwise.

In the developing world the main aim of studying English is often to pass school tests, rather than to communicate (frequently there is almost no communicative opportunity anyway). In this situation a well-defined core vocabulary syllabus is essential, but often lacking, a luxury that we really shouldn’t disdain or take for granted!

