C is for Core vocabulary

5 11 2017

West GSL“Lexis is the core or heart of language”, wrote Michael Lewis  (Lewis, 1993, p. 89). Yes, but which lexis?  Given the hundreds of thousands of words that there are, which ones should we be teaching soonest? Is there a ‘core’ vocabulary? If so, where can we find it? If it is a list, how is it organized? And on what principles of selection is it based?

These questions were prompted by a student on my MA TESOL who asked if the measure of an item’s ‘core-ness’ was simply its frequency. I suspected that there might be more to it than this, and this impelled me to look at the literature on word lists.

The most famous of these, of course, is Michael West’s General Service List (GSL), first published in 1936 and then revised and enlarged in 1953. I am the proud owner of not just one but two copies of West, one of which clearly once belonged to a writer (see pic), who used it to keep within the 3000 word limit imposed by his or her publishers.

Michael West flyleaf.jpgCompiled before the days of digitalized corpora, the GSL was based on a print corpus of up to 5 million words, diligently trawled through by a small army of researchers (‘of high intelligence and especially trained for the task’) for the purposes of establishing frequency counts – not just of individual words but of their different meanings.

But frequency was not the only criterion for inclusion in the GSL. West and his collaborators also assessed whether a word was relatively infrequent but necessary, because it lacked a viable equivalent – ‘vessel’ being one: ‘container’ doesn’t work for ‘blood vessels’, for example.  Conversely, some words may be frequent but unnecessary, because there are adequate non-idiomatic alternatives, i.e. they have cover. Finally, informal and highly emotive words were excluded, on the grounds that they would not be a priority for learners.

In the end the GSL comprised around 2000 word families (but over 4000 different lemmas, i.e. words that have the same stem and are the same part of speech: dance, danced, dancing, but not dancer) and even today, despite its age, the GSL gives a coverage of nearly 85% of the running words in any corpus of non-specialist texts (according to Bresina & Gablasova, 2015 – see below).

Subsequently, Carter (1998) has elaborated on the criteria for what constitutes ‘core-ness’. One is a core word’s capacity to define other words. Hence the words chosen by lexicographers for dictionary definitions are a reliable source of core vocabulary. One such is the Longman Defining Vocabulary (LDV): you can find it at the back of the Longman Dictionary of Contemporary English (my edition is that of 2003) or at a number of websites, including this one.

The publishers comment, ‘The words in the Defining Vocabulary have been carefully chosen to ensure that the definitions are clear and easy to understand, and that the words used in explanations are easier than the words being defined.’

laugh entry GSL

Entry for ‘laugh’ in the GSL

Another test of coreness is superordinateness: ‘Core words have generic rather than specific properties’ (Carter 1998, p. 40). Hence, flower is more core than rose;  tool more core than hammer. For this reason, perhaps, core words are the words writers tend to use when they are writing summaries.

 

Core words are also more likely to have opposites than non-core words: fat vs. thin, laugh vs. cry. But what is the opposite of corpulent, say? Or giggle?

Core words also tend to have a greater range of collocates – compare start vs commence (start work/an argument/a career/a rumour/a conversation etc.) And they have high word-building potential, i.e. they combine productively with other morphemes: startup, headstart, starter, starting line, etc.  Core words are also neutral: they do not have strong emotional associations; they do not index particular cultures (dress vs sari, for example), nor are they specific to certain discourse fields: compare galley, starboard, and below deck  with kitchen, left, and downstairs. (i.e. a nautical discourse vs. a less marked one.)

On this last aspect, an important test of a word’s coreness is not just its overall frequency but its frequency in a wide range of contexts and genres – its dispersion. In a  recent attempt to update the GSL, and to eliminate the subjectivity of West’s criteria, Bresina & Gablasova (2015) tested for the ‘average reduced frequency’ (ARF): ‘ARF is a measure that takes into account both the absolute frequency of a lexical item and its distribution in the corpus… Thus if a word occurs with a relatively high absolute frequency only in a small number of texts, the ARF will be small’ (op. cit, p. 8). Bresina and Gablasova also drew on – not just one corpus – but a range of corpora, including the 12-billion word EnTenTen12 corpus, to produce a New General Service List which, while much trimmer than West’s original (2500 vs. 4000 lemmas), and therefore perhaps more ‘learnable’, still gives a comparable coverage of corpus-based text – around 80%. (The full text of the article, along with the word list itself, can be found here).

More impressive still, and also called a New General Service List, is the one compiled by Browne et al (2013) which, with 2800 lemmas, claims to provide more than 90% coverage of the kinds of general English texts learners are likely to read.

Other potentially useful word lists include The Oxford 3000: ‘a list of the 3000 most important words to learn in English’ – accessible here.  Again, dispersion – not just frequency – has been an important criterion in choosing these: ‘ We include as keywords only those words which are frequent across a range of different types of text. In other words, keywords are both frequent and used in a variety of contexts.’ And the publishers add:

In addition, the list includes some very important words which happen not to be used frequently, even though they are very familiar to most users of English. These include, for example, words for parts of the body, words used in travel, and words which are useful for explaining what you mean when you do not know the exact word for something. These words were identified by consulting a panel of over seventy experts in the fields of teaching and language study.

Inevitably, there is a lot of overlap in these lists (they would hardly be ‘core vocabulary’ lists if there were not) but the differences, more than the similarities, are intriguing – and suggestive, not only of the corpora from which the lists were derived, but also of the criteria for selection, including their intended audience and purpose. To give you a flavor:

Words in West’s GSL not in LDV: plaster, jealous, gay, inch, widow, elephant, cushion, cork, chimney, pupil, quart.

Words in LDV not in GSL: traffic, sexual, oxygen, nasty, infectious, piano, computer, prince.

Words in Oxford 3000 not in either GSL or LDV: fridge, gamble, garbage, grandchild, sleeve, software, vocabulary… Note also that the Oxford 3000 includes phrasal verbs, which are not systematically included in the other lists, e.g. pull apart/ down/ off/ in/ over/ through/ up + pull yourself together.

Of course, the key question is: what do you actually do with these lists? Are they simply guidelines for materials writers and curriculum planners? Or should learners be encouraged to memorize them? In which case, how?

Discuss!

References

Brezina, V. and Gablasova, D. (2015) ‘Is there a core general vocabulary? Introducing the New General Service List,’ Applied Linguistics, 36/1. See also this website: http://corpora.lancs.ac.uk/vocab/index.php

Browne, C., Culligan, B. & Phillips, J. (2013) ‘New General Service List’ http://www.newgeneralservicelist.org/

Carter, R. (1998) Vocabulary: Applied linguistic perspectives (2nd edition) London: Routledge.

Lewis, M. (1993) The lexical approach. Hove: LTP.

West, M. (1953) A general service list of English words. London: Longman.





V is for Vocabulary teaching

2 06 2013

Slovenian girl and teacherA teacher educator in Norway reports on how she has used ideas from my book How to Teach Vocabulary (2002) on an in-service course for local primary and lower secondary school teachers. Mona Flognfeldt writes: ‘I have shared with my students a lot of input that I have learnt from you, and a lot of our students have put their new insights to immediate practical use in their classrooms. … As a part of their course, these students have also learnt to make their own blogs.’ These blogs have become the vehicles whereby they report on how they ‘have tried out various activities and types of tasks in their attempts to help their students enhance their vocabulary in English’.

Reading the blogs I am struck by the way these teachers have implemented, in their own classes, a reflective task cycle as part of their ongoing professional development. This has involved background reading and discussion, classroom experimentation, reflection and – by means of the blogs – sharing with their colleagues the insights that they have gained.

To give you a flavour, here is a sample of the kinds of activities these teachers tried. I have grouped them according to five guiding principles of vocabulary acquisition. (Apologies in advance to those whose blog posts I haven’t included, but readers who are interested can find them at the link below).

1. The Principle of Cognitive Depth: “The more one manipulates, thinks about, and uses mental information, the more likely it is that one will retain that information.In the case of vocabulary, the more one engages with a word (deeper processing), the more likely the word will be remembered for later use” (Schmitt 2000: 120).

I picked out 8 words from the text that I wanted my pupils to learn. Then I had my pupils identifying the words in the text. Task 2 was a selecting task where the pupils had to underline the words that were typical for India. They shared their work with a partner, explaining their choices. As task 3 they were matching the words with an English description from a dictionary. They also found antonyms and synonyms. Task 4 was a sorting activity where the pupils had to decide whether the words were nouns, verbs, adjectives or adverbs. Finally, as a ranking and sequencing activity I had my pupils rank the words according to preference, to decide how important they thought knowing each word was. They discussed their ranking with a partner. (Mette B.)

Slovenian  two girls2. The Principle of Retrieval: “The act of successfully recalling an item increases the chance that the item will be remembered. It appears that the retrieval route to that item is in some way strengthened by being successfully used” (Baddeley 1997: 112).

My Vocabulary activity was “Categories” … The students worked in groups of four or five. They were handed out a piece of paper where five columns were drawn up. Each column was labelled with the name of a lexical set: Food, transport, clothes, animals and sport. I called out a letter of the alphabet (e.g. B!). The students wrote down as many words they knew began with the letter to a time of limit which was around 2-3 minutes. The group with the most words won (I did not demand that the words were spelled correctly. (Gunn)

There is also pictionary, where you divide the class into two groups, and one member of each team goes to the SmartBoard. The teacher flashes them a card with a word, phrase or expression and the pupils have one minute to make their team say the word on the basis of their drawing on the SmartBoard; no other clues are allowed. (Vanessa)

 Slovenian boy student 023. The Principle of Associations: “The human lexicon is believed to be a network of associations, a web-like structure of interconnected links. When students are asked to manipulate words, relate them to other words and to their own experiences, and then to justify their choices, these word associations are reinforced” (Sökmen 1997: 241-2).

Make true and false sentences about yourself using eight of these words.

I believe this is a good activity for deeper processing of words, because the learners have to relate to the words and phrases personally. I have tried it out in class and found it a motivating activity both for me and for my pupils. We all got to know each other better by sorting out the activities they liked more and liked less. This was a concrete task, easy for them to relate to and to make up sentences from a given pattern. The activity guessing what is false and true is fun and easy to understand. They have to use what they already know about each other to decide whether the statements are true or false. (Anne Katrine)

 4. The Principle of Re-contextualization: “When words are met in reading and listening or used in speaking and writing, the generativeness of the context will influence learning. That is, if the words occur in new sentence contexts in the reading text, learning will be helped. Similarly, having to use the word to say new things will add to learning”  (Nation 2001: 80).

I showed them the list of words on the projector and introduced the task to them. Their first task was to translate the words and write them in Norwegian. … When the pupils had finished this, they were asked to use at least five words/expressions from each column to write a paragraph on US politics. The task had to be finished before the lesson the week after. This sentence or text creation task required the pupils to create the context for the given words and phrases. In addition to the meaning of the words, the pupils also needed to think about word tense, grammatical behaviour and so on. (Sturla)

Slovenian male teacher5. The Principle of Multiple Encounters: “Due to the incremental nature of vocabulary acquisition, repeated exposures are necessary to consolidate a new word in the learner’s mind” (Schmitt & Carter 2000: 4).

The class was supposed to work with reading comprehension, but before starting the reading, the pupils were given a pre-reading task related to vocabulary in the text. … After a while, the teacher went through the task with the class, asking for the matching words and the definitions. The teacher repeated the answers to model the correct pronunciation.

Then the class was instructed to read the article and use the worksheet on vocabulary while reading and after reading when they were asked to answer questions from the article. This way the vocabulary was met several times.  (Anette)

Finally, the last word goes to Mette B. ‘I have also had the pleasure of practising Thornbury’s ways of putting words to work this year. What amazes me the most is how positive even the pupils with elementary skills respond to these types of activities’.

Music to my ears!

Again, heartfelt thanks to Mona and her trainee teachers.

Slovenian girl studentReferences:

Baddeley, A. (1997)  Human Memory: Theory and Practice (Revised edition), Hove: Psychology Press.

Nation, I.S.P. (2001) Learning Vocabulary in Another Language, Cambridge: Cambridge University Press.

Schmitt, N. (2000) Vocabulary in Language Teaching, Cambridge: Cambridge University Press.

Schmitt, N. & Carter; R. (2000) ‘The lexical advantages of narrow reading for second language learners’, TESOL Journal, 9/1, 4-9.

Sökmen, A.J. (1997) ‘Current trends in teaching second language vocabulary,’ in Schmitt, N. and McCarthy, M. (Eds.) Vocabulary: Description, Acquisition and Pedagogy. Cambridge: Cambridge University Press.

Thornbury, S. (2002) How to Teach Vocabulary, Harlow: Pearson.

Illustrations from Grad, A. (1958) Vasela Angleščina, Ljubljana: DZS.

Mona’s blog, with access to her trainee teachers’ blogs, can be found here: http://monaflognfeldt.wordpress.com/2012/11/05/vocabulary-acquisition-and-development/





V is for Vocabulary size

3 10 2010

Paul Meara, of Swansea University, in Barcelona

How many words do you know? How many words do your students know? How do you count them? Is it important?

These and similar questions came up during a fascinating series of lectures given this week by Paul Meara (“the world’s leading researcher in modelling vocabulary knowledge” according to Paul Nation), at the Pompeu Fabra University here in Barcelona.

Paul Nation at the MASH Equinox Event in Tokyo, last month (Photo: David Chapman)

Traditionally, estimates of vocabulary size have been based on the number of words that subjects could define on a list taken at random from a dictionary: if the list represented 10% of the total words in the dictionary, the number of known words would then be multiplied by ten to give the total. But the method is fraught with problems, not least ‘the big dictionary’ effect: “The bigger the dictionary used, the more words people are found to know” (Aitchison 1987, p.6).

More sophisticated, and more sensitive, tests have since been designed, including Paul Nation’s widely used and very reliable Vocabulary Levels Test (described in Nation 1990), which targets five levels of word frequency (including a university word list) and involves matching words with simple definitions.

Meara himself has devised a number of vocabulary size tests, including the EVST (originally commissioned as a placement test by Eurocentres). Elegantly simple and very easy to administer, this checklist-type test requires takers simply to say which words they recognise in a sequence of frequency-based lists. But, as a way of controlling for wild guessing – or shameless lying! – the lists also include ‘pseudo words’, such as obsolation and mudge.

All the above tests are tests of receptive vocabulary knowledge. Testing a user’s productive vocabulary is more problematic. One approach is the aptly-named ‘spew test’, where test-takers are asked to produce as many words they can that share a common feature, e.g. that start with the letter B. Taking a somewhat different tack, Meara reported on some intriguing research he has done, matching frequency profiles of learner texts with statistical models of different vocabulary sizes. A student writes a text and a profile is generated in terms of the relative frequency of its words; the program then searches for a best match (a bit like the way that fingerprints are matched up), which in turn yields a fairly exact estimate of the learner’s vocabulary size. Magic! (You can check the program out for yourself at Paul’s _lognostics website. It’s called V-size).

But what does vocabulary size mean? And does size matter? Certainly, it seems that having a big vocabulary is a prerequisite for reading (and presumably listening) ability. As Bhatia Laufer (1997) puts it, “By far the greatest lexical obstacle to good reading is insufficient number of words in the learner’s lexicon. [In research studies] lexis was found to be the best predictor of success in reading, better than syntax or general reading ability” (p. 31).

Paul Meara in action

More than that, vocabulary size may be a reliable predictor, not just of reading success, but of overall linguistic competence. Certainly, in first language acquisition, the processes of vocabulary development and grammar development are closely intertwined, with the former possibly driving the latter. Tomasello (2003), for example, cites research that shows that “only after children have vocabularies of several hundred words [do] they begin to produce in earnest grammatical speech”, which suggests to Tomasello “that learning words and learning grammatical constructions are both part of the same overall process” (p. 93).

If this is the case in first language acquisition, does it not also suggest that – for second language learning – the learner needs to assemble as big a lexicon as possible, and as soon as possible – even if this means putting other areas of language learning ‘on hold’?

References:
Aitchison, J. 1987. Words in the Mind: An introduction to the mental lexicon. Oxfrod: Blackwell.
Laufer, B. 1997. ‘The lexical plight in second language reading” in Coady, J. and Huckin, T. (eds.) Second Language Vocabulary Acquisition: A Rationale for Pedagogy. Cambridge: Cambridge University Press.
Nation, I.S.P. 1990. Teaching and Learning Vocabulary. Boston, MA: Heinle and Heinle.
Tomasello, M. 2003. Constructing a Language. Cambridge, MA: Harvard University Press.





L is for (Michael) Lewis

5 09 2010

(Continuing an occasional series of the type ‘Where are they now?’)

Michael Lewis and me: University of Saarbrücken

A reference in last week’s post (P is for Phrasal Verb) to the fuzziness of the vocabulary-grammar interface naturally led to thoughts of Michael Lewis. It was Michael Lewis who was the first to popularize the view that “language consists of grammaticalized lexis, not lexicalized grammar” (1993, p. 34). This claim is a cornerstone of what rapidly came to be known as the Lexical Approach – rapidly because Lewis himself wrote a book called The Lexical Approach (1993), but also because, at the time, corpus linguistics was fueling a major paradigm shift in applied linguistics (under the visionary custodianship of John Sinclair and his brainchild, the COBUILD project) which, for want of a better term, might best be described as ‘lexical’. Lewis was one of the first to popularize this ‘lexical turn’ in applied linguistics, and he did so energetically, if, at times, contentiously.

So, what happened to the Lexical Approach – and to Lewis, its primum mobile?

Well, for a start (as I argued in an article in 1998), the Lexical Approach never was an approach: it offered little guidance as to how to specify syllabus objectives, and even its methodology was not much more than an eclectic mix of procedures aimed mainly at raising learners’ awareness about the ubiquity of ‘chunks’. Moreover, Lewis seemed to be dismissive – or perhaps unaware – of the argument that premature lexicalization might cause fossilization. To him, perhaps, this was a small price to pay for the fluency and idiomaticity that accrue from having an extensive lexicon. But wasn’t there a risk (I argued) that such an approach to language learning might result in a condition of “all chunks, no pineapple” i.e. lots of retrievable lexis but no generative grammar?

In the end, as Richards and Rodgers (2001) note, the Lexical Approach “is still an idea in search of an approach and a methodology” (p. 138). Nevertheless, as I said in 1998, “by challenging the hegemony of the traditional grammar syllabus, Lewis… deserves our gratitude.”

Michael responded graciously to these criticisms, acknowledging them – although not really addressing them – in a subsequent book, Teaching Collocation (2000). There the matter rested. Until 2004, when I published a ‘lexical grammar’ – that is, a grammar based entirely on the most frequent words in English – and, in the introduction, paid tribute to my ‘lexical’ precursors, specifically Michael Lewis, and Jane and Dave Willis.

Michael was not pleased. When I next ran into him, at an IATEFL Conference a year or two later, he was still fuming. Apparently, by suggesting that his version of the Lexical Approach had anything in common with the Willis’s, or that my book in any way reflected it, was a gross misrepresentation. The sticking point was what Michael calls ‘the frequency fallacy’, that is, the mistaken belief that word frequency equates with utility. Much more useful than a handful of high-frequency words, he argued, was a rich diet of collocations and other species of formulaic language. I, by contrast, shared with the Willis’s the view that (as Sinclair so succinctly expressed it) ‘learners would do well to learn the common words of the language very thoroughly, because they carry the main patterns of the language’ (1991, p. 72). To Michael, ‘patterns of the language’ sounded too much like conventional grammar.

When we met again, a year or two later, at a conference at the University of Saarbrücken, we found that we had more in common than at first seemed. For a start, we sort of agreed that the chunks associated with high frequency words were themselves likely to be high frequency, and therefore good candidates for pedagogical treatment. And Michael was working on the idea that there was a highly productive seam of collocationally powerful ‘mid-frequency’ lexis that was ripe for investigation.

A few months later, at a conference in Barcelona, we had even started talking about some kind of collaborative project. I was keen to interest Michael in developments in usage-based theories of acquisition, premised on the view that massive exposure to formulaic language (his ‘chunks’) nourishes processes of grammar emergence – a view that, I felt, vindicated a re-appraisal of the Lexical Approach.

But Michael is enjoying a well-earned retirement, and I suspect that he’s satisfied in the knowledge that the Lexical Approach, his Lexical Approach, whatever exactly it is, is well-established in the EFL canon, and that his name is stamped all over it.

So, then, what’s the Lexical Approach to you?

References:

Lewis, M. 1993. The Lexical Approach. Hove: LTP.
Lewis, M. 2000. Teaching Collocation. Hove: LTP.
Richards, J., and Rodgers, T. 2001. Approaches and Methods in Language Teaching (2nd edition). Cambridge University Press.
Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford University Press.