D is for Dictionary

3 12 2017

spanish dictionary(This post was timed to coincide with the most recent update of the Macmillan English Dictionary, and first appeared on the  Macmillan Dictionary blog here.)

I love dictionaries almost as much as I love old coursebooks. I have a two-volume Spanish-English dictionary published in Cadíz in MDCCCLXIII – which I think is 1863. I picked it up for a song in the flea market in Barcelona and it’s in great shape. In the English volume it includes words like duskishly, porterage, and crupper, so I’m not betting on it being of that much use in 2017. Presciently (or perhaps duskishly?) the writer – one Don Mariano Velasquez de la Cadena – comments in the preface:

Language, like dress, is subject to continual change; and many phrases which were deemed elegant two centuries ago are almost unintelligible at the present day, in consequence of being displaced by other [sic] which were then unknown.

This is as true in our own field – applied linguistics and language education – as it is in other specialized fields. It was driven home to me just this week (thanks to a blog post by Richard Smith) as I read a book published exactly 100 years ago, called The Scientific Study and Teaching of Languages, by Harold E. Palmer.

H E Palmer copy

Harold Palmer, circa 1920 (from Smith 1999)

 

Palmer, in case you didn’t know, taught and trained extensively in Japan, and  ‘did more than any other single individual to establish English language teaching (ELT) as an autonomous branch of language education in the first half of the 20th century and to give it the ‘applied linguistic’ direction to which it has remained loyal ever since’ (Smith 1999 p.vii). Reading Palmer, though, is not always easy, as he uses a number of terms which ‘are almost unintelligible at the present day’ (to quote Don Mariano). He refers frequently to ergons, for example, and the science of ergonics. And to morphons and polylogs and the catenizing. Fortunately, Palmer supplies a glossary, which explains that an ergon is ‘any speech unit considered from the point of view of its function or powers of combining with other units’. Morphons are what we might now call morphemes; polylogs are multi-word items, and catenizing is ‘learning to pronounce accurately and rapidly a given succession of sounds’. He uses this last term a lot, since it is an integral part of his methodology, but I am not sure if we have a contemporary equivalent.

Having just completed the second edition of An A – Z of ELT (now The New A to Z of ELT), I am particularly interested in the way terminology shifts, evolves and morphs like this. Over ten years have elapsed between the two editions, and it’s been salutary to see how rapidly some terms lose their currency while new ones are enlisted in response to developments in language description, methodology and second language acquisition theory.

An obvious area of rapid change is in educational technology: even the term educational technology didn’t get an entry in the first edition, where computer assisted language learning (CALL) was made to serve for virtually the whole field. Now there are separate entries for mobile learning, adaptive learning, blended learning, and the flipped classroom –  all new arrivals since 2006.

Another growth area has been in what I loosely call the ‘neoliberal turn’ – that is, the way the discourses of economic neoliberalism have been co-opted to serve the discourses of education, such that words like accountability, outcomes, competencies, granularity and life-skills (or twenty-first century skills) now regularly feature in ELT conference programs. In the entry on life skills, I manage to sneak in the suggestion that there might be something a little bit faddish about this development:

Concepts like communication, learner training and (inter-) cultural awareness have all been central to language teaching methodology for several decades now. The renewed interest in such skills may be an effect of the way education is being shaped to serve the needs of the new, globalized economy, with English playing a central role.

Indeed, by the time the third edition comes out, will granularity seem as dated then as ergons are to us now?

Reference

Smith, R.C. (1999) The writings of Harold E. Palmer: An overview. Tokyo: Hon-no-Tomosha.

Postscript:  This is the 200th post on this blog (see Index) and it’s appropriate that it’s about dictionaries since it was a kind of dictionary (The A – Z of ELT) that was the impetus behind it. At the year’s end, it also seems like a good time to take a break, comfortable in the knowledge that the blog is still very much visited, even during rest periods – if the graphic below, showing average views per day per month, is any guide. It’s also good to know that the website for The New A-Z of ELT is up and running (click on the book cover graphic top right) so if you need your weekly dose of An A -Z, you can always buy the book 😉  See you some time in 2018!average views per day

 

 





C is for Core vocabulary

5 11 2017

West GSL“Lexis is the core or heart of language”, wrote Michael Lewis  (Lewis, 1993, p. 89). Yes, but which lexis?  Given the hundreds of thousands of words that there are, which ones should we be teaching soonest? Is there a ‘core’ vocabulary? If so, where can we find it? If it is a list, how is it organized? And on what principles of selection is it based?

These questions were prompted by a student on my MA TESOL who asked if the measure of an item’s ‘core-ness’ was simply its frequency. I suspected that there might be more to it than this, and this impelled me to look at the literature on word lists.

The most famous of these, of course, is Michael West’s General Service List (GSL), first published in 1936 and then revised and enlarged in 1953. I am the proud owner of not just one but two copies of West, one of which clearly once belonged to a writer (see pic), who used it to keep within the 3000 word limit imposed by his or her publishers.

Michael West flyleaf.jpgCompiled before the days of digitalized corpora, the GSL was based on a print corpus of up to 5 million words, diligently trawled through by a small army of researchers (‘of high intelligence and especially trained for the task’) for the purposes of establishing frequency counts – not just of individual words but of their different meanings.

But frequency was not the only criterion for inclusion in the GSL. West and his collaborators also assessed whether a word was relatively infrequent but necessary, because it lacked a viable equivalent – ‘vessel’ being one: ‘container’ doesn’t work for ‘blood vessels’, for example.  Conversely, some words may be frequent but unnecessary, because there are adequate non-idiomatic alternatives, i.e. they have cover. Finally, informal and highly emotive words were excluded, on the grounds that they would not be a priority for learners.

In the end the GSL comprised around 2000 word families (but over 4000 different lemmas, i.e. words that have the same stem and are the same part of speech: dance, danced, dancing, but not dancer) and even today, despite its age, the GSL gives a coverage of nearly 85% of the running words in any corpus of non-specialist texts (according to Bresina & Gablasova, 2015 – see below).

Subsequently, Carter (1998) has elaborated on the criteria for what constitutes ‘core-ness’. One is a core word’s capacity to define other words. Hence the words chosen by lexicographers for dictionary definitions are a reliable source of core vocabulary. One such is the Longman Defining Vocabulary (LDV): you can find it at the back of the Longman Dictionary of Contemporary English (my edition is that of 2003) or at a number of websites, including this one.

The publishers comment, ‘The words in the Defining Vocabulary have been carefully chosen to ensure that the definitions are clear and easy to understand, and that the words used in explanations are easier than the words being defined.’

laugh entry GSL

Entry for ‘laugh’ in the GSL

Another test of coreness is superordinateness: ‘Core words have generic rather than specific properties’ (Carter 1998, p. 40). Hence, flower is more core than rose;  tool more core than hammer. For this reason, perhaps, core words are the words writers tend to use when they are writing summaries.

 

Core words are also more likely to have opposites than non-core words: fat vs. thin, laugh vs. cry. But what is the opposite of corpulent, say? Or giggle?

Core words also tend to have a greater range of collocates – compare start vs commence (start work/an argument/a career/a rumour/a conversation etc.) And they have high word-building potential, i.e. they combine productively with other morphemes: startup, headstart, starter, starting line, etc.  Core words are also neutral: they do not have strong emotional associations; they do not index particular cultures (dress vs sari, for example), nor are they specific to certain discourse fields: compare galley, starboard, and below deck  with kitchen, left, and downstairs. (i.e. a nautical discourse vs. a less marked one.)

On this last aspect, an important test of a word’s coreness is not just its overall frequency but its frequency in a wide range of contexts and genres – its dispersion. In a  recent attempt to update the GSL, and to eliminate the subjectivity of West’s criteria, Bresina & Gablasova (2015) tested for the ‘average reduced frequency’ (ARF): ‘ARF is a measure that takes into account both the absolute frequency of a lexical item and its distribution in the corpus… Thus if a word occurs with a relatively high absolute frequency only in a small number of texts, the ARF will be small’ (op. cit, p. 8). Bresina and Gablasova also drew on – not just one corpus – but a range of corpora, including the 12-billion word EnTenTen12 corpus, to produce a New General Service List which, while much trimmer than West’s original (2500 vs. 4000 lemmas), and therefore perhaps more ‘learnable’, still gives a comparable coverage of corpus-based text – around 80%. (The full text of the article, along with the word list itself, can be found here).

More impressive still, and also called a New General Service List, is the one compiled by Browne et al (2013) which, with 2800 lemmas, claims to provide more than 90% coverage of the kinds of general English texts learners are likely to read.

Other potentially useful word lists include The Oxford 3000: ‘a list of the 3000 most important words to learn in English’ – accessible here.  Again, dispersion – not just frequency – has been an important criterion in choosing these: ‘ We include as keywords only those words which are frequent across a range of different types of text. In other words, keywords are both frequent and used in a variety of contexts.’ And the publishers add:

In addition, the list includes some very important words which happen not to be used frequently, even though they are very familiar to most users of English. These include, for example, words for parts of the body, words used in travel, and words which are useful for explaining what you mean when you do not know the exact word for something. These words were identified by consulting a panel of over seventy experts in the fields of teaching and language study.

Inevitably, there is a lot of overlap in these lists (they would hardly be ‘core vocabulary’ lists if there were not) but the differences, more than the similarities, are intriguing – and suggestive, not only of the corpora from which the lists were derived, but also of the criteria for selection, including their intended audience and purpose. To give you a flavor:

Words in West’s GSL not in LDV: plaster, jealous, gay, inch, widow, elephant, cushion, cork, chimney, pupil, quart.

Words in LDV not in GSL: traffic, sexual, oxygen, nasty, infectious, piano, computer, prince.

Words in Oxford 3000 not in either GSL or LDV: fridge, gamble, garbage, grandchild, sleeve, software, vocabulary… Note also that the Oxford 3000 includes phrasal verbs, which are not systematically included in the other lists, e.g. pull apart/ down/ off/ in/ over/ through/ up + pull yourself together.

Of course, the key question is: what do you actually do with these lists? Are they simply guidelines for materials writers and curriculum planners? Or should learners be encouraged to memorize them? In which case, how?

Discuss!

References

Brezina, V. and Gablasova, D. (2015) ‘Is there a core general vocabulary? Introducing the New General Service List,’ Applied Linguistics, 36/1. See also this website: http://corpora.lancs.ac.uk/vocab/index.php

Browne, C., Culligan, B. & Phillips, J. (2013) ‘New General Service List’ http://www.newgeneralservicelist.org/

Carter, R. (1998) Vocabulary: Applied linguistic perspectives (2nd edition) London: Routledge.

Lewis, M. (1993) The lexical approach. Hove: LTP.

West, M. (1953) A general service list of English words. London: Longman.