Medieval grammarians were obsessed with etymology because (according to a recent review in the London Review of Books[1]) the study of word origins ‘perfectly expresses the medieval conviction that language is a comprehensive, fully rational system, in which any part may be logically derived from the whole – just as “logic” itself derives from Logos, the all-creating word’.
The notion that ‘everything is in everything’ was a core precept of the Jacotot Method, also called ‘universal teaching’ (Rancière 1991 – I’ve blogged about it here), in which a single text (in Jacotot’s case it was a bilingual version of the 18th-century French novel Télémaque) served not only as the tool that revealed the secrets of the French language, but also the key that opened the intelligence of the learner. As Rancière (1991: 26) understands it: ‘That is what “everything is in everything” means: … All the power of languages is in the totality of the book. All knowledge of oneself as an intelligence is in the mastery of a book, a chapter, a sentence, a word’.
Thus every word, phrase, sentence, chapter is subject to intense scrutiny – but always as a microcosm of the whole. ‘This is the first principle of universal teaching: one must learn something and relate everything else to it … The student must see everything for himself, compare and compare, and always respond to a three-part question: what do you see? what do you think about it? what do you make of it? And so on, to infinity’ (ibid. 22-23). In fact, as Rivière (ibid. 27) points out, ‘the procedures used matter very little in themselves. It could be Télémaque, or it could be something else. One begins with a text and not with grammar, with entire words and not with syllables…’
Cut to the 21st-century and the notion that ‘any part may be derived from the whole’ is a fractal one: ‘A fractal is a geometric figure that is self-similar at different levels of scale’ (Larsen-Freeman 1997: 146). Language, like other complex systems, is fractal in nature: patterns at one level of delicacy are reproduced at every other level. It takes only a very short text to display many of the basic design principles built into language, such as text organisation, sentence structure, word formation, as well as vocabulary distribution and frequency. In William Blake’s words, the text is ‘a world in a grain of sand’.
Take this one, chosen more or less at random from a joke book for children[2]:
Two elephants went on holiday and sat down on the beach. It was a very hot day and they fancied having a swim in the sea. Unfortunately they couldn’t: they only had one pair of trunks!
In just three sentences the text displays a classically generic story structure, involving actors (two elephants), circumstantial details (on the beach, a hot day), a sequence of past tense actions, and a complicating event. It also has a basic joke structure, consisting of a narrative and a punch-line, which here takes the form of a play on words.
The 37 words further divide up into function words (also called grammar words) and content words (also called lexical words). The former include such common (and typically short) words as a, on, of, the, and was. The latter are the ones that carry the main informational load of the text, such as elephants, beach, hot, and unfortunately. In the elephant text, the relative proportion of these two types of words is roughly 50:50, and this closely reflects the ratio of function words to content words in all texts. Moreover, the proportion of common to relatively uncommon words in the text exactly reflects the proportions found in much larger collections of text: 30 of the 37words (i.e. roughly 80%) are in the top 1000 words in English. Not only that, but of the ten most frequent words in English, six are present in this text, some of them (a, and, the) occurring more than once.
The fact that this tiny text is a microcosm of all text is consistent with what is known as Zipf’s Law (Zipf 1935, 1965). This law states that if a word is nth in frequency in a given language it is likely to occupy the same ranking in any single text in that language. So, the most frequent words in the language are likely to be the most frequent words in any text in that language, and their order of frequency will also be roughly the same. Zipf also showed that there is a correlation between the length of a word and its frequency. Short words occur often. Again, this is evident in our short text.
Coursebook texts are generally rather long, in the belief (possibly mistaken) that learners need to be taught how to read, when what they actually need is the language knowledge (lexical, grammatical, and textual) to enable them to transfer their reading skills from their first language into their second. Long texts have the disadvantage that they take quite a long time to process, leaving little classroom time for the kind of detailed language work that exploits the text’s linguistic properties. In fact, as I’ve attempted to demonstrate, even a very short text, such as the elephant joke, is packed with pedagogical potential. What’s more, Zipf’s Law relieves us of the worry that short texts might not be sufficiently representative.
As with the Jacotot Method, the choice of text is immaterial. ‘The problem is to reveal an intelligence to itself. Anything can be used. Télémaque. Or a prayer or song the child or the ignorant one knows by heart. There is always something that the ignorant one knows that can be used as a point of comparison, something to which a new thing to be learned can be related’ (Rivière 1991: 28).
Everything is in everything.
References:
Larsen-Freeman, D. (1997) ‘Chaos/Complexity science and second language acquisition’, Applied Linguistics, 18, 2.
Rancière, J. 1991. The Ignorant Schoolmaster: Five lessons in intellectual emancipation. Stanford: Stanford University Press.
Zipf, G.K. (1935, 1965) The Psycho-biology of Language: An Introduction to Dynamic Philology, Cambridge, MA: MIT Press.
(Parts of this article were first published in the Guardian Weekly, March 18th, 2005.)
[1] Newman, B. (2012) ‘Ailments of the Tongue’, London Review of Books, 34, 6.
[2] The Great Puffin Joke Directory, by Brough Girling, Puffin Books, 1990.
Fascinating.
Whenever I feel guilty of not reading ELT books as widely as I once did, I come across a post like this and the guilt subsides. Thank you.
When the first Headway came along, grammar made a huge comeback – we all know that. But, in retrospect, there’s more to it than that, I think. I remember gasping in horror at the very first unit of Headway Intermediate first edition (it was called “Simple Present”), wondering what on earth had become of the functional orthodoxy established by the Strategies series and, to some extent, improved upon by its American peers (such as the now defunct Spectrum series or even Longman’s New Wave).
But I also remember gaping in (somewhat restrained) amazement at the signposting of the skills sections: the student book, if I remember correctly, made use of words like “skimming” and “scanning”, which, up until then, most coursebooks wouldn’t have touched with a bargepole. Also, there were clearly labeled pre-listening and pre-reading tasks, which, in hindsight, the Soars were brave to use. So I do think they deserve some credit for that.
Innovative as it was, though, Headway’s more systematic approach to skills development created / reinforced (not sure) another orthodoxy, which lasted throughout much of the 90s and early 2000s:
Thank God the pendulum is swinging back and recent titles now include shorter and more manageable, explorable texts (though perhaps not nearly as much emphasis on bottom-up processing as I’d like to see).
Now why am I saying all this?
Before reading this post, I would argue that this is the right thing to do because, well, it sounds plausible. But now, assuming that Zipf’s law is, indeed, accurate, it’s easier to explain WHY giving students very long texts to plod through is often (though not always – I’m thinking FCE preparation, for example) a waste of precious classroom time. Great stuff, thank you, Scott.
Thanks, Luiz. I ought to add that short texts will only encapsulate the features of longer texts if they are authentic. Simplifying or mutilating or contriving texts for teaching purposes inevitably skews their ‘self-similarity’ properties – making them similar to one another but not to authentic texts in general. (Which is not to say that simplified, mutilated or contrived texts have no pedagogical function).
“… the elephant joke, is packed with pedagogical potential.”
Yes, absolutely. Whoever decided that long, lexically dense texts were more useful or challenging than shorter ones? Arguably, the only thing that longer texts require over shorter ones is a longer concentration-span. There is, as you rightly say, a whole universe of language potential in the elephant joke, which might profitably be used to teach beginners and CPE students alike.
In my experience, many teachers are unaware – or are at least unsure of how to exploit – this potential. What is frustrating is the speed at which many teachers cruise through materials on a course, doing the multiple-choice tasks, grammar exercises etc. and then ignoring the rest of the language in the text. What I feel tends to happen in such a scenario is that learners end up doing lots and lots of things not very well, and find themselves simply forgetting the Grammar McNugget (that they only ever thinly understood to begin with) from two weeks ago.
I feel that teacher training courses should begin to train teachers how to slow down, and how to work with and take language further. By slowing down, I don’t mean making the pace of the lesson boring and monotonous, but exploiting the language that arises at a much deeper level than the coursebook traditionally affords. Conceivably, a whole lesson could be based on a single lexeme, or a whole course could be based on one text (jointly agreed on by the learners), and still be thoroughly interesting and motivating – at least in the hands of a skillful teacher. Moreover, I think most of us feel a sense of attendant satisfaction and accomplishment when we finally understand something in its totality (I’m thinking of your thread on Z is for Zero Uncertainty here).
On a down note, I wonder if in arguing “that every word, phrase, sentence, chapter is subject to intense scrutiny – but always as a microcosm of the whole” might not inadvertently give fuel to creators of structural syllabuses, who precisely believe in breaking down bits of grammar into discrete chunks in order to understand the whole?
Thanks, Wez…
Regarding the pedagogical potential of the elephant joke, I’ll cut and paste a little more from the original article I wrote for the Guardian Weekly ELT section in 2005:
As for your final point, yes, this is a danger – i.e. that a micro-analysis of text might play into the hands of proponents of the ‘accumulated entities’ view of language syllabusing – this is why the French pedagogy called ‘explicacion de texte’ (sp?) has had such a bad press, especialy when it’s the teacher who’s doing all the ‘explicacion’. On the other hand, if you adopt a text-based syllabus (or genre-based one), you – theoretically – provide a buffer against an exclusively microscopic view. Theoretically.
Thank you for this fantastically interesting post, taking the theory into practice.
Unfortunately, I would like to pick up on a point of detail as Zipf comes round again.
Mandelbrot (my hero) actually set out to disprove Zipf’s theory – and discovered more than he had bargained for – though he does note in his book “The Fractal Geometry of Nature” that, “It had originally been hoped that Zipf’s law would contribute to the field of linguistics, but my explanation shows that this law is linguistically very shallow”
Being my hero, he does not claim to have done more than name “fractals” and explains in beautiful detail why he chose that word (in the section: “Fractal” and other neologisms; which can be seen at http://www.cut-the-knot.org/books/mandelbrot/intro.shtml ). Being a frenchman living in the US, he shows how aware he was of language difficulties.
At the end of his book “The Fractal Geometry of Nature” he has a section entitled “Biographies” in which he writes about the people who’s work influenced him, and in this section he writes about a book by Zipf being “one of those books in which (…) flashes of genius projected in many directions are nearly overwhelmed by a gangue of wild notions and extravagance. On the one hand it deals with the shape of sexual organs and justifies the Anshluss of Autria into Germany because it improves the fit of a mathematical formula. On the other hand it is filled with figures and tables that hammer away ceaselessly at the empirical law that, in social science statistics, the best combination of mathematical convenience and empirical fit is often given by a scaling probability distribution.”
Sorry – I had to render unto Caeser ….
Thanks, Elizabeth, for that corrective. I’m not really qualified to argue the case for or against Zipf (even I do have a copy of his book!). And my understanding of his importance (or lack thereof) comes mainly second-hand. Whether or not his maths were correct, I think the conclusion that, for example, Larsen-Freeman and Cameron (Complex Systems and Applied Linguistics, 2010) draw from his ‘law’, particularly in the way it applies to other languages, and not just English, is that “this should not be surprising in that all information systems need to be fractal in shape in order to make them compressible and thus shareable” (2008:110). Moreover, (they go on) complex systems derive their richness “in part from a strategy of organising smaller units into larger ones, and these in turn into still larger ones, and so on… this type of organisation facilitates learning” (op. cit. 111). But they add the caveat that “questions about the extent to which there are fractals in language use remain” (ibid.).
Scott,
I call your attention to a somewhat longer, but perhaps more germaine, text for the evaluation of the significance of the F method of teaching language.
Elephants is also (ostensibly) the topic of The Blind Men and the Elephant, but its is also a cautionary tale about the foibles of Highly Educated Experts.
A good version is on the wordinfo.info website.
This text has 230 words and my statistics have the count of small words (4 letters or less) as 83%(192) words selected for poetic reasons 5% (about 10)
and obsolete or obsolesent words 5% (about 12). Unlike Jacotot, I discern no particular significance to any of this. But, I find the moral of the text to be quite enlightening about the similar expoundings of English Linguists, Psychologists, Educationalists, and others of a philosophic inclination about the modern Elephant in the Room, illiteracy, what causes it, and consequently what we ought to be doing about it.
Perhaps we need to have a more wholistic view of what constitutes the Inglistic language we speak and understand (usually) to “see” more clearly how the “elephant” (illiteracy) can be banished.
Jim Kanzelmeyer
Thanks you Jim. I agree on your point about adotping a holistic perspective on language and language use, without – I guess – losing sight of the parts that comprise it. As for the ‘F method’ – presumably F is for Fractal – I’m not proposing a method, just endorsing another way of appproaching text. How this might impact on illiteracy I have no idea.
“There is always something that the ignorant one knows that can be used as a point of comparison, something to which a new thing to be learned can be related”.
This, Scott, reminds me of Vygotsky’s ZPD: perhaps the “point of comparison” is, within an ecological paradigm, a riparian area, where the banks of “the ignorant one’s” solid grasp of language, the linguistic terra firma upon which she stands to make sense of, and meaning with, what she knows and can use, meets a steady stream of text (aural, written, etc.). Drawing a sample from the flowing river allows her to examine it within the confines of her own two hands, consider the particular life forms there, representative of the larger body of water rushing by too quickly to comprehend this way.
“If something is in me which can be called religious then it is the unbounded admiration for the structure of the world so far as our science can reveal it.”
– Albert Einstein in a 1954 letter to philosopher Eric Gutkind
Thanks for keeping it interesting to visit here, Scott.
Rob
Thanks, Rob. I have to admit I hadn’t seen that possible correlation between Jacotot’s system of mapping the new onto the known, and Vygotsky’s notion of the zone of proximal development, but it makes perfect sense.
Hi Scott, Interesting that you quote William Blake. For him ‘the whole Bible is filled with imagination and visions from end to end and not with moral virtues’; it is ‘not allegory, but eternal vision or imagination of all that exists’ (The Complete Writings of William Blake, ed. Geoffrey Keynes, 1966, pp. 774, 604). Blake’s recurring message is that we read more of the imaginations of the bard and less of the mechanical limitedness of convention.
Hmmm, interesting comment, Gareth. Not sure how to respond except to say that in the way that we use language there is a constant tension between creativity (and by extension, imagination) and ‘the mechanical limitedness of convention’.
Ha. Sorry I wasn’t fractal. If fractals are geometric patterns in nature, I think it plausible that the geometry of Shakespeare’s sonnets, children’s role-plays or jokes demonstrate the fractals of language better than a legal document or a textbook can.
Although I’m a big fan of fractal geometry, I haven’t been able to see any in language. As regards Zipf’s references in Larsen-Freeman & Cameron, although amusing, it’s not really essential to the whole volume, or argument on complex systems. I may be saying that because I didn’t get the point, fair enough, but I’ve read this book over and over again, it’s possibly my favorite book in applied linguistics, and still… I can’t see fractals in language.
As you know, I’m a big complexity enthusiast, but I’ve never seen any mentions of fractals in the field of language and education that would go beyond ‘interesting’ to something actually functional or enlightening, or whatever; something that could potentially change theory and practice.
Nonetheless, I totally recommend this old documentary:
Fractals: the colors of infinity
http://authenticteaching.wordpress.com/2010/11/05/the-colors-of-infinity/
Thanks, Willy.
Maybe the term ‘fractal’ is being used a little freely in the literature – perhaps figuratively rather than literally – in its application to language (although I think Larsen-Freeman and Cameron might take issue with your suggestion that it is simply ‘amusing’!).
But (for a start) – if we take Zipf’s Law as being generalizable to other languages (and it seems to be) – there does seem to be a universal feature of language that finely balances the need for communicative efficiency with the need for ‘least effort’, and this seems to explain the fact that certain (polysemous) words, such as so, that, well, like, are used a great deal (least effort) but that other, less frequent (and typically longer) words are needed as a buffer against ambiguity (communicative efficiency). From the least-effort point of view, it would be wonderful if there was only one word in a language (such as well) and you just repeated it endlessly, allowing pragmatics to do the communicative work. (In fact, Bakhtin does a thought experiment with the idea that well is the only word in the language, doesn’t he?) Clearly this is a nonsense, but nevertheless, we still burden words like well and like with an (almost, but not quite) disproportionate communicative load, especially in spoken language. Fortunately, we keep in reserve a whole lot of other, much less frequent words, in order to fine-tune our communicative needs and to disambiguate the overuse of the high frequency ones. But the proportion between the ‘worker words’ and the ‘drone words’ remains more or less constant – across languages and across texts. The fact that this pattern reverberates through language and languages, from sentence level to paragraph level to text level, is what has prompted the ‘fractal’ metaphor. That is to say, language is self-similar at various levels of delicacy, and, importantly, that this is not accidental : it is an effect of compression and processability, and of the trade-off between least effort and communication.
Practical application? Pay attention to the small words. And teach high-frequency prototypes. Why?
What this means is that there are some very high frequency verbs that ‘seed’ the acquisition of particular syntactic patterns. Now whether this is ‘fractal’ or not, the fact is it is a repetitive pattern, common to both first and second language learners, and does have applications in terms of teaching, I think.
Hi Scott,
Diggin on Zipf’s law, but wondering how an individual learner’s exposure to a language influences how they experience a text. I’m reminded of Gillian Claridge’s (2005) “Simplification in Graded Readers:Measuring the Authenticity of Graded Texts,” in Reading in a Foreign Language. She argues that the word frequency distribution in relation to a learner’s level and previous exposure can result in what would normally be considered inauthentic texts (graded readers) being experienced in the same way as a native reader would experience an authentic text. Do we need to simply provide students with authentic texts that are representative of the language as a whole, or does the manner in which the learner experiences a text play a role in text selection?
Thanks again for a great read.
Kevin
Great question, Kevin. If the Claridge article is to be believed – and I have to admit the argument sounds plausible – then maybe it doesn’t matter if texts are simplified – so long as, I guess, they are motivating to read – and for some learners this will be affected by their degree of ‘verisimilitude’. It has, of course, always been a contention of Widdowson’s that texts are either genuine or not, but their authenticity is in the eyes of the reader – it is the learner who ‘authenticates’ a text, whether genuine or not. Ergo, simplified texts can be authentic for some learners.
Anyway, thanks for the comment.
Hi Scott,
Like Willy I’m confused by the use of the fractal analogy. Your argument that language is similar throughout, so a snippet will represent the whole, is not the same as it being similar from near and far, which is how fractal similarity works. What would be fractal is a notion that language at the level of the phoneme, or morpheme, somehow shared properties with language further up at the level, say, of the sentence, paragraph, or text. Chomsky’s syntactic trees, for example, where the principles defining the shape of words also define the structure of noun phrases and also define the structure of the clause and sentence might possibly be described as fractal.
Dan
It’s suggestive, I think, that Chomsky’s branching trees operate at different levels of scale: phonemic, phonotactic, morphological and syntactic. Likewise, Halliday’s systemic functional grammar is described in terms of a network of binary distinctions at different levels, or scales: phonemes, morphemes and lexico-grammar. “The notion of the network of systems obviously indicates that there are interrelations between the various systems. So choices from within one system may co-occur with choices from within other systems…” (Malmkjaer, ed. 1991. The Linguistics Encyclopedia p. 447).
Is that fractal?
No idea, to be honest, Scott, you’ve lost me. I had a look at Halliday but he did my head in, and now I don’t know whether I’m perhaps a bit out of my depth. Could you provide an example of how he describes language in terms of these binary distinctions?
While I agree totally with the idea that small texts are better than long ones in presenting grammar and that coursebooks can really muddle up a perfectly decent grammar lesson by sticking a flipping great text in the middle, that doesn’t mean that ‘everything is in everything’; your elephant joke might work brilliantly at helping understand how we express the past but I don’t know how you can extrapolate from it anything about auxiliary verbs, continuous forms, the word ‘get’, etc
With fractals, you can: one simple equation answers the question how all this complexity came to be.
One way that the elephant joke is fractal (perhaps) is that each element of the text is so freighted with associations with other, prior texts, that the text is – to a large extent – predictable on the basis of these associations. That is is to say, as Hoey, puts it
If you reveal the text phrase by phrase or even word by word (and I’ve done this using powerpoint), proficient users of English can predict with an amazing degree of accuracy what the next word or phrase is likely to be:
Predictive text tools, of the type that Google provides in its Blogger software (and I wish I could find a link to Steve Neufield’s – I think – blog post about this) will do something similar, drawing on the enormous corpus of ‘used language’ that it has access to, reminding us, yet again, that (as Bakhtin put it) all language use is ‘appropriation’ and that all texts carry the traces or echoes of the texts that preceded them. The elephant text is an exemplar of a genre of jokes about elephants, of jokes using puns, of jokes for children, and jokes themselves are a sub-genre of narrative texts. So, it’s more than just the past tense: its the macrostructure of the narrative (orientation, complication etc), the sequencing devices, the use of articles and pronouns to index shared knowledge (the beach; they)… etc etc., – all of these features are represented in the elephant text, or, put another way, the elephant text is a distillation of these generic features, as well as a distillation of all text, and of all language use.
Thanks Scott, it’s finally sunk in ;o)
I can see that this intertextual interpretation of a text means that we approach one small text by drawing on a whole world of other texts in order to endow it with meaning, so in that sense everything is in everything.
For me though, the beauty of the fractal comes from the incredible complexity that springs from simplicity. Proficient users of English (such as Google!) are able to predict/generate your text because they draw on impressive banks of knowledge about the way language works -bottom-up- and the way the world works -top-down- (Google less so!). Not so much complexity from simplicity, it seems to me, as the other way round.
Thanks again for an extremely thought provoking post. I had been looking forward to this one because of your cryptic reply to my comment last week, and it didn’t disappoint. I’ve been thinking about this post all week, trying to figure out how to apply it in practice. Which leads to my question, how could I apply this in practice?
Teaching a once a week class for about an hour, with a lot of students closer to beginner level, I’ve been trying to figure out the best use of time. I like the idea of being able to mine short texts for language as it cuts down on “processing breadth” and makes space for “processing depth”, but I’m not clear in my mind on what potential class processes would look like. I just want to make sure that the students are getting a deeper exposure to the language while avoiding the pitfall of the exercise degenerating into me getting wrapped up in linguistic nuance.
Dear Scott,
thank you for this fascinating post on ‘fractal’. In my opinion, Zipf’s law can be applied to any other sphere of how this world is organized: from the biological aspect — the evolution of species where every living cell either on its own or as a part of a more complex body shares a lot of features with other cells to the dimension of how our universe and galaxy are organized if compared to the structure of an atom. You have but to agree that Zipf’s law is at work here too. If we take language as just another phenomenon of nature then it should be subject to the same organizational principle: every small part of it is a true representative of the whole, provided it is the result of natural development, but not an artificial fake model that doesn’t exist in the real world (like some teacher-written texts in some coursebooks which in my opinion have very little educational value).
Another point is how Zipf’s law can be applied to teaching practice: it is a solid argument for selecting such texts for classroom exploitation, i.e. detailed/intense reading, which are, first of all, age appropriate and age relevant, i.e. a genre familiar to the student, containing notions, concepts, functions within the scope of the student’s cognitive development and maturity, therefore becoming interesting for the student by allowing a flow of associations with the topic of the text. You might be well aware that not always coursebook writers are governed by this principle of choosing texts. They strive for something useful and beneficial linguistically and educationally making the process of reading, i.e. making sense of the text, absolutely impossible, even blocking the development of the reading skill. Therefore, Zipf’s law has strong pedagogical potential as it allows us to choose proper texts for our students: it is the same quality language whatever you choose, provided it is authentic (serves a real world task), age/interest relevant and is a proper well-formed text.
Svetlana, could you explain why we should take language as just another phenomenon of nature?
What are the other options? Is it possible to have any doubt that language isn’t just another phenomenon of nature? Then what is it? God’s creation? “At the beginning was the word” is a well-known quotation from the Bible. But what is meant by the word? If you ask me, though not being religious i would consider it sacreligious to state that the Bible is wrong. I would rather go for the following explanation of “the word”: the word stands here for “consciousness”, a unique human ability to control one’s behaviour and therefore resisit the temptations of the devil, i.e. 7 mortal sins. Hence “the word” is a tool to fight the “devil” with, to overcome the instincts, etc. Yet we admit that language is God’s creation, we are back to the Medieval times.
Here is just another example of language as a phenomenon of nature: Once i was told that there is no clear certainty yet how you should say the name of year 2021 — either twenty-twenty-one or two thousand and twenty-one. We will have to live to that year or just near it and wait until these two ways of saying the year go through the process of natural selection.
Zipf’s law provides clear evidence of the naturalistic origin of language which abides by the same rules governing the whole of nature.
Can I ask you: What is your explanation of Zipf’s law? Or you don’t believe it works?