M is for Machine translation

2 07 2017

(Or: How soon will translation apps make us all redundant?)

Arrival Movie

An applied linguist collecting data

In a book published on the first day of the new millennium, futurologist Ray Kurzweil (2000) predicted that spoken language translation would be common by the year 2019 and that computers would reach human levels of translation by 2029. It would seem that we are well on track. Maybe even a few years ahead.

Google Translate, for example, was launched in 2006, and now supports over 100 languages, although, since it draws on an enormous corpus of already translated texts, it is more reliable with ‘big’ languages, such as English, Spanish, and French.

A fair amount of scorn has been heaped on Google Translate but, in the languages I mostly deal with, I have always found it fairly accurate. Here for example is the first paragraph of this blog translated into Spanish and then back again:

En un libro publicado el primer día del nuevo milenio, el futurólogo Ray Kurzweil (2000) predijo que la traducción hablada sería común para el año 2019 y que las computadoras llegarían a niveles humanos de traducción para 2029. Parecería que estamos bien en el camino. Tal vez incluso unos años por delante.

In a book published on the first day of the new millennium, futurist Ray Kurzweil (2000) predicted that the spoken translation would be common for 2019 and that computers would reach human translation levels by 2029. It would seem we are well on the road. Maybe even a few years ahead.

Initially text-to-text based, Google Translate has more recently been experimenting with a ‘conversation mode’, i.e. speech-to-speech translation, the ultimate goal of machine translation – and memorably foreshadowed by the ‘Babel fish’ of Douglas Adams (1995): ‘If you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language.’

The boffins at Microsoft and Skype have been beavering away towards the same goal: to produce a reliable speech-to-speech translator in a wide range of languages. For a road test of Skype’s English-Mandarin product, see here: https://qz.com/526019/how-good-is-skypes-instant-translation-we-put-it-to-the-chinese-stress-test/

The verdict (two years ago) was less than impressive, but the reviewers concede that Skype Translator will ‘only get better’ – a view echoed by The Economist last month:

Translation software will go on getting better. Not only will engineers keep tweaking their statistical models and neural networks, but users themselves will make improvements to their own systems.

Mention of statistical models and neural networks reminds us that machine translation has evolved through at least three key stages since its inception in the 1960s. First was the ‘slot-filling stage’, whereby individual words were translated and plugged into syntactic structures selected from a built-in grammar.  This less-than-successful model was eventually supplanted by statistical models, dependent on enormous data-bases of already translated text, which were rapidly scanned using algorithms that sought out the best possible phrase-length fit for a given word. Statistical Machine Translation (SMT) was the model on which Google Translate was initially based. It has been successful up to a point, but – since it handles only short sequences of words at a time – it tends to be less reliable dealing with longer stretches of text.

star trek translator.png

An early translation app

More recently still, so-called neural machine translation (NMT), modelled on neural networks, attempts to replicate mental processes of text interpretation and production. As Microsoft describes it, NMT works in two stages:

  • A first stage models the word that needs to be translated based on the context of this word (and its possible translations) within the full sentence, whether the sentence is 5 words or 20 words long.
  • A second stage then translates this word model (not the word itself but the model the neural network has built of it), within the context of the sentence, into the other language.

Because NMT systems learn on the job and have a predictive capability, they are able to make good guesses as to when to start translating and when to wait for more input, and thereby reduce the lag between input and output.  Combined with developments in voice recognition software, NMT provides the nearest thing so far to simultaneous speech-to-speech translation, and has generated a flurry of new apps. See for example:



One caveat to balance against the often rapturous claims made by their promoters is that many of these apps are trialled using fairly routine exchanges of the type Do you know a good sushi restaurant near here?  They need to be able to prove their worth in a much wider variety of registers, both formal and informal. Nevertheless, Kurzweil’s prediction that speech-to-speech translation will be commonplace in two years’ time looks closer to being realized. What, I wonder, will it do to the language teaching industry?Universal-Translator-FI.png

As a footnote, is it not significant that developments in machine translation seem to have mirrored developments in language acquisition theory in general, and specifically the shift from a  focus primarily on syntactic processing to one that favours exemplar-based learning? Viewed from this perspective, acquisition – and translation – is less the activation of a pre-specified grammar, and more the cumulative effect of exposure to masses of data and the probabilistic abstraction of the regularities therein. Perhaps the reason that a child – or a good translator – never produces sentences of the order of Is the man who tall is in the room? or John seems to the men to like each other (Chomsky 2007) is not because these sentences violate structure-dependent rules, but because the child/translator has never encountered instances of anything like them.


Adams, D. (1995) The hitchhiker’s guide to the galaxy. London: Heinemann.
Chomsky, N. (2007) On Language. New York: The New Press.
Kurzweil, R. (2000) The Age of Spiritual Machines: When Computers Exceed Human Intelligence.  Penguin.


C is for Contrastive analysis

27 01 2013

Charles-BridgeAn article in the latest Applied Linguistics (Scheffler 2012) makes a robust defence of some discredited classroom practices, including the use of translation. While lamenting the lack of research into the effectiveness of translation, Scheffler reports a couple of studies that suggest that learners exposed to cross-linguistic comparison (also called contrastive metalinguistic input) out-perform those who have had grammar presented to them solely in the target language. The author concludes that ‘teachers who resisted the ban on [translation] in the classroom may have known what they were doing’ (p. 606).  In this wise, Scheffler echoes the thrust of Guy Cook’s (2010) book, discussed in this blog here.

Interestingly, neither Scheffler nor Cook reference the work of the ‘Prague School’ of linguistics, and especially of its founder, Vilém Mathesius, whose application of cross-linguistic comparison to the teaching of foreign languages seems to have been a methodological staple in (then) Czechoslovakia until at least the late 1960s.

PragueatnightJust a bit of background: the Prague School flourished in the 1920s and 1930s, and was distinguished by at least three major breaks with tradition:

1. In contrast to the then predominant preoccupation with historical linguistics (and what the past might tell us about ‘correct’ language use), Prague School linguists were more concerned with language as it is currently used now, hence had a less prescriptive, more relaxed approach to acceptable language use (and one which perhaps foreshadowed the development of corpus linguistics);

2. In viewing language as an integrated, interdependent system, in which all its elements stand in some relationship with one another, such that no single element can be viewed in isolation, the Prague School was able to show how changes in one element might affect changes across the system, thereby modelling systematic language change and variability, and, incidentally, helping to establish linguistics as a discipline in its own right, rather than as a branch of psychology or philosophy; and

3. (perhaps most importantly), Prague School linguists shifted the prevailing focus on linguistic structures to a focus on the communicative functions of language, thereby paving the way for the kind of functional linguistics associated with Michael Halliday, and, by extension, communicative language teaching.

Prague-CastleThe conjunction of both a descriptive and a functional perspective prompted an interest in comparative linguistics, and, specifically, in the way that different languages express the same functions.  Languages, for example, divide between those (like English) that express movement using constructions where the manner is encoded in the verb and the direction in a particle, e.g. Juan ran in (the house); she limps out (of the kitchen), and those (like Spanish or French) where the direction is expressed in the verb, while the manner is expressed in some kind of non-finite construction: Juan entró (en la casa) corriendo;  elle sort (de la cuisine) en boitant. Hence English is particularly well endowed with manner of movement phrasal verbs: saunter off, stride about, scurry away, slide down, etc.

Prague school linguists argued that ‘confronting’ such differences should form the basis of language course design and classroom practice. As Vachek (1972: 24) puts it,

‘In language teaching, the instructor using the contrastive method makes a point of stressing, in the taught foreign language, not only those of its features which are identical or parallel in it with the corresponding features of the pupil’s mother tongue, but also, and particularly, those features in which the two languages are found to differ.’

Likewise, Fried (1968: 45), another Prague School associate, advocated an approach in which ‘the student is systematically guided and made to realize the functional differences that exist between the foreign language…and his native tongue’, and he adds: ‘Two-way translation may not be excluded here’.

Prague-TowerHow was this realized in practice?  Tantalizingly, in a footnote Fried refers to a series of textbooks, called Nová cesta k jazykum: Co není v učebních (‘A New Approach to Languages: What cannot be found in textbooks’), one of which, written by Mathesius himself and published in 1936, was called Nebojte se angličtiny (‘Don’t be afraid of English’). (Can my dear readers in the Czech Republic or Slovakia keep an eye out for this – the title alone is worth the price of the book!)

Short of knowing how Mathesius went about it, I’m assuming that one way of realizing a ‘confrontational approach’ might be to take the deductive route, in which the rules of the target language are compared and contrasted with those of the students’ mother tongue.

A more inductive approach, however, seems better attuned to current methodology. Here, for example, is my attempt to contrast a feature of English which is not shared to anything like the same extent with Spanish:

1.            Read the text in English and Spanish


Coffee is made from the beans of the coffee plant. Coffee bushes grow best in warm, wet highland areas, such as in Brazil and Kenya. Inside each red berry are one or two beans. At harvest time, the beans are removed and dried in the sun. Then they are roasted until they are brown, and sold, either ground or whole. Coffee is exported all over the world.


El café se hace con los granos de la planta del café.   El arbusto de café crece mejor en tierras altas, cálidas, y húmedas, como en Brasil y en Kenia. Dentro de cada baya roja hay uno o dos granos. Después de la recolección, se extraen los granos y se secan al sol.  Entonces se tuestan hasta que adquieren un color morrón, y se venden, o molidos o en grano. El café se exporta a todo el mundo.

2.            How do you say…

1. El café se hace con los granos de la planta del café.
2. Se extraen los granos
3. Se secan al sol.
4. Se tuestan
5. Se venden
6. El café se exporta.

3.            Can you work out the rule for these sentences? How does it differ from Spanish?

Incidentally, has anyone done ‘running translations’? I.e., as in running dictations, a text is pinned up at some distance from where the students, working in groups, have to translate it, one student acting as the ‘runner’. Is there any mileage in it, do you think? 😉


Cook, G. (2010) Translation in Language Teaching, Oxford: Oxford University Press.

Fried, V. (1968) ‘Comparative linguistic analysis in language teaching’, in Jalling, H. (ed.) Modern Language Teaching, London: Oxford University Press.

Scheffler, P. (2012) ‘Theories pass. Learners and teachers remain,’ Applied Linguistics, 33, 5: 603-607.

Vachek, J. (1972) ‘The linguistic theory of the Prague School’, in Fried, V. (ed.) The Prague School of Linguistics and Language Teaching, London: Oxford University Press.

Many thanks to Jeremy Taylor for his lovely photos of Prague.

T is for text-based curriculum

4 12 2011

Nigel Davies, who runs a school in El Prat de Llobregat, near Barcelona, wrote to me last week:

I’m doing an experimental kind of class here at the school, which, if you have time I would like to hear your thoughts on.

It’s a post CAE class mixed bag of wannabe one day proficiencies and other advanced students. I didn’t want to do an exam-based course, and couldn’t find a suitable high level general texbook, so someone suggested doing some Engl Lit, maybe one of the classics, which was a possibility, but not for a whole course, so I settled on one of Malcolm Gladwell’s books. Do you know his work? I chose ‘Outliers’ a study of how people become successful, as it has lots of stories of different people in different situations to back up his central thesis, and there was lots of extra material on internet, both spoken and written.

What we do is varied ( I hope). We do lots of vocab work on the text, some grammar, various approaches to text comprehension, and compare clips of or about the various people involved with the written text. The students have to read sections of the book ahead of time, so that the material is fresh for discussion, and for closer textual work on gram or voc, I have them use the text in class to find examples. […]

They’re finding the material very interesting, and are managing to keep up with the reading load.  Still, as there’s no external ‘help’, I have to create all the activities and do a lot of extra research, which is very time consuming, if at times personally rewarding!!  […]

It would be interesting to know if you’ve ever run a course like this or what your thoughts are on using this kind of authentic material over a long period of time…

A number of thoughts were triggered by Nigel’s account:

Years ago I had a DELTA trainee who was in a similar situation, with a  group of women who had completed the Cambridge FCE the year before and wanted a break from exam-driven classes. They decided they would all subscribe to a women’s magazine, the choice being agreed mutually, and that this would provide the course content, in much the way that Gladwell’s book does for Nigel’s class. The experiment was rated a great success.

The idea of basing a second language curriculum on a single text has a long history. I’m currently reading Jacques Rancière’s (1991) account of how, in 1818, the French schoolteacher Joseph Jacotot developed an innovative method of teaching Flemish (of which he spoke not a word) by basing the whole course on one (bilingual) text, Fenelon’s Télémaque (1699), although – as the translator notes (p. 2), ‘In terms of Jacotot’s adventure, the book could have been Télémaque or any other’.  For Jacotot, “all the power of language is in the totality of a book” (p. 26).

Click to expand

In similar style, I own an 1872 edition of a textbook by a certain T. Robertson that is based entirely on the study of a single text, spread over 20 units. The first unit of the first course starts with the first sentence of the text (apparently a story from the Arabian Nights).

The text is first translated, word by word, and phrase by phrase, and this forms the basis of exercises that involve translating the text back and forth.  The course continues, a sentence at a time, through the complete story.

What are the pros and cons of basing a course on a  single text?

Obviously, one disadvantage would be the possible boredom that might set in, as learners tire of the same text. This, of course, could be off-set if the text were one that had been mutually chosen, and/or one that was relevant to their lives, study or work, and/or one where there was built-in variety (as in the case of the women’s magazine).

Another problem might be the relatively narrow lexical focus. What kind of word coverage do you get from a novel, for example? At the same time, this could be seen as an advantage, in that ‘narrow reading’ allows a greater degree of turnover of the same vocabulary items, optimising the chances of these items being learned. Coursebooks, that jump from topic to topic, are notoriously poor at providing the number of repeated word encounters that are considered necessary for incidental learning to occur. A course based on a single text might lose out on lexical range but score highly in terms of lexical retention.

To me, a real advantage of such an approach is that it is essentially meaning-driven, and that the language that the learners have to engage with, in order to understand the text, has not been pre-selected and pre-graded, and hence is more representative of language in the real world. Moreover, by virtue of its being both self-selected and authentic, such a text may offer a more engaging stimulus (than coursebook texts customarily do) for other, ancillary activities, such as discussion and writing.

Has anyone else out there tried this kind of approach to course design?


Rancière, J. 1991. The Ignorant Schoolmaster: Five lessons in intellectual emancipation. Stanford: Stanford University Press.

G is for Grammar-Translation

15 10 2010

A new book by Guy Cook, called Translation in Language Teaching (Oxford University Press, 2010), dropped into my letterbox last week, and makes compelling reading. Without giving too much away, this extract, from the very last page, captures not only his thesis, but something of the passion with which it is argued:

A great deal remains to be done before TILT [Translation in Language Teaching] can be rehabilitated and developed in the way that it deserves.  The insidious association of TILT with dull and authoritarian Grammar Translation, combined with the insinuation that Grammar Translation had nothing good in it at all, has lodged itself so deeply in the collective consciousness of the language-teaching profession, that it is difficult to prise it out at all, and it has hardly moved for a hundred years.  The result has been an arid period in the use and development of TILT, and serious detriment to language teaching as a whole. (p.  156)

This prompted me to resurrect from my files a coursebook proposal (yes, I know, I know) that I drafted over ten years ago, aimed at rehabilitaing GT within the umbrella of a communicative approach. This is an edited extract from the Rationale:

Grammar-Translation (GT) has come to be seen as the antithesis of good teaching practice, and much scorn is customarily heaped upon it. This bad reputation is not entirely undeserved: GT is associated with a very grammar driven approach to learning, with an emphasis on accuracy rather than fluency, and on the written form rather than the spoken form. Moreover, most exercise types in traditional GT courses work at the sentence level or below: there is no such thing as authentic text, for example, in a standard GT course. In fact, inauthenticity is a hallmark of GT courses, and lends itself to endless ridicule.

None of the features commonly associated with GT, however, – its accuracy-driven sentence-level grammar-focus, nor its inauthenticity – are necessarily intrinsic to it. They are simply excess baggage that GT accreted in its passage through the nineteenth century.

Old-fashioned GT course

The notions of fluency, skills work, and whole texts are not in the least incompatible with a translation-mediated approach to the presentation and practice of grammar and vocabulary.

The fact is that a vast number of teachers, both native-speakers and teachers who are speakers of languages other than English, use translation on a regular basis in their teaching of English. They do this because of common sense practical reasons, but without necessarily compromising their adherence to a communicative philosophy.

…An approach that uses translation as a vehicle for teaching the meaning and use of the second language “code” respects the universal tendency to build from the known to the unknown, and, at the same time, does not insult the intelligence and preferred learning styles of most learners.

From the affective point of view, L1 reference provides the support that many beginning learners are desperately in need of. Moreover, by recognising the validity and relevance of the learners’ mother tongue in learning a second language, a GT approach does not devalue the learner’s culture, background and experience to the extent that an “English only” approach might seem to.

…Finally, there are sound practical reasons for rehabilitating translation in the classroom.  The current reaction away from communicative syllabuses, and the resultant resurgence of grammar has meant that grammar teaching occupies more classroom time than ever – at the expense of opportunities for authentic language use. The economy and efficiency of translation as a means of grammar presentation – as opposed to such direct method techniques as demonstration and situationalization – is an argument for its reinstatement: if nothing else, it saves time.

To summarise, then, it is my belief that EFL materials need to catch up with EFL practice. The rehabilitation of translation-mediated learning through GT-style materials is an idea whose time has come. In the absence of a global initiative, local publishers will soon rush in to fill the vacuum. This could be the biggest EFL publishing breakthrough since the advent of the functional syllabus.

I sent the proposal, including some sample units (one with Spanish as the mediating language, another with German), to a leading ELT publisher. Receipt was politely acknowledged. That was the last I heard from them.

Now that Guy’s book is out, would I stand a better chance, I wonder?

T is for Translation

21 04 2010

During a talk on grammar teaching techniques, last week in Turkey, one participant queried my suggestion that translation could be a useful technique for raising awareness of similarities and differences between the students’ L1 and the target language. I went so far as to suggest that – with some structures (such as the future perfect) it could be the most economical way of presenting them. However, the participant felt (strongly) that encouraging learners to translate L1 forms into the L2 would cause negative transfer.

This led to an interesting discussion with other trainers and teachers, after the session, as to the current status of translation – specifically as a means of presenting grammar – on methodology courses, and prompted me to re-visit the entry in An A-Z of ELT. There I don’t exactly come out in favour of translation, but, in weighing up the pros and cons, I definitely give translation the last word. To quote:

Apart from being a skill in its own right, translation is also an aid to teaching and learning a second language. In this sense, translation has been central to some teaching methods, such as grammar translation, and frowned upon by others, such as the direct method. The reasons for not using translation in teaching include the following:

  • translation encourages a dependence on the L1, at the expense of the learner constructing an independent L2 system
  • translation encourages the notion of equivalence between languages, yet no two languages are exactly alike (although languages from the same language family may be similar in lots of respects)
  • the L1 system interferes with the development of the L2 system
  • translation is the “easy” approach to conveying meaning, and is therefore less memorable than approaches that require more mental effort, such as working out meaning from context
  • the “natural” way of acquiring a language is through direct experience and exposure, not through translation
  • translation is simply not feasible in classes of mixed nationalities, or where the teacher does not speak the learners’ L1.

On the other hand, the arguments for using translation in the classroom include:

  • new knowledge (e.g. of the L2) is constructed on the basis of existing knowledge (e.g. of the L1), and to ignore that is to deny learners a valuable resource
  • languages have more similarities than differences, and translation encourages the positive transfer of the similarities, as well as alerting learners to significant differences
  • translation is a time-efficient means of conveying meaning, compared, say, to demonstration, explanation, or working out meaning from context
  • learners will use translation, even if covertly, as a strategy for making sense of the L2, so it may as well be used as an overt tool
  • the skill of translation is an integral part of being a proficient L2 user, and contributes to overall pluralingualism
  • translation is a natural way of exploiting the inherent bilingualism of language classes, especially where the teacher is herself bilingual

The question is, do the pros outweight the cons – or should I have emphasised the negative factors more strongly?