O is for Outcomes

10 02 2013

New campus, American University, Cairo

I have to say at the outset that I have an almost pathological horror of testing and assessment. All my worst teaching and teacher training experiences relate directly to issues of assessment. I don’t mean assessment of me (although negative assessments of my capacity to teach may well have resulted from my incapacity to ‘do’ assessment effectively). I mean my assessment of my students. Things can be going along just swimmingly until the day of the test, or the day when I’m required to post a grade. Then all hell breaks loose. The cozy relationship I had built up with my class or with individual students is shattered irreparably. Often this has to do with failing a student, but just as often it has to do with a student not getting the A grade they had always got in the past. Or, worse still, not getting the one percentage point that will make the difference between continued funding or having to leave the program for good.

I don’t deny that testing – like death and taxes – is unavoidable. As Johnston (2003: 77) puts it, testing is a necessary evil. It is necessary because learners and other stakeholders need feedback on progress. (Or, arguably, they have a right to feedback). And this is what testing does: it provides feedback, in accordance with principles of validity, reliability and fairness.

But, at the same time, testing is evil. Why? Because it assigns a value to the learner, and, since the value is almost always short of perfection, it essentially de-values the learner. Worse, testing typically involves measuring students one against the other, thereby destroying at a blow the dynamic of equality that the teacher might have judiciously nurtured up until this point.

Testing is evil because it is stressful for all concerned, and because the conditions under which testing is conducted (separated desks, no mobile phones, etc) imply a basic lack of trust in the learners.

It is evil because it pretends to be objective but in fact it is inherently subjective. Why is it subjective? Because, as Johnston (op. cit: 76-77) points out, ‘the selection of what to test, how it will be tested, and how scores are to be interpreted are all acts that require human judgment; that is, they are subjective acts’. Ultimately, it is the tester – not the test-taker – who decides what counts as knowledge, and how you count knowledge.

And, finally, it is evil because the kind of knowledge implicated in language learning is uncountable. More on that later.

For all these reasons, I avoid, as much as I can, having to talk about testing, and have refused more than one conference invitation because the theme was in some way connected to assessment.

Dr Deena Boraie, Nile TESOL 2013

At the same time, I am fascinated by – and a little envious of – those conference presenters who seem happily to embrace the topic of testing – such as the indomitable, and utterly charming, Dr Deena Boraie of the American University in Cairo, who was one of the plenary speakers at the Nile TESOL conference last week (on the wonderful new AUC campus – see pics).

Deena presented a lucid, non-technical rationale for the need for ‘assessment literacy’ on the part of teachers and other stakeholders. This included some straightforward tips on how to achieve validity, reliability and fairness in teacher-designed classroom tests. With regard to test validity, Deena’s recommendation is that tests should be judged in terms of how faithfully they reflect curriculum goals, typically encoded as learning outcomes. If the desired outcome is vocabulary knowledge, this should be reflected in the test. If it is reading ability, ditto.

While this makes perfect sense, it does rather sidestep the fact that the very notion of outcomes is not an entirely unproblematic one. For a start, and as I suggested earlier, language learning does not lend itself to easily quantifiable outcomes. Johnston again: ‘Neither language nor competence in language is naturally measurable’ (op. cit: 83). (He might also have added that teaching is not naturally measurable either – a conundrum for those of us who have to grade teachers). He continues: ‘The fundamental immeasurability of language competence lends a further moral dimension to our work in language assessment; the decisions we are forced to make about how competence will be assessed are always subjective and thus can only be rooted in our beliefs about what is right and good, beliefs which, we must always acknowledge, could be mistaken’ (ibid. emphasis added).

That’s not the only problem with outcomes-driven testing. An obsession with pegging learning to preselected and minutely-detailed outcomes now pervades every aspect of education (as I am discovering at the moment at my own place of work). Where does this love affair with outcomes come from?

Some would argue that it comes from the world of business, from what has been dubbed the ‘marketization of education’. As Gray and Block (2012: 121) gloss it, ‘In such an educational climate, students are increasingly seen as customers seeking a service and schools and teachers are, as a consequence, seen as service providers. As this metaphorical frame has been imposed… the semantic stretching of keywords from the world of business… has become commonplace. Thus terms such as “outcomes”, “value added”, “knowledge transfer”, “the knowledge economy” and above all “accountability” have become part of the day-to-day vocabulary of education’.

In an invigorating swipe at the culture of accountability, Frank Furedi, a sociology professor in the UK, condemns outcomes-driven education as ‘a technique through which a utilitarian ethos to academic life serves to diminish what would otherwise be an open-ended experience for student and teacher alike.’ And he adds, ‘Its focus on the end product devalues the actual experience of education. When the end acquires such significance, the means become subordinated to it’.

The means become subordinated to the ends. Isn’t this, finally, the real problem of testing?

References:

Gray, J., & Block, D. (2012). ‘The marketisation of language teacher education and neoliberalism: Characteristics, consequences and future prospects,’ in Block, D., Gray, J., & Holborow, M., (eds) Neoliberalism and Applied Linguistics, London: Routledge.

Furedi, F. (2012) ‘The unhappiness principle’, The Times Higher Education Supplement,

http://www.timeshighereducation.co.uk/story.asp?storycode=421958

Johnston, B. (2003) Values in English Language Teaching, Mahwah, NJ: Lawrence Erlbaum.

Actions

Information

Date : February 10, 2013
Tags: assessment, evaluation, neoliberalism, outcomes, test validity, testing
Categories : English language teaching

97 responses

10 02 2013: Cecilia Lemos (09:17:14) :

Yes it is… simple as that, Scott. As someone who teaches students preparing for tests, that’s the sad true. I don’t agree with it, I think it’s unfair – I’ve seen extremely communicative but test-nervous students fail – but so is life. And so far, I haven’t seen any “test” able to measure students’ abilities without looking – and feeling – as a test.

However you have only approached (correct me if I am wrong, please) summative testing… I see great value at formative testing.I do it.

Food for thought, as usual, nonetheless.

Cheers,

Ceci

Reply
11 02 2013: Scott Thornbury (09:22:31) :

Thanks, Ceci, for being the first cab off the rank!

“However you have only approached (correct me if I am wrong, please) summative testing… I see great value at formative testing.I do it.”

Yes, and this is a theme that is picked up later in the comments. The distinction between formative and summative assessment is crucial – although it’s a distinction that is not always clear nor honored.

Here’s Bill Johnston (2003) again: “Grades or marks that are meant to have a formative role take on certain summative qualities. For example, many kinds of evaluation, such as quizzes and midterms, are intended to let students know how they are doing. Yet often scores from these sources also factor into final grades; thus, the evaluation is also summative in that it forms part of the summative grade. Moreover, even when this is not the case, formative evaluations look like summative ones; they often come in the form of scores on tests in which there is little in the way of feedback. I suggest that this resemblance leads learners to see the teacher as a judge rather than as a teacher, once again affecting the teacher-student relation” (79-80).

Reply
10 02 2013: Tom (10:31:35) :

Morning Scott,
Very interesting post this morning. As I work for a testing organisation based in a boggy part of the east of England, I feel compelled to throw my cap into the ring. To my mind this is a Marmite question: you either like tests or you loathe them. (I have a colleague where I teach who at the mere mention of the word becomes disturbingly animated.)
The arguments against are difficult to answer, in some cases impossible. But, at the same time, what I’ve found with my students is that many want them, they are convinced of their utility especially for work and they are motivating. Do they condition the contents of a course? Yes. But is it always a bad thing?

Reply
11 02 2013: Scott Thornbury (09:24:20) :

It’s true, Tom, that I sometimes wish I liked testing more: it would make my job easier! But, like Marmite, perhaps it’s an acquired taste.

Reply
10 02 2013: Candy van Olst (10:47:15) :

Hear hear, Scott. I thank the powers-that-be on a daily basis that I work in a place where we don’t have to test.

Reply
10 02 2013: Hatice Asvaroglu (10:54:31) :

Could not agree more! As educators, we are all well aware of this problem!

What about the solutions? I Would rather focus on possible solutions.
Particularly the devastating effects of assessment on the relationship between a teacher and the learners is obvious! For example, last week I was unconsciously feeling miserable about posting the final grades to my ELT students, telling them they failed; particularly damaging their motivation to learn.

What do you all think, as teachers, we should do about this? Putting more value in the process of teaching than assessment?

Reply
10 02 2013: Rose Bard (16:17:19) :

Hatice, good point!

But change doesn’t come easy when culture has been established I’m afraid.

I’m lucky though that the language school I work for cares more about students progress than numbers (making money). It takes a lot of time and patience to deal with this cultural aspect and expectance, but it is worth working on the students (and family) perspective of learning for instance. But even in our school that doesn’t have to worry so much about making money as it is a huge institute that offers community a range of opportunity for development. And last year, 800 students got full scholarship in regular school and another 900 received some kind of scholarship to study with us and money is invested all of us, we still have parents own away of viewing success and that therefore is through numbers (grades). How we assess and present those grades is that make the difference. However when it comes to standardized exams, it is another journey to endure for teachers and students. For me the two journeys are separate roads.

Reply
11 02 2013: Scott Thornbury (09:25:48) :

Good questions, Hatice. I’m going to work through the rest of the comments before I commit to an answer – maybe the questions will have been answered by others more qualified to address them!

Reply
10 02 2013: Chaz Pugliese (11:19:56) :

Thanks, Scott. I agree in spades. The problem as I see it, is that we’ve managed to install an ethos of achievement to the detriment of learning, and this industrial culture that emphasizes grades, scores and tests has done significant damage to our schools: we’ve produced a generation of stressed out students. For these students, achievement, not learning, is the most important thing. We’ve allowed the emphasis to shift from discovery to comformity, from creativity to standardization. And schools have greatly exaggerated the importance of extrinsic rewards. As a result, students don’t study for intrinsic reasons anymore. In a way, they don’t even behave as students anymore and perhaps we should call them point collectors instead (as Al Kohn suggests). This is happening precisely because we play down the fact that engagement in school should provide intrinsic satisfaction.

One of the most well-established findings in the field of motivational psychology is that the more people are rewarded for doing something, the more they tend to lose interest in whatever they had to do to get the reward. Study after study has found that students — from elementary school to graduate school, and across cultures – demonstrate less interest in learning as a result of being graded (Benware and Deci, 1984). One other thing: grades don’t encourage students to be autonomous. Fostering autonomy is an uphill battle because modern society doesn’t: the philosopher Charles Taylor has referred to this as the malaise of instrumental reason. Everything gets to be evaluated in terms of its bottom-line yield. The cost-benefit ratio, if you will. What a horrible metaphor for education. We need to change this!

Reply
11 02 2013: Scott Thornbury (09:32:08) :

Thanks, Chaz – I can’t improve on your argument nor better your passion! Thanks for the reference to those motivational studies – it seems to me that research into the motivational effects of testing need to be made more widely known. Does the expectation of grading motivate learners? If so, which learners, and to what ends? (I expect that there are only certain types of learners – high-achievers, perhaps – who actually respond positively to being tested).

Reply
10 02 2013: k. liz (11:30:45) :

Thank you for this post. I have really enjoyed it, and a lot of your statements resonate with me. I am teaching in Turkey which is very assessment oriented, as the students have been working towards exams for most of their lives. I am actually about to start an Action Research project to explore whether or not more frequent assessment increases student motivation because of their background.

I’d love to see your own conclusion on how to fix the problems you mention in this post. What can we practically do? If students need the language for school, how can we assess whether or not they are ready without actually placing so much emphasis on the assessments?

It is quite the round about issue in ELT, and perhaps there will never be an easy answer. Then again, I guess a lot of education experiences problems like this.

Thanks!

Reply
11 02 2013: Scott Thornbury (09:33:17) :

Research! That’s fantastic – see my previous comment to Chaz. Please let us know how you get on!

Reply
10 02 2013: Dan (11:54:40) :

Perhaps a distinction between formal assessment and informal and day-to-day testing would be useful here. To put a more positive spin on tests, they can be motivating when used judiciously. The mere word ‘test’ is a great way to get attention, wake students up and force concentration. The instruction: ‘Listen carefully, because I’m going to test you afterwards,’ when used sparingly usually works, and students invariably remember the salient points in a comprehension afterwards. During a lesson we test outcomes, or at least should do, every time we ask a question, conduct feedback, or organise a communicative activity. Much of the thrust of the current move for greater rigour in ELT is to do with this call for teachers to challenge their students to show that they know. Testing outcomes, in other words.

Reply
10 02 2013: Natalia Guerreiro (20:57:33) :

Couldn’t agree more! We’re constantly assessing sts even when we don’t have the word ‘test’ in our minds.

Reply
11 02 2013: Scott Thornbury (09:39:15) :

Thanks, Dan. The case for informal testing of the formative type is compelling, but I’m not so sure it’s always done in the right spirit – or perceived as such. So often even informal assessment seems to focus on what the learners can’t do – i.e. their distance from the target – rather than what they can do – their potential, as it were. And then there’s the question that – while some students rise to the challenge of knowing they are being assessed – others under-perform, with the result that expectations of success are reduced, and this can initiate a downward spiral of further failure and yet lower expectations.

Reply
14 02 2013: Jo (15:49:05) :

Yes, Dan… we often use this in the classroom to make students pay attention. It proves they can regurgitate information within a short time frame. In my opinion it does nothing to show the real educational value of such techniques, and completely disregards and, in most cases, replaces the need for meaningful input.
But, I am a testophobe. Despite performing very well in exams at school, as an educator I do not believe that the formal type of testing that prevails (especially summative assessment) at the moment benefits the learner.
I agree with Chaz and will add that if we can somehow encourage intrinsic motivation in learners then we need not the argument for testing as motivation.

Reply
14 02 2013: Jo (15:56:27) :

PS. (sorry) I also agree with you Dan when you say that we need to encourage students to show what they know and this way we can assess them without a formal test.

Reply
10 02 2013: Scott C (12:31:06) :

Yes!! It’s all marketing’s fault 🙂 It’s interesting how tests are often justified as students need to be assessed for further studies, but as a native speaker, I still failed uni subjects! Nothing to do with language ability. I did a CELTA course recently and only did input sessions. It was wonderful!!

Reply
10 02 2013: Gareth Knight (12:38:39) :

Is education context or content? Is the desired outcome experience or product? People are more than their grades.

Reply
10 02 2013: Karenne Sylvester (12:55:11) :

I agree completely as both a teacher and a learner currently biting nails waiting for results on an assignment in that assessment can be tedious, however, I do think that the search for an outcome is a human one- not just something borrowed from the business world.. in that we always want to know what the fruit of our labour will be, in whatever endeavour we embark on… to learn is to learn “something”.. there is a purpose to study and it seems to me contrary to Furedi, silly to learn just to learn just to learn… life is too short and I would add too important for that riddle!

Although we say things like it’s the journey that counts, not the destination, fact is without a destination then what is a journey? Even if that destination changes.

Anyway, actually this week I had really wanted to get back to my own blog to share buzz about an experiment I did, an experience of assessment I carried out this week. But as I have no real time to blog anymore, and it fits to this topic, I’ll just share here, and I hope you don’t mind…

I currently have a class of 12 Intermediate students who are nearing the end of the book, the next step, post-assessment, would be to move up to UpperIntermediate, however not all of them are “ready to.” Like many classrooms, around the world, they have different spikes of ability and different difficult troughs of non-ability… sometimes related to them not “getting” something, but sometimes because they weren’t there for a lesson or sometimes that they’ve simply forgotten how to do something… so how could I teach to all of their different strengths and weaknesses within the class framework and still get them ready for the end of the course?

A couple of weeks ago I told the students we would do a test that I would not look at. They would use it instead to discover their own problem areas. The book I’m using (SpeakOut) comes with different types of tests on its DVDrom – quick tests of each unit (which I usually give them weekly); Progress tests, (which I don’t do); Mid-Course Tests and End of Course tests. A lot of tests. Last week I took the Progress Tests for units 1 to 8, and then in Word, amalgamated them into blocks – all the listening together/all the vocab and grammar/ all the reading/ speaking/ writing. Over a two day period (approx 6hrs) the students took these tests. Because they all knew that there would be no repercussions, they did so without complaint. At the end of the second day, I gave them a sheet with the answers and got them to mark their own answers.

On the morning of the third day, I gave them a self-evaluation sheet I has made, which invited them to personally think about the last few months of classes units 1 – 8, to look over the monster test they had just taken, and I also handed them over their complete files (which included results of other tests, their introductory test and other bits of paperwork including their written essays). Additionally they were invited to think about what kind of learners they are, and the strategies that help them to learn best. Based on their own understanding of where they were strongest and where they were weakest, they could then choose three areas they wanted to spend the next three days working on improving in.

I stacked a box of grammar, vocabulary books on the table, a box of headsets, provided a sheet with links to websites that could help them practice speaking/listening/reading/writing/grammar/vocab/pronunciation and then told them that for the next three days they were in charge of their own learning – they could study whatever and wherever they liked; in groups, in pairs, on their own; in the classroom, on the sofas in the hallways, in the computer room, in the library, at Costa; and could bring in their own tablets/netbooks/laptops/smartphones – whatever and wherever they wanted to – I said over and over “you’re the boss of your learning.”

Although I probably shouldn’t be surprised, I was, because it was a complete success. The students showed so much enthusiasm for doing their own assessment, because they were in complete control (I did not take copies of their work or looked at the results – some showed me where they had gone wrong but it wasn’t “my business” to look or comment, it was “their business”) and because they had a clearly defined outcome in mind – I circulated the school to help individual learners, and was in the classroom to answer specific questions as and when they arose.

Overall, I found that they were very realistic about their problem areas, (each morning when I took attendance they told me what they were working on and why they had chosen that area) and I had zero issues with student discipline. Motivation was through the roof.

So the moral of my rather long comment: it isn’t the outcome that’s the problem, it’s the question of who is in control of said outcome…
Karenne

Reply
10 02 2013: Karenne Sylvester (15:43:52) :

just to add quickly for clarification, there are ten units to the book, so this experiment at the end of u8, was done as a way of getting them thinking about their individual core issues, prior to the real “end-of-course” assessment – so not teaching “to” the book’s outcome but teaching “with” it in mind…if that makes any sense.

Reply
11 02 2013: Scott Thornbury (09:50:43) :

Thanks for sharing that with us, Karenne. You make the point beautifully that ‘it isn’t the outcome that’s the problem, it’s the question of who is in control of said outcome,’ and this is a good example of how self-assessment can empower learners.

Nevertheless, I am still a little nervous of the validity of the testing instrument itself. If, as you suggest, the tests included testing of skills, such as speaking and writing, and at the same time there is a scoring sheet that the students themselves could use, what exactly was being tested? For example, were they speaking, recording themselves, and then scoring themselves according to a rubric? If so how skilled were they at doing this? Or was the test — as I suspect — essentially a written test with lots of discreet-item questions, possibly of the multiple-choice format — easy to set and to score, i.e. a test of knowledge rather than of the ability to apply that knowledge? In which case, how useful were the results in terms of effecting improvements in their performance? This is not to invalidate your excellent experiment, but just to show how difficult it is to identify and assess the outcomes that really matter.

Reply
10 02 2013: John Hanes (13:26:04) :

Why not reinvent testing so that is more accurate and less threatening. Instead of one or two huge threatening all-evaluative assessments try mini-assessments for each learning point. You get a dynamic picture of what the student actually learns at different stages of the course. The accuracy of scoring is directly tied to concrete lessons rather than abstracts expectation of outcomes. The mental stress of testing is distributed and thereby reduced as the students become habituated to shorter more frequent format. There is plenty of educational psych research to support the efficacy of quizzes as a strong motivator for frequent short term recall for a wide variety of content. I say scrap the big scary final exams and do sharp focused frequent mini-quizzes that you aggregate scores and get an overall long term average for, that’s more informative than the traditional final exam.

Reply
11 02 2013: Scott Thornbury (09:56:16) :

I agree, John, then the ‘big scary final exams’ could usefully be binned — but to replace them with lots of little tests might not necessarily be the best remedy, in that it could instil a mindset of constant assessment. Also, I’m not sure what — in language learning terms — a ‘learning point’ is. If language use is a skill rather then an accumulation of discrete items of knowledge, how do you measure it if not in terms of performance?

Reply
11 02 2013: John Hanes (10:23:19) :

Thanks Scott for taking the time to reply. I very much appreciate your blog and insights.

By learning point, I meant lesson aims. Don’t get me wrong, I’ve always hated tests and I empathize with you and our students. We can give them smaller quizzes (call it a review with recorded data) that they don’t need to study for. It would check what they just did in class and they’d only need to participate, instead of study, to do well. After a few attempts they’ll desensitize and even be motivated by it, their confidence will increase, as the test ought to be cumulative.

You can even have the learners help construct the review and decide which what they think is appropriate. That should change their attitude if they create it. In fact just scrap the whole quiz/ testing contruct and just thinking of it as systematic collection of useful data, like baseball statistics, and use the data as feedback for improvement without being overly evaluative. Feedback is crucial for learning anything well, as you well know, and this data could have far more dimensions that can be used to evaluate progress and achievement or just give constructive feedback. You could unlock so much about what makes our student learn with appropriate application of data collection. We see it in sports medicine, clinical psychology, and other areas. I’m not trying to seem anti-social constructivist. I come from a social psych background and strongly emphasis the social aspect of learning, particularly a language. But quality data in the right hands can help us see more clearly. Happy Teaching!

John Hanes

EnglishWithJohn.net

Reply
13 02 2013: Scott Thornbury (20:06:18) :

Thanks John.”Quality data in the right hands can help us see more clearly” – agreed. I guess it depends a bit on the ‘quality’ and whose hands are ‘right’!
As for your point, “By learning point, I meant lesson aims”, I have to say I’m a bit nervous about the idea of learning aims, too – especially when they are pre-selected in such a way as to pre-empt emergence – see A is for Aims in this series!
10 02 2013: philchappell (14:01:51) :

What a great introduction to your post, Scott. Uncannily familiar to my own experiences with those dreaded grades. Outcomes have always been antithetical to responsive language teaching in my mind. But I do think that assessment is an integral part of learning/teaching activity – especially when the students and the teacher have a shared understanding and commitment to what the desirable “outcomes” of the activity are. Maybe it’s the “O” word that is off putting?

Cheers

Phil

Reply
11 02 2013: Scott Thornbury (09:58:12) :

Thanks, Phil — yes, there is a responsibility on the part of teachers to provide feedback on progress, and maybe the problem is in defining progress in terms of pre-selected outcomes, rather than negotiating the outcomes during the progress, if you understand what I mean.

Reply
17 02 2013: stevebrown70 (11:44:34) :

Hi Scott,
I must say I’m baffled that so many people commenting here seem to be so anti-assessment. Where I work, assessment (formal and informal) is just an integral part of teaching. Teachers who worry about telling students their test results, or blaming tests for messing up their relationships with their students, need to just grow a pair and accept that sometimes their job is to tell students things they don’t want to hear.
OK, it’s easy to be provocative, but to a large extent I agree with myself. Yesterday, before I read this blog, I wrote a blog post of my own suggesting that English language teachers are so concerned with being nice to their students that it can have a negative impact on learning – http://stevebrown70.wordpress.com/2013/02/16/are-we-too-nice-for-their-own-good/
There are plenty of bad tests around, and this has probably contributed to the negativity surrounding them. But a course without outcomes has no direction. All stakeholders need to have some kind of idea of where the course is going and what the students should be expected to do by the end, especially if the course exists within a framework of levels. The language classroom is not a bubble.
I agree with Jeremy Harmer’s comment and would suggest that, rather than trying to avoid assessment altogether, we look for ways of creating outcomes that promote learning and achievement rather than stifle it. There are plenty of interesting ideas up here already, which is encouraging.

Reply
18 02 2013: Scott Thornbury (13:34:07) :

Thanks, Steve – and thanks of the link to your blog (about being over-nice). I like to think that – as a teacher and teacher trainer – I am both nice and rigorous (‘tough love’ perhaps is what it’s called). At the same time, I am forced sometimes to be very ‘un-nice’ (when it comes to grading) and yet I often suspect that my testing procedures lack both rigour and fairness. My fault, perhaps, and not the fault of assessment per se – yet I also suspect that I’m not alone and that assessment is more often done badly than well.
10 02 2013: Isabela Villas Boas (14:37:07) :

Hi, Scott
What an inspiring, thought-provoking post, as usual.
I’m particularly interested in the topic of assessment right now. True, I’d rather avoid assessment altogether, but I don’t have this option. We are in the process of redesigning some of the assessments in my ELT Institute, especially with adults. The dreaded word “tests” really scares them and sometimes demotivates them, too. I believe the most negative characteristic of big tests, usually administered at the end of a program, is that they measure what students learned long ago and accumulate content. Students should be assessed right after instruction, or ideally, instructional strategies, and assessment should overlap – you teach, and you assess how well students have attained what you taught, which informs your next steps. Our plan is to design short assignments given throughout the module that can then be added up for a final grade, a requirement in the institution (and a culturally-driven expectation). They can also redo assignments that they didn’t do well in, as the main goal is that they learn. However, I don’t see right now (and please help me if there is indeed another way) how we can do this without defining what our learning outcomes, or objectives, are.I mean, how do you decide what types of assignments to use and what to include in them without knowing what you want to achieve with them? Do you think that the problem may lie in the way learning outcomes are typically defined – too narrow, too detailed? I understand your point that teaching to the outcomes is like teaching to the test and that there’s much more to gain from the rich interaction that goes on in the classroom than the mere attainment of language outcomes. Perhaps, then, we need to think of broader outcomes that reflect all this and also leave space for outcomes that were not defined at the outset. Also,ideally, outcomes should always be open to revision – this is what I expected students to achieve, but this is what I now see they are able to realistically achieve. In sum, don’t you think, then, that the problem is not necessarily in using outcomes (or assessments), but rather, in the way we use them?

Reply
11 02 2013: Scott Thornbury (10:13:37) :

Thanks, Isabela, for this very thoughtful post. With regard to your question as to how you can provide ongoing assignment without pre-selecting your outcomes, I think you go a long way to answering it:

Do you think that the problem may lie in the way learning outcomes are typically defined – too narrow, too detailed? I understand your point that teaching to the outcomes is like teaching to the test and that there’s much more to gain from the rich interaction that goes on in the classroom than the mere attainment of language outcomes. Perhaps, then, we need to think of broader outcomes that reflect all this and also leave space for outcomes that were not defined at the outset. Also,ideally, outcomes should always be open to revision – this is what I expected students to achieve, but this is what I now see they are able to realistically achieve. In sum, don’t you think, then, that the problem is not necessarily in using outcomes (or assessments), but rather, in the way we use them?

Yes, I do. But I think that there is a even more fundamental issue, and that is that outcome descriptors seldom include some estimation of potential: that is,they focus on achievement in the past and not the potential for achievement in the future. Vygotsky’s sociocultural theory of learning suggests that what the child can do now with the help of a better other is a strong indicator of their potential, which suggests that maybe assessment should incorporate an interactive dynamic. Accordingly, the literature on sociocultural learning refers to ‘dynamic assessment’, in which, for example, ‘the examiner… functions as a mediator who reacts to the learners reciprocity and is more concerned with cognitive transformation than with performance efficiency’ (Lantolf & Thorne 2006:337). I’ll leave you to think about how this might work in practice!

Reply
10 02 2013: Hada Litim (14:45:39) :

Hi Scott,
Thanks for another generous bite of your insight. I too attended Deena’s plenary, but I left feeling as uncertain as I always do when it comes to assessing, and more specifically ‘formal testing’. Like you, I experience similar setbacks as those described in your post, with my students and it seems, I’m never prepared for the blow. Saying this, when teaching academic classes, I can just about justify subjecting students to assessments (formative and summative), for all the obvious reasons. What I still find difficult to rationalise is giving formal tests to ‘general English’ students, especially on short courses. And this becomes even worse when the tests are standardised, which to me reduces dramatically the usefulness of carrying out the needs analysis of each particular group, so needless to say your quote of Frank Furedi particularly resonated with me. I remember Deena expressly pointing on the misuse by some, of the term ‘validity’. I wonder why validity doesn’t refer to the needs of the students – wouldn’t that be more appropriate in a learner-centred classroom? Recently, and like K. Liz I’ve been considering carrying out an Action Research which hopefully will shed some light on it all. Although, it seems our journey is one of process rather than outcome, so a light may just reveal, another 😉

Reply
11 02 2013: Scott Thornbury (10:17:49) :

Thank you, Hada (I think we met at the conference!). That’s a very good point about defining validity in terms of the needs of specific learners. And maybe, with a view to involving learners more in the process (as Karenne attempted to do above) the learners could be encouraged to articulate their own desired outcomes. Easier said than done, of course, in an educational climate that expects – indeed, requires – that all learners march to the beat of the same drum.

Reply
11 02 2013: Hada Litim (14:50:00) :

Yes, we did meet, a few times! It was great having you and some of the other speakers in this region – hopefully nearer next time. Jeddah, perhaps?

Reply
10 02 2013: Stephanie (16:04:54) :

Once again you’ve raised some very thoughtful points. I am, however, intrigued by your choice of words in saying ‘Because it [testing] assigns a value to the learner, and, since the value is almost always short of perfection, it essentially de-values the learner’. The use of ‘learner’ is very “person-al”, in the proper sense of that word and given that, as Gareth Knight points out above, ‘People are more than their grades’, surely the better word here would be “learning”. I was wondering if your choice of “learner” was deliberate, and if so, why. Does this reflect your own perception of being tested?

Reply
11 02 2013: Scott Thornbury (10:23:15) :

Good question, Stephanie. Yes, the choice of learner rather than learning was deliberate, and follows from my reading of Johnston (Values in English Language Teaching, 2003) in which he says ‘the process of assigning and grading a test or other work by a student — that is, the process of evaluation, is precisely that: a process of placing a value on the head of each individual student. If this is not a moral act, nothing is’ (p.77). Inadvertently, perhaps this is often reflected in how we talk about learners: ‘She’s a grade A student’, ‘He’s a potential fail’, etc.

Reply
10 02 2013: Elka Todeva (17:55:53) :

DITTO, DITTO to the first part of your posting, Scott. To what you have said in one of the paragraphs there, “Testing is evil because it is stressful for all concerned, and because the conditions under which testing is conducted (separated desks, no mobile phones, etc) imply a basic lack of trust in the learners.”, I would add that the testing conditions you have described also do not reflect how we function in today’s world with the premium we rightly put on collaboration, on more than the usual triangulation of sources, etc. Very interesting posting by Karenne Sylvester that in a way addresses this point.
Others already commented on the important distinction between formative and summative assessment.
Thanks for another nice Sunday morning treat, Elka

Reply
11 02 2013: Scott Thornbury (10:26:13) :

Thank you, Elka, especially for reminding us that the learning model we encourage is one of collaboration and joint-construction – but this is not at all reflected in our assessment procedures.

Reply
10 02 2013: David (19:06:38) :

The only summative outcome I will ever pay attention to as a teacher is that which shows what a student can do in a real situation (be that speaking/listening or reading/writing). That is the only real benchmark of learning and score that will count. What I rue is how so much of ELT has truly strayed from the communicative emphasis when it comes to testing. .

I once had an end of course exam where the students had to record themselves ordering a pizza. Most of the students who were very successful would have scored very low on the typical speak 2 minutes to me about “x” and then take this 50 question multiple choice text. Even if I had designed an inclass test where I roleplayed this situation with the students – it would be far from the same in kind and result.

We have so much “reality” around us as language teachers, especially now given technology. Why do we prefer the dark, artificial laboratory of language to that of the authentic, real world? Perhaps it is because at base – teachers want to be in control.

Scott, to add to your point of Deena’s about tests reflecting the objectives – I’d say whose objectives? How are those objectives relevant to the real world? Have the students had any say in those objectives? How many teachers really do a proper needs assessment of students and align their tests/outcomes to the actual real purposes of the students? For example, a teacher might have an objective of ” the ability to use the past simple.” But no student I know cares about that – they want the ability to tell their English girlfriend what happened in class.

Alas, the failure of tests is a failure to realize we are all an experiment of one – not many. A failure to differentiate and allow learners to rate/test themselves. I wonder why no program I know of, allows a student to give themselves a mark?

Reply
11 02 2013: Scott Thornbury (10:29:56) :

Thanks David – your comment is worthy of a blog-post of its own! As for your last challenging comment (“I wonder why no program I know of, allows a student to give themselves a mark?”) , self-assessment was all the rage in the 90s I seem to remember, and was a spin-off from the learner autonomy movement. What happened to it?

Reply
14 02 2013: David Deubelbeiss (03:55:38) :

Yep, I remember it too and it was all the rage in the 70s (I’m told). The pendulum keeps swinging. I think it went the way society has headed – the business paradigm (and the ethos of competition) governing even our most well intentioned activities. But I think that learner autonomy is due for a comeback given the affordances technology offers – lets see. I’m keeping my fingers crossed.

Forgot to mention I too am astounded by how students clamor and demand marks, even decimal places. I gave a student a 96 in my course this year and she demanded to know where she lost the extra 4 marks. I simply told her I don’t believe in perfect unless its more than perfect. She then wrote the dean saying she did everything perfect so should be given “perfect”. I laughed so hard I almost died but there was a part of me really angry and its that part I struggle and we all struggle with when it comes to testing.

Reply
14 02 2013: David Deubelbeiss (04:02:02) :

Forgot to mention – with my students we watch a lot of Mr. D scenes (Canadian comedy show about a teacher) and discuss the implications for schools/education. Here is a classic one I use to discuss assessment – must watch. http://www.youtube.com/watch?v=0fn_vAhu_Lw
10 02 2013: Miguel (19:07:57) :

Hello everyone!
Exciting conversation! I also find assessment excruciating. My problem with it is that we sometimes draw wrong inferences (making the testing instrument invalid -see Cisek, 2012, p. 35: “Validity is the degree to which scores on an appropriately administered instrument support inferences about variation in the characteristic that the instrument was developed to measure”). If a student fails, say a B2 writing exam, we may infer that we are awful teachers, that the student is dumb, etc. All we can say, however, is that the student failed to produce a piece of writing that is characteristic of a B2 level in English. I think this latter conclusion is not very threatening or unjust but it usually gets mixed with the others (that might still be true but were not really assessed by the instrument).
Best,
Miguel

Reply
11 02 2013: Scott Thornbury (10:32:38) :

Very perceptive comment, Miguel. Worth repeating your point that the inferences we draw from tests may not be the correct ones. This relates to the issue of ‘fairness’ which Deena also addressed in her plenary, and it’s reassuring to know that fairness has joined reliability and validity as one of the key criteria for assessing the value of a test. If unfair conclusions are being drawn from a test, the test is indeed ‘evil’!

Reply
10 02 2013: Sarah Emsden-Bonfanti (20:28:21) :

Thought-provoking post Scott. As someone who spends about half of her working life with one foot in the (summative) language assessment camp and the other in language teaching, I believe there is much to be gained from further analysis of the issues you raise.

– The washback effect in the classroom:

As you state, testing is unavoidable because language tests are required by educational, professional and governmental institutions. Therefore, students invest/sacrifice time and money to learn how to pass ‘the test’. As a consequence, teachers have a responsibility to prepare them so that they have the best chance of getting the required grade so that they can move on with their lives. This leads to two possible washback effects: either the teacher tailors their lessons and teaches to the test, or from a more holistic, pedagogically sound position, they teach their students based on their linguistic long-term needs (i.e. skills that they are much more likely to require in their educational/working contexts). Having recently done both, neither is without its issues: Teach to the test and the student may pass, but then is likely to lack the skills they will later need. However, if teaching to the test is avoided, then the student is unlikely to do as well as they would have been able to with training. The result of this is unhappy students and more often than not, pressure from management as to why the students are not passing with the required grade. (Lets not forget that as teachers we are often, if not always, appraised on our own ability to meet pass-rate benchmarks.)

– Issues of validity and reliability:

This brings me to my second point. You seem to conflate issues of reliability and validity, and thereby take it as given that all tests are by their nature both reliable and valid when I fear the reality of often far from this. Although the need for R&V is widely recognised, they are often seen as opposing forces: ‘no test can ever be wholly valid or wholly reliable. Indeed, a completely reliable test would measure nothing; and a completely valid test would not measure (Davies 1990:50).’ But what if the test covered a more holistic representation of the construct in question (i.e. listening, reading etc.), one in which the learner were practising skills that they will need in their future, rather than tests that appear to have situational authenticity but lack the type of cognitive behaviour that candidates will need. Would this not then go some way to redressing the balance? This could mean that training the student for the test would then be the same as preparing them for their future contexts.

Of course, even this would require the student being assigned a subjective value, because as you rightly point out, trying to objectively define what language is is highly problematic and often far from successful. But let us not forget that failure is an important learning curve. It may not be what we want or were expecting to hear, but it is sometimes what we need to face.

I would love to live in a world where education were more Socratic and less outcomes-driven in nature, but given current reality I believe that more cognitively valid tests are attainable and take us some way towards putting right many of the wrongs associated with summative testing.

Reply
11 02 2013: Scott Thornbury (12:48:05) :

Thanks, Sarah, for your very helpful comments – and from someone who has engaged with these issues more deeply than I have, perhaps!

But am I really conflating issues of reliability and validity? I thought I had avoided issues of reliability (if by reliability we mean the extent to which the test yields the same results with matched test-takers, test-scorers and other variables, or the same results with the same test-takers on two separate but matched occasions). This is a whole other can of worms!

I agree, though, that if test-tasks closely match learning-tasks, which in turn closely match predicted real-world language use, then the degree of congruence is increased hugely.

Reply
10 02 2013: Natalia Guerreiro (20:35:55) :

I certainly relate to the uneasiness with bad tests, bad uses of tests and also the word “outcome”, since it seems to imply a factory-like view of education. But if you have a good valid test and you don’t misuse it (e.g.: by using it to label your students as a B student, a C student, etc.), how’s it any different really from any kind of feedback?

When teacher training books talk about “accountability”, doesn’t it presuppose daily assessment of your students’ performance? When Dogme and TBLT tell us to work on what the learner has produced, doesn’t it imply assessing their performance each and every time they do something with the language? And in those 3 cases, wouldn’t my next step as a teacher be to intervene so as to reaffirm the objective I had in mind (either as a pre-conceived lesson aim or as some idea of a target performance I see as their next or final step)?

As a teacher who gives feedback and plans tasks and lessons that will hopefully help sts move being on their present stage, all my actions will depend on my beliefs and values, i.e. my construct of what it is to know a language, what it is to perform ‘better’ than they perform now, etc. So both my daily assessment of their perfomance and the institutionalized “assessment day” depend on a construct, on how we understand what we want sts to achieve. And while the latter is a notorious punch bag, the former may be even more pernicious if we don’t think long and hard about it, since it controls the daily image we build of our students and even their in-class learning opportunities.

Reply
11 02 2013: Scott Thornbury (12:56:24) :

“When Dogme and TBLT tell us to work on what the learner has produced, doesn’t it imply assessing their performance each and every time they do something with the language?”

Good question, Natalia. Clearly, learning, language use, and the assessment of learning and language use are all inextricably bound. I have no problem with that. But the assessment of task-performance in a TBL or Dogme-type lesson is (or should be) a far remove from the kind of assessment associated with more formal and standardized testing. For a start, the task is likely to be collaborative and assessment based on more holistic criteria of intelligibility and task-achievement, perhaps. Also, no grades will be given, or comparisons made with other learners. And feedback will be supportive with – ideally – a chance to attempt the task again and improve on it. Finally the learners’ own criteria of success might also be taken into account so that, all in all, the feedback cycle optimises learning opportunities rather than evaluating individuals.

Reply
10 02 2013: Rob (21:14:59) :

Brilliant Scott! Very pertinent and thoughtful comments as well. Interesting to read how the words ‘testing’ and ‘assessment’ are used throughout the thread.

Hodge (2007), in an article on competency-based learning (CBL), cites an observation by Harris et al. (1995) that the Cold War caused the USA to ‘undertake some deep soul searching with respect to its education and training system’. Alderson and Martin (2007) note that “Outcomes and outcomes based education (OBE) have almost become terms of abuse in some Western Australian education circles…” The authors go on to examine the origins and role of outcomes-based education (OBE), particularly in Australia.

Reading the articles above can provide one with the sense that OBE is indeed a move to turn schools into training grounds for skilled laborers, in order to bolster economic competitiveness. The ensuing bandwagon seems to be one that many technocrats have gleefully jumped on to, or even hijacked, when we examine the push to place flickering lights in front of teachers and learners alike.

The only thing more disturbing than the ubiquity of standardized tests and summative assessment is our deference to them. Like some who have commented, I’m thankful to have the freedom to use any form of assessment I deem appropriate, which usually means formative assessment by qualitative means (eg, one-to-one conversations with students throughout a term/semester). However, like others who’ve shared their thoughts here, I know how many students value testing and grades as feedback even though they may have been trained to do so.

At least a couple of your readers have called for solutions. As Rose (Bard) points out, this is a cultural phenomenon that won’t easily budge.

Given the economic interests at stake, and the neoliberal trends in a global economy, it’s going to take a major shift that might only occur – as is so often the case – once a catastrophic collapse of the systems in place happens. We see signs of it, but the slow and painful process might be further drawn out by powerful interests.

Then again, here we are recognizing the problem, exchanging ideas about it. Imagine where we can go from here:

-Let’s look for models of formative assessment that work well and share them, try them out, and improve upon them.

-Let’s attend local meetings and join in on public discussions about education, even if it’s a handful of teachers in the break room.

-Let’s experiment, as Karenne has done, with assessment that breaks down the traditional hierarchy in education, and share our conclusions.

-Contact local government officials and other stakeholders about this issue; let them know there are alternatives.

Rome didn’t fall in a day. 🙂

Rob

Alderson, A. & Martin, M. (2007). Outcomes based education: Where has it come from and where is it going? Issues In Educational Research, 17(2), 161- 182.

http://www.iier.org.au/iier17/alderson.html

Harris, R., Guthrie, H., Hobart, B. & Lundberg, D. (1995) Competency- based education and training: between a rock and a whirlpool, South Melbourne: Macmillan Publishers Australia Pty. Ltd.

Hodge, S. (2007). The origins of competency-based Learning. Australian
Journal of Adult Learning, 47(2), 179-209.

Click to access EJ797578.pdf

Reply
11 02 2013: Scott Thornbury (16:07:24) :

Thank you, Rob, for suggesting practical actions teachers might take to chip away at the edifice of OBE! I knew someone would come along with an answer to Hatice’s question way at the beginning of this thread!

Reply
11 02 2013: Kevin Giddens (01:00:38) :

Love this idea, “The means become subordinated to the ends.” Outcomes in education (ēducātiō = A breeding, a bringing up, a rearing.) reduces school to a process of “bringing up” good citizens and employees – cogs in a wheel. How about learning as a chaotic, anarchistic process that leaves room for playful discourse. I say we leave the “breeding, bringing up and rearing” to families and keep schools (schola = leisure time given to learning) as leisurely spaces for exploring ideas. A former professor of mine, Sam Scoville’s blog inspired this post/response – http://samscoville.blogspot.com/2013/01/rationalizing-slack-to-rigor-ratios.html

Reply
11 02 2013: doggy (10:31:54) :

A few years ago, while sitting in a bar, preparing a lesson, I came to the realisation that I wasn’t a teacher but a ‘trainer’, enabling ss to use their existing L1 vocab transformed into English to express themselves. Teachers feed in new knowledge. Trainers tease out existing knowledge into a new language.

Rather like climbing the tallest mountain in the world, this allowed me to see other things which before had been invisible: my relationship with my 3-year-old, my community and also the state.

If my lessons began taking on a new format with the student as the driver, then my role was not to be authoritarian or didactic, but rather collaboarative and willing to negotiate individual student needs and desires, putting myself in a position of being equal.

I began negotiating with my son and not telling him what to do, advising, explaining and patience led to wonderful results.

The state. Politics is always a thorny issue but until we realise that coercing students into learning is counterproductive and counter motivational, we will be met with hoards of young faces with their arms crossed, giving us those looks of, “…go on, then. Teach me.”

Do exams or tests ever evaluate collaboration? It seems like we test without taking into consideration what we do in the real world…

The responsibility for learning lies firmly within the braincase of the learner.

Reply
13 02 2013: Scott Thornbury (20:07:22) :

Thanks, Kevin. And a nice blog post of your own on testing!

Reply
11 02 2013: Tyson Seburn (@seburnt) (03:59:52) :

I’ve always hated the preamble to units both in my teaching and of my learning that begin with “You should be able to X, Y, & Z.” Although succinct, those outcomes are always met to varying degrees, dependent on the style of teaching, the relevance of the content and the journey of the learning (and all their interactions between teacher and learner in between). Reminds me of a previous post of yours–one of my favourites ever–and something so impressed upon me by colleagues.

Reply
11 02 2013: Scott Thornbury (16:02:33) :

Thanks you Tyson. I also touched on a similar theme (to Outcomes) in A is for Aims, which was more about lesson planning than testing per se.

Reply
11 02 2013: Josh Kurzweil (08:21:23) :

Hi Scott,
You’ve gotten a great discussion going here and have brought up some really important points about the effects of testing on teaching and learning. I think that much of the anguish around testing relates to your idea that “it assigns a value to the learner.” As several posters have mentioned, it is often the grades and their insidious effects on a student’s identity that can be so damaging to them and the learning/teaching process. In one of the above comments, Scott C. talks about the joy of doing a CELTA that was just input sessions without grades, and I completely understand that feeling as both a trainer and trainee. When grades are present, the focus becomes not on learning but on whether one is an “A” student or a “C” student.

In his book ‘How We Think,’ John Dewey explains that often in education the student’s “…chief concern is to accommodate himself to what the teacher expects of them, rather than to devote himself energetically to the problems of the subject matter.” In teacher education, this pattern often manifests itself with trainees struggling to get the ‘grade,’ and focusing on ‘correct’ lesson shapes, techniques, and classroom set up without learning how to reflect on underlying principles and student learning. This can result in a kind of mechanical teaching with trainees trying to show techniques in their lessons, rather than paying attention to what their students actually need in their learning.

In my own English language teaching, I noticed a huge shift in my students’ attitude toward writing when I stopped giving grades on their essays. Instead I break down the different parts of their writing by paragraph and give them feedback on things like topic sentences and support. They need to ‘pass’ each area and have opportunities to rewrite. What struck me about this shift was that when students talked to me about their essays, rather than squabbling over grades, they asked specific questions about how to improve their writing. The funny thing was that I never had students ask for a grade. Granted, I work in an intensive English program at a language school and not in college or public school. However, it made me think about how it is sometimes possible to just stop giving grades and focus students on, as Dewey says, ‘the problems of the subject matter.’

I’d also like to add to Sarah Emsden-Bonfanti’s comment above about the power of the washback effect. Sarah explains two possible and contrasting reactions to testing: a teacher ‘tailors their lessons and teaches to the test’ or they ‘teach their students based on their linguistic long-term needs.’ While I agree that this type of dichotomy is often the case, I would also offer an example of how I’ve seen a test have a positive washback effect on teachers and students. Since the IELTS has gained popularity in the U.S., I have seen in my own program and others a greater focus on developing students fluency in speaking. Because the test has a section on speaking that includes sustained bursts of speaking from students, some students and teachers have started to place a new emphasis on fluency. Moreover, because IELTS posts sample videos of speaking tests along with rubrics, the washback effect has included, at our school, a discussion of what it is to work on fluency and help students develop their speaking.

Students and teachers do care about tests so, as Sarah says, students ‘can move on with their lives,’ But, perhaps tests can also have the function of increasing awareness and fostering positive changes among both students and teachers. Ultimately, I suppose it’s going to depend on the test and how the results are delivered.

Reply
11 02 2013: Di (10:55:52) :

Hi Josh,
I’m glad someone here is saying that a test can have a positive effect on potential test-takers, as in your example from the IELTS Speaking test.

I assume Scott’s original post was aiming to provoke, otherwise he would not have said: “Testing is evil. Why? Because it assigns a value to the learner, and, since the value is almost always short of perfection, it essentially de-values the learner.”

I disagree with Scott here. An IELTS test assigns value not to the learner as a person, but to the learner’s linguistic competence. What is evil about that?
Scott also says that it is invidious to measure students against each other. I would maintain that an external test such as IELTS does not do this, either. It measures each students individually against an objective benchmark.

Thirdly the issue of “lack of trust” referred to by Scott when his students have to take a test under exam conditions (sitting alone etc.) is not relevant. The exam conditions are necessary in order to ensure validity, reliability and fairness. Surely most students and educators understand this and don’t think it has anything to do with whether the teacher/invigilator would trust the person on an individual basis.

Reply
13 02 2013: Scott Thornbury (20:12:02) :

Thanks, Di, for your comment. Indeed, i was being a tad provocative, I admit – but partly as a response to my negative experiences of grading – not testing, so much as simply assigning grades. Maybe the grading would have been less painful if it had been supported by reliable and valid tests, but – as I pointed out above – testing language proficiency is elusive, and testing teaching skills seems to be just about impossible.

Reply
13 02 2013: hadalitim (20:18:33) :

Hi Scott,
Now I’m just about to start my final module in DELTA (mod2) and your last line only makes me feel even more nervous
14 02 2013: Scott Thornbury (11:16:03) :

Oops! Don’t be! Nevertheless, if anyone has invented a fair and foolproof rubric for evaluating teaching practice, let me have it!
13 02 2013: Scott Thornbury (20:16:19) :

Thanks, Josh, for your considered comment. Your point about noticing “a huge shift in my students’ attitude toward writing when I stopped giving grades on their essays” chimes with an experiment on my Masters course at the moment whereby the students are allowed two ‘strikes’ at their written assignment, with copious feedback between strikes one and two. The turnaround time is not great (only a week) and they still get grades on their second attempt (a requirement of the system) – but at least they get constructive feedback and a reasonable chance to demonstrate improvement – a move in the direction of greater fairness, I hope.

Reply
11 02 2013: Luan (12:05:59) :

Hi Scott, i believe the sentiment is somewhat misplaced. Ok, testing is not perfect but you have to fight fuzziness and mediocrity with quality and empiricism, because the world of business, efficiency and the bottom line is not a metaphor, but a reality. After all, why do people learn and study English in the first place? It’s not primarily for academic ends, but economic ones.

Reply
14 02 2013: Scott Thornbury (11:20:34) :

Even if the ends are economic, do the means have to replicate a business mentality, e.g. one of accountablity, efficiency, quantification, profit and loss etc? After all, it seems to me that the neoliberal agenda has got us into the (economic) mess we’re presently in!

Reply
15 02 2013: Luan Hanratty (@TEFL101) (15:03:51) :

I guess I’m torn between the two arguments because I definitely see the drawbacks of testing, but I definitely see the advantages of a business-like approach. On the whole, I feel neo-liberalism is the least nefarious of political constructs to be working and living within, especially in terms of learner self-actualisation. I don’t see why these ideas should have negative connotations, except for ‘testing’ – which is often counterproductive when it’s not done right; i.e. most of the time in language teaching. But you must have the element there, because aiming for accountability goes a long way to equalling the reality of the learner’s situation. And so I believe that to get the most out of testing in language learning, to make it more palatable, less futile, and to do the most to encourage learner progress in this respect, testing should be four things: regular, informal, personal and communicative.

Reply
11 02 2013: Roger Hunt (14:11:35) :

If achievement were the result of teaching, every student in the class would get the same result. I have to assess teachers in training on CELTA and DELTA courses; they all get the same ‘input’ but some end up as better teachers than others – I think it’s what they already have, what they bring to the course that makes the difference. Consequently I could probably give them a grade when they arrive on the first day of the course and cut out all the stress and strain they experience wondering whether or not they will pass, get a ‘Distinction’ or a ‘Pass B’. I’ve frequently heard colleagues say such things as: ‘She gave a pretty poor lesson today, but she’s still a ‘B’ (ie: very good) teacher’.
Roger

Reply
14 02 2013: Scott Thornbury (11:22:24) :

Thanks, Roger. Yes, the reduction of individuals to a mere grade would seem to be one of the more de-humanizing effects of standardized testing.

Reply
11 02 2013: eurominuteman (14:17:28) :

ERIC Database provides 25,121 hits for Accountability http://ow.ly/hAjOR
ERIC Database provides 369 hits for Outcome-based Education http://ow.ly/hndKH

Reply
11 02 2013: eurominuteman (14:32:28) :

In the words of Transparency International…
Lack of Transparency + Integrity + Accountability = Education Corruption

How about defeating Education Corruption http://transparency.org/topic/detail/education

Check the forthcoming Global Corruption Report 2013 on Education from Transparency International http://www.transparency.org/files/content/activity/Brochure_GlobalCorruptionReportEducation_EN.pdf

Reply
11 02 2013: Kathy (18:16:00) :

Reading a page or two of those accountability abstracts is toxic … toxic.

My teaching situation is funded by public money. We are mandated to use an approved pre- and post test. While we are trained that test results are to be used in a formative way (just one bit of information among many to help us identify learning needs), the reality is that these tests are being used to summarize the performance *of the teachers*. Recipients of public funds (teachers who are paid with the money) must be held accountable for effectiveness, which is defined (partly) by gains. If the assessment were truly formative, there would be no such thing as a post test. A learner might be assessed several times, but each assessment would stand alone (what is the learner needing today?). Even if a teacher compared two tests, it would not be a matter of comparing scores. The “gains calculation” only cares if a learner has made a transition to another level, which is of no practical use in the classroom.

Love this topic, my thanks to Scott and all contributors.
Kathy

Reply
11 02 2013: eurominuteman (14:23:43) :

Causa Finalis has been around since the days of Aristotle
http://en.wikipedia.org/wiki/Four_causes

Reply
11 02 2013: eurominuteman (14:42:47) :

Causa finalis indeed ranks higher than Causa Efficiens – for a long time already and in many areas… pedaling in circles ranks lower than crossing a finish line by using a roadmap and holding your handlebars

Reply
11 02 2013: eurominuteman (14:30:28) :

Youth Jobs + Return-on-Investment > The International Labour Organization ILO makes a bleaker statement for Youth Unemployment > 40 per cent of the jobless worldwide are young people. There will be nearly 75 million unemployed youth aged 15 to 24 in 2012, an increase of nearly 4 million since 2007. The youth unemployment crisis can be beaten but only if job creation for young people becomes a key priority in policy-making http://www.ilo.org/global/lang–en/index.htm#a1 How much worse do things have to get?

Reply
11 02 2013: eurominuteman (14:51:23) :

Shooting with popguns on Jobless Youth is not wanted any more…

Especially since these Black Box Fallacies have become scientifically hard data evident…
Google Trends http://ow.ly/gXQnx > Two datasets over 8 years
EU Survey http://ow.ly/gXQny

Reply
11 02 2013: eurominuteman (14:59:14) :

Better link
http://www.ilo.org/global/topics/youth-employment/lang–en/index.htm

Reply
11 02 2013: eurominuteman (15:05:51) :

Teacher Preparation is near Impossible as Currently Conceived by Celta and Delta
http://ow.ly/hAfza

We don’t need certificate mills without outcome accountability for the reduction of the youth jobless rate, the severity of this youth jobless situation outweighs any other pedagogic considerations.

Reply
11 02 2013: eurominuteman (15:13:04) :

Teacher certification and staff development are seriously flawed. There is no Free Market place in proven ideas, in some ways Teacher Education is unregulated, a Mock Market controlled by vested interests and locked in place by industry blindness

It is more of a mishmash of competing whims and crystallized but untested practices with no continuity, critical review or coherenceacross the profession.

Reply
11 02 2013: eurominuteman (16:06:47) :

Does Celta and Delta possess this ICT Capability Maturity Level?

FITS – Framework for ICT Technical Support in Education – What people are saying

IT Infrastructure Library ITIL style of service support and service delivery for education, best-described in FITS from BECTA and The FITS Foundation
http://www.thefitsfoundation.org/
http://www.e-ictsupport.org/fits/Sec/fits-introduction/index.html

Reply
11 02 2013: eurominuteman (17:43:57) :

How to get back on Track > ERIC Database provides 6,854 hits for Youth Employment
http://ow.ly/hBStT

Reply
11 02 2013: Bruno Leys (@BrunoLeys) (18:29:33) :

A very interesting post and a series of useful comments as well. It set me to blogging on the dual role of teachers as educator and evaluator.
For those interested:
http://blog.associatie.kuleuven.be/brunoleys/evaluating-learning/
Feel free to comment!

Reply
11 02 2013: jeremyharmer (19:51:14) :

Well, Scott, this is quite a difficult one. You may remember that in the talk you came to in New York, testing was one of the topics that I raised in that ‘6 key questions’ session. I hope you don’t mind my repeating a précis-ed version of what I said then:

Testing, summative and formative, permeates every single aspect of language teaching, whether it’s placement testing, progress testing, end-of-semester, public proficiency tests etc. It’s there all the time, every day. And if that’s so we have 3 choices, it seems to me:

1 Change the world so that it goes away
2 Do the ostrich and ignore it it
3 engage with testing. Make it better, more responsive, work out ways to help people get more benefit from it etc etc.

I know all the arguments AGAINST testing, but here is the other side: for example, the grade system in music in the UK (grades 1-8) keeps kids practising and learning and MOTIVATED so to do. Language test make classes of thoroughly engaged and purposeful students. A good proficiency test gives a proper reading of someone’s language skills (something quite important for e.g. pilots, air traffic controllers). A proper system of assessment gives us an idea of how well we are helping students to learn and whether or not we are going about it the right way.

And here’s something else: much of the research that we read and you quote is results-based, measured. Just like testing.

Am I a fan of outcomes-based testing. Er, well, queasily I can see its uses even while I acknowledge its dangers. But option 2 above just isn’t an answer I think, even though it is one, to my shame, that I have sometimes chosen.

Jeremy

Reply
14 02 2013: Scott Thornbury (11:39:07) :

Thanks, Jeremy, and apologies for this belated response. I totally agree that the most pragmatic response is to engage with testing, and try to make it work FOR rather than AGAINST education – and there have been some insightful ways already mentioned in the comments above whereby this might be done.

Returning to Bill Johnston again, this is how he concludes his chapter on the ethics of assessment:

Is there any way out of it? Well, at one level there is not;… the paradox of testing simply represents a constant factor in our work. It is better seen as a dynamic rather than a problem; that is, it is simply a permanent characteristic of what we do, rather than some obstacle that will eventually be overcome.

On another level, however, I believe there are ways forward. First of all, I suggest it is incumbent on each of us is teachers to continually reflect on our own values and continue to question whether these values are accurately reflected in our assessment procedures … The learning process is a highly individual one, and the teacher-student relation is similarly unrepeatable. It assessment is to be an integral part of teaching — which, I have argued, it needs to be — then it must be included in that relation… This does not exclude the use of externally written and scored standardised tests, but I believe that, in essence, assessment in the classroom must be brought within the bounds of the unique relation between teacher and student and that in order for this to happen, we need the flexibility that comes from a deep knowledge of our students and their circumstances.

Reply
12 02 2013: Divya (00:57:38) :

Your Marmite comment made me chuckle Scott, my love for it has incurred some social stigma here in France 🙂

I abhor testing, always have but I have had a recent positive experience with it which left me feeling like it’s something I can actually get my head around.

Very briefly, my students wrote their own assessment rubrics and peer-assessed for an oral exam, my input only counted for a quarter of the final grade and everyone seemed pretty happy with the experience,

and…one person failed!

I had assumed that the grades would end up being fairly homogeneous and that my students would politely indulge my ‘creative’ practices but they did in fact make a heartfelt attempt at assessing their peers and decided that an underprepared peer just didn’t make the cut.

I was also interested by the fact that their rubrics had very little by way of correct use of grammar and pronunciation and they showed an interest in the more profound level of communication that was taking place (one criterion was for instance the ability to describe the “voice” of the text, which I wouldn’t have come up with on my own).

(I’ve blogged the details: http://tinyurl.com/aoo5pob)

I’m going to start doing this with my other classes and keep a journal of how transferable this is, maybe I just have a particularly easy vibe with this group of people…. the same class is designing their own English language textbook, which will hopefully teach me how to use one better 🙂

Reply
12 02 2013: Rob (05:43:17) :

A local radio program aired a discussion about testing (assessment) during my commute today. Here’s the intro:

“Members of the Portland Public Schools (PPS) Student Union and the Portland Student Union are urging their fellow students to opt out of the annual Oregon Assessment of Knowledge and Skills (OAKS) test. The students have tapped in to a heated debate going on in many cities, including Seattle, about the merits of standardized tests. The PPS students say the OAKS test doesn’t accurately reflect what they’ve learned.”

If you’d like to listen: http://www.opb.org/thinkoutloud/shows/students-protest-standardized-test/

Rob

Reply
12 02 2013: Kurt Kohn (14:34:47) :

An excellent post, Scott. And a great discussion!

Are tests “evil”? Well, although I share your negative feelings, I hesitate to say that tests are evil – not intrinsically, and not necessarily. It is rather the way in which tests are being implemented into our educational culture that tends to put them in conflict with successful learning and teaching.

The way we think and talk about language influences the way we think and talk about language learning, teaching and testing. To understand the feelings of uneasiness shared by so many learners and teachers, tests and testing need to be seen in this wider context.

The LEARNING perspective: What are our benchmark requirements for successful language learning and teaching? Do we agree on a social constructivist understanding according to which learning a language (for communication purposes) involves creating it in one’s mind, heart and behavior based on who we are, where we come from, and where we want to be – and always in communicative and social interaction with others? In this case, key requirements for successful language learning and teachingare learner autonomy (agency), authenticity/authentication, and collaboration. Is language testing in sync with these requirements? The more it is, “a consummation devoutly to be wished”:), the better testing can be pedagogically integrated – to the learner’s perceived and real advantage.

The TEACHING perspective: I am sure we all agree that learning comes before teaching. The purpose of good teaching is to facilitate good learning and, what is more, learning is possible without teaching and indeed significantly surpasses what can be taught. With the post-modern emphasis in education on test-driven efficiency, transparency, comparability, and excellence rankings teaching is more and more shaped by what can (or “needs” to) be tested. The more teaching is degraded to service testing, the less it is able to facilitate good learning. The obvious conclusion is NOT to bar testing from our classrooms. Why not turn the tables and make testing service learning for a change? Testing for learning – wonderful! But that would require a major readjustment on the level of educational policies as well.

The EDUCATION perspective: Yes, many “actors” in the education sector have an interest in test outcomes. This includes the learners themselves, their teachers, educational evaluators, the job market – i.e. “stakeholders” of ALL kinds. However, all these stakeholders are interested in testing and test outcomes for a different reason! Which is why things go wrong: very different interests are projected onto testing – with streamlining effects that are in harmony with administrative “quality” assurance, but alas, not with successful language learning and teaching.

Reply
14 02 2013: Scott Thornbury (19:25:03) :

Thanks, Kurt – delighted to have you on board. And thanks for that very lucid breakdown of how testing is situated in the educational context, and the way it serves different – although not necessarily opposed – ends. I think what has emerged in this discussion is a strong sense that testing should not be used as a stick to beat the students with, but as a means of supporting learning, as you say. I think that this is what Dr Deena Boraie was getting at in her plenary, i.e. the notion of developing ‘assessment literacy’ in teachers and other stakeholders, so that there is a greater degree of congruence in the way testing is integrated into the curriculum – although I would also like to see a greater commitment to ‘assessment ethics’ perhaps.

Reply
12 02 2013: Joan (21:51:56) :

As an ESL instructor, we assess students on a yearly (or biannual) basis. However, at the school I am in, we do weekly quizzing (every Friday) followed by a “Fun Friday” where kids earn fun time to stay motivated throughout all of these quizzes. Then we have term assessments (6 this year) for two to three half days (math and reading). On top of this, students have the required MCA and NWEA assessments which open at the same time as the ESL testing. So, our school is just gearing up to do the tests that begin next week after the holiday and will contiue for 3 solid weeks to ensure all students K-8 are assessed to see progress.
While I like this process of weekly tests, I think the length of the tests could be adjusted. The weekly quizzes are often two to three pages; term comps are about 6 pages each Math and Writing. Our CEO once said recently that, “if we do not look at the data for the purpose of understanding student progress, then there is no point to doing the assessment.”
I completely agree with all of the rationale that testing should guide teaching — how to understand data and how to help students succeed academically. As for ESL testing, we use this data constantly to monitor areas of strengths and weaknesses to select which students are in which group.
Our school is based on the extended school day and extended school year – from August 13, 2012 to July 1, 2013; 7:40 a.m. to 4:00 p.m. daily.

Reply
13 02 2013: Simon Williams (07:49:05) :

I’m so glad this discussion thread came along, because it is what I am kind of looking at for my MA TESOL dissertation. I always thought tests were unfair and tended to shy away from them. However, after getting into project-based learning using co-operative and collaborative learning on a Wiki, I quickly found that a simple summative assessment was unfair. When a group of 4 students get together, how do we make sure each person is assessed accurately and fairly.

Anyway, over the last 8 months I have been researching and using various methods to assess group work and looking into methods such as peer-, self & group assessment mixed with a formative and summative approach has given me a knew found respect for assessment and how useful it can be.

I think one of the best ways to reduce subjectivity and fear of assessment is to get out students involved. When we decide on a project, the class and I decide on what should be assessed together. By doing this I think they have a better idea of what is expected from them. I also think that rubrics and scoring systems should be written in a way that the student can understand.

I think assessment is scary, because both teacher and student don’t know enough about it. We are always scared of what we don’t know, right? Anyway, instead of pushing assessment into one corner, let’s bring it to the centre of the class.

I think just by doing these things we can make assessment less daunting.

– Create assessments together as a class and teacher
– Write rubrics in simple English for everyone to understand
– Get students to practice using the rubrics with self-assessments
– Let the students have a say in their final grade with a peer or self assessment.

I think is students know what is expected of them, there is less confusion and misunderstandings when it comes to assessment time, whether it is a simple quiz of exam.

Thanks!

Reply
14 02 2013: Joan (02:32:30) :

I agree with your post about having students be involved in the grading! Also, I have come across a text, Alternative Assessments for ELLs, that has many similar suggestions. I also think that assessments should connect with the standards we are teaching as well – many teachers have great activities and projects but don’t connect these learning activities to the standards. I hate to be so standards-driven, but thus is the way of education at the moment. Thanks for your insights – good luck with your dissertation!

Reply
14 02 2013: kevchanwow (05:11:13) :

Hi Scott,

Thanks for raising the issue of testing in the language classroom and to all the participants in this comment thread. I’m going to throw in my two cents as a father first and a language teacher a distant second.

My daughter belongs to both a swimming and a gymnastic school. In both schools, they have monthly achievement tests. The criterion for passing the tests is clearly stated to the students and there is no way to pass the test without meeting the explicit criterion. For gymnastics, the students must jump rope 50 times in order to pass the test. 48 times results in a collective sigh of disappointment and some tears. 50 leads to a collective cheer and many high-fives. Similarly, for swimming, the glide kick test of 25 meters with perfect form (head half way in the water, arms straight) is met or not met.

So what does this have to do with testing? I realize that not all testing can be so concrete as a jump rope test. But I do think that having a criterion-referenced assessment with little wiggle room aimed at mastery can provide students with a moment of positive feedback which shouldn’t be undervalued. I also think that failing seen in a purely negative light is problematic as well. My daughter has said to me after failing a test that it simply means she needs to work harder the next time to improve.

When I evaluate students for their ability to say write a concise summary of a listening or reading passage, formative assessments with ample feedback help them see the steps they need to take to develop the skills necessary to complete the task. A summary assessment (or series of assessment if time allows) provides students with a crystallized moment where they can put an exclamation mark on their learning.

Do I think testing is perfect? Not in the least. But I also feel that a well-prepared test can provide students with the opportunity to clearly recognize their own mastery of clearly stated objectives. Perhaps the problem lies not in the test, or even in giving a score or grades, but in the fact that classes in the “real world” come stamped with an end date of the last day of class, making a final grade for some students much more final than it should truly be to encourage continued effort and growth.

Kevin

Reply
15 02 2013: Andrew Walkley (03:31:22) :

The O is for outcomes, not testing per se and I would like to defend more thought to outcomes, especially in the context of EFL / ESOL. I understand Furedi’s frustration with outcomes in a university context, where we might wish for a more open ended search for truth. Anyone who after having an intersting discussion with a student only to be followed by a question whether it will be in the assessment, can appreciate that. However, it seems to me that in English language classes of all kinds we have been fixated in both teaching and materials on narrow aims – teach these words or the meaning / form of this structure – and undervalued thinking about the outcome of what students might actually need and want to say and more broadly how language is used. Contexts and speaking tasks in coursebooks still tend to be there to serve a grammar syllabus rather than the other way round and noun and adjective dominated lexical sets free of the verbs and phrases that can put them to use. As a result, at low levels in particular we must describe what people look like and what they are wearing rather than ask what they think, what they feel and why and whole swathes of language is left untaught and unloved. At the same time, I think the dislike of testing is often transferred to a dislike of correction or, as I prefer to call it, helping the student say things in better ways beyond what they actually communicated. Teacher training often compounds the problem because of its own focus on teacher meeting aims and a philosophy of restricted teacher talking time. Are these outcomes driven problems? Maybe, but it seems to me more a problem of what Furedi says about external bodies imposing certain language and outcomes which are inappropriate. Changing our outcomes to “at the end of this course trainees will be able to ….. engage with students as people and langauge learners / …. explain and expand on language in both materials and correcting what students say and write / …teach something useful” may require a rather different kind of training and assessment than much that is on offer.

Reply
15 02 2013: Scott Thornbury (11:20:09) :

Thanks for your comment, Andrew (and I appreciate that you would be defensive of Outcomes 😉 ). Your point is well made, i.e. that (communicative) goalposts help direct teachers and students away from a narrow obsession with grammar mcnuggets etc (I also explored this theme in C is for Core Inventory). And I agree that it is externally-imposed outcomes – in whatever form they take – that sit uncomfortably with the more emergent ecology of the (ideal?) language classroom.

Combining holistic outcomes with a more negotiated curriculum doesn’t seem impossible in some contexts at least. ‘Let’s agree on a set of desirable outcomes (perhaps articulated along the lines of CEF descriptors), let’s see how well we can achieve them now, let’s plan tasks and activities that will build on that achievement, and let’s test progress periodically by (self-)assessing these tasks during and at the end of the course’. Does that sound utopian?

Reply
15 02 2013: Andrew Walkley (14:21:13) :

Hey, you know – a parent always defends their babies! I don’t think it is too utopian, no. But just to be contrary and coming back to assessment, I’m teaching an FCE class at the moment, and find myself teaching things I don’t typically do – and things you certainly don’t see in many general English coursebooks, such as building noun phrases or examples of the pattern no + noun (prep) + verb: there’s no need to do anything / They had no intention of causing any trouble etc. because they come up in the exam. Sometimes and externally imposed outcome – whether from coursebook or exam or governing body (or Natural Grammar books!)- can push us into new and interesting areas. …sometimes!

Reply
5 03 2013: Candy Duarte (06:17:14) :

Hello Professor Thurnbury,

Very interesting and controversial article 😀

I´m in my senior year of English teaching and I have taught for a couple of years, just a novice teacher.

As a student and as a teacher I do agree with u that the problem of testing is when the means (the educational experience) become subordinated to the ends (The test), and all the importance is given to it.

I have noticed that some students do not pay attention in class nor participate, but when the test is coming they get worry and try to study to pass it; so they are just concerned about not failing the test rather than learning; that makes me think that tests as you said are inevitable, the washback effect they have is really significant; therefore I see testing in a good way,i don´t see it completely like an evil,I see testing like an evil and an angel at the same time.

An evil in the sense that it freaks out students and gives a value to them, some students get disappointed and just quit, moreover, I feel really identified with you since I have felt the struggle of tests when I give bad scores to my students, I do feel terrible giving failed tests to them; but also as an angel since it motivates students to practice and make an effort to pass, I would say it pushes lazy learners who are not aware of their learning process.

In addition, the assessment of students should be holistic ,not just taking the scores, but also observing and taking into account their effort and performance in the whole educational process.

Reply
9 03 2013: duffyjordan (04:39:18) :

You say, Scott: Testing is evil. Why? Because it assigns a value to the learner, and, since the value is almost always short of perfection, it essentially de-values the learner. Worse, testing typically involves measuring students one against the other, thereby destroying at a blow the dynamic of equality that the teacher might have judiciously nurtured up until this point.

Testing is evil because it is stressful for all concerned, and because the conditions under which testing is conducted (separated desks, no mobile phones, etc) imply a basic lack of trust in the learners.

It is evil because it pretends to be objective but in fact it is inherently subjective.

I’ve rarely seen such drivel.

1. Assigning a value to the learner which is almost always short of perfection does NOT “essentially” de-value the learner. This is rhetoric at its very worse. What is the word “essential” doing here? We don’t have to be experts in critical discourse analysis to see its distorting influence. As it happens, Scott, no learner is perfect, and giving a learner a score in a test can have a myriad of consequences, not all of them necessarily leading to a “devaluation” of the person concerned. Duh.

2. Testing, by measuring students one against the other, does not necessarily “destroy tthe dynamic of equality that the teacher might have judiciously nurtured up until this point”. Again, you use silly language to support a silly argument. Real differences exist between students’ abilities, and no good teacher need feel that test results destroy any realistic dynamic of equality, if by that you mean that everybody in the group is equally respected.

3. Testing is stressful, and so is being asked to do a lot of things in class. The teacher’s job is to minimise stress and support those who feel it, not to strive for some daft zen-like calm throughout. And there is absolutely no reason to suppose that testing implies a basic lack of trust in the learners. None.

4. Testing is only “in fact inherently subjective” if you define it so. If the test consists of 10 multiple choice questions, carefully chosen to test what we expect students to have learned, without any deliberate bias or favour, then it is only subjective because some daft post-modernist twisting of words makes it so.

Testing is an extremely interesting part of our job. The key construct in testing is validity. Messick (1996) views validity as a unified concept which places a heavier emphasis on how a test is used. Six distinguishable aspects of validity are highlighted as a means of addressing central issues implicit in the notion of validity as a unified concept. In effect, these six aspects conjointly function as general validity criteria or standards for all educational and psychological measurement. These six aspects must be viewed as interdependent and complementary forms of validity evidence and not viewed as separate and substitutable validity types.

* Content A key issue for the content aspect of validity is determining the knowledge, skills, and other attributes to be revealed by the assessment tasks. Content standards themselves should be relevant and representative of the construct domain. Increasing achievement levels or performance standards should reflect increases in complexity of the construct under scrutiny and not increasing sources of construct-irrelevant difficulty (Messick, 1996a).

* Substansive The substansive aspect of validity emphasizes the verification of the domain processes to be revealed in assessment tasks. These can be identified through the use of substansive theories and process modeling (Messick 1989). When determining the substansiveness of test, one should consider two points. First, the assessment tasks must have the ability to provide an appropriate sampling of domain processes in addition to traditional coverage of domain content. Also, the engagement of these sampled in these assessment tasks must be confirmed by the accumulation of empirical evidence.

* Structure Scoring models should be rationally consistent with what is known about the structural relations inherent in behavioral manifestations of the construct in question. The manner in which the execution of tasks are assessed and scored should be based on how the implicit processes of the respondent’s actions combine dynamically to produce effects. Thus, the internal structure of the assessment should be consistent with what is known about the internal structure of the construct domain (Messick, 1989).

* Generalizability Assessments should provide representative coverage of the content and processes of the construct domain. This allows score interpretations to be broadly generalizable within the specified construct. Evidence of such generalizability depends on the tasks’ degree of correlation with other tasks that also represent the construct or aspects of the construct.

* External Factors The external aspects of validity refers to the extent that the assessment scores’ relationship with other measures and nonassessment behaviors reflect the expected high, low, and interactive relations implicit in the specified construct. Thus, the score interpretation is substantiated externally by appraising the degree to which empirical relationships are consistent with that meaning.

* Consequential Aspects of Validity It is important to accrue evidence of such positive consequences as well as evidence that adverse consequences are minimal. The consequential aspect of validity includes evidence and rationales for evaluating the intended and unintended consequences of score interpretation and use. This type of investigation is especially important when it concerns adverse consequences for individuals and groups that are associated with bias in scoring and interpretation.

These six aspects of validity apply to all educational and psychological measurement; most score-based interpretations and action inferences either invoke these properties or assume them, explicitly or tacitly. The challenge in test validation, then, is to link these inferences to convergent evidence which support them as well as to discriminant evidence that discount plausible rival inferences.

I suggest you pick holes in modern views of testing rather than blather on about how evil they are. Shame on you!

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan.

Messick, S. (1996). Validity of Performance Assessment. In Philips, G. (1996). Technical Issues in Large-Scale Performance Assessment. Washington, DC: National Center for Educational Statistics.

Reply
9 03 2013: Sandra (19:34:45) :

Hello,

I want to say something about your final question. I think ‘the means are subordinated to the ends’ would be just a tendency, economical policy or perspective. However, a professional teacher like you understands and critiques these scenarios and acts differently. Unfortunately, systems dominates affecting teaching in some way, but the most important principles are not affected at all. Anyway, teachers can think and make wise decisions on the classroom which is the place where testing really takes place. Teachers decide how to test using whatever they want using the means they consider including their conception of testing that, as you said and is well-known, is a flexible subjective process not only based on outcomes but what the language is. As the language or language learning is a personal process and problem, student´s and teacher´s attitudes towards testing should go behind grading but cooperating enthusiastically instead of thinking of comparisons among classmates: language development varies a lot. So, for me, the main problem of testing is the problem of students’ and teachers’ attitudes. Nevertheless, general conceptions towards testing should improve not only on academic but commercial settings.

Reply
29 09 2017: Jeff Buck (00:53:46) :

Thanks for another great post, Scott. My main complaints with testing are that the tests at my school are not made by the individual teachers and that we don’t see them until the last minute. If a school doesn’t trust teachers to make their own syllabuses and tests, then why did they hire them in the first place?

Reply

	Jeff Buck on T is for Technology
	Janet Mournard on F is for Forty years on
	Mohammad Alnahas on N is for Native-speakerism
	Evan Millner on P is for (Thomas) Prender…
	Nili Pinhasi on I is for Imitation
	Mohammad Jahangir Ho… on R is for Rapport
	Md. Mahbubr Rahman on R is for Rapport
	Mahmoud Heikal on V is for Vocabulary teach…
	Joe Bonner on G is for Grammar McNugget…
	Philip on G is for Gist

An A-Z of ELT