Testing and Assessment Part One

Fall 2005

By Stan Izen

Recently, Grant Wiggins and I had an online conversation on the topic "Testing and Assessment." Grant is a former teacher and heads Authentic Education, an education consulting company. Grant is a nationally known, highly respected expert on education. He has published many articles and books, and speaks frequently to teachers and administrators in schools across the country. Part 1 of our slightly edited conversation is published here. Part 2 appears in the Spring 2006 issue.


SI: Testing and assessment generally are certainly two of the most difficult jobs a teacher has. Is this test too long, is this quiz too easy, is this question too much of a stretch from the homework, do I give everyone in the group the same grade, is this project worth the time, do I grade homework for right and wrong or only for effort, how do I reward creative thinking that doesn't lead anywhere? After decades of teaching, I have fewer answers and more questions. Added to the difficulties of choosing a testing instrument and designing it properly is the pressure for good grades that students and parents, especially independent school students and parents, feel. (More about this later, I hope.)

One thing is clear; as the definition of good teaching keeps changing, the way one tests learning must keep changing. Since you visit many schools, have you seen some examples of good assessment? What made them good? You talk a lot about "transfer of knowledge." It seems to me that when testing "transfer of knowledge" the possibility of expecting too much is very likely. How does one test for this at the right level?

GW: I think there are at least four different and difficult questions here! There is also an assumption I question. More on that, later.

The questions you raise:
What is appropriate assessment? (What makes a good assessment good?)
What are good examples of assessments I have seen?
How does one assess for transfer? How does one not under- or over-challenge when doing so?
What should I grade, and how?

The assumption: the definition of good teaching keeps changing. I don't agree - I think it remains the same as it has always been - cause lasting learning of important things.

SI: Let's try to answer questions one and three now and tackle the others later.

GW: Initial thoughts on good assessment: Good assessment is assessment that:

  1. aligns with the goals, so that what is assessed is a valid measure of the goal and not merely what is easy or comfortable to score.
  2. provides students with useful feedback (and opportunities within the syllabus/curriculum to use the feedback)
  3. provides teachers with feedback not merely about whether the student learned what was taught but whether they met the learning goals that provide the rationale for the lessons in the first place.

Let me say a little about each idea:

  1. Good assessment is valid, appropriate. It provides a credible measure of the goals. In a very real sense, the assessments are implied in the goals, and derivable by logical analysis from the goals. So, if you are teaching the Civil War, and your goals are for students to know the key events leading up to, during, and after; and be able to have causal understanding connecting those events, then the assessments must provide evidence of mastery of the key facts and of causal reasoning about the Civil War. You cannot just start assessing for other goals at the 11th hour.

  2. More importantly, since the 2nd goal is UNDERSTANDING, you have to ask yourself: what is an understanding of the causes and effects? What would such understanding look like? It cannot merely be a regurgitation of what the teacher or textbook said about those causes and effects since accurate recall is not a solid indicator of understanding. What, then, IS evidence of understanding? The ability to make the causal analysis on one's own, the ability to extend or transfer one's understanding to new events or wars.

    In Understanding by Design we note that understanding reflects itself in various kinds of transfer tasks: a generalization or theory of one's own, a new interpretation of facts, an application of one's ideas and facts to a new situation, perspective on the facts, empathy, self-understanding about possible bias. (The six facets of understanding)




  3.  
  4.  
  5. We could have the greatest assessment in the world but it might still be instructionally useless because the student was ill-prepared for it and the feedback from it was useless. So in the broader context of assessment as part of instruction and curriculum, we need to make sure that students get feedback that they can use and profit from, opportunities to use that feedback, and a FAIR grading system that does not penalize them for initial attempts at understanding that are just part of learning - we do not grade the actor's dress rehearsal or the soccer player's scrimmage; we practice performance before demanding and evaluating it. And we certainly should not have an assessment that relies on tricks, secrets, and other dysfunctional approaches. In the real world of fairness, you have complete understanding of what the assessment will demand of you before you do it. In short, assessment should honest, helpful, fair - not just a grade-generating exercise for teachers.

  6. Feedback for teachers against their goals is also the point - that's not something we typically design for. We are often so intent on just finding out if they learned what was taught in the previous few lessons we lose sight of our long-terms goals. That's why it is important to pre/post test against goals, and do some ongoing assessment of end-of-year goals throughout the year.

(Ed. Due to the length of the next email, Grant's replies are interspersed with my responses.)

SI: Your analysis seems to me to be absolutely correct but I would like to clarify some points. Since I am a math teacher, my remarks tend to be in terms of the math classroom.

First, you mention the need for "opportunities to use that feedback." This seems crucial to me and I would like to clarify how this happens. Obviously, feedback on daily work can be used to improve the next day's work, the next quiz, test, etc. But on a chapter test would "test corrections" or re-tests be examples of using the feedback from the corrected test? And won't there be a point after which the feedback will no longer be useful within that course, but perhaps in the next course or "life?"

GW: Careful here: the hidden assumption is that the point of these tests is unique content from which we want to move on to new content. That's not the feedback we need. I don't need to have a lengthy account of what I did right and what I did wrong on each unique test question. Rather, what I need is feedback that can help me do better next time. What does "next time" mean if we define assessment in terms of unique topics and content and test items? This is one way in which I think many teachers are confused about their goals. And that's not what we mean by feedback in the wider world - we mean feedback on the essential skills, ideas, habits, qualities, and performances that can and do recur. In other words, where so many math (and history) teachers err is that they presume the feedback should be related primarily to content because the course is all unique content. But that's not the point of the course. The point of the course is transfer and mastery of transferable skills and ideas, so the tests must test for those and the feedback related to those is therefore always timely, always useful.

Simple example in math. I wrote a number of years back about a study in which a math teacher and assisting math professor developed an error typology for use in giving feedback - in other words, what kinds of errors is a student making - computational? conceptual? logical? (there were 2 others I can't recall). NOW the feedback is far more useful - including on the chapter test. Interestingly, some kids get the same test scores but for vastly different reasons - much more helpful as feedback.

More complicated example: the students should then be asked: state for me exactly how you prepared for this test, in a timelog. What did you do and how did you do it? Then. let's look for correlations and give feedback about that. Too often teachers absolve themselves from being coaches of learning like this. I know it is politically incorrect, but what I am saying, bluntly, is that teachers are far too quick to blame students for test results and far too rarely examining their own limits as coaches.

I think re-tests are fine - but I don't think it is fair to use the highest grade without reporting that it is a retest. Similarly, I like rubrics that say, when referring to projects and complex tasks/performances:

Could do with no teacher help
Could do with very minor teacher help
Could do only with significant teacher help
Could not do, even with significant teacher help

SI: Second, how are "feedback" and "assessment" different? Doesn't feedback always include (or imply) indications of right or wrong, or complete or incomplete, etc.?

GW: Feedback is neutral information, implying no value judgment on my or your part. I simply observe and report back what I see. In my previous examples, I merely report that almost all your errors are due to computational errors, not conceptual misunderstanding. I further note that you turned in your test paper five minutes sooner than many other people who got higher scores. You also do not have a typical test-preparation profile of the high-achievers. That is feedback, not praise/blame. Nor is it advice/guidance. That comes next.

SI: If feedback is useful doesn't it have to contain indications for improvement? Can't feedback be both positive and negative?

GW: No, don't confuse the feedback with the suggested ways to improve - advice. I advise you, based on the feedback, to CAREFULLY review your computations in future tests and use every available minute in the test period to do so. But I first need you to really understand the diagnosis before I start giving advice. Positive and negative have no place if it's true feedback and guidance. Praise and criticism are separate functions of teaching, wisely or unwisely used in addition to feedback and guidance (even if they sometimes get all jumbled up in our speech patterns they remain very distinct ideas). In my experience, teachers do too much of both - praise and blame - as a substitute for feedback and useful advice. The next time you hear yourself saying: Good Job! - immediately explain what you mean by "good" in descriptive, concrete terms about which actions were done well, etc. And try VERY hard to rid your vocabulary of words like good/bad when you are giving feedback. You'll find, if nothing else, that kids pay more attention to what you say.

SI: Third, you say that we "should not penalize [students] for initial attempts at understanding that are just part of learning." I agree. In your view does this mean not grading homework for correctness?

GW: I am very uneasy about grading the content of homework - not to be confused with the diligence of doing it conscientiously. I can tell you as a parent of 3 school-age children that most homework is an utter waste of everyone's time because they often flounder on their own; they cannot get any useful teaching and feedback as they try to learn new material unless parents actively intervene (which brings its own problems and inequities). Why in the world would we expect the novice to learn new material in quiet isolation, without coaching and feedback? It would make far more sense to do what athletic coaches do: watch people try out new skills and give them feedback and advice. Implication? Do not give people challenging brand new material and grade it. That's really dumb and unfair. Would you like me to grade your first attempts to use Understanding by Design after only hearing me once and having no further access to me as you try it out? Of course not. This is as hypocritical as failing to listen to a kid explain a wrong answer they gave with a great reason for their answer - sorry, test is over....c'mon! You would be outraged if you could not explain your work.

Homework should provide useful practice to gain facility, provide useful background knowledge, and opportunities to explore implications and raise questions. Again, we can grade your having done it, but to grade the accuracy of first attempts on its own is immoral. That does not mean do not note that 4 out of 10 were wrong - that is feedback that is useful. But to put a 4 out of 10 in your gradebook is misguided.

SI: What about quizzes, are they "initial attempts at understanding?"

GW: Only fair and appropriate if I have had plenty of chances to check out my grasp of stuff in question first.

SI: Finally, I assume that you agree with me that assessment is not the same thing as "grades." Teachers should be "assessing" their students' work continuously as students do problems in class, answer questions in class, etc. Almost everything a student does (or doesn't do) in class helps a teacher understand how much that student has learned.

GW: Not only do I agree with you, I would go further: we evaluate far more than we should and we assess far too little than we should.

SI: The ability of a student to transfer learning from one situation to another is one of the most important results of good teaching as you see it. How does a teacher assess for transfer?

GW: If we regularly see students learning lessons but stymied when asked to meaningfully use them, the problem is ours, not theirs. We have not made clear that the goal is transfer, as reflected in assessments, assignments, teaching, and feedback. Learning how to use knowledge and skill wisely and effectively, is the goal of a course. Mastering school coursework is not the goal. Transfer is successfully drawing upon one's previous learning for new tasks, new meaning, new learning. That goal has to be reflected in the assessments.

When I was a Varsity soccer coach, the co-captain, in the middle of the game, made this point about the need to learn how to transfer exquisitely clear. I was yelling: "Use the 2-on-1 we worked on in practice!!" but my co-captain yelled back: "I can't SEE it now, coach; the other team won't line up for me like in the drill!"

Therein lies the problem: The sum of the drills is never fluent and flexible performance - in soccer, or any other technical discipline where the aim is effective performance. It is never enough to practice all the discrete skills in scaffolded drills in soccer or persuasive writing: you have to practice "playing the game" in all its fluid and unpredictable messiness. That means making sure we assess for transfer, the game; then teach for transfer.

In other words, we must design "backward" from the ultimate transfer we seek - the performances and habits of mind at the heart of "doing" each subject. As on the basketball court or in the art studio, the student must see academic knowledge and skill as a repertoire to be used, to be wisely called on, by being confronted by assessments and instruction that make this clear. Too often, though, teachers merely teach, then ask: did you learn the lesson? That makes no sense, if the point is to learn to use lessons, which it is.

There is nothing new in this line of argument. Recall how Bloom and colleagues defined application and synthesis:

If the situations...are to involve application as we are defining it here, then they must either be situations new to the student or situations containing new elements as compared to the situation in which the abstraction was learned... Ideally we are seeking a problem which will test the extent to which an individual has learned to apply the abstraction in a practical way."

They made clear that "application" means "transfer" - new, complex problems:

Ideally we are seeking a problem which will test the extent to which an individual has learned to apply the abstraction in a practical way... Problems which themselves contain clues as to how they should be solved would not test application."

It is never enough to learn, practice, and master your skills in isolation, which is what almost all tests require, especially math tests. You have to practice using your skill intelligently. And what that means in real-world settings is that you have to practice making sound judgments about what to do where, when, and why. The soccer story provides a clear example of the challenge. Now, you are on your own, out on the field; coach is on the sideline. Now, the scaffolding and simplification of the drill fades away; there are 22 players trying to determine what they should do next, on their own. You not only have to judge what to do, you have to judge which tactic is called for, and thus which skills are called for, given your entire repertoire. This takes lots of coached practice and learning from feedback in the assessment, the real performance "test," about what works and what doesn't work in the heat of the game.

When teachers bemoan the absence of "critical thinking" in students what they are really lamenting, then, is likely the failure of their own assessment and instruction to coach students in how to critically judge situations and respond accordingly. That requires not only instruction in how to make judgments about which learning to call upon in a given situation but it requires assessment tasks (and "scrimmages" - i.e. practice in such assessments) that have minimal cues, prompts, and scaffolds. Otherwise, there is no transfer, just following directions and using recall.

Here is a lovely math story from 70 years ago that, as a math teacher Stan, you'll like, that makes the point in a slightly different way:

from
The Teaching of Arithmetic I: The Story of an experiment
L. P. Benezet
Superintendent of Schools, Manchester, New Hampshire
Originally published in the Journal of the National Education Association, Volume 24, Number 8, November 1935, pp. 241-244

"I have recently tried, in several parts of the city, a test involving five simple problems. Here it is:

  1. Two boys start out together to race from Manchester to West Concord, a distance of 20 miles. One makes 4 miles an hour and the other 5 miles an hour. How long will it be before both have reached West Concord?

  2. A man can row 4 miles an hour in still water. How long will it take him to row from Hill to Concord, 24 miles one way, and back, if the river flows south at the rate of 2 miles an hour?

  3. The same man again starts rowing from Hill to Concord in the spring when the water is high and the current is twice as swift as it was before. How long will it now take him to make the round trip?

  4. Joe can eat a whole watermelon in 10 minutes. Sue in 12. I suggest a race between them, giving each half of a melon. How long will it be before the melon is entirely gone?

  5. The distance from Boston to Portland by water is 120 miles. Three ships leave Boston, simultaneously, for Portland. One makes the trip in 10 hours, one in 12, and one in 15. How long will it be before all 3 reach Portland?

It looks easy enough, but I advise you to try it. I will guarantee that high school seniors, preparing for College Entrance Board Examinations in Mathematics, will not average 70 percent. I had some rather ridiculous results. [A] ninth-grade class in arithmetic, which had been taught under the old arithmetical curriculum, made a sorry showing. Out of twenty-nine in the class only six gave me the correct answer to problem five."

SI: I absolutely agree that the ability to transfer knowledge is crucial and I try, with varying degrees of success, to make transfer a goal in all of my classes. The main difficulty I run into is judging how much or how little transfer to expect. How does a teacher guard against over-challenging and under-challenging his students?

GW: We neither want to over-challenge nor under-challenge students - therein lies the art of teaching. You have to know the kids and you have to know the subject and you have to know how to design lessons and assessments that are fair tests.


Sometimes the only way to find out what people understand is to really challenge them.

Here's another way to put it: our job is neither to frustrate kids and make them feel dumb nor make them have 100% success on everything they do. Our job is to educate them for future learning and successful transfer. Sometimes that means challenging them and sometimes that means making sure they feel enough confidence/competence to continue. As you might suspect, I therefore think the problem is not in the assessment per se but in our dysfunctional grading system that is neither fair nor honest about what kids can and cannot do, and what are appropriate expectations in a diverse class. In my ideal world there would only be learner profile along rubrics for all key dimensions - your ability to speak, listen, study, argue, investigate, and we would simply report your level of performance on each dimension. Then we could give you hard transfer tasks but not feel the need to equate incomplete and inappropriate response with poor grade. Why people grade every student attempt, even the first tries, is beyond me - very unfair and not real-world. But that's another day's discussion.

Read Part 2 of this conversation in the Spring 2006 issue.
 

Stan Izen

Stan Izen is the editor of Independent Teacher Magazine.