My response to NYCDOE teacher evaluation formulas, with some thoughts on undergraduate teacher evals, too. Originally published 3/11/11, here.
Page A15 of the New York Times on March 7th looked suspiciously like a story from The Onion about the tangled mess that is teacher evaluation in New York City public schools. Winning the award for the most understated headline of the year, “Evaluating New York Teachers, Perhaps the Numbers Do Lie,” Michael Winerip tells the (predictably?) sad story of Stacey Isaacson, a 7th grade English and Social Studies teacher at the Lab school, described as “very dedicated,” “wonderful,” and “one of a kind,” by teachers, students, and principals alike.
So why, then, is poor Ms. Isaacson ranked in the 7th percentile of city teachers when it comes to student academic progress?
Because of this formula, designed to calculate a teacher’s value-added score by the Department of Education’s “accountability experts” (satirists, start your engines):
As someone who once taught for the NYC Department of Education and is also a product of it, I wasn’t really surprised that they had gotten it all wrong. I wasn’t even surprised to imagine that they would think such a formula could be an accurate method for tenure evaluation. They did, however, outdo themselves in the category of overall incoherence; not only did this tool strike me as wrong-headed, but it was also completely unintelligible. This is so unbelievably unhelpful a formula (ready-made for critique by visualization genius Edward Tufte), that no teacher could be expected to look at it and see her work (or her true challenges) reflected within it. Matrix-like in its complexity and opaque in its reasoning, it is a formula incapable of communicating what it is measuring or how a teacher might improve her practices based upon it. And from what I can tell, the variables are wonky, too.
It is not until the 16th paragraph of the article that Winerip summons the courage to try to explain the thing:
According to her “data report,” Isaacson’s students had a prior proficiency score of 3.57. “Her students were predicted to get a 3.69– based on the scores of comparable students around the city. Her students actually scored a 3.63. So Ms. Isaacson’s valued added in 3.63-3.69.” Simple enough, right? Wrong. The author– who knows he’s hit pay dirt with this one– goes on:
“These are not averages. For example, the department defines Ms. Isaacson’s 3.57 prior proficiency as ‘the average prior year proficiency rating of the students who contribute to a teacher’s value added score.”
Eh? And the calculation for her predicted score is based on 32 variables, which are plugged into a statistical model– the one that made me feel like I was, surely, readingThe Onion.
Anyone reading this case study of Ms. Isaacson will naturally wonder a few things, like, “Wouldn’t it be fun to calculate what percentage of Joel Klein’s contract at Fox News Corporation represents Ms. Isaacson’s salary?” or, “Wouldn’t it be interesting to invite these statisticians to actually teach us this formula and how it works?” I frequently work on assessment at the Schwartz Institute, and it is also a built-in aspect of every course I teach. So I know that evaluating teaching and learning is a tricky thing indeed, a hall of mirrors in which you think you see the student reflected but often, you don’t.
I decided, then, to concoct my own formula, with my own variables, to evaluate the teaching that I do at Baruch in my capacity as a Fellow and an instructor of Communication Studies. What variables get in the way of student progress that cannot be accounted for after you have observed my class, read my syllabus, and tested my students for their proficiency level?
What if you really tried to articulate the variables that come into play when facing a group of students and a set of learning objectives?
Winerip explains that teachers are eligible for tenure based upon three categories: instructional practices (including observations), contribution to the school community, and student achievement (which is where the formula comes in). Now, I’ve never been much of a whiz at statistics, but maybe that’s okay. After all, if the communications people made the formulas, and the formula people made the communications, perhaps we’d all start getting somewhere?
So please—in the spirit of collaborative learning, improve upon my draft and post your own visual and/or variables in the comments section.