A response to Gary Rubinstein’s post on value-added.
* * *
I can see why you would be worried that policymakers and the media have overstated the usefulness of one-year VAM scores in teacher evaluations, but it’s still not clear to me why you don’t find the Chetty results, which suggest that accurate VAM is possible, compelling in and of themselves.
The study (available here) shows a pretty tight correlation between teacher value-added and actual student test scores (see Figure 1a). There are also positive correlations between between teacher value-added and earnings and the longer-term outcomes (college attendance at 20, college quality at 20, earnings at age 28, teenage births, and neighborhood quality) although these correlations are not as strong (see Figures 5a, 5b, 6, 8a, and 8b). I don’t see any correlation coefficients published and I don’t have the statistical chops to analyze the data myself, so I’m just looking at the graphs; let me know if there’s something wrong with my conclusions.
It makes sense that the long-term outcomes are less tightly correlated; it’s unrealistic to expect any one teacher to completely change all of his or her students’ lives forever. The Counterpunch article you link makes it seem like the study’s conclusions are bogus because there is no statistically significant link between a teacher’s value-added and a student’s income at age 30. But the signal between these two things is going to be very, very weak – you’re going to need a lot of observations to be confident it’s there. There were 368,427 observations at age 28 and only 61,639 at age 30. That’s still a large number, but it’s a big decrease. According to the authors, the effect of teacher value-added on earnings is in fact higher at age 30 than at age 28 – “The correlation between test scores and earnings is roughly 20% higher at age 30 than at age 28″ – but the standard error is also much greater, partially because the sample shrunk so much.
In my opinion, the Chetty study convincingly demonstrates the validity of value-added measurements. Even Diane Ravitch initially wrote, “[the] problems of the study are not technical, but educational” (though she has since changed her mind). Matthew DiCarlo of the AFT wrote, “[there] is some strong, useful signal there”. Like commenter Paul Bruno, I haven’t seen a convincing scholarly repudiation of the study itself.
Now, you can definitely argue that value-added measurements are being incorrectly used in teacher evaluations, although I wouldn’t necessarily agree with you. The Chetty study had an mean of 8.08 years of data per teacher (although the standard deviation is almost as high, 7.72 years), and the researchers generated each teacher’s value-added score by combining all those years together. As you saw in your analysis of the NYC data, there can be HUGE fluctuations in a teacher’s students’ test scores from year to year and even from classroom to classroom. However, Chetty demonstrates pretty convincingly that value-added is not ‘completely inaccurate’. It’s just wildly imprecise. So how many years of data do you need to be able to make a confident assessment of a teacher’s value-added? I don’t know, and it’s going to vary from teacher to teacher, but we shouldn’t discard the measure just because it’s not perfect.
I do think that we should use value-added, perhaps weighting it less at the beginning of a teacher’s career and more further down the line as there is more information to draw from. I’m not aware of anyone who wants to hire and fire teachers based on test scores alone. In Chicago, for example, the current proposal is that standardized tests be worth 15% of classroom teachers’ scores (see p. 46 of the proposal) and be thrown out if their confidence interval is too large (p. 38). A teacher with the minimum VAM score could still be rated ‘Excellent’ if he or she did well on all of the other measures.
I know that CPS wants the weighting of VAM to increase over time, and I can see how you would be worried about that, too. I can see how teachers, worried about their evaluations, could narrowly focus on test-prep. I think it’s an incredibly misguided strategy, but that’s beside the point – it will happen anyway. I think the best way to combat that is with (1) high-quality coaching to help teachers increase their VAM by employing better teaching techniques, and (2) frequent evaluations by trained observers. If every observer who comes in says, “Teacher X gave her kids a test-prep packet and told them to spend the period working on it”, then red flags should appear elsewhere in that teacher’s evaluation, providing a disincentive to go test-prep crazy. I don’t think it makes sense to say, “tying VAM to evaluations will completely distort VAM measurements, so we shouldn’t do it at all.”
Now, I’ve said all this, but I realize we may still have a fundamental disagreement. You say that value-added is not ‘student learning’, ‘student achievement’, or ‘student growth’. Does that mean you think standardized tests are completely worthless as proxies for these things? If so, how do you think we should measure them? Or do you not think we should try to measure them at all? If that’s the case, how should teachers figure out what they’re going wrong and get better?
Thanks for your post. I enjoyed reading it and taking the opportunity to learn more about VAM.