Arne Duncan Should Beware Of Standardized Testing’s Shortfalls

Students despise them for making learning dull. Parents despise them for making teachers teach to the test. Educators despise them for putting their jobs on the line. But despite their widespread criticism, standardized testing only seems to be gaining steam in the U.S. Is it finally time to dump high-stakes measurements of student performance? The answer is more complex than a simple yes or no.

Over the past two decades, 31 states have implemented teacher evaluations tying performance to how well students’ test scores grow. This trend towards supposedly objective measurement is soon expected to expand to the federal level as well. On Friday, Secretary of Education Arne Duncan is expected to announce new regulations tying financial aid for teacher education programs to how well their graduates improve students’ test scores.

Such a shift is surprising considering how unpopular standardized tests have been in the public’s eye since the turn of the century. President Bush’s No Child Left Behind Act (NCLB) most infamously required states to set annual testing benchmarks for their schools to achieve the impossible goal of 100 percent student proficiency in reading and math by 2013. If a school failed to meet Adequate Yearly Progress (AYP), as the benchmarks were called, NCLB required local education agencies to impose corrective action such as staff replacement, a state administrative takeover, or even a complete shutdown.

Instead of improving educational outcomes, NCLB’s draconian punishments for inadequate test scores perversely incentivized 15 states to lower their education standards, according to one Department of Education study. As a result, student achievement on standardized tests largely stagnated throughout the Bush administration, as measured by the National Assessment for Educational Progress. By the time President Obama took office, NCLB was widely regarded as a failure. The Department of Education has thus granted 42 states and the District of Columbia waivers from AYP in exchange for implementing its educational policy priorities — most notably, Common Core.

Since proficiency benchmarks proved unattainable, states shifted their priorities since NCLB towards using standardized tests to measure academic progress. The foremost method by which most states do this is the value-added model (VAM), where a teacher’s performance is measured by how much his or her students improve on their standardized tests in an academic year.

Like Adequate Yearly Progress benchmarks under NCLB, the value-added model has been a lightning rod for controversy, with educators and analysts hotly debating its ability to accurately measure teacher performance. But despite the metric’s questionable validity, the Department of Education is expected to utilize it in its evaluation of teacher education programs to issue financial aid. So, can this measurement be trusted, or will the Obama administration’s new policy push usher in another era of NCLB-like abuse of standardized tests?

While it’s perhaps too early to tell how effective the value-added model has been used with teacher evaluations on the state level, the statistical evidence is too mixed to suggest that using the metric on the federal level is a good idea. Indeed, the American Statistical Association issued a statement last month urging governments to use caution in their use of the measurement, as “[m]ost VAM studies find that teachers account for about 1 percent to 14 percent of the variability in test scores.” In other words, there are a host of other factors that account for fluctuations in students’ test scores.

For instance, a class could take a standardized test at the end of an academic year with more difficult questions than one taken at the beginning of the year. Also, external factors such as a barking dog outside of a classroom could distract students and skew test results. That’s why the OECD released a literature review recommending the VAM “should not be considered as the only source of indicators for making high-stakes decisions” such as teacher salaries, promotions, or layoffs.

If VAM can’t be trusted as the only factor in important decisions on the state and local level, it certainly shouldn’t be used to determine which teacher education programs are worthy of receiving a portion of the federal government’s $239 billion of student aid. Like NCLB, this attempt to hold teacher education programs accountable will likely result in unintended consequences — this time, acting as a barrier to entry for new programs to enter the field.

As Bloomberg View’s editors put it this week, “states are still figuring out the best ways to measure the impact that teachers have on student achievement … and questions abound about whether such measurements, which are based on standardized tests, can be made accurately.”

Standardized tests should not be tossed out altogether. Although they will never be able to present a full picture of a student’s knowledge, tests provide the best available snapshot. Furthermore, VAM’s measurement of progress versus benchmarks is a methodological improvement over mistakes of the past like NCLB.

However, it is to say that standardized tests should be used as only one of several evaluation techniques in considering a teacher’s performance. Classroom observations and student surveys are additional options that add a human element to VAM’s statistical take. Nevertheless, the Department of Education plan to use of the metric in evaluating student education programs is concerning for future generations of American teachers seeking financial aid.

Casey Given is an editor at Young Voices, a project aimed at promoting Millennials’ policy voice in the media. He is a frequent writer for State Sight, a website focused on state policy.