Standardized tests have been a part of American education since the mid-1800s. Their use skyrocketed after 2002’s No Child Left Behind Act (NCLB) mandated annual testing in all 50 states. US students slipped from being ranked 18th in the world in math in 2000 to 40th in 2015, and from 14th to 25th in science and from 15th to 24th in reading. Failures in the education system have been blamed on rising poverty levels, teacher quality, tenure policies, and, increasingly, on the pervasive use of standardized tests.
Proponents argue that standardized tests offer an objective measurement of education and a good metric to gauge areas for improvement, as well as offer meaningful data to help students in marginalized groups, and that the scores are good indicators of college and job success. They argue standardized tests are useful metrics for teacher evaluations.
Opponents argue that standardized tests only determine which students are good at taking tests, offer no meaningful measure of progress, and have not improved student performance, and that the tests are racist, classist, and sexist, with scores that are not predictors of future success. They argue standardized tests are useful metrics for teacher evaluations.
Pro & Con Arguments
Standardized tests offer an objective measurement of education and a good metric to gauge areas for improvement.
Teachers’ grading practices are naturally uneven and subjective. An A in one class may be a C in another. Teachers also have conscious and unconscious biases for a favorite student or against a rowdy student, for example. Standardized tests offer students across the country a unified measure of their knowledge.
Aaron Churchill, Ohio Research Director for the Thomas B. Fordham Institute, stated, “At their core, standardized exams are designed to be objective measures. They assess students based on a similar set of questions, are given under nearly identical testing conditions, and are graded by a machine or blind reviewer. They are intended to provide an accurate, unfiltered measure of what a student knows.”
Frequently states or local jurisdictions employ psychometricians to ensure tests are fair across populations of students. Mark Moulon, PhD, Chief Executive Officer at Pythias Consulting and psychometrician, offered an example: “If you find that your question on skateboarding is one that boys find to be an easy question, but girls find to be a hard question, that’ll pop up as a statistic. Differential item functioning will flag that question as problematic.”
Moulon continued, explaining, “What’s cool about psychometrics is that it will flag stuff that a human would never be able to notice. I remember a science test that had been developed in California and it asked about earthquakes. But the question was later used in a test that was administered in New England. When you try to analyze the New England kids with the California kids, you would get a differential item functioning flag because the California kids were all over the subject of earthquakes, and the kids in Vermont had no idea about earthquakes.”
With problematic questions removed, or adapted for different populations of students, standardized tests offer the best objective measure of what students have learned. Taking that information, schools can determine areas for improvement. As Bryan Nixon, former Head of School at private school Whitby, noted, “When we receive standardized test data at Whitby, we use it to evaluate the effectiveness of our education program. We view standardized testing data as not only another set of data points to assess student performance, but also as a means to help us reflect on our curriculum. When we look at Whitby’s assessment data, we can compare our students to their peers at other schools to determine what we’re doing well within our educational continuum and where we need to invest more time and resources.”
Standardized tests offer meaningful data to help students in marginalized groups.
Keri Rodrigues, Co-founder of the National Parents Union, explained, “If I don’t have testing data to make sure my child’s on the right track, I’m not able to intervene and say there is a problem and my child needs more. And the community can’t say this school is doing well, this teacher needs help to improve, or this system needs new leadership… It’s really important to have a statewide test because of the income disparity that exists in our society. Black and Brown excellence is real, but just because a kid lives in Dorchester [Massachusetts] does not make his or her life is less valuable than a child that lives in Wellesley [Massachusetts]. And it is unfair to say that just by luck of birth that a child born in Wellesley is somehow entitled to a higher-quality education… Testing is a tool for us to hold the system accountable to make sure our kids have what they need. ”
Sheryl Lazarus, PhD, Director of the National Center on Educational Outcomes at the University of Minnesota, stated, “a real plus of these assessments is that they’ve really shone a light on the differences across sub-groups. And they have led to improvements in access to instruction for students with disabilities and English learners… Inclusion of students with disabilities and English learners in summative tests used for accountability allows us to measure how well the system is doing for these students, and then it is possible to fill in gaps in instructional opportunity.”
Advocates for marginalized groups of students, whether by race, learning disability, or other difference, can use testing data to prove a problem exists and to help solve the problem via more funding, development of programs, or other solutions. Civil rights education lawsuits wherein a group is suing a local or state government for better education almost always use testing data.
Chris Stewart, CEO of brightbeam, summarizes, “We only know that there’s a difference between White students and Black students and other students of color because we have the data. We only know about that because we have assessments.”
A letter signed by 12 civil rights organizations including the NAACP and the American Association of University Women, explained, “Data obtained through some standardized tests are particularly important to the civil rights community because they are the only available, consistent, and objective source of data about disparities in educational outcomes, even while vigilance is always required to ensure tests are not misused. These data are used to advocate for greater resource equity in schools and more fair treatment for students of color, low-income students, students with disabilities, and English learners… [W]e cannot fix what we cannot measure. And abolishing the tests or sabotaging the validity of their results only makes it harder to identify and fix the deep-seated problems in our schools.”
Standardized tests are useful metrics for teacher evaluations.
While grades and other measures are useful for teacher evaluations, standardized tests provide a consistent measure across classrooms and schools. Individual school administrators, school districts, and the state can compare teachers using test scores to show how each teacher has helped students master core concepts.
Timothy Hilton, a high school social studies teacher in South Central Los Angeles, stated, “No self-respecting teacher would use a single student grade on a single assignment as a final grade for the entirety of a course, so why would we rely on one source of information in the determination of a teacher’s overall quality? The more data that can be provided, the more accurate the teacher evaluation decisions will end up being. Teacher evaluations should incorporate as many pieces of data as possible. Administration observation, student surveys, student test scores, professional portfolios, and on and on. The more data that is used, the more accurate the picture it will paint.”
Standardized tests only determine which students are good at taking tests, offer no meaningful measure of progress, and have not improved student performance.
Standardized test scores are easily influenced by outside factors: stress, hunger, tiredness, and prior teacher or parent comments about the difficulty of the test, among other factors. In short, the tests only show which students are best at preparing for and taking the tests, not what knowledge students might exhibit if their stomachs weren’t empty. External stereotypes also play a part in scores: “research indicates that being targeted by well-known stereotypes (‘blacks are unintelligent,’ ‘Latinos perform poorly on tests,’ ‘girls can’t do math’ and so on) can be threatening to students in profound ways, a predicament they call ‘stereotype threat.’”
Students are tested on grade-appropriate material, but they are not re-tested to determine if they have learned information they tested poorly on the year before. Instead, as Steve Martinez, EdD, Superintendent of Twin Rivers Unified in California, and Rick Miller, Executive Director of CORE Districts, note, each “state currently reports yearly change, by comparing the scores of this year’s students against the scores of last year’s students who were in the same grade. Even though educators, parents and policymakers might think change signals impact, it says much more about the change in who the students are because it is not measuring the growth of the same student from one year to the next.”
Further, because each state develops its own tests, standardized tests are not necessarily comparable across state lines, leaving nationwide statistics shaky at best.
Brandon Busteed, Executive Director, Education & Workforce Development at the time of the quote, stated, “Despite an increased focus on standardized testing, U.S. results in international comparisons show we have made no significant improvement over the past 20 years, according to the Program for International Student Assessment (PISA). The U.S. most recently ranked 23rd, 39th and 25th in reading, math and science, respectively. The last time Americans celebrated being 23rd, 39th and 25th in anything was … well, never. Our focus on standardized testing hasn’t helped us improve our results!
Busteed asks, “What if our overreliance on standardized testing has actually inhibited our ability to help students succeed and achieve in a multitude of other dimensions? For example, how effective are schools at identifying and educating students with high entrepreneurial talent? Or at training students to apply creative thinking to solve messy and complex issues with no easy answers?”
Standardized tests are racist, classist, and sexist.
The origin of American standardized tests are those created by psychologist Carl Brigham, PhD, for the Army during World War I, which was later adapted to become the SAT. The Army tests were created specifically to segregate soldiers by race, because at the time science inaccurately linked intelligence and race.
Racial bias has not been stripped from standardized tests. W. James Popham, PhD, Professor Emeritus at the University of California at Los Angeles and former test maker, explains how discrimination is purposefully built in to standardized tests, “Traditionally constructed standardized achievements, the kinds that we’ve used in this country for a long while, are intended chiefly to discriminate among students … to say that someone was in the 83rd percentile and someone is at 43rd percentile. And the reason you do that is so you can make judgments among these kids. But in order to do so, you have to make sure that the test has in fact a spread of scores. One of the ways to have that test create a spread of scores is to limit items in the test to socioeconomic variables, because socioeconomic status is a nicely spread out distribution, and that distribution does in fact spread kids’ scores out on a test.”
As Young Whan Choi, Manager of Performance Assessments Oakland Unified School District in Oakland, California, explains, “Too often, test designers rely on questions which assume background knowledge more often held by White, middle-class students. It’s not just that the designers have unconscious racial bias; the standardized testing industry depends on these kinds of biased questions in order to create a wide range of scores.” Choi offers an example from his own 10th grade class, “a student called me over with a question. With a puzzled look, she pointed to the prompt asking students to write about the qualities of someone who would deserve a “key to the city.” Many of my students, nearly all of whom qualified for free and reduced lunch, were not familiar with the idea of a ‘key to the city.’”
Wealthy kids, who would be more familiar with a “key to the city,” tend to have higher standardized test scores due to differences in brain development caused by factors such as “access to enriching educational resources, and… exposure to spoken language and vocabulary early in life.” Plus, as Eloy Ortiz Oakley, MBA, Chancellor of California Community Colleges, points out, “Many well-resourced students have far greater access to test preparation, tutoring and taking the test multiple times, opportunities not afforded the less affluent… [T]hese admissions tests are a better measure of students’ family background and economic status than of their ability to succeed”
Journalist and teacher Carly Berwick explains, “All students do not do equally well on multiple choice tests, however. Girls tend to do less well than boys and perform better on questions with open-ended answers, according to a 2018 study by Stanford University’s Sean Reardon, which found that test format alone accounts for 25 percent of the gender difference in performance in both reading and math. Researchers hypothesize that one explanation for the gender difference on high-stakes tests is risk aversion, meaning girls tend to guess less.”
Standardized tests are unfair metrics for teacher evaluations.
16 states and DC have stopped using standardized tests in teacher evaluations.   As W. James Popham, PhD, noted, “standardized achievement tests should not be used to determine the effectiveness of a state, a district, a school, or a teacher. There’s almost certain to be a significant mismatch between what’s taught and what’s tested.”
Margaret Pastor, PhD, Principal of Stedwick Elementary School in Maryland, stated: “[A]n assistant superintendent… pointed out that in one of my four kindergarten classes, the student scores were noticeably lower, while in another, the students were outperforming the other three classes. He recommended that I have the teacher whose class had scored much lower work directly with the teacher who seemed to know how to get higher scores from her students. Seems reasonable, right? But here was the problem: The “underperforming” kindergarten teacher and the “high-performing” teacher were one and the same person.”