Contemporary Educational Psychology/Chapter 11: Standardized and Other Formal Assessments/High Stakes Testing by States

While many States had standardized testing programs prior to 2000, the number of state-wide tests has grown enormously since then because NCLB required that all states test students in reading and mathematics annually in grades 3-8 and at least once in high school by 2005-6. Twenty three states expanded their testing programs during 2005-6 and additional tests are being added as testing in science is required by 2007-8. Students with disabilities and English language learners must be included in the testing and provided a variety of accommodations so the majority of staff in school districts are involved in testing in some way (Olson, 2005). In this section we focus on these tests and their implications for teachers and students

Academic Content Standards
NCLB mandates that States must develop academic content standards that specify what students are expected to know or be able to do at each grade level. These content standards used to be called goals and objectives and it is not clear why the labels have changed (Popham, 2004). Content standards are not easy to develop – if they are too broad and not related to grade level teachers cannot hope to prepare students to meet the standards. An example, a broad standard in reading is


 * Students should be able to construct meaning through experiences with literature, cultural events and philosophical discussion (no grade level indicated).American Federation of Teachers, 2006, p. 6).

Standards that are too narrow can result in a restricted curriculum. An example of a narrow standard might be this:
 * Students can define, compare and contrast, and provide a variety of examples of synonyms and antonyms.

A stronger standard is this:
 * Students should apply knowledge of word origins, derivations, synonyms, antonyms, and idioms to determine the meaning of words (grade 4).(American Federation of Teachers, 2006, p. 6).

The American Federation of Teachers conducted a study in 2005-6 and reported that in 32 states some of the standards in reading, math and science were weak. States had the strongest standards in science followed by mathematics. Standards in reading were particularly problematic and with one fifth of all reading standards redundant across the grades, i.e., word-by-word repetition across grade levels at least 50% of the time (American Federation of Teachers, 2006). Even if the standards are strong, there are often so many of them that it is hard for teachers to address them all in a school year. Content standards are developed by curriculum specialists who believe in the importance of their subject area so they tend to develop large numbers of standards for each subject area and grade level. At first glance, it may appear that there are only several broad standards, but under each standard there are subcategories called goals, benchmarks, indicators or objectives (Popham, 2004). For example, Idaho’s first grade mathematics standard contains five broad standards, including 10 goals and a total of 29 objectives (Idaho Department of Education, 2005-6).

Alignment of Standards, Testing and Classroom Curriculum
The State tests must be aligned with strong content standards in order to provide useful feedback about student learning. If there is a mismatch between the academic content standards and the content that is assessed then the test results cannot provide information about students’ proficiency on the academic standards. A mismatch not only frustrates the students taking the test, teachers, and administrators it undermines the concept of accountability and the “theory of action” that underlies the NCLB. Unfortunately, the 2006 American Federation of Teachers study indicated that in only 11 states were all the tests aligned with state standards.

State standards and their alignment with state assessments should be widely available  - preferably posted on the Web sites of the States so they can be accessed by school personnel and the public. A number of states have been slow to do this. Table 11-1 summarizes which states had strong content standards, tests that were aligned with state standards, and adequate documents on online. Only 11 States were judged to meet all three criteria in 2006.

Sampling content
When numerous standards have been developed it is impossible for tests to assess all of the standards every year, so the tests sample the content, i.e., measure some but not all the standards every year. Content standards cannot be reliably assessed with only one or two items so the decision to assess one content standard often requires not assessing another. This means if there are too many content standards a significant proportion of them are not measured each year. In this situation, teachers try to guess which content standards will be assessed that year and align their teaching on those specific standards. Of course if these guesses are incorrect students will have studied content not on the test and not studied content that is on the test. Some argue that this is a very serious problem with current state testing and Popham, an expert on testing even said, “What a muddleheaded way to run a testing program” (2004, p. 79).

Adequate Yearly Progress (AYP)
Under NCLB three levels of achievement, basic, proficient and advanced, must be specified for each grade level in each content area by each state. States were required to set a time table from 2002 that insured an increasing percentage of students reach the proficient levels such that by 2013-14 every child is performing at or the proficient level. Schools and school districts who meet this timetable are said to meet adequate yearly progress (AYP).

Because every child must reach proficiency by 2013-14 greater increases are required for those schools that had larger percentages of initially lower performing students. Figure 11-1 illustrates the progress needed in three hypothetical schools. School A, initially the lowest performing school, has to increase the number of students reaching proficiency by an average of 6% each year, the increase is 3% for School B, and the increase is only 1% for School C. Also, the checkpoint targets in the timetables are determined by the lower performing schools. This is illustrated on the figure by the arrow – it is obvious that School A has to make significant improvements by 2007-8 but School C does not have to improve at all by 2007-8. This means that schools that are initially lower performing are much more likely to fail to make AYP during the initial implementation years of NCLB.

Subgroups and AYP
For a school to achieve AYP not only must overall percentages of the students reach proficiency but subgroups must also reach proficiency in a process called desegregation. Prior to NCLB State accountability systems typically focused on overall student performance but this did not provide incentives for schools to focus on the neediest students, e.g., those children living below the poverty line (Hess & Petrilli, 2006). Under NCLB the percentages for each racial/ethnic group in the school (White, African American, Latino, Native American etc.), low income students, students with limited English Proficiency, and students with disabilities are all calculated if there are enough students in the subgroup. A school may fail AYP if one group, e.g. English Language Learners do not make adequate progress. This means that it is more difficult for large diverse schools (typically urban schools) that have many subgroups to meet the demands of AYP than smaller schools with homogeneous student body (Novak & Fuller, 2003). Schools can also fail to make AYP if too few students take the exam. The drafters of the law were concerned that some schools might encourage low-performing students to stay home on the days of testing in order to artificially inflate the scores. So on average at least 95% of any subgroup must take the exams each year or the school may fail to make AYP (Hess & Petrilli, 2006).

Sanctions and AYP
Schools failing to meet AYP for consecutive experience a series of increasing sanctions. If a school fails to make AYP for two years in row it is labeled “in need of improvement” and school personnel must come up with a school improvement plan that is based on “scientifically based research.” In addition, students must be offered the option of transferring to a better performing public school within the district. If the school fails for three consecutive years free tutoring must be provided to needy students. A fourth year of failure requires “corrective actions” which may include staffing changes, curriculum reforms or extensions of the school day or year. If the school fails to meet AYP for five consecutive years the district must “restructure” which involves major actions such as replacing the majority of the staff, hiring an educational management company, turning the school over to the state.

(back to Chapter 11...)