MAP Reading Assessment: A Critique

In empirical research and in the educational setting, testing plays an essential role in obtaining meaningful results. Two factors are important in the assessment of a clinical tool’s appropriateness for its intended purpose. The first factor is reliability. This is the ability of the instrument to measure the same way every time that it is used on different sample populations and under different test conditions. Validity is the second factor. This is the ability of the tool to measure the intended variable. Having reliable and valid testing instruments allows an educator to measure accurately the student’s retention and understanding of the given content. This allows the educator to assess the student’s strengths and weaknesses, as well as to plan future teaching strategies and interventions. This analysis will examine the Measures of Academic Performance (MAP) for Primary Grades and its ability to assess the reading abilities of Elementary school learners.

Overview of MAP for Reading

MAP was chosen as the assessment tool for this study because it is a Common Core test that is widely used across the nation. It was developed by the Northwest Evaluation Assessment in 2000. It is uses in all fifty states for grades 3 through 9. The test is divided into reading and mathematics. This research will only examine the reading portion of the study. The Reading portion of the MAP has 40 questions and generally consists of multiple-choice questions with four answer options.

Even though the MAP is widely used, there has been some question as to its reliability and validity. This analysis will explore the MAP, as well as ways in which it could be improved to provide a more valuable and reliable means of assessment. The MAP is administered via computer. It asks the learner to click on or drag the correct answer into a box. The test covers grammar, vocabulary, and comprehension skills. The following will examine the reliability and validity of this measure.

Reliability of MAP for Reading

The Northwest Evaluation Assessment (NWEA) is an organization that is dedicated to analyzing the effectiveness of evaluation tool in the academic field. They set rigorous standards for the use of assessment tools. Reliability is often referred to as the test-retest reliability or temporal stability of the test. There are two measures of reliability. The first involves the overall test score. The other involves the consistency across various test items. This is referred to as the internal consistency.

Klingbell, McComas, Burns, and Helman compared the validity of three universal measures of reading ability. The measures were The Oral Reading Fluency (ORF), Fountas and Pinnell Benchmark Assessment System (BAS) scores, and Measures of Academic Progress for reading (MAP). The study found that the ORF and BAS did not meet reliability criteria and could not be used for the comparative analysis. Only the MAP met reliability criteria to be considered a reliable measure of reading ability. The intended purpose of these researchers was to examine the construct validity of the measures, but they were unable to perform the intended analysis, as two of the instruments were not reliable enough for comparison.

Validity of the MAP for Reading

Validity refers to whether the test actually measures the factor that it was intended to measure. Evidence for validity can come from a variety of sources. The first is form its content. In the educational field, this means selecting test items that match specific content areas of the curriculum. The most common form of validity for NWEA tests, such as the MAP Reading tests is in the form of concurrent validity. This method of validity testing uses an established test that examines the same content. Both tests are administered to the same students to determine how closely the new test matches the established one (NWEA, 2004). This is a comparative method for validity testing.

Wang, McCall, Jiao, & Harris used the comparative method to examine the construct validity of the MAP using the Reading and Mathematics Computerized Adaptive Tests (CAT) as the standard against which to measure the MAP. This study found sufficient evidence to support the MAP as a valid measure of the intended content constructs. This study examined scores from administration of the MAP and CFA over a two-year time span, with two test sessions per school year, using data from ten states: Colorado, Illinois, Indiana, Kansas, Kentucky, Michigan, Minnesota, South Carolina, Washington, and Wisconsin. The study used approximately 20% of the total student population who were administered the MAP tests for the study. This validation study used a large sample population and found statistically significant evidence that the MAP produced consistent content measurement with the CFA. The results of the study are considered able to be generalized across all 50 states due to the large sample size.


The American Institutes for Research only rank the reliability and validity of the MAP as only mediocre. One of the strengths of the MAP is its ability to measure the reading factors that were intended. The test can be administered easily using a computer and takes approximately 60 minutes to complete. The instructors must receive approximately 4 hours of training prior to administration of the test, but once this is completed they can administer the test without having to repeat this training. It also has the advantage of being able to be administered in an individual or a group setting without affecting the results. It is computer scored, which is convenient.

One of the weaknesses of the test is its sensitivity. The sensitivity of the test differed among different demographic groups, which affected the reliability of the MAP. For white students, the sensitivity was 0.53. However, for Hispanics, it was 0.72. Significant differences were found between the scores of girls and boys. This weakness affects the ability of the MAP to be used for the planning of academic programs, as the measure may not be a reliable measure for certain demographic groups. This is a flaw that needs to be examined in future research.

The MAP could be improved through the exploration of the reasons for the differences in results among various demographic groups. This will require further academic studies that examine the reasons for these differences. It may be that wording of certain questions, the concepts, or answer choices are the reasons for the differences. It may be that certain questions are the reason for decreased reliability across difference test populations. Extended testing of the various items and components of the MAP may provide clues as to the changes that need to be made to improve the reliability.

This measure has been in use for over a decade and it has flaws that produce unreliable results across different classifications of students. It does not produce uniform results across the entire student population. These differences could affect group reporting and test averages for schools that have a high population of minority students or other demographic groups. The challenge will be to improve the reliability, without affecting the construct validity of the measure.