Total views : 1162

Creating a K-12 Adaptive Test: Examining the Stability of Item Parameter Estimates and Measurement Scales


  • Northwest Evaluation Association, United States


Development of adaptive tests used in K-12 settings requires the creation of stable measurement scales to measure the growth of individual students from one grade to the next, and to measure change in groups from one year to the next. Accountability systems like No Child Left Behind require stable measurement scales so that accountability has meaning across time. This study examined the stability of the measurement scales used with the Measures of Academic Progress. Difficulty estimates for test questions from the reading and mathematics scales were examined over a period ranging from 7 to 22 years. Results showed high correlations between item difficulty estimates from the time at which they where originally calibrated and the current calibration. The average drift in item difficulty estimates was less than .01 standard deviations. The average impact of change in item difficulty estimates was less than the smallest reported difference on the score scale for two actual tests. The findings of the study indicate that an IRT scale can be stable enough to allow consistent measurement of student achievement.

Full Text:

 |  (PDF views: 342)


  • Ban, J.-C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A Comparative Study of On-Line Pretest Item: Calibration/Scaling Methods in Computerized Adaptive Testing. Journal of Educational Measurement, 38, 191-212.
  • Bejar, I. I. (1980). A procedure for investigating the unidimensionality of achievement tests based on item parameter estimates. Journal of Educational Measurement, 17, 283-296.
  • Bock, R. D., Muraki, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25, 275- 285.
  • Delaware Department of Education (2011). Delaware Comprehensive Assessment System (DCAS) online test administration manual. Dover, DE: Author.
  • Donoghue, J. R. & Isham, S. P. (1998). A comparison of procedures to detect item parameter drift. Applied Psychological Measurement, 22, 33-51.
  • Houser, R. L., Kingsbury, G. G., & Harris, G. (1997). MATCAL: A program for calibrating items using data from sparse matrices. Portland, OR: NWEA.
  • Houser, R. L., Hathaway, W. E., & Ingebo, G. S. (1983). An alternate procedure to obtain ability estimates in latent trait models. Paper presented to the annual meeting of the American Educational Research Association, Montreal, Canada.
  • Ingebo, G. S. (1997). Probability in the measure of achievement. Chicago, IL: MESA Press.
  • Kingsbury, G. G. & Houser, R. L. (1997). Using data from a level testing system to change a school district. In J. O’Reilly (Ed.), The Rasch tiger ten years later: Using IRT techniques to measure achievement in schools (pp. 10 - 24). Chicago, IL: National Association of Test Directors.
  • Kingsbury, G. G. & Houser, R. L. (1998). Developing computerized adaptive tests for school children. In Drasgow, F. and Olson-Buchanan, J. B. (Eds.) Innovations in computerized assessment. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227- 242.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
  • Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
  • No Child Left Behind Act of 2001, Pub. L. No. 107-110, 115 U.S.C. § 1425 (2002). Northwest Evaluation Association [NWEA] (2009). Technical Manual for Measures of Academic Progress and Measures of Academic Progress for Primary Grades. Portland, OR: Author.
  • Oregon Department of Education (2010). 2009–2010 Technical Report Oregon’s Statewide Assessment System Annual Report. Salem, OR: Author. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danske Paedagogiske Institut.
  • Renaissance Learning (2010). The foundation of the STAR Assessments. Wisconsin Rapids, WI: Author.
  • Scholastic, Inc (2007). Scholastic Reading Inventory Technical Guide. New York, NY: Author.
  • Swaminathan, H. & Gifford J. A. (1983). Estimation of parameters in the three parameter latent trait model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press. Sykes, R. C. & Fitzpatrick, A. R. (1992). The stability of IRT b values. Journal of Educational Measurement, 29, 201-211.
  • Vale, C. D. (1986). Linking Item Parameters Onto a Common Scale. Applied Psychological Measurement, 10, 333-344.
  • van der Linden, W. J. (1986). The changing conception of measurement in education and psychology. Applied Psychological Measurement, 10, 325-332.
  • Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97-116.
  • Yen, W. M. (1980). The extent, causes, and importance of context effects on item parameters for two latent trait models. Journal of Educational Measurement, 17, 297-311.


  • There are currently no refbacks.