Total views : 1691

Applying Multidimensional Item Response Theory Models in Validating Test Dimensionality: An Example of K–12 Large-scale Science Assessment


  • American Institutes for Research, Washington D.C., United States
  • University of Maryland, College Park, MD., United States


This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the assumptions of the unidimensional IRT models currently used in many statewide large-scale assessments. An empirical K-12 science assessment was used as an example of dimensionality validation using multidimensional IRT models. The unidimensional IRT model was also included as the most commonly used model in current practice. The procedures illustrated in this real example can be utilized to validate the test dimensionality for any testing program once item response data are collected.


Test Validity, Test Dimensionality, Item Response Theory (IRT), Multidimensional IRT Models, Large-scale Assessment

Full Text:

 |  (PDF views: 763)


  • Adams, R. J., Wilson, M. R., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.
  • Cohen, A., Cho, S., & Kim, S. (2005). A mixture testlet model for educational tests. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Canada.
  • Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Hillsdale, NJ: Erlbaum. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
  • Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
  • Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika,19, 149–161.
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer-Nijhoff.
  • Hattie, J. (1985). Methodology review: Assessing dimensionality of items and items. Applied Psychological Measurement, 9, 139–164.
  • Jiao, H., & Wang, S. (2008). Construct equivalence for vertically scaled science assessment. Paper presented at the Annual Meeting of the American Educational Research Association, New York, NY.
  • Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychology Measurement, 34, 111–117.
  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger Publishers.
  • Lane, S., & Stone, C. (2006). Performance assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 387–431). Westport, CT: American Council on Education and Praeger Publishers.
  • McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34, 100–117.
  • McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379–396.
  • Nandakumar, R., Yu, F., Li, H.-H., & Stout, W. F. (1998). Assessing unidimensionality of polytomous data. Applied Psychological Measurement, 22, 99–115.
  • Patz, R. J. (2005). Building NCLB science assessments: Psychometric and practical considerations. Final Report submitted to the National Research Council Committee on Test Design for K-12 Science Achievement. Retrieved September 26, 2012 from:
  • Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W.J. van der Linden & R. K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 271–286). NewYork: Springer-Verlag.
  • Reckase, M. D., & Martineau, J. A. (2004). The vertical scaling of science achievement tests. Unpublished Report. East Lansing, MI: Michigan State Unversity.
  • Schmeiser, C. B., & Welch, C. J. (2006). Test development. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 307–353). Westport, CT: American Council on Education and Praeger Publishers.
  • Skaggs, G., & Lissitz, R. W. (1988). Effect of examinee ability on test equating invariance. Applied Psychological Measurement, 12(1), 69–82.
  • Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589–617.
  • Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55(2), 293–325.
  • Stout, W. F., Douglas, J., Kim, H. R., Roussos, L., & Zhang, J. (1996). Conditional covariancebased nonparametric multidimensionality assessment. Applied Psychological Measurement, 10, 331–354.
  • Wainer, H., & Kiely, G. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–202.
  • Wang, S., & Jiao, H. (2009). Construct equivalence across grades in a vertical scale for a K-12 large-scale reading assessment. Educational and Psychological Measurement, 69(5), 760–777.
  • Wang, S., Jiao, H., & Severance, N. (2005). An investigation of growth patterns of student achievement using unidimensional and multidimensional vertical scaling methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montreal, QB: Canada.
  • Wang, W.-C., & Wilson, M. R. (2005). The Rasch Testlet model. Applied Psychological Measurement, 29, 126–149.
  • Wu, M. L., Adams, R. J., & Halden, M. R. (2001). ConQuest: Generalized item response modeling software [Computer software and manual]. Camberwell, Victoria: Australian Council for Educational Research.
  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.


  • There are currently no refbacks.