Total views : 3529

The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding their Impact


  • University of Maryland, College Park, MD., United States
  • Educational Testing Service


This article investigates several questions regarding the impact of different item formats on measurement characteristics. Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of their use and the potential effect of ceasing to use one or the other item format in an assessment. In particular, this study examines the differences in constructs measured across different domains, changes in test reliability and test characteristic curves, and interactions of item format with race and gender. The data for this study come from the Maryland High School Assessments that are high stakes state examinations whose passage is required in order to obtain a high school diploma. Our results indicate that there are subtle differences in the impact of CR and MC items. These differences are demonstrated in dimensionality, particularly for English and Government, and in ethnic and gender differential performance with these two item types.


Constructed Response Items, Multiple Choice Items, Large-scale Testing

Full Text:

 |  (PDF views: 1361)


  • Angoff, W. (1953). Test reliability and effective test length. Psychometrika, 18, 1-14. Bedrova, E., & Leong, D. (1996). Tools of the mind: The Vygotskian approach to early childhood education. Englewood Cliffs, NJ: Prentice-Hall.
  • Bennett, R. E., Rock, D. A., Braun, H. I., Frye, D., Spohrer, J. C., & Soloway, E. (1990). The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items. Applied Psychological Measurement, 14, 151-162.
  • Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free-response and multiple choice items. Journal of Education of Measurement, 28 (l), 77-92.
  • Bentler, P. M. (2004). EQS structural equations program manual. Los Angeles: BMDP Statistical Software.
  • Berk, L, & Winsler, A. (1995). Scaffolding children’s learning: Vygotsky and early childhood education. Washington, DC: National Association for the Education of Young Children.
  • Birebaum, M. & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats – It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385-395.
  • Bleske-Rechek, Zeug, N., & Webb, R. M. (2007). Discrepant performance on multiple-choice and short answer assessments and the relation of performance to general scholastic aptitude. Assessment and Evaluation in Higher Education, 32 (2), 89-105.
  • Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple choice formats. Journal of Educational Measurement, 29,253-27 1.
  • Bridgeman, B. & Morgan, R. (1996) Success in college for students with discrepancies between performance on multiple-choice and essay tests, Journal of Educational Psychology, 88, 333–340.
  • Bridgeman, B. & Rock, D. A. (1993). Relationships among multiple-choice and open-ended analytical questions. Journal of Educational Measurement, 30 (4), 313-329.
  • Byrne, M. B. (2006). Structural equation modeling with EQS: basic concepts, applications, and programming. Hillsdale, NJ: Lawrence Erlbaum Associate.
  • Campbell, J. R. (1999). Cognitive processes elicited by multiple-choice and constructedresponse questions on an assessment of reading comprehension. Doctoral Dissertation, Temple University. (UMI No. 9938651)
  • Crocker, L. & Algina, J. (1986). Introduction to classical & modern test theory. Fort Worth, TX: Holt, Rinehart and Winston, Inc.
  • Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H.I. Braun (Eds.), Test Validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Garner, M., & Engelhard, G. J. (1999). Gender differences in performance on multiple-choice and constructed response mathematics items. Applied Measurement in Education, 12 (1), 29- 51.
  • Center for K-12 Assessment. (2012). Coming together to raise achievement: New assessments for the common core state standards. Retrieved October 10, 2012 from
  • Cook, L. L., Dorans, N. J., & Eignor, D. R. (1988). An assessment of the dimensionality of three SAT-Verbal test edition Journal of Educational Statistics, 13, 19-43.
  • Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. Belmont, CA: Wadsworth.
  • Halpern, D. F. (2004). A cognitive-process taxonomy for sex differences in cognitive abilities. Current Directions in Psychological Science, 13(4), 135-139.
  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Lawrence Erlbaum associates, NJ: Mahwah.
  • Hancock, G. R. (1994). Cognitive complexity and the comparability of multiple-choice and constructed-response test formats. Journal of Experimental Education, 62(2), 143-157.
  • Hu, L.- T., & Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1- 55.
  • Kennedy, P. & Walstad, W. B. (1997). Combining multiple-choice and constructedresponse test scores: An economist’s view. Applied Measurement in Education, 10(4), 359-375.
  • Kline, R. B. (2005). Principle and practice of structural equation modeling. New York: The Guilford Press.
  • Manhart, J. J. (1996). Factor analytic methods for determining whether multiple-choice and constructed-response tests measure the same construct. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York, NY.
  • Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207-218.
  • Maryland State Department of Education (2008). Maryland High School Assessments 2007 Technical Report: Algebra/Data Analysis, Biology, English, and Government. Retrieved January 20, 2009 from B834-48BF-8B03- EEE757386ED5/1791/MDHSA_2007_TechnicalReport_FINAL_33108.pdf
  • Maryland State Department of Education (2009). Maryland High School Assessments 2008 Technical Report: Algebra/Data Analysis, Biology, English, and Government. Princeton, NJ: Educational Testing Service.
  • Mislevy, R. J. (1993). A framework for studying differences between multiple-choice and freeresponse test items. In R. E. Bennett & W.C. Ward (Eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 75-106). Hillsdale, NJ: Erlbaum.
  • Nandakumar, R. (1993). Assessing essential unidimensionality of real data. Applied Psychological Measurement, 1, 29-38.
  • Nandakumar, R. (1991). Traditional dimensionality vs. essential dimensionality. Journal of Educational Measurement, 28, 99-117.
  • Newstead, S. & Dennis, I. (1994) The reliability of exam marking in psychology: examiners examined, Psychologist, 7, 216–219.
  • Reiss, P. P. (2005). Causal models of item format and gender-related differences in performance on a large-scale mathematics assessment for grade three to grade ten. Dissertation. University of Hawaii.
  • Rodriguez, M.C. (2002). Choosing an item format. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213-231). Mahwah, NJ: Erlbaum.
  • Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed response items: a random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163-184.
  • Schafer, W. D., Swanson, G., Bené, N., & Newberry, G. (2001). Effects of teacher knowledge of rubrics on student achievement in four content areas. Applied Measurement in Education, 14, 151-170.
  • Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293-325.
  • Stout, W. F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589-617.
  • Snow, R. E. (1993). Construct validity and construed response tests. In R. E. Bennett, & W. C. Ward (Eds.), Construction versus choice in cognitive measurement (pp. 45-60). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Stiggins, R. J. (2005). Student-involved assessment for learning (4th ed.). Upper Saddle River: Pearson.
  • Thissen, D., Wainer, H., & Wang, X-B. (1994). Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement, 31, 113-123.
  • Traub, R. E. (1993). On the equivalence of the traits assessed by multiple-choice and constructed-response tests. In R. E. Bennett & W.C. Ward (Eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 75-106). Hillsdale, NJ: Erlbaum.
  • Wiggins, G. (1993). Assessing student performance. San Francisco: Jossey-Bass. Willingham, W.W., & Cole, N. S. (1997). Gender and fair assessment. Mahwah, NJ: Elbaum.


  • There are currently no refbacks.