• Esin Yılmaz Koğar Faculty of Education, Ömer Halisdemir University, Niğde, Turkey
  • Hakan Koğar Faculty of Education, Akdeniz University, Antalya, Turkey




Multiple choice items, constructed response items, mixed format tests, multidimensionality, latent dimension score


The aim of the present study is to examine various item types utilized to measure success in mathematics in terms of dimensionality and latent trait scores. The data collection instruments utilized in the present study were the student questionnaires and the mathematics achievement tests developed to measure 4th and 8th grade students’ mathematics success in TIMSS 2015. It is assumed in the current study that two different dimensions are formed: the combination of MC and CR items, forming the “math ability” and the CR items, forming the “CR ability”. To determine the dimensionality of latent trait of math ability, three different IRT models – unidimensional, within-item and between-items – were used. It was found that the within-item model displayed a better fit, when compared to the unidimensional model. Moreover, the within-item dimensional model showed better fit according to AIC and BIC as well. In the unidimensional and within-item models, the talent parameter predictions were similar. While the effect of the variables of sense of school belonging and students’ confidence in mathematics on the primary trait were significant, the home resources for learning variable also had a significant impact within 8th grade


Ackerman, T. A., & Smith, P. L. (1988). A comparison of the information provided by essay, multiple-choice, and free-response writing tests. Applied Psychological Measurement, 12(2), 117-128. DOI: 10.1177/014662168801200202

Adams, R., & Wu, M. (2010). Multidimensional models. ConQuest Tutorial.

Adams, R., Wu, M., Macaskill, G., Haldane, S., & Xun Sun, X. (2015). ConQuest. University of California, Berkley: Australian Council for Educational Research.

Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short-answer questions in a marketing context. Journal of Marketing Education, 25(1), 31-36. DOI: 10.1177/0273475302250570

Becker, W. E., & Johnston, C. (1999). The relationship between multiple choice and essay response questions in assessing economics understanding. Economic Record, 75(4), 348-357. DOI: 10.1111/j.1475-4932.1999.tb02571.x

Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free‐response and multiple‐choice items. Journal of Educational Measurement, 28(1), 77-92. DOI: 10.1111/j.1745-3984.1991.tb00345.x

Bible, L., Simkin, M. G., & Kuechler, W. L. (2008). Using multiple-choice tests to evaluate students’ understanding of accounting. Accounting Education: An International Journal, 17 (Supplement), 55-68. DOI: 10.1080/09639280802009249

Birenbaum, M., & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats—it does make a difference for diagnostic purposes. Applied Psychological Measurement, 11(4), 385-395. Retriewed from https://conservancy.umn.edu/bitstream/handle/11299/104073/v11n4p385.pdf?sequence=1 DOI: 10.1177/014662168701100404

Bond, T. G., & C. M. Fox. 2007. Applying the Rasch model (2nd Edition). Mahwah, N.J.: Lawrence Erlbaum Associates.

Bridgeman, B. (1991). Essays and multiple-choice tests as predictors of college freshman GPA. Research in Higher Education, 32, 319-332. DOI: 10.1007/BF00992895

Bridgeman, B., & Rock, D. A. (1993). Relationships among multiple-choice and open-ended analytical questions. Journal of Educational Measurement, 30, 313-329. DOI: 10.1111/j.1745-3984.1993.tb00429.x

Brown, G. A., Bull, J., & Pendlebury, M. (2013). Assessing student learning in higher education. Routledge.

Camilli, G., Wang, M. M., & Fesq, J. (1995). The effects of dimensionality on equating the law school admission test. Journal of Educational Measurement, 32(1), 79-96. DOI: 10.1111/j.1745-3984.1995.tb00457.x

Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. University of Maryland, College Park. Retriewed from https://drum.lib.umd.edu/bitstream/handle/1903/8843/umi-umd-5871.pdf;sequence=1

Chan, N., & Kennedy, P. E. (2002). Are multiple-choice exams easier for economics students? A comparison of multiple-choice and "equivalent" constructed-response exam questions. Southern Economic Journal, 68(4), 957-971. DOI: 10.2307/1061503

Downing, S. M. (2006). Selected-response item formats in test development. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development, (p. 287-301). London: Lawrence Erlbaum Associates.

Dufresne, R. J., Leonard, W. J., & Gerace, W. J. (2002). Making sense of students' answers to multiple-choice questions. The Physics Teacher, 40(3), 174-180. DOI: 10.1119/1.1466554

Embretson, S. E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and Scoring of Tests With Multiple‐Choice and Constructed‐Response Item Types. Journal of Educational Measurement, 35(2), 137-154. DOI: 10.1111/j.1745-3984.1998.tb00531.x

Fenna, D. S. (2004) Assessment of foundation knowledge: are students confident in their ability? European Journal of Engineering Education, 29(2), 307-312, DOI: 10.1080/0304379032000157277

Gessaroli, M. E., & Champlain, A. F. (2005). Test dimensionality: Assessment of. Wiley StatsRef: Statistics Reference Online. DOI: 10.1002/9781118445112.stat06371/full

Griffo, V. B. (2011). Examining NAEP: The effect of ıtem format on struggling 4th graders' reading comprehension. University of California, Berkeley.

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333. DOI: 10.1207/S15324818AME1503_5

Hancock, G. R. (1994). Cognitive complexity and the comparability of multiple-choice and constructed-response test formats. The Journal of experimental education, 62(2), 143-157. DOI: 10.1080/00220973.1994.9943836

Hastedt, D. (2004). Differences between multiple-choice and constructed response items in PIRLS 2001. In Proceedings of the IEA International Research Conference.

Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. DOI: 10.1080/08957340701580736

Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. DOI: 10.1207/s15324818ame1904_7

Kuechler, W. L., & Simkin, M. G. (2010). Why ıs performance on multiple‐choice tests and constructed‐response tests not more closely related? Theory and an empirical test. Decision Sciences Journal of Innovative Education, 8(1), 55-73. DOI:10.1111/j.1540-4609.2009.00243.x/full

Lissitz, R. W., Hou, X., & Slater, S. C. (2012). The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding their Impact. Journal of Applied Testing Technology, 13(3), 1-50.

Lukhele, R., Thissen, D., & Wainer, H. (1993). On the relative value of multiple‐choice, constructed‐response, and examinee‐selected items on two achievement tests. ETS Research Report Series, Technıcal Report No. 93-28. DOI: 10.1002/j.2333-8504.1993.tb01517.x

Marengo, D., Miceli, R., & Settanni, M. (2016). Do mixed ıtem formats threaten test unidimensionality? Results from a standardized math achievement test. TPM: Testing, Psychometrics, Methodology in Applied Psychology, 23(1), 25-36. DOI:10.4473/TPM23.1.2

Martin, M. O., Mullis, I. V. S., & Hooper M. (2016). Methods and procedures in TIMSS 2015. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).

Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207-218. DOI: 10.1207/s15326985ep3404_2

Nickerson, R. S. (1989). New directions in educational assessment. Educational Researcher, 18(9), 3-7. Retriewed from http://www.jstor.org/stable/pdf/1176712.pdf https://doi.org/10.3102/0013189X018009003

Pollack, J. M., Rock, D. A., & Jenkins, F. (1992). Advantages and disadvantages of constructed-response item formats in large-scale surveys. İn Annual Meeting of the American Educational Research Association, San Francisco.

Saunders, P., & Walstad, W. B. (1998). Research on teaching college economics. İn Teaching undergraduate economics: A handbook for instructors. Boston, MA: İrwin/McGraw Hill, 141-166.

Simkin, M. G., & Kuechler, W. L. (2005). Multiple‐choice tests and student understanding: what is the connection?. Decision Sciences Journal of İnnovative Education, 3(1), 73-98. Retrieved from http://www.coba.unr.edu/faculty/kuechler/cv/DSJ?E.3.1.05.pdf ; https://doi.org/10.1111/j.1540-4609.2005.00053.x

Smith, J. (2009). Some issues in item response theory: Dimensionality assessment and models for guessing. Unpublished Doctoral Dissertation. University of South California.

Sykes, R. C., Hou, L., Hanson, B., & Wang, Z. (2002). Multidimensionality and the equating of a mixed-format math examination. Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 2-4, 2002). Retriewed from https://files.eric.ed.gov/fulltext/ED469163.pdf

Thissen, D., Wainer, H., & Wang, X. B. (1994). Are tests comprising both multiple‐choice and free‐response items necessarily less unidimensional than multiple‐choice tests? An analysis of two tests. Journal of Educational Measurement, 31(2), 113-123. Retriewed from http://www.jstor.org/stable/pdf/1435171.pdf ; https://doi.org/10.1111/j.1745-3984.1994.tb00437.x

Traub, R. E. & Fisher, C. (1977). On the equivalence of constructed-response and multiple-choice tests. Applied Psychological Measurement, 1(3), 355-369. DOI:10.1177/014662167700100304

Traub, R. E., & MacRury, K. A. (1990). Multiple-choice vs. free-response in the testing of scholastic achievement. Ontario Institute for Studies in Education.

Tuckman, B. W. (1993). The essay test: A look at the advantages and disadvantages. Nassp Bulletin, 77(555), 20-26. DOI: 10.1177/019263659307755504

Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118. DOI: 10.1207/s15324818ame0602_1

Walker, C. M., & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: Multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40(3), 255-275. Retriewed from http://www.jstor.org/stable/pdf/1435130.pdf; https://doi.org/10.1111/j.1745-3984.2003.tb01107.x

Wang, W. C. (1994). Implementation and application of the multidimensional random coefficients multinomial logit model. University of California, Berkeley.

Wang, Z. (2002). Comparison of different item types in terms of latent trait in mathematics assessment. Doctoral dissertation, University of British Columbia.

Ward, W. C., Dupree, D., & Carlson, S. B. (1987). A comparıson of free‐response and multiple‐choice questions in the assessment of reading comprehension. ETS Research Report Series. DOI:10.1002/j.2330-8516.1987.tb00224.x/pdf

Ward, W. C., Frederiksen, N., & Carlson, S. B. (1978). Construct validity of free‐response and machine‐scorable versions of a test of scientific thinking. ETS Research Report Series, 1978(2). DOI:10.1002/j.2333-8504.1978.tb01162.x/pdf

Yıldırım, A., Özgürlük, B., Parlak, B., Gönen, E., & Polat, M. (2016). TIMSS uluslararası matematik ve fen eğilimleri araştırması: TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. Sınıflar. Ankara: T.C. Milli Eğitim Bakanlığı Ölçme, Değerlendirme ve Sınav Hizmetleri Genel Müdürlüğü.

Zhang, L., & Manon, J. (2000) Gender and achievement-understanding gender differences and similarities in mathematics assessment. Paper presented at the Annual Meeting of the American Educational Research Association, April 2000 (pp. 24-28). New Orleans, LA.




How to Cite

Yılmaz Koğar, E., & Koğar, H. (2018). EXAMINATION OF DIMENSIONALITY AND LATENT TRAIT SCORES ON MIXED-FORMAT TESTS. PUPIL: International Journal of Teaching, Education and Learning, 2(1), 29–49. https://doi.org/10.20319/pijtel.2018.21.2949