EXAMINATION OF DIMENSIONALITY AND LATENT TRAIT SCORES ON MIXED-FORMAT TESTS
DOI:
https://doi.org/10.20319/pijtel.2018.41.165185Keywords:
Multiple Choice Items, Constructed Response Items, Mixed Format Tests, Multidimensionality, Latent Dimension ScoreAbstract
The aim of the present study is to examine various item types utilized to measure success in mathematics in terms of dimensionality and latent trait scores. The data collection instruments utilized in the present study were the student questionnaires and the mathematics achievement tests developed to measure 4th and 8th grade students’ mathematics success in TIMSS 2015. It is assumed in the current study that two different dimensions are formed: the combination of MC and CR items, forming the “math ability” and the CR items, forming the “CR ability”. To determine th
The aim of the present study is to examine various item types utilized to measure success in mathematics in terms of dimensionality and latent trait scores. The data collection instruments utilized in the present study were the student questionnaires and the mathematics achievement tests developed to measure 4th and 8th grade students’ mathematics success in TIMSS 2015. It is assumed in the current study that two different dimensions are formed: the combination of MC and CR items, forming the “math ability” and the CR items, forming the “CR ability”. To determine the dimensionality of latent trait of math ability, three different IRT models – unidimensional, within-item and between-items – were used. It was found that the within-item model displayed a better fit, when compared to the unidimensional model. Moreover, the within-item dimensional model showed better fit according to AIC and BIC as well. In the unidimensional and within-item models, the talent parameter predictions were similar. While the effect of the variables of sense of school belonging and students’ confidence in mathematics on the primary trait were significant, the home resources for learning variable also had a significant impact within 8th grade.
e dimensionality of latent trait of math ability, three different IRT models – unidimensional, within-item and between-items – were used. It was found that the within-item model displayed a better fit, when compared to the unidimensional model. Moreover, the within-item dimensional model showed better fit according to AIC and BIC as well. In the unidimensional and within-item models, the talent parameter predictions were similar. While the effect of the variables of sense of school belonging and students’ confidence in mathematics on the primary trait were significant, the home resources for learning variable also had a significant impact within 8th grade.
References
Ackerman, T. A., & Smith, P. L. (1988). A comparison of the information provided by essay, multiple-choice, and free-response writing tests. Applied Psychological Measurement, 12(2), 117-128. DOI: 10.1177/014662168801200202
Adams, R., & Wu, M. (2010). Multidimensional models. ConQuest Tutorial.
Adams, R., Wu, M., Macaskill, G., Haldane, S., & Xun Sun, X. (2015). ConQuest. University of California, Berkley: Australian Council for Educational Research.
Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short-answer questions in a marketing context. Journal of Marketing Education, 25(1), 31-36. DOI: 10.1177/0273475302250570
Becker, W. E., & Johnston, C. (1999). The relationship between multiple choice and essay response questions in assessing economics understanding. Economic Record, 75(4), 348-357. DOI: 10.1111/j.1475-4932.1999.tb02571.x
Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free‐response and multiple‐choice items. Journal of Educational Measurement, 28(1), 77-92. DOI: 10.1111/j.1745-3984.1991.tb00345.x
Bible, L., Simkin, M. G., & Kuechler, W. L. (2008). Using multiple-choice tests to evaluate students’ understanding of accounting. Accounting Education: An International Journal, 17 (Supplement), 55-68. DOI: 10.1080/09639280802009249
Birenbaum, M., & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats—it does make a difference for diagnostic purposes. Applied Psychological Measurement, 11(4), 385-395. Retriewed from https://conservancy.umn.edu/bitstream/handle/11299/104073/v11n4p385.pdf?sequence=1 DOI: 10.1177/014662168701100404
Bond, T. G., & C. M. Fox. 2007. Applying the Rasch model (2nd Edition). Mahwah, N.J.: Lawrence Erlbaum Associates.
Bridgeman, B. (1991). Essays and multiple-choice tests as predictors of college freshman GPA. Research in Higher Education, 32, 319-332. DOI: 10.1007/BF00992895
Bridgeman, B., & Rock, D. A. (1993). Relationships among multiple-choice and open-ended analytical questions. Journal of Educational Measurement, 30, 313-329. DOI: 10.1111/j.1745-3984.1993.tb00429.x
Brown, G. A., Bull, J., & Pendlebury, M. (2013). Assessing student learning in higher education. Routledge.
Camilli, G., Wang, M. M., & Fesq, J. (1995). The effects of dimensionality on equating the law school admission test. Journal of Educational Measurement, 32(1), 79-96. DOI: 10.1111/j.1745-3984.1995.tb00457.x
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. University of Maryland, College Park. Retriewed from https://drum.lib.umd.edu/bitstream/handle/1903/8843/umi-umd-5871.pdf;sequence=1
Chan, N., & Kennedy, P. E. (2002). Are multiple-choice exams easier for economics students? A comparison of multiple-choice and "equivalent" constructed-response exam questions. Southern Economic Journal, 68(4), 957-971. DOI: 10.2307/1061503
Downing, S. M. (2006). Selected-response item formats in test development. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development, (p. 287-301). London: Lawrence Erlbaum Associates.
Dufresne, R. J., Leonard, W. J., & Gerace, W. J. (2002). Making sense of students' answers to multiple-choice questions. The Physics Teacher, 40(3), 174-180. DOI: 10.1119/1.1466554
Embretson, S. E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and Scoring of Tests With Multiple‐Choice and Constructed‐Response Item Types. Journal of Educational Measurement, 35(2), 137-154. DOI: 10.1111/j.1745-3984.1998.tb00531.x
Fenna, D. S. (2004) Assessment of foundation knowledge: are students confident in their ability? European Journal of Engineering Education, 29(2), 307-312, DOI: 10.1080/0304379032000157277
Gessaroli, M. E., & Champlain, A. F. (2005). Test dimensionality: Assessment of. Wiley StatsRef: Statistics Reference Online. DOI: 10.1002/9781118445112.stat06371/full
Griffo, V. B. (2011). Examining NAEP: The effect of ıtem format on struggling 4th graders' reading comprehension. University of California, Berkeley.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333. DOI: 10.1207/S15324818AME1503_5
Hancock, G. R. (1994). Cognitive complexity and the comparability of multiple-choice and constructed-response test formats. The Journal of experimental education, 62(2), 143-157. DOI: 10.1080/00220973.1994.9943836
Hastedt, D. (2004). Differences between multiple-choice and constructed response items in PIRLS 2001. In Proceedings of the IEA International Research Conference.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427-441. DOI: 10.1080/08957340701580736
Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19(4), 357-381. DOI: 10.1207/s15324818ame1904_7
Kuechler, W. L., & Simkin, M. G. (2010). Why ıs performance on multiple‐choice tests and constructed‐response tests not more closely related? Theory and an empirical test. Decision Sciences Journal of Innovative Education, 8(1), 55-73. DOI:10.1111/j.1540-4609.2009.00243.x/full
Lissitz, R. W., Hou, X., & Slater, S. C. (2012). The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding their Impact. Journal of Applied Testing Technology, 13(3), 1-50.
Lukhele, R., Thissen, D., & Wainer, H. (1993). On the relative value of multiple‐choice, constructed‐response, and examinee‐selected items on two achievement tests. ETS Research Report Series, Technıcal Report No. 93-28. DOI: 10.1002/j.2333-8504.1993.tb01517.x
Marengo, D., Miceli, R., & Settanni, M. (2016). Do mixed ıtem formats threaten test unidimensionality? Results from a standardized math achievement test. TPM: Testing, Psychometrics, Methodology in Applied Psychology, 23(1), 25-36. DOI:10.4473/TPM23.1.2
Martin, M. O., Mullis, I. V. S., & Hooper M. (2016). Methods and procedures in TIMSS 2015. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207-218. DOI: 10.1207/s15326985ep3404_2
Nickerson, R. S. (1989). New directions in educational assessment. Educational Researcher, 18(9), 3-7. Retriewed from http://www.jstor.org/stable/pdf/1176712.pdf https://doi.org/10.3102/0013189X018009003
Pollack, J. M., Rock, D. A., & Jenkins, F. (1992). Advantages and disadvantages of constructed-response item formats in large-scale surveys. İn Annual Meeting of the American Educational Research Association, San Francisco.
Saunders, P., & Walstad, W. B. (1998). Research on teaching college economics. İn Teaching undergraduate economics: A handbook for instructors. Boston, MA: İrwin/McGraw Hill, 141-166.
Simkin, M. G., & Kuechler, W. L. (2005). Multiple‐choice tests and student understanding: what is the connection?. Decision Sciences Journal of İnnovative Education, 3(1), 73-98. Retrieved from http://www.coba.unr.edu/faculty/kuechler/cv/DSJ?E.3.1.05.pdf ; https://doi.org/10.1111/j.1540-4609.2005.00053.x
Smith, J. (2009). Some issues in item response theory: Dimensionality assessment and models for guessing. Unpublished Doctoral Dissertation. University of South California.
Sykes, R. C., Hou, L., Hanson, B., & Wang, Z. (2002). Multidimensionality and the equating of a mixed-format math examination. Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 2-4, 2002). Retriewed from https://files.eric.ed.gov/fulltext/ED469163.pdf
Thissen, D., Wainer, H., & Wang, X. B. (1994). Are tests comprising both multiple‐choice and free‐response items necessarily less unidimensional than multiple‐choice tests? An analysis of two tests. Journal of Educational Measurement, 31(2), 113-123. Retriewed from http://www.jstor.org/stable/pdf/1435171.pdf ; https://doi.org/10.1111/j.1745-3984.1994.tb00437.x
Traub, R. E. & Fisher, C. (1977). On the equivalence of constructed-response and multiple-choice tests. Applied Psychological Measurement, 1(3), 355-369. DOI:10.1177/014662167700100304
Traub, R. E., & MacRury, K. A. (1990). Multiple-choice vs. free-response in the testing of scholastic achievement. Ontario Institute for Studies in Education.
Tuckman, B. W. (1993). The essay test: A look at the advantages and disadvantages. Nassp Bulletin, 77(555), 20-26. DOI: 10.1177/019263659307755504
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118. DOI: 10.1207/s15324818ame0602_1
Walker, C. M., & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: Multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40(3), 255-275. Retriewed from http://www.jstor.org/stable/pdf/1435130.pdf; https://doi.org/10.1111/j.1745-3984.2003.tb01107.x
Wang, W. C. (1994). Implementation and application of the multidimensional random coefficients multinomial logit model. University of California, Berkeley.
Wang, Z. (2002). Comparison of different item types in terms of latent trait in mathematics assessment. Doctoral dissertation, University of British Columbia.
Ward, W. C., Dupree, D., & Carlson, S. B. (1987). A comparıson of free‐response and multiple‐choice questions in the assessment of reading comprehension. ETS Research Report Series. DOI:10.1002/j.2330-8516.1987.tb00224.x/pdf
Ward, W. C., Frederiksen, N., & Carlson, S. B. (1978). Construct validity of free‐response and machine‐scorable versions of a test of scientific thinking. ETS Research Report Series, 1978(2). DOI:10.1002/j.2333-8504.1978.tb01162.x/pdf
Yıldırım, A., Özgürlük, B., Parlak, B., Gönen, E., & Polat, M. (2016). TIMSS uluslararası matematik ve fen eğilimleri araştırması: TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. Sınıflar. Ankara: T.C. Milli Eğitim Bakanlığı Ölçme, Değerlendirme ve Sınav Hizmetleri Genel Müdürlüğü.
Zhang, L., & Manon, J. (2000) Gender and achievement-understanding gender differences and similarities in mathematics assessment. Paper presented at the Annual Meeting of the American Educational Research Association, April 2000 (pp. 24-28). New Orleans, LA.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Esin Yılmaz Koğar, Hakan Koğar
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.