Equivalence in multilanguage mathematics assessment
DOI:
https://doi.org/10.7146/nomad.v28i1-2.149250Abstract
When mathematics tasks are used in multilanguage assessments, it is necessary that the task versions in the different languages are equivalent. The purpose of this study is to deepen the knowledge on different aspects of equivalence for mathematics tasks in multilanguage assessment. We analyze mathematics tasks from PISA 2012 given to students in English, German and Swedish. To measure formal equivalence, we examine three linguistic features of the task texts and compare between language versions. To measure functional equivalence, a Differential item functioning (DIF) analysis is conducted. In addition, we examine statistically if there is a relation between DIF and the differences regarding linguistic features. The results show that there is both DIF and differences regarding the linguistic features between different language versions for several PISA tasks. However, we found no statistical relation between the two phenomena.
References
Abedi, J., Lord, C. & Plummer, J. R. (1997). Final report of language background as a variable in NAEP mathematics performance (CSE Report 429). National Center for Research on Evaluation, Standards, and Student Testing.
AERA (2014). Standards for educational and psychological testing. American Educational Research Association.
Aiken, L. R. (1972). Language factors in learning mathematics. Review of Educational Research, 42, 359-385. https://doi.org/10.3102/00346543042003359
Allalouf, A. (2003). Revising translated differential item functioning items as a tool for improving cross-lingual assessment. Applied Measurement in Education, 16 (1), 55-73. https://doi.org/10.1207/S15324818AME1601_3
Allalouf, A., Hambleton, R. K. & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of educational measurement, 36 (3), 185-198. https://doi.org/10.1111/j.1745-3984.1999.tb00553.x
Allan, S. (2009). Passive be damned: the construction that wouldn't be beaten (Master of Arts). University of Canterbury,
Allen, N. L. & Donoghue, J. R. (1996). Applying the Mantel-Haenszel procedure to complex samples of items. Journal of educational measurement, 33 (2), 231-251. https://doi.org/10.1111/j.1745-3984.1996.tb00491.x
Arffman, I. (2010). Equivalence of translations in international reading literacy studies. Scandinavian Journal of Educational Research, 54 (1), 37-59. https://doi.org/10.1080/00313830903488460
Bergqvist, E., Theens, F. & Österholm, M. (2018). The role of linguistic features when reading and solving mathematics tasks in different languages. The Journal of Mathematical Behavior, 51, 41-55. https://doi.org/10.1016/j.jmathb.2018.06.009
Berry, R. (2012). English grammar: a resource book for students. Routledge. https://doi.org/10.4324/9781315881256
Bostian, L. R. (1983). How active, passive and nominal styles affect readability of science writing. Journalism Quarterly, 60 (4), 635-640, 670. Chicago manual of style (2003) (15 ed.). University of Chicago Press. https://doi.org/10.1177/107769908306000408
Çikrikçi Demirtaşli, N. & Ulutaş, S. (2015). A study on detecting of differential item functioning of PISA 2006 science literacy items in Turkish and American samples. Eurasian Journal of Educational Research, 58, 41-60. https://doi.org/10.14689/ejer.2015.58.3
Davis-Dorsey, J., Ross, S. M. & Morrison, G. R. (1991). The role of rewording and context personalization in the solving of mathematical word problems. Journal of Educational Psychology, 83(1), 61. https://doi.org/10.1037/0022-0663.83.1.61
Dorans, N. J. & Holland, P. W. (1992). DIF detection and description: Mantel-Haenszel and standardization. ETS Research Report Series, 1992 (1), i-40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
Dryer, M. S. & Haspelmath, M. (2013). The world atlas of language structures online. http://wals.info
Ercikan, K., Arim, R., Law, D., Domene, J., Gagnon, F. & Lacroix, S. (2010). Application of think aloud protocols for examining and confirming sources of differential Item functioning identified by expert reviews. Educational Measurement: Issues and Practice, 29 (2), 24-35. https://doi.org/10.1111/j.1745-3992.2010.00173.x
Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G. & Koh, K. (2004). Comparability of bilingual versions of assessments: sources of incomparability of English and French versions of Canada's national achievement tests. Applied Measurement in Education, 17 (3), 301-321. https://doi.org/10.1207/s15324818ame1703_4
Ercikan, K. & Koh, K. (2005). Examining the construct comparability of the English and French versions of TIMSS. International Journal of Testing, 5 (1), 23-35. https://doi.org/10.1207/s15327574ijt0501_3
Ferne, T. & Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4 (2), 113-148. https://doi.org/10.1080/15434300701375923
Forster, K. I. & Olbrei, I. (1973). Semantic heuristics and syntactic analysis. Cognition, 2 (3), 319-347. https://doi.org/10.1016/0010-0277(72)90038-8
Grisay, A. & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33 (1), 69-86. https://doi.org/10.1016/j.stueduc.2007.01.006
Hickendorff, M. (2013). The language factor in elementary mathematics assessments: computational skills and applied problem solving in a multidimensional IRT framework. Applied Measurement in Education, 26 (4), 253-278. https://doi.org/10.1080/08957347.2013.824451
Hopfenbeck, T. N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan, J. & Baird, J.-A. (2017). Lessons learned from PISA: a systematic review of peer- reviewed articles on the programme for international student assessment. Scandinavian Journal of Educational Research, 1-21. https://doi.org/10.1080/00313831.2016.1258726
Huang, X., Wilson, M. & Wang, L. (2014). Exploring plausible causes of differential item functioning in the PISA science assessment: language, curriculum or culture. Educational Psychology (ahead-of-print), 1-13. https://doi.org/10.1080/01443410.2014.946890
Koller, W. (2011). Einführung in die Übersetzungswissenschaft (8., neubearb. Aufl. ed.). Francke.
Lord, C. (2002). Are subordinate clauses more difficult? In J. L. Bybee & M. Noonan (Eds.), Complex sentences in grammar and discourse: essays in honor of Sandra A. Thompson (pp. 223-234). John Benjamins. https://doi.org/10.1075/z.110.12lor
Michaelides, M. P. (2008). An illustration of a Mantel-Haenszel procedure to flag misbehaving common items in test equating. Practical Assessment Research & Evaluation, 13 (7).
O'Grady, K., Karen, F., Servage, L. & Khan, G. (2018). PCAP 2016. Report on the Pan-Canadian assessment of reading, mathematics, and science. Council of Ministers of Education. https://cmec.ca/507/PCAP_2016.html
OECD (2010). Translation and adaption guidelines for PISA 2012. OECD.
OECD (2012). Data base - PISA 2012. OECD. http://www.oecd.org/pisa/data/pisa2012database-downloadabledata.htm
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: a comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14 (3), 235-259. https://doi.org/10.1207/S15324818AME1403_3
Pym, A. (2010). Exploring translation theories. Routledge. https://doi.org/10.4324/9780203869291
Ross, S. M. & Anand, P. G. (1987). A computer-based strategy for personalizing verbal problems in teaching mathematics. ECTJ, 35 (3), 151-162. https://doi.org/10.1007/BF02793843
Roth, W.-M., Oliveri, M. E., Sandilands, D. D., Lyons-Thomas, J. & Ercikan, K. (2013). Investigating linguistic sources of differential item functioning using expert think-aloud protocols in science achievement tests. International Journal of Science Education, 35 (4), 546-576. https://doi.org/10.1080/09500693.2012.721572
Solano-Flores, G., Backhoff, E. & Contreras-Niño, L. Á. (2009). Theory of test translation error. International Journal of Testing, 9 (2), 78-91. https://doi.org/10.1080/15305050902880835
Swedish National Agency for Education (2018). Beställning av nationella prov för höstterminen 2018. Skolverket. www.skolverket.se/download/18.4fc05a3f164131a7418259/1533891917368/bestallningsbrev-np-ht-gymnasieskolan-2018-2019.pdf
Taylor, A. B., West, S. G. & Aiken, L. S. (2006). Loss of power in logistic, ordinal logistic, and probit regression when an outcome variable is coarsely categorized. Educational and Psychological Measurement, 66 (2), 228-239. https://doi.org/10.1177/0013164405278580
Theens, F. (2019). Does language matter? Sources of inequivalence and demand of reading ability of mathematics tasks in different languages (PhD thesis). Umeå universitet.
Villiers, J. G. de & Villiers, P. A. de (1973). Development of the use of word order in comprehension. Journal of Psycholinguistic Research, 2 (4), 331-341. https://doi.org/10.1007/BF01067055
Wichmann, S., Holman, E. W. & Brown, C. H. (2016). The ASJP database (version 17). http://asjp.clld.org
Volansky, V., Ordan, N. & Wintner, S. (2013). On the features of translationese. Digital Scholarship in the Humanities, 30 (1), 98-118. https://doi.org/10.1093/llc/fqt031
Xiao, R. & Yue, M. (2009). Using corpora in translation studies: the state of the art. In P. Baker (Ed.), Contemporary corpus linguistics (pp. 237-261). Continuum.
Yildirim, H. H. & Berberoğlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9 (2), 108-121. https://doi.org/10.1080/15305050902880736
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.