Interrater reliability in a national assessment of oral mathematical communication
DOI:
https://doi.org/10.7146/nomad.v13i2.148118Abstract
Mathematical communication, oral and written, is generally regarded as an important aspect of mathematics and mathematics education. This implies that oral mathematical communication also should play a part in various kinds of assessments. But oral assessments of subject matter knowledge or communication abilities, in education and elsewhere, often display reliability problems, which render difficulties with their use. In mathematics education, research about the reliability of oral assessments is comparably uncommon and this lack of research is particularly striking when it comes to the assessment of mathematical communication abilities. This study analyses the interrater reliability of the assessment of oral mathematical communication in a Swedish national test for upper secondary level. The results show that the assessment does suffer from interrater reliability problems. In addition, the difficulties to assess this construct reliably do not seem to mainly come from the communication aspect in itself, but from insufficiencies in the model employed to assess the construct.
References
Barnes, M, Clarke, D. & Stephens, M. (2000). Assessment: the engine of systemic curricular reform? Journal of Curriculum Studies, 32 (5), 623-650. https://doi.org/10.1080/00220270050116923
Brennan, R.L. & Johnson, E.G. (1995). Generalizability of performance assessments. Educational Measurement: Issues and Practice, (14) 4, 25-27. https://doi.org/10.1111/j.1745-3992.1995.tb00882.x
Cohen, J. (1960). A coefficient for agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256
Crocker, L. & Algina, J. (1986). Introduction to classical & modern test theory. Orlando, FL: Harcourt Brace Jovanovich.
Danish Ministry of Education (2007). Matematik Stx. Retrieved June 12, 2008 from http://us.uvm.dk/gymnasie/fagene/matematik/ny_stx.htm?menuid=15l
Finnish National Board of Education. (2004). National core curriculum for upper secondary schools 2003. Vammala: Finnish National Board of Education.
Frick, T. & Semmel, M.I. (1978). Observer agreement and reliabilities of classroom observational measures. Review of Educational Research, 48, 157-184. https://doi.org/10.3102/00346543048001157
Glass, G.V. & Hopkins, K. H. (1996). Statistical methods in education and psychology. Boston: Allyn and Bacon.
Harman, H.H. (1967). Modern factor analysis. University of Chicago press.
Joughin, G. (1998). Dimensions of oral assessment. Assessment & Evaluation in Higher Education, (23) 4, 367-378. https://doi.org/10.1080/0260293980230404
Kane, M., Crooks, T. & Cohen, A. (1999). Validating measures of performance. Educational Measurement: Issues and Practice, 18 (2), 5-17. https://doi.org/10.1111/j.1745-3992.1999.tb00010.x
Linacre, J.M. (1994). Many-facet Rasch measurement. Chicago: MESA press.
Linacre, J.M. (2002). Judge ratings with forced agreement. Rasch Measurement Transactions, 16 (1), 857-858. https://doi.org/10.1016/S0890-6238(02)00113-2
Linn, R.L., Baker, E.L. & Dunbar, S.B. (1991). Complex, performance-based assessment: expectations and validation criteria. Educational Researcher, 20(8), 5-21. https://doi.org/10.2307/1176232
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (Vol. 3, pp. 13-103). New York: American Council on Education.
NCTM (2000). Principles and standards for school mathematics. Reston, Va.: National Council of Teachers of Mathematics.
Niss, M., & Jensen, T. H. (2002). Kompetencer og matematiklaering [Competencies and mathematical learning] (No. 18). København: Undervisningsministeriets forlag.
Norwegian Directorate for Education and Training (2007). Natural Science and Mathematics Studies. Retrieved June 12, 2008 from http://www.udir.no/ templates/udir/TM_Artikkel.aspx?id=3587
Nyström, P. (2004). Reliability of educational assessments: the case of classification accuracy. Scandinavian Journal of Educational Research, 48 (4), 427-440. https://doi.org/10.1080/0031383042000245816
OECD (1999). Measuring student knowledge and skills. A new framework for assessment. Paris: OECD, Organisation for Economic Co-operation and Development. https://doi.org/10.1787/9789264173125-en
Raymond, M.R. & Viswesvaran, C. (1991). Least-squares models to correct for rater effects in performance assessment (ACT research report series 91-8). (ERIC, No. ED344947) https://doi.org/10.1037/e426952008-001
Shavelson, R.J. & Webb, N.M. (1991). Generalizability theory: a primer. Newbury Park, CA: Sage Publications.
Stemler, S.E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9 (4). Retrieved August 16, 2007 from http://PAREonline.net/getvn.asp?v=9&n=4
Svensson, E. (1998). Ordinal invariant measures for individual and group changes in ordered categorical data. Statistics in Medicine, 17 (24), 2923-2936. https://doi.org/10.1002/(SICI)1097-0258(19981230)17:24<2923::AID-SIM104>3.0.CO;2-#
Swedish National Agency of Education (2001). Natural science programme: programme goal, structure and syllabuses. Stockholm: National Agency of Education
Swedish Ministry of Education and Research (1993). Proposition 1992/93:250. Ny läroplan och ett nytt betygssystem för gymnasieskolan, komvux, gymnasiesärskolan och särvux. Stockholm: Fritzes.
Tierney, R. & Simon, M. (2004). What's still wrong with rubrics. Focusing on consistency of performance criteria across scale levels. Practical Assessment, Research & Evaluation, 9 (2), Retrieved August 16, 2007 from http:// pareonline.net/getvn.asp?v=9&n=2
Uebersax, J. (1987). Diversity of decision-making models and the measurement of interrater agreement. Psychological Bulletin, 101 (1), 140-146. https://doi.org/10.1037/0033-2909.101.1.140
Uebersax, J. (2007). Statistical methods for rater agreement. Retrieved April 24, 2007, from http://ourworld.compuserve.com/homepages/jsuebersax/kappa. htm
Webb, N.L. (1997). Criteria for alignment of expectations and assessments in mathematics and science education (Research Monograph no. 6). Madison, WI: National Institute for Science Education.
Wiggins, G. (1998). Educative assessment: designing assessments to inform and improve student performance. San Francisco, CA: Jossey-Bass Publishers.
Wiliam, D. (2007). Keeping learning on track: classroom assessment and the regulation of learning. In F. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 1051-1089). Charlotte, NC: Information Age Publishing.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.