Assessing authentic tasks: alternatives to mark-schemes

Authors

  • Dylan Wiliam

DOI:

https://doi.org/10.7146/nomad.v2i1.145005

Abstract

The kinds of authentic tasks that have been used in national assessments in Eng- land and Wales over the last thirty years - typically open-ended, 'pure' investigative tasks - are described, and the marking schemes used for their assessment are classified as either task-specific or generic. Generic schemes are further classified according to whether the 'degree of difficulty' of the task or the 'extent of progress' through the task is given most emphasis. A view of validation is presented that requires consideration of the value implications and social consequences of imple- menting assessment procedures, and it is argued that both task-specific and generic schemes will have the effect of stereotyping student approaches to these tasks. An alternative paradigm to norm-referenced and criterion-referenced interpretations of assessments, entitled 'construct-referenced' assessment, is proposed as being more consistent with the rationale behind such authentic assessments. Suggestions for the implementation of such a system are made and indices derived from signal- detection theory are suggested as appropriate measures for the evaluation of the accuracy of such assessments.

References

Airasian, P. (1987). State mandated testing and educational reform: context and consequences. American Journal of Education, 95(3), 393-412. https://doi.org/10.1086/444312

Airasian, P. (1988). Measurement-driven instruction: a closer look. Educational Measurement: Issues and Practice (Winter), 6-11. https://doi.org/10.1111/j.1745-3992.1988.tb00837.x

American Psychological Association, American Educational Research Association, & National Council on Measurement Used in Education (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin Supplement, 51(2) part 2), 1-38. https://doi.org/10.1037/h0053479

Angoff, W. H. (1974). Criterion-referencing, norm-referencing and the SAT. College Board Review, 92 (Summer), 2-5.

Angoff, W. H. (1988). Validity: an evolving concept. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 19-32). Hillsdale, NJ: Lawrence Erlbaum Associates.

Associated Examining Board (1966). Mathematics syllabus C paper II. London, UK: Associated Examining Board.

Bechtoldt, H. P. (1959). Construct validity: a critique. American Psychologist, 14, 619-629. https://doi.org/10.1037/h0040359

Bell, A. W., Burkhardt, H., & Swan, M. (1992). Assessment of extended tasks. In R. Lesh & S. J. Lamon (Eds.), Assessment of authentic performance in school mathematics (pp. 145- 176). Washington, DC: American Association for the Advancement of Science.

Berk, R. A. (1990). Criterion-referenced tests. In H. J. Walberg & G. D. Haertel (Eds.), The international encyclopaedia of educational evaluation (pp. 490-495). Oxford, UK: Pergamon.

Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: the SOLO taxonomy (Structure of the Observed Learning Outcome). London, UK: Academic Press.

Brown, S. I., & Walter, M. I. (1983). The art of problem posing. Philadelphia, PA: Franklin Institute Press.

Burton, L. (1984). Thinking things through. Oxford, UK: Basil Blackwell.

Cannell, J. J. (1987). Nationally normed elementary achievement testing in America's public schools: how all fifty states are above the national average. Daniels, WV: Friends for Education.

Case, R. (1985). Intellectual development: birth to adulthood. New York, NY: Academic Press.

Cattell, R. B. (1944). Psychological measurement: normative, ipsative, interactive. Psychological review, 57, 292-303. https://doi.org/10.1037/h0057299

Department of Education and Science, & Welsh Office (1989). Mathematics in the National Curriculum. London: Her Majesty's Stationery Office.

Department of Education and Science, & Welsh Office (1991). Mathematics in the National Curriculum. London, UK: Her Majesty's Stationery Office.

Embretson (Whitely), S. E. (1983). Construct validity - construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179-197. https://doi.org/10.1037/0033-2909.93.1.179

Foucault, M. (1977). Discipline and punish (Sheridan-Smith, A. M., Trans.). Harmondsworth, UK: Penguin.

Garrett, H. E. (1937). Statistics in psychology and education. New York, NY: Longmans, Green.

Gill, P. N. G. (1993). Using the construct of "levelness" in assessing open work in the National Curriculum. British Journal of Curriculum and Assessment, 3(3), 17-18.

Glaser, R. (1963). Instructional technology and the measurement of learning outcomes: some questions. American Psychologist, 18, 519-521. https://doi.org/10.1037/h0049294

Graded Assessment in Mathematics (1988). Developmentpack. London: Macmillan Education.

Graded Assessment in Mathematics (1992). Complete pack. Walton-on-Thames, UK: Thomas Nelson.

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, NY: Wiley.

Hambleton, R. K., & Rogers, H. J. (1991). Advances in criterion-referenced measurement. In R. K. Hambleton & J. N. Zaal (Eds.), Advances in educational and psychological testing (pp. 3-43). Boston, MA: Kluwer Academic Publishers. https://doi.org/10.1007/978-94-009-2195-5_1

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological reports, 3 (Monograph Supplement 9), 635-694. https://doi.org/10.2466/pr0.1957.3.3.635

London and East Anglian Group for GCSE Examinations (1987). Mathematics: centre-based assessment. London, UK: London and East Anglian Group for GCSE Examinations.

Madaus, G. (1988). The influence of testing on the curriculum. In L. N. Tanner (Ed.), Critical issues in curriculum: the 87th yearbook of the National Societyfor the Study of Education (part 1) (pp. 83-121). Chicago, IL: University of Chicago Press.

Mason, J. (1984). Mathematics: a psychological perspective. Milton Keynes, UK: Open University Press.

Mason, J., Burton, L., & Stacey, K. (1982). Thinking mathematically. London, UK: Addison- Wesley.

Messick, S. (1975). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 30, 955-966. https://doi.org/10.1037/0003-066X.30.10.955

Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35(11), 1012-1027. https://doi.org/10.1037/0003-066X.35.11.1012

Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: implications for performance assessment. Review of Educational Research, 62(3), 229-258. https://doi.org/10.3102/00346543062003229

Oxford Certificate of Educational Achievement (1987). Mathematics: putting it into practice. Oxford, UK: Oxford International Assessment Services Limited.

Paechter, C. (1992, August). Discipline as examination/examination as discipline: cross- subject coursework and the assessment-focused subject subculture. Paper presented at Bri- tish Educational Research Association Annual Conference held at Stirling University. London, UK: King's College Centre for Educational Studies.

Pascual-Leone, J. (1970). A mathematical model for the transition rule in Piaget's developmental stages. Acta Psychologica, 32, 301-345. https://doi.org/10.1016/0001-6918(70)90108-3

Piaget, J. (1956). Les stades du development mentale chez l'enfant et l'adolescent. In P. Osterreich, J. Piaget, R. de Saussure, J. M. Tanner, H. Wallon, & R. Zarro (Eds.), Le probleme des stades en psychologie de l'enfants. Paris, France: Presse Universitaire de France.

Piaget, J., & Inhelder, B. (1941). Le développement des quantités chez l' enfant. Neuchâtel, France: Delachaux et Niestlé.

Polya, G. (1957). How to solve it. Princeton, NJ: Princeton University Press.

Popham, W. J. (1978). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.

Popham, W. J. (1980). Domain specification strategies. In R. A. Berk (Ed.), Criterion- referenced measurement: the state of the art (pp. 15-31). Baltimore, MD: Johns Hopkins University Press.

Popham, W. J. (1987). Can high-stakes tests be developed at the local level? NASSP bulletin, 71(496), 77-84. https://doi.org/10.1177/019263658707149609

Popham, W. J. (1993, April). The instructional consequences of criterion-referenced clarity. Paper presented at Symposium on Criterion-referenced measurement - a thirty year retrospective at the annual meeting of the American Educational Research Association held at Atlanta, GA. Los Angeles, LA: University of California.

Sperling, G., & Dosher, B. A. (1986). Strategies and optimization in human information processing. In K. Boff, J. Thomas, & L. Kaufmann (Eds.), Handbook of perception and performance New York, NY: Wiley.

Stricker, L. J. (1976). Ipsative measures. In S. B. Anderson, S. Ball, & R. T. Murphy (Eds.), Encyclopedia of educational evaluation: concepts and techniques for evaluating education and training programs (pp. 217-220). San Francisco, CA: Jossey-Bass.

Stronach, I. (1989). A critique of the 'new assessment': from currency to carnival. In H. Simons & J. Elliott (Eds.), Rethinking appraisal and assessment Milton Keynes, UK: Open University Press.

Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285- 1293. https://doi.org/10.1126/science.3287615

Wells, D. G. (1986). Problem solving and investigations. Westbury-on-Trym, UK: Rain Publications.

Wiliam (Williams), D. (1989). Assessment of open-ended work in the secondary school. In D. F. Robitaille (Ed.), Evaluation and assessment in mathematics education (pp. 135-140). Paris, France: UNESCO.

Wiliam, D. (1992). Some technical issues in assessment: a user's guide. British Journal for Curriculum and Assessment, 2(3), 11-20.

Wiliam, D. (1993a, April). Assessing open-ended problem solving and investigative work in mathematics. Paper presented at Second Australian Council for Educational Research Second National Conference on Assessment in the Mathematical Sciences held at Surfer's Paradise, Australia.

Wiliam, D. (1993b). Paradise postponed? Mathematics Teaching (144), 20-23.

Wiliam, D. (1993c). Validity, dependability and reliability in national curriculum assessment. The Curriculum Journal, 4(4), 335-350. https://doi.org/10.1080/0958517930040303

Downloads

Published

1994-03-01

How to Cite

Wiliam, D. (1994). Assessing authentic tasks: alternatives to mark-schemes. NOMAD Nordic Studies in Mathematics Education, 2(1), 48–68. https://doi.org/10.7146/nomad.v2i1.145005

Issue

Section

Articles