• Morten Tannert Aarhus University



computational linguistics, digital methods, writing research, learner corpora, L1 writing, education


With the rapid increase in the number of available digital texts in schools, new methodological approaches to studying writing development in education are now emerging. However, with new methodological approaches follow new epistemological challenges. In this article, I examine some of these challenges and discuss how they affect the role of computational linguistics within the field of educational writing research. The article is structured around three main sections. First, I position computational linguistics within the wider field of educational writing research with particular focus on L1 writing and K12 education. Second, I discuss to what extent methods from computational linguistics can provide us with new insights into different aspects of educational writing. Third, I discuss the potential of the concept of affordance to bridge between technology-centered and human-centered methodological approaches, and I relate this idea to recent theoretical developments in the digital humanities. Based on this discussion, I conclude the article with suggestions for possible directions in future writing research.


Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A Preliminary Analysis of Keystroke Log Data from a Timed Writing Task. ETS Research Report Series, 2012(2), 1–61.

Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired.

Baird, D. (2004). Thing knowledge: A philosophy of scientific instruments. University of California Press.

Baron, D. (2009). A Better Pencil: Readers, Writers, and the Digital Revolution. Oxford University Press.

Barton, D. (1994). Literacy: An introduction to the ecology of written language. Blackwell.

Beard, R., & Burrell, A. (2010). Investigating narrative writing by 9–11-year-olds. Journal of Research in Reading, 33(1), 77–93.

Beck, S. W., & Jeffery, J. V. (2007). Genres of high-stakes writing assessments and the construct of writing competence. Assessing Writing, 12(1), 60–79.

Beers, S. F., & Nagy, W. E. (2011). Writing development in four genres from grades three to seven: Syntactic complexity and genre differentiation. Reading and Writing, 24(2), 183–202.

Bernstein, B. (1990). The Structuring of Pedagogic Discourse. Routledge.

Berry, D. (2011). The computational turn: Thinking about the digital humanities. Culture Machine, 12, 1–22.

Berthelsen, U. D. (2021). Digitale tekster og skriftlig fremstilling i gymnasiet—Et curriculumperspektiv. Tidsskriftet Læring Og Medier (LOM), 13(23), 1–17.

Berthelsen, U. D., & Tannert, M. (2019). The Ecology of Analytics in Education: Stakeholder Interests in Data-Rich Educational Systems. International Journal of Learning Analytics and Artificial Intelligence for Education (IJAI), 1(1), 89–101.

Berthelsen, U. D., & Tannert, M. (2020). Utilizing the affordances of digital learning materials. L1 - Educational Studies in Language and Literature, 20, 1–23.

Biber, D. (2006). University language: A corpus-based study of spoken and written registers. John Benjamins Publishing Company.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.

Biber, D., & Egbert, J. (2018). Register Variation Online. Cambridge University Press.

Biesta, G. (2015). Improving education through research? From effectiveness, causality and technology to purpose, complexity and culture. Policy Futures in Education, 14(2), 194–210.

Bod, R. (2009). Probabilistic Linguistics. In B. Heine & H. Narrog (Eds.), The Oxford Handbook of Linguistic Analysis (pp. 663–692). Oxford University Press. 10.1093/oxfordhb/9780199677078.013.0025

Brandt, D. (2014). The Rise of Writing: Redefining Mass Literacy. Cambridge University Press.

Bremholm, J., Hansen, R., & Slot, M. F. (2018). Student work and the multimodal challenge: A mixed methods study of students’ productive work in L1, science and mathematics. L1 - Educational Studies in Language and Literature, 18, 1–27.

Carter, M. (2007). Ways of Knowing, Doing, and Writing in the Disciplines. College Composition and Communication, 58(3), 385–418.

Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.

Chomsky, N. (1966). Cartesian Linguistics: A Chapter in the History of Rationalist Thought. Harper & Row.

Christie, F., & Derewianka, B. (2010). School discourse: Learning to write across the years of schooling. Continuum.

Cope, B., & Kalantzis, M. (2000). Multiliteracies: Literacy learning and the design of social futures. Routledge.

Cope, B., & Kalantzis, M. (2009). “Multiliteracies”: New Literacies, New Learning. Pedagogies: An International Journal, 4, 164–195.

Correnti, R., Matsumura, L. C., Wang, E., Litman, D., Rahimi, Z., & Kisa, Z. (2020). Automated Scoring of Students’ Use of Text Evidence in Writing. Reading Research Quarterly, 55(3), 493–520.

Crossley, S. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415–443.

Da, N. Z. (2019). The Computational Case against Computational Literary Studies. Critical Inquiry, 45(3), 601–639.

Dobson, J. E. (2019). Critical Digital Humanities: The Search for a Methodology. University of Illinois Press.

Dourish, P., & Cruz, E. G. (2018). Datafication and data fiction: Narrating data and narrating with data. Big Data & Society, 5(2), 1–10.

Dunsmuir, S., & Blatchford, P. (2004). Predictors of writing competence in 4- to 7-year-old children. British Journal of Educational Psychology, 74(3), 461–483.

Eisenstein, J. (2019). Introduction to natural language processing. The MIT Press.

Engblom, C., Andersson, K., & Åkerlund, D. (2020). Young students making textual changes during digital writing. Nordic Journal of Digital Literacy, 15(03), 190–201.

Fish, S. (2018a). If You Count It, They Will Come. New York University Journal of Law and Liberty, 12(2), 333–351.

Fish, S. (2018b). The Interpretive Poverty of Data. Balkinization.

Flower, L., & Hayes, J. R. (1981). A Cognitive Process Theory of Writing. College Composition and Communication, 32(4), 365–387.

Gavin, M. (2020). Is there a text in my data? (Part 1): On counting words. Journal of Cultural Analytics.

Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton Mifflin.

Graham, S., Kiuhara, S. A., Harris, K. R., & Fishman, E. J. (2017). The Relationship among Strategic Writing Behavior, Writing Motivation, and Writing Performance with Young, Developing Writers. The Elementary School Journal, 118(1), 82–104.

Graham, S., McKeown, D., Kiuhara, S., & Harris, K. R. (2012). A meta-analysis of writing instruction for students in the elementary grades. Journal of Educational Psychology, 104(4), 879–896.

Graham, S., & Perin, D. (2007). A Meta-Analysis of Writing Instruction for Adolescent Students. Journal of Educational Psychology, 99(3), 445–476.

Gray, B. (2015). Linguistic Variation in Research Articles. When discipline tells only part of the story. John Benjamins Publishing Company.

Halliday, M. A. K. (1978). Language as Social Semiotic. Edward Arnold.

Halliday, M. A. K. (2013). Halliday’s Introduction to Functional Grammar (4th edition). Routledge.

Hardy, J. A., & Römer, U. (2013). Revealing disciplinary variation in student writing: A multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora, 8(2), 183–207.

Hirst, G. (2013). Computational Linguistics. In K. Allan (Ed.), The Oxford Handbook of The History of Linguistics (pp. 707–726). Oxford University Press.

Hutchby, I. (2001). Technologies, Texts and Affordances. Sociology, 35(2), 441–456.

Hyland, K. (2004). Disciplinary Discourses: Social Interactions in Academic Writing. University of Michigan Press.

Jewitt, C. (2006). Technology, literacy and learning: A multimodal approach. Psychology Press.

Jurafsky, D. (2002). Probabilistic Modeling in Psycholinguistics: Linguistic Performance and Production. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probabilistic Linguistics (pp. 39–96). MIT Press.

Juuhl, G. K. (2020). Text in and out of school – a dialogical rhetorical analysis. Nordic Journal of Literacy Research, 6(1), Article 1.

Kern, R. (2000). Literacy and language teaching. Oxford University Press.

Kress, G. (1994). Learning to write (2. ed.). Routledge.

Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. Routledge.

Kress, G., & van Leeuwen, T. (1996). Reading Images: The Grammar of Visual Design. Routledge.

Kvistad, A. H., & Otnes, H. (2019). Mottakerinstansen i skoleskriving – En studie av skriveoppgaver fra Normprosjektet. Nordic Journal of Literacy Research, 5(2).

Kyle, K. (2021). Natural language processing for learner corpus research. International Journal of Learner Corpus Research, 7(1), 1–16.

Lang, S. (2018). Evolution of Instructor Response? Analysis of Five Years of Feedback to Students. Journal of Writing Analytics, 2, 1–33.

Lankshear, C., & Knobel, M. (2003). New literacies: Changing knowledge and classroom learning. Open University Press.

Leech, G. (1992). Corpora and theories of linguistic performance. In J. Svartvik (Ed.), Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991 (pp. 105–122). De Gruyter Mouton.

Leijten, M., van Waes, L., & Horenbeeck, E. van. (2015). Analyzing writing process data: A linguistic perspective. In G. Cislaru (Ed.), Writing(s) at the Crossroads: The process-product interface (pp. 277–302). John Benjamins Publishing Company.

Manovich, L. (2001). The Language of New Media. MIT Press.

Manovich, L. (2020). Cultural Analytics. MIT Press.

McEnery, T., & Hardie, A. (2013). The History of Corpus Linguistics. In K. Allan (Ed.), The Oxford Handbook of The History of Linguistics (pp. 727–745). Oxford University Press.

McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2009). Linguistic Features of Writing Quality: Written Communication, 27(1), 57–86.

Mohr, J. W., Wagner-Pacifici, & Breiger, R. L. (2015). Toward a computational hermeneutics. Big Data & Society, 2(2), 1–8.

Moretti, F. (2000). Conjectures on World Literature. New Left Review, 1.

Moretti, F. (2013). Distant reading. Verso.

Moxley, J., Elliot, N., Eubanks, D., Vezzu, M., Elliot, S., & Allen, W. (2017). Writing Analytics: Conceptualization of a Multidisciplinary Field. Journal of Writing Analytics, 1, v–xvii.

Murphy, S., & Yancey, K. B. (2007). Construct and Consequence: Validity in Writing Assessment. In C. Bazerman (Ed.), Handbook of Research on Writing (pp. 448–473). Routledge.

O’Keeffe, A., McCarthy, M., & Carter, R. (Eds.). (2007). Exploring teacher corpora. In From Corpus to Classroom: Language Use and Language Teaching (pp. 220–245). Cambridge University Press.

Olinghouse, N. G., & Wilson, J. (2013). The relationship between vocabulary and writing quality in three genres. Reading and Writing, 26(1), 45–65.

Palermo, G. J. (2017). Transforming Text: Four Valences of a Digitial Humanities Informed Writing Analytics. Journal of Writing Analytics, 1, 311–343.

Parsons, S. A., Gallagher, M. A., Leggett, A. B., Ives, S. T., & Lague, M. (2020). An Analysis of 15 Journals’ Literacy Content, 2007–2016. Journal of Literacy Research, 52(3), 341–367.

Perelman, L. (2012). Construct validity, score, and time in holistically graded writing assessments: The case against automated essay scoring (AES). In C. Bazerman, C. Dean, J. Early, K. Lunsford, S. Null, P. Rogers, & A. Stansell (Eds.), International Advances in Writing Research: Cultures, Places, Measures (pp. 83–101). Fort Collins, Colorado: WAC Clearinghouse/Anderson, SC: Parlor Press.

Perelman, L. (2020). The BABEL generator and E-rater: 21st Century Writing Constructs and Automated Essay Scoring (AES). Journal of Writing Assessment, 13(1), 1–9.

Reppen, R. (2010). Using Corpora in the Language Classroom. Cambridge University Press.

Römer, U., Cortes, V., & Friginal, E. (2020). Advances in Corpus-Based Research on Academic Writing: Effects of Discipline, Register, and Writer Expertise (Vol. 95). John Benjamins Publishing Company.

Rousse-Malpat, A., Steinkrauss, R., & Verspoor, M. (2019). Structure-based or dynamic usage-based instruction: Long-term effects on (morpho)syntactic and lexical complexity in writing samples. Instructed Second Language Acquisition, 3(2), 181–205.

Searle, J. R. (1969). Speech acts: An essay in the philosophy of language (Repr.). Cambridge University Press.

Searle, J. R. (1995). The Construction of Social Reality. Free Press.

Selwyn, N. (2015). Data entry: Towards the critical study of digital data and education. Learning, Media and Technology, 40(1), 64–82.

Selwyn, N. (2019). What’s the Problem with Learning Analytics? Journal of Learning Analytics, 6(3), 11-19-11–19.

Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration. Assessing Writing, 20, 53–76.

Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of Automated Essay Evaluation: Current Applications and New Directions. Taylor & Francis Group.

Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, 57(10), 1380–1400.

Street, B. V. (1984). Literacy in theory and practice. University Press.

Street, B. V. (1995). Social literacies: Critical approaches to literacy in development, ethnography and education. Longman.

Stubbs, M. (2007). On texts, corpora and models of language. In M. Hoey, M. Mahlberg, M. Stubbs, & W. Teubert (Eds.), Text, Discourse and Corpora: Theory and Analysis (pp. 127–161). Continuum.

Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.

ten Peze, A., Janssen, T., Rijlaarsdam, G., & van Weijen, D. (2021). Writing creative and argumentative texts: What’s the difference? Exploring how task type affects students’ writing behaviour and performance. L1 - Educational Studies in Language and Literature, 21, 1–38.

The New London Group. (1996). A Pedagogy of Multiliteracies: Designing Social Futures. Harvard Educational Review, 66(1), 60–92.

Togeby, O. (2010). Handling, tekstualisering og tekst. Scandinavian Studies in Language, 1(1), 67–92.

Togeby, O. (2015). Den dobbelte genreforventning: Genreforhold og sproglige valg i elevtekster. In E. Krogh, T. S. Christensen & K. S. Jakobsen (Eds.), Elevskrivere i gymnasiefag (pp. 267–292). Syddansk Universitetsforlag.

Tracy-Ventura, N., & Paquot, M. (2020). The Routledge Handbook of Second Language Acquisition and Corpora. Routledge.

Troelsen, S. (2018). En invitation man ikke kan afslå – analyse af afgangsprøven i skriftlig fremstilling med særligt fokus på skriveordren. Nordic Journal of Literacy Research, 4(1), 142–166.

Troia, G. A., Shen, M., & Brandon, D. L. (2019). Multidimensional Levels of Language Writing Measures in Grades Four to Six. Written Communication, 36(2), 231–266.

Tuldava, J. (1998). Probleme und Methoden der quantitativ-systemischen Lexicologie. Wissenschaftlicher Verlag.

van Es, K., Wieringa, M., & Schäfer, M. T. (2018). Tool Criticism: From Digital Methods to Digital Methodology. In E. Reyes, M. Bernstein, G. Ruffo, & I. Saleh (Eds.), WS.2 2018: Proceedings of the 2nd International Conference on Web Studies (pp. 24–27). Association for Computing Machinery.

Wise, A. F., & Shaffer, D. W. (2015). Why Theory Matters More than Ever in the Age of Big Data. Journal of Learning Analytics, 2(2), 5–13.




How to Cite