Large Language Models and Biblical Hebrew: Limitations, pitfalls, opportunities


  • Camil Staps Leiden University / Radboud University Nijmegen



Large Language Models, machine learning, methodology, Biblical Hebrew


Researchers have been relying on computational methods to study Biblical Hebrew for a long time already. The recent improvements to and easy availability of Large Language Models (LLMs) like GPT prompt the question whether these models can be useful for our work as well. This paper tempers the expectations, showing that a critical analysis of earlier work exposes fundamental issues with methods involving GPT. However, depending on the task at hand a way forward with machine learning methods is possible, once we are aware of the limitations.


Amgoud, Leila. 2023. ‘Explaining black-box classifiers: Properties and functions’. International Journal of Approximate Reasoning 155:40–65.

Assael, Yannis, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag & Nando de Freitas. 2022. ‘Restoring and attributing ancient texts using deep neural networks’. Nature 603(7900):280–283.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major & Shmargaret Shmitchell. 2021. ‘On the dangers of stochastic parrots: Can language models be too big? 🦜’. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. ACM.

Bird, Steven & Haejoong Lee. 2007. ‘Graphical query for linguistic treebanks’. In Proceedings, PACLING 2007 – 10th Conference of the Pacific Association for Computational Linguistics. Melbourne.

Elrod, A. G. 2023. ‘Nothing new under the sun? The study of Biblical Hebrew in the era of generative pre-trained AI’. HIPHIL Novum 8(2):1–32.

Elrod, A. G. 2024. ‘Uncovering theological and ethical biases in LLMs: An integrated hermeneutical approach employing texts from the Hebrew Bible’. HIPHIL Novum 9(1):2–45.

Naaijer, Martijn, Constantijn Sikkel, Mathias Coeckelbergs, Jisk Attema & Willem Th. van Peursen. 2023. ‘A Transformer-based parser for Syriac morphology’. In Proceedings of the Ancient Language Processing Workshop associated with RANLP-2023, 23–29.

OpenAI. 2023. ‘GPT-4 technical report’.

Rillig, Matthias C., Marlene Ågerstrand, Mohan Bi, Kenneth A. Gould & Uli Sauerland. ‘Risks and benefits of Large Language Models for the environment’. Environmental Science & Technology 57(9):3464–3466.

Roorda, Dirk. 2017–2022. Annotation/text-fabric. Zenodo.

Roorda, Dirk, Christiaan Erwich, Cody Kingham & SeHoon Park. 2017–2023. ETCBC/bhsa. Zenodo.

Rudin, Cynthia. 2019. ‘Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead’. Nature Machine Intelligence 1(5):206–215.

Strubell, Emma, Ananya Ganesh & Andrew McCallum. 2019. ‘Energy and policy considerations for deep learning in NLP’. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650. Florence: Association for Computational Linguistics.

Tunstall, Lewis, Leandro von Werra & Thomas Wolf. 2022. Natural Language Processing with Transformers: Building language applications with Hugging Face. Beijing: O’Reilly.

Van de Bijl, Etienne P., Cody Kingham, Wido van Peursen & Sandjai Bhulai. 2019. ‘A probabilistic approach to syntactic variation in Biblical Hebrew’.

Van der Schans, Yanniek, David Ruhe, Wido van Peursen & Sandjai Bhulai. 2020. ‘Clustering biblical texts using recurrent neural networks’.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser & Illia Polosukhin. 2017. ‘Attention is all you need’. Advances in Neural Information Processing Systems 30 (NIPS 2017).

Wilson-Wright, Aren. 2023. ‘COHeN’., version 82ff154, retrieved March 18, 2024.

Young, Ian & Robert Rezetko. 2008. Linguistic dating of biblical texts. London: Equinox.




How to Cite

Staps, C. (2024). Large Language Models and Biblical Hebrew: Limitations, pitfalls, opportunities. HIPHIL Novum, 9(1), 46–55.