Large Language Models and Biblical Hebrew: Limitations, pitfalls, opportunities

Camil Staps

doi:10.7146/hn.v9i1.144177

Authors

Camil Staps Leiden University / Radboud University Nijmegen

DOI:

https://doi.org/10.7146/hn.v9i1.144177

Keywords:

Large Language Models, machine learning, methodology, Biblical Hebrew

Abstract

Researchers have been relying on computational methods to study Biblical Hebrew for a long time already. The recent improvements to and easy availability of Large Language Models (LLMs) like GPT prompt the question whether these models can be useful for our work as well. This paper tempers the expectations, showing that a critical analysis of earlier work exposes fundamental issues with methods involving GPT. However, depending on the task at hand a way forward with machine learning methods is possible, once we are aware of the limitations.

References

Amgoud, Leila. 2023. ‘Explaining black-box classifiers: Properties and functions’. International Journal of Approximate Reasoning 155:40–65. https://doi.org/10.1016/j.ijar.2023.01.004

Assael, Yannis, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag & Nando de Freitas. 2022. ‘Restoring and attributing ancient texts using deep neural networks’. Nature 603(7900):280–283. https://doi.org/10.1038/s41586-022-04448-z

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major & Shmargaret Shmitchell. 2021. ‘On the dangers of stochastic parrots: Can language models be too big? 🦜’. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. ACM. https://doi.org/10.1145/3442188.3445922

Bird, Steven & Haejoong Lee. 2007. ‘Graphical query for linguistic treebanks’. In Proceedings, PACLING 2007 – 10th Conference of the Pacific Association for Computational Linguistics. Melbourne. http://hdl.handle.net/11343/34835

Elrod, A. G. 2023. ‘Nothing new under the sun? The study of Biblical Hebrew in the era of generative pre-trained AI’. HIPHIL Novum 8(2):1–32. https://doi.org/10.7146/hn.v8i2.143114

Elrod, A. G. 2024. ‘Uncovering theological and ethical biases in LLMs: An integrated hermeneutical approach employing texts from the Hebrew Bible’. HIPHIL Novum 9(1):2–45. https://doi.org/10.7146/hn.v9i1.143407

Naaijer, Martijn, Constantijn Sikkel, Mathias Coeckelbergs, Jisk Attema & Willem Th. van Peursen. 2023. ‘A Transformer-based parser for Syriac morphology’. In Proceedings of the Ancient Language Processing Workshop associated with RANLP-2023, 23–29. https://aclanthology.org/2023.alp-1.3

OpenAI. 2023. ‘GPT-4 technical report’. https://doi.org/10.48550/arXiv.2303.08774

Rillig, Matthias C., Marlene Ågerstrand, Mohan Bi, Kenneth A. Gould & Uli Sauerland. ‘Risks and benefits of Large Language Models for the environment’. Environmental Science & Technology 57(9):3464–3466. https://doi.org/10.1021/acs.est.3c01106

Roorda, Dirk. 2017–2022. Annotation/text-fabric. Zenodo. https://doi.org/10.5281/zenodo.592193

Roorda, Dirk, Christiaan Erwich, Cody Kingham & SeHoon Park. 2017–2023. ETCBC/bhsa. Zenodo. https://doi.org/10.5281/zenodo.10049740

Rudin, Cynthia. 2019. ‘Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead’. Nature Machine Intelligence 1(5):206–215. https://doi.org/10.1038/s42256-019-0048-x

Strubell, Emma, Ananya Ganesh & Andrew McCallum. 2019. ‘Energy and policy considerations for deep learning in NLP’. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650. Florence: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1355

Tunstall, Lewis, Leandro von Werra & Thomas Wolf. 2022. Natural Language Processing with Transformers: Building language applications with Hugging Face. Beijing: O’Reilly.

Van de Bijl, Etienne P., Cody Kingham, Wido van Peursen & Sandjai Bhulai. 2019. ‘A probabilistic approach to syntactic variation in Biblical Hebrew’. https://doi.org/10.5281/zenodo.2546802

Van der Schans, Yanniek, David Ruhe, Wido van Peursen & Sandjai Bhulai. 2020. ‘Clustering biblical texts using recurrent neural networks’. https://doi.org/10.5281/zenodo.4003509

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser & Illia Polosukhin. 2017. ‘Attention is all you need’. Advances in Neural Information Processing Systems 30 (NIPS 2017). https://doi.org/10.48550/arXiv.1706.03762

Wilson-Wright, Aren. 2023. ‘COHeN’. https://huggingface.co/gngpostalsrvc/COHeN, version 82ff154, retrieved March 18, 2024.

Young, Ian & Robert Rezetko. 2008. Linguistic dating of biblical texts. London: Equinox.

Large Language Models and Biblical Hebrew: Limitations, pitfalls, opportunities

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue