Large Language Models and Biblical Hebrew: Limitations, pitfalls, opportunities
DOI:
https://doi.org/10.7146/hn.v9i1.144177Keywords:
Large Language Models, machine learning, methodology, Biblical HebrewAbstract
Researchers have been relying on computational methods to study Biblical Hebrew for a long time already. The recent improvements to and easy availability of Large Language Models (LLMs) like GPT prompt the question whether these models can be useful for our work as well. This paper tempers the expectations, showing that a critical analysis of earlier work exposes fundamental issues with methods involving GPT. However, depending on the task at hand a way forward with machine learning methods is possible, once we are aware of the limitations.
References
Amgoud, Leila. 2023. ‘Explaining black-box classifiers: Properties and functions’. International Journal of Approximate Reasoning 155:40–65. https://doi.org/10.1016/j.ijar.2023.01.004
Assael, Yannis, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag & Nando de Freitas. 2022. ‘Restoring and attributing ancient texts using deep neural networks’. Nature 603(7900):280–283. https://doi.org/10.1038/s41586-022-04448-z
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major & Shmargaret Shmitchell. 2021. ‘On the dangers of stochastic parrots: Can language models be too big? 🦜’. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. ACM. https://doi.org/10.1145/3442188.3445922
Bird, Steven & Haejoong Lee. 2007. ‘Graphical query for linguistic treebanks’. In Proceedings, PACLING 2007 – 10th Conference of the Pacific Association for Computational Linguistics. Melbourne. http://hdl.handle.net/11343/34835
Elrod, A. G. 2023. ‘Nothing new under the sun? The study of Biblical Hebrew in the era of generative pre-trained AI’. HIPHIL Novum 8(2):1–32. https://doi.org/10.7146/hn.v8i2.143114
Elrod, A. G. 2024. ‘Uncovering theological and ethical biases in LLMs: An integrated hermeneutical approach employing texts from the Hebrew Bible’. HIPHIL Novum 9(1):2–45. https://doi.org/10.7146/hn.v9i1.143407
Naaijer, Martijn, Constantijn Sikkel, Mathias Coeckelbergs, Jisk Attema & Willem Th. van Peursen. 2023. ‘A Transformer-based parser for Syriac morphology’. In Proceedings of the Ancient Language Processing Workshop associated with RANLP-2023, 23–29. https://aclanthology.org/2023.alp-1.3
OpenAI. 2023. ‘GPT-4 technical report’. https://doi.org/10.48550/arXiv.2303.08774
Rillig, Matthias C., Marlene Ågerstrand, Mohan Bi, Kenneth A. Gould & Uli Sauerland. ‘Risks and benefits of Large Language Models for the environment’. Environmental Science & Technology 57(9):3464–3466. https://doi.org/10.1021/acs.est.3c01106
Roorda, Dirk. 2017–2022. Annotation/text-fabric. Zenodo. https://doi.org/10.5281/zenodo.592193
Roorda, Dirk, Christiaan Erwich, Cody Kingham & SeHoon Park. 2017–2023. ETCBC/bhsa. Zenodo. https://doi.org/10.5281/zenodo.10049740
Rudin, Cynthia. 2019. ‘Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead’. Nature Machine Intelligence 1(5):206–215. https://doi.org/10.1038/s42256-019-0048-x
Strubell, Emma, Ananya Ganesh & Andrew McCallum. 2019. ‘Energy and policy considerations for deep learning in NLP’. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650. Florence: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1355
Tunstall, Lewis, Leandro von Werra & Thomas Wolf. 2022. Natural Language Processing with Transformers: Building language applications with Hugging Face. Beijing: O’Reilly.
Van de Bijl, Etienne P., Cody Kingham, Wido van Peursen & Sandjai Bhulai. 2019. ‘A probabilistic approach to syntactic variation in Biblical Hebrew’. https://doi.org/10.5281/zenodo.2546802
Van der Schans, Yanniek, David Ruhe, Wido van Peursen & Sandjai Bhulai. 2020. ‘Clustering biblical texts using recurrent neural networks’. https://doi.org/10.5281/zenodo.4003509
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser & Illia Polosukhin. 2017. ‘Attention is all you need’. Advances in Neural Information Processing Systems 30 (NIPS 2017). https://doi.org/10.48550/arXiv.1706.03762
Wilson-Wright, Aren. 2023. ‘COHeN’. https://huggingface.co/gngpostalsrvc/COHeN, version 82ff154, retrieved March 18, 2024.
Young, Ian & Robert Rezetko. 2008. Linguistic dating of biblical texts. London: Equinox.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Camil Staps
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Counting from volume 9 (2024), articles published in HIPHIL Novum are licensed under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). The editorial board may accept other Creative Commons licenses for individual articles, if required by funding bodies e.g. the European Research Council. With the publication of volume 9, authors retain copyright to their articles and give Hiphil Novum the right to the first publication. The authors retain copyright to earlier versions of the articles, such as the submitted and the accepted manuscript. Authors and readers may use, reuse, and build upon the published work, use it for text or data mining or for any other lawful purpose, as long as appropriate attribution is maintained.
Articles in volumes 1-8 are not licensed under Creative Commons. In these volumes, all rights are reserved to the authors of the articles respectively. This implies that readers can download, read, and link to the articles, but they cannot republish the articles. Authors may post the published version of their article to their personal website, institutional repository, or a repository required by their funding agency as a part of a green open access policy.