Representativeness and biases in Icelandic corpora

Authors

  • Einar Freyr Sigurðsson
  • Steinþór Steingrímsson

DOI:

https://doi.org/10.7146/ln.v1i31.151291

References

Dictionaries and corpora

Cambridge Dictionary. dictionary.cambridge.org (April 2024).

Collins English Dictionary. www.collinsdictionary.com (April 2024).

Icelandic Gigaword Corpus = Barkarson, Starkaður, Steinþór Steingrímsson, Þórdís Dröfn Andrésdóttir, Hildur Hafsteins dóttir, Finnur Ágúst Ingimundarson & Árni Davíð Magnússon (2022): Icelandic Gigaword Corpus (IGC-2022) – annotated version. CLARIN-IS. hdl.handle.net/20.500.12537/254 (April 2024).

Merriam-Webster.com. merriam-webster.com (April 2024).

Tímarit.is. Landsbókasafn Íslands – Háskólabókasafn. imarit.is (February 2024).

Other references

Barkarson, Starkaður, Steinþór Steingrímsson & Hildur Hafsteinsdóttir (2022): Evolving large text corpora: Four versions of the Icelandic Gigaword Corpus. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille: European Language Resources Association. 2371–2381 aclanthology.org/2022.lrec-1.254.

Beelen, Kaspar, Jon Lawrence, Daniel C.S. Wilson & David Beavan (2022): Bias and representativeness in digitized newspaper collections: Introducing the environmental scan. In: Digital Scholarship in the Humanities 38, 1–22. https://doi.org/10.1093/llc/fqac037.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major & Shmargaret Shmitchell (2021): On the dangers of stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21. New York, NY: Association for Computing Machinery. 610–623. https://doi.org/10.1145/3442188.3445922.

Blodgett, Su Lin, Solon Barocas, Hal Daumé III & Hanna Wallach (2020): Language (technology) is power: A critical survey of “bias” in NLP. In: Dan Jurafsky, Joyce Chai, Natalie Schluter & Joel Tetreault (eds.): Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Associ ation for Computational Linguistics. 5454–5476. aclanthology.org/2020.acl-main.485.

Böðvarsson, Árni (1992): Íslenskt málfar. Reykjavík: Almenna bókafélagið.

Friðriksdóttir, Steinunn Rut & Hafsteinn Einarsson (2024): Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, LREC-COLING 2024. Torino: ELRA and ICCL. 7596–7610. aclanthology.org/2024.lrec-main.671.

Garg, Nikhil, Londa Schiebinger, Dan Jurafsky & James Zou (2018): Word embeddings quantify 100 years of gender and ethnic stereotypes. In: Proceedings of the National Academy of Sciences 115, E3635–E3644. www.pnas.org/doi/abs/10.1073/pnas.1720347115.

Hunston, Susan (2008): Collection strategies and design decisions. In: Anke Lüdeling & Merja Kytö (eds.): Corpus Linguistics: An International Handbook, Volume 1. Berlin: De Gruyter. 154–168.

Mikolov, Tomas, Kai Chen, Gregory S. Corrado & Jeffrey Dean (2013): Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations. arxiv.org/abs/1301.3781.

Nefnd um konur og fjölmiðla: álit og tillögur (2001). Reykjavík: Menntamálaráðuneytið. rafhladan.is/handle/10802/6120.

Rögnvaldsson, Eiríkur (2022): Alls konar íslenska. Hundrað þættir um íslenskt mál á 21. öld. Reykjavík: Mál og menning.

Stefánsdóttir, Lilja Björk & Anton Karl Ingason (2018): A high definition study of syntactic lifespan change. In: University of Pennsylvania Working Papers in Linguistics 24, 169–178.

Stefánsdóttir, Lilja Björk & Anton Karl Ingason (2022): Einstaklingsbundin lífsleiðarbreyting: Þróun stílfærslu í þingræðum Steingríms J. Sigfússonar. In: Íslenskt mál og almenn málfræði 44, 151–178.

Steingrímsson, Steinþór, Sigrún Helgadóttir, Eiríkur Rögnvaldsson, Starkaður Barkarson & Jón Guðnason (2018): Risamálheild: A Very Large Icelandic Text Corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki: European Language Resources Association. 4361–4366. aclanthology.org/L18-1690.

Sólmundsdóttir, Agnes, Dagbjört Guðmundsdóttir, Lilja Björk Stefánsdóttir & Anton Karl Ingason (2021): Vondar vélþýðingar: Um kynjahalla í íslenskum þýðingum Google Translate. In: Ritið 3/21, 177–200. https://doi.org/10.33112/ritid.21.3.7.

Sólmundsdóttir, Agnes, Dagbjört Guðmundsdóttir, Lilja Björk Stefánsdóttir & Anton Karl Ingason (2022): Mean machine translations: On gender bias in Icelandic machine translations. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille: European Language Resources Association. 3113–3121.aclanthology.org/2022.lrec-1.333.

Vanmassenhove, Eva (2024): Gender bias in machine translation and the era of large language models. In: arXiv arxiv.org/html/2401.10016v1.

Downloads

Published

2024-12-05

How to Cite

Freyr Sigurðsson, E., & Steingrímsson, S. (2024). Representativeness and biases in Icelandic corpora. LexicoNordica, 1(31). https://doi.org/10.7146/ln.v1i31.151291

Issue

Section

Tematiske bidrag