Dansk betydningsinventar i et datalingvistisk perspektiv

Forfattere

  • Bolette Sandford Pedersen
  • Sanni Nimb
  • Sussi Olsen

DOI:

https://doi.org/10.7146/danskestudier.vi2021.134544

Nøgleord:

Datalingvistik

Resumé

In this paper we investigate the Danish sense inventory from a paradigmatic and a syntagmatic perspective, respectively, and we present a collection of related lexical semantic resources that we have developed in collaboration between The Society for Danish Language and Literature and The University of Copenhagen. The resources comprise a Danish wordnet (DanNet), The Danish FrameNet Lexicon, and The Danish Sentiment Lexicon. All three resources are designed to enable semantic processing to be used in digital humanities research as well as more broadly in language-centric technology development. Finally, in order to illustrate the use of the resources when processing running text, we provide some annotation examples of each resource.

Referencer

Asmussen, Jørg og Jakob Halskov (2011), DK-CLARIN Reference Corpus of General Danish, CLARIN-DK-UCPH Centre Repository.

http://hdl.handle.net/20.500.12115/36.

Ahmadi, Sina, Sanni Nimb, John P. McCrae, Nicolai H. Sørensen (2020). Towards automatic linking of lexicographic data: the case of a historical and a modern Danish dictionary. Congress of the European Association for Lexicography: EURALEX XIX: Lexicography for Inclusion.

Baker, Collin F., Charles J. Fillmore, John B. Lowe (1998). The Berkeley FrameNet project. I: Proceedings of the COLING-ACL. Montreal, Canada.

https://doi.org/10.3115/980451.980860

Bick, Eckhard (2011). A FrameNet for Danish. I: Proceedings of NODALIDA 2011, May 11-13, Riga, Latvia. NEALT Proceedings Series, Vol. 11, pp. 34-41. Tartu: Tartu University Library.

Bick, Eckhard (2017). Propbank Annotation of Danish Noun Frames, I: ACL Anthology. W17, 69, p. 6.

Bjerring-Hansen, Jens, Frank Fischer, Torben Jelsbak & Nicolai Hartvig Sørensen (2019). Nodes and Edges in Literary History. Modelling 19th Century Literary Landscapes. I: DH2019 Proceedings, Utrecht.

Braasch, Anna & Bolette Sandford Pedersen (2010). Encoding Attitude and Connotation in Wordnets. I: Proceedings of the XIV Euralex International Congress.

Braasch Anna, Sussi A. Olsen (2004). STO: A Danish Lexicon Resource - Ready for Applications. I: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 1079-1083. Lisboa, Portugal 2004.

Calzolari, Nicoletta, Antonio Zampolli, Alessandro Lenci (2002). Towards a Standard for a Multilingual Lexical Entry: The EAGLES/ISLE Initiative. I: CICLing 2002: Computational Linguistics and Intelligent Text Processing pp. 264-279.

https://doi.org/10.1007/3-540-45715-1_25

Copestake, Ann & Ted Briscoe (1996). Semi-productive polysemy and sense extension. I: Lexical Semantics: The Problem of Polysemy, J. Pustejovsky & B. Boguraev (red.). New York: Clarendon Press. pp. 15-67.

https://doi.org/10.1093/jos/12.1.15

Cruse, David A. (1989). Lexical Semantics. Cambridge University Press. Dannéls, Dana, Lars Borin & Karin Friberg Heppin (eds) (2021). Swedish FrameNet++. Harmonization, integration, method development and practical language technology applications. Amsterdam: John Benjamins.

DanNet, det danske wordnet: Kan downloades fra http://wordnet.dk. Kan browses fra http://wordties.cst.dk/wordties-dannet/. Forskningsreference: Pedersen et al. 2009.

Den Danske Begrebsordbog: Nimb, Sanni, Henrik Lorentzen, Thomas Troelsgård, Liisa Theilgaard, Lars Trap-Jensen (2014). Den Danske Begrebsordbog, Det Danske Sprog- og Litteraturselskab, København, Danmark.

Den Danske Ordbog (DDO). Tilgået på: http://ordnet.dk/ddo, Det Danske Sprog- og Litteraturselskab, København, Danmark

Devitt, Ann & Khurshid Ahmad (2013). Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. I: Language Resources & Evaluation 47: 475-511. doi 10.1007/s10579-013-9223-6.

https://doi.org/10.1007/s10579-013-9223-6

Det Danske FrameNet-leksikon. https://korpus.dsl.dk/resources/details/framenet.html. Forsknings-reference: Nimb 2018, Pedersen et al. 2018a, samt Nimb et al. 2017

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2019). BERT: Pretraining of deep bidirectional transformers for language understanding. I: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota. Association for Computational Linguistics.

Enevoldsen, Kenneth C. & Lasse Hansen (2017). Analysing Political Biases in Danish Newspapers Using Sentiment Analysis. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 2(2), pp.87-98.

Fellbaum, Christiane, and George Miller (1989) WordNet. An electronic lexical database. Cambridge, MA: MIT Press; 1998.

https://doi.org/10.7551/mitpress/7287.001.0001

Fillmore, Charles. (1968). The case for case. I: Bach & Harms (red.): Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston. pp. 1-88.

Fillmore, Charles, Sue Atkins (1992). Toward a frame-based lexicon: The semantics of RISK and its neighbours. I: Frames, Fields, and Contrast: New Essays in Semantics and Lexical Organization, A. Lehrer and E. Kittay, red., pp. 75-102, Lawrence Erlbaum Associates.

Firth, John R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis. Oxford, Blackwell.

Guarino, Nicola & Mark Musen (2015). Applied ontology: The next decade begins. I: Applied ontology 10, IOS Press.

https://doi.org/10.3233/AO-150143

Hershcovich, Daniel & Lucia Donatelli (2021). It's the Meaning that Counts: The State of the Art in NLP and Semantics. Künstliche Intelligenz Special Issue on NLP and Semantics.

https://doi.org/10.1007/s13218-021-00726-6

Hjelmslev, Louis (1966). Omkring sprogteoriens grundlæggelse. København: Akademisk Forlag.

Izquirdo, Rubén, Armando Suárez & German Rigau (2009). An empirical study on class-based word sense disambiguation. I: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 389-397. The Association for Computational Linguistics.

https://doi.org/10.3115/1609067.1609110

Jacobs, Arthur. (2019). Sentiment Analysis for Words and Fiction Characters From the Perspective of Computational (Neuro-)Poetics. I: Frontiers in Robotics and AI. 6. 53. 10.3389/frobt.2019.00053.

https://doi.org/10.3389/frobt.2019.00053

Jakobsen, Lisbeth Falster (2008). Paradigmatisk og associationsdannelse som grundlæggende funktioner i sproget. Ny Forskning i Grammatik 15, pp. 49-68.

https://doi.org/10.7146/nfg.v16i15.23758

Kirchmeier, Sabine, Peter Juel Henrichsen, Philip Diderichsen og Nanna Bøgebjerg Hansen (2019). Dansk Sprogteknologi i verdensklasse - rapport fra Sprogteknologiudvalget

https://sprogtek2018.dk/?p=409

KorpusDK: https://ordnet.dk/korpusdk; https://korpus.dsl.dk/resources/details/korpusdk.html

Krek, Simon, Iztok Kosem, John McCrae, Roberto Navigli, Bolette S. Pedersen, Carole Tiberius, & Tanja Wissik (2018). European Lexicographic Infrastructure (ELEXIS). 881-892. Paper presented at the XVIII EURALEX International Congress, Ljubljana, Slovenia.

Krippendorff, Klaus (2011). Agreement and Information in the Reliasbility of Coding. I: Communication Methods and Measures 5 (2) pp: 93-112.

https://doi.org/10.1080/19312458.2011.568376

Lakoff, George & Mark Johnson (1980). Metaphors We Live By. University of Chicago Press.

Lauridsen, G., Dalsgaard, J., & Svendsen, L. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38-53.

https://tidsskrift.dk/lwo/article

Lenci, Alessandro. (2008). Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20(1):1-31.

Lenci, Alessandro, Nuria Bel, Federica Busa, Nicoletta Calzolari, Elisabetta Gola, Monica Monachini, Antoine Ogonowski, Ivonne Peters, Wim Peters, Nilda Ruimy, Marta Villegas, Antonio Zampolli (2000). SIMPLE: A general framework for the development of multilingual Lexicons. I: International Journal of Lexicography, 13(4), pp. 249-263.

https://doi.org/10.1093/ijl/13.4.249

Levin, Beth (1993). English verb classes and their alternations. A preliminary investigation. University of Chicago Press.

Liu, Bing (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge: Cambridge University Press.

https://doi.org/10.1017/CBO9781139084789

Lorentzen, Henrik & Sanni Nimb (2010). Fra ordbog til wordnet - Hvordan udmøntes en traditionel ordbogsdefinition i en formaliseret wordnetbeskrivelse? Nordiska Studier i Lexikografi 10, 2010, pp. 329-344. Rapport från Konferens om lexikografi i Norden, Tammerfors 3.-5. juni 2009

Madsen, Bodil Nistrup (2005). Håndbog i begrebsarbejde. Sundhedsdatastyrelsen.

Martínez Alonso, Héctor, Anders Johannsen, Sussi Olsen, Sanni Nimb, Nicolai Hartvig Sørensen, Anna Braasch, Anders Søgaard, Bolette Sandford Pedersen (2015). Supersense tagging for Danish. I: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015, Linköping Electronic Conference Proceedings #109, ACL Anthology, Linköping University Electronic Press, Sweden.

McCarthy, Diana, Marianna Apidianaki, Katrin Erk (2016). Word Sense Clustering and Clusterability. I: Computational Linguistics, Vol. 42, no. 2.

Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean (2013). Distributed representations of words and phrases and their compositionality. I: Advances in neural information processing systems, pp. 3111-3119.

Nielsen, F. Å. (2018). Danish resources.

https://bit.ly/2NDHcbW

Nimb, Sanni (2018). The Danish FrameNet Lexicon: Method and lexical coverage. I: Proceedings of the International FrameNet Workshop 2018: Multilingual FrameNets and Constructions, pp. 48-52.

Nimb, Sanni (2016). Der er ikke langt fra tanke til handling. Om semantiske typer og systematisk polysemi i Den Danske begrebsordbog. I: Danske Studier 2016, Universitets-Jubilæets danske Samfund, pp. 25-59.

Nimb, Sanni (2013). Leksikalsk-semantisk information i en ny dansk begrebsordbog. I: (Dorthe Duncker, Anne Mette Hansen og Karen Skovgaard-Petersen (red.). Betydning og Forståelse. Festskrift til Hanne Ruus. Selskab for Nordisk Filologi, Københavns Universitet, pp. 251-266.

Nimb, Sanni (2009). The Semantic Relations of Artifacts in DanNet. I: B.S. Pedersen, A. Braasch, S. Nimb, R. Vatvedt Fjeld (red.): Proceedings of the NODALIDA 2009 workshop. WordNets and other Lexical Semantic Resources - between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. NEALT Proceedings Series, Vol. 7.

Nimb, Sanni & Bolette S. Pedersen (2016). Fra begrebsordbog til sprogteknologisk ressource: verber, semantiske roller og rammer - et pilotstudie. I: Skrifter / Nordisk forening for leksikografi, Vol. 14, 2016, pp. 405-415.

Nimb, Sanni & Bolette S. Pedersen (2000). Treating Metaphoric Senses in a Da- nish Computational Lexicon - Different cases of regular polysemy. I: Proceedings of the 9th EURALEX International Congress.

Nimb, Sanni, Anna Braasch, Sussi Olsen, Bolette Sandford Pedersen, Anders Søgaard (2017). From Thesaurus to FrameNet. I: Proceedings of eLex 2017. Electronic Lexicography in the 21st century - Proceedings of eLex 2017 conference. Leiden, The Netherlands, pp. 1-22.

Nimb, Sanni, Nicolai Hartvig Sørensen (2018). Word2Dict - Lemma Selection and Dictionary Editing Assisted by Word Embeddings. Proceedings from Euralex 2018, Ljubliana, Slovenia, 2018

Ohara, Kyoko (2012). Semantic annotations in Japanese Framenet: comparing fra- mes in Japanese and English. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) Istanbul, Turkey.

Olsen, Ida R., Asad Sayeed, & Bolette S. Pedersen (2020). Building Sense Re- presentations in Danish by Combining Word Embeddings with Lexical Re- sources. I: Globalex Workshop on Linked Lexicography: LREC 2020 Work- shop Language Resources and Evaluation Conference (pp. 45-52). Marseille, France: European Language Resources Association

Olsen, Sussi, Bolette S. Pedersen, Héctor Martínez Alonso & Anders T. Johannsen (2015). Coarse-grained sense annotation of Danish across textual domains. I: Proceedings of the Workshop on Semantic resources and Seman- tic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015 (pp. 36-43). Sweden: Linköping University Electronic Press. Linköping Electronic Conference Proceedings.

Pedersen, Bolette S., John McCrae, Carole Tiberius, & Simon Krek (2018). ELEXIS - a European infrastructure fostering cooperation and information exchange among lexicographical research communities. I: Proceedings of Global WordNet Conference 2018 Singapore.

Pedersen, Bolette S., Sanni Nimb, Anders Søgaard, Mareike Hartmann, Sussi Olsen (2018a). A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier. I: Proceedings of LREC 2018, Japan.

Pedersen, Bolette S., Manex Agirrezabal, Sanni Nimb, Sussi Olsen, Ida Rør- mann Olsen (2018b). Towards a principled approach to sense clustering - a case study of wordnet and dictionary senses in Danish. I: Proceedings of Glo- bal WordNet Conference 2018, Singapore.

Pedersen, Bolette S., Anna Braasch, Anders Johanssen, Héctor Martínez Alonso, Sanni Nimb, Sussi Olsen, Anders Søgaard, Nicolai Hartvig Sørensen (2016). The SemDaX Corpus - sense annotations with scalable sense inventories. I: Proceedings of the 10th edition of the Language Resources and Evaluation Conference, Portorož, Slovenia.

Pedersen, Bolette, Sanni Nimb, Anna Braasch, Sussi Olsen (2016a). Betydningsinventarer - i ordbøger og i løbende tekst. In: Skrifter/Nordisk Forening for Leksikografi, Vol. 14, pp. 417-429.

Pedersen, Bolette S., Sanni Nimb, Anna Braasch (2010). Merging specialist taxonomies and folk taxonomies in wordnets. - a case study of plants, animals and foods in the Danish wordnet. I: Proceedings from the Seventh International Conference on Language Resources and Evaluation 2010 p. 3181-3186. Malta.

Pedersen, Bolette S., Krister Lindén, Kadri Vider, Markus Forsberg, Neeme Kahusk, Jyrki Niemi, Lars Nygaard, Mitchell Seaton, Heili Orav, Lars Borin, Kaarlo Voionmaa, Niklas Nisbeth and Eirikur Rögnvaldsson (2013). Nordic and Baltic wordnets aligned and compared through »WordTies«. I: Pro- ceedings from the 19th Nordic Conference on Computational Linguistics (NODALIDA). Linköping Electronic Conference Proceedings; Volume 85 (ISSN 1650-3740)

Pedersen, Bolette.S, Sanni Nimb, Jørg Asmussen, Nicolai Sørensen, Lars Trap- Jensen, Henrik Lorentzen (2009). DanNet - the challenge of compiling a WordNet for Danish by reusing a monolingual dictionary. Language Re- sources and Evaluation, Computational Linguistics Series, pp.269-299.

https://doi.org/10.1007/s10579-009-9092-1

Pedersen Bolette S., Sussi Olsen, Sanni Nimb, Anna Braasch (2015). Betydningsinventar - i ordbøger og i løbende tekst. I: 13. Konference om Leksikografi i Norden, København, Denmark.

Pedersen, Bolette, Patrizia Paggio (2004). The Danish SIMPLE Lexicon and its Application in Contentbased Querying. Nordic Journal of Linguistics, (Vol.27:1), 97-127.

https://doi.org/10.1017/S0332586504001179

Pedersen, Bolette Sandford, Anna Braasch, Anders Trærup Johannsen, Héctor Martinez Alonso, Sanni Nimb, Sussi Olsen, Anders Søgaard, Nicolai Sørensen (2016). The SemDaX Corpus - sense annotations with scalable sense inven- tories. In Proceedings of the 10th conference of the Language Resources and Evaluation Conference (pp. 842-847). Portorož, Slovenia.

Pustejovsky, James (1995). The Generative Lexicon, Cambridge, MA: MIT Press.

Retskrivningsordbogen, 4. udgave, 2012 Dansk Sprognævn og Forlaget Lindhardt og Ringhof.

Rouces Jacobo, Lars Borin, Nina Tahmasebi, Stian R. Eide (2018a). Defining a gold standard for a Swedish sentiment lexicon: Towards higher-yield text mining in the digital humanities. I: CEUR Workshop Proceedings vol. 2084. Proceedings of the Digital Humanities in the Nordic Countries 3rd Confe- rence Helsinki, Finland, March 7-9, 2018. Edited by Eetu Mäkelä Mikko Tolonen Jouni Tuominen.

Rouces Jacobo, Lars Borin, Nina Tahmasebi, Stian R. Eide (2018b). SenSaldo: Creating a sentiment lexicon for Swedish. I: Proceedings of LREC 2018, Elev- enth International Conference on Language Resources and Evaluation, si. 4192-4198, Miyazaki, ELRA.

Ruppenhofer, Josef, Michael Ellsworth, Miriam R. L. Petruck, Christopher. R. Johnson, Collin F. Baker, Jan Scheffczyk (2016). I: FrameNet II: Extended Theory and Practice https://framenet.icsi.berkeley.edu/fndrupal/the_book.

Scarlini, Bianca, Tommaso Pasini and Roberto Navigli (2020). SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disam- biguation. I: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).

https://doi.org/10.1609/aaai.v34i05.6402

Svensén, Bo (2004). Handbok i lexikografi. Ordböker och ordboksarbete i teori och praktik. Norstedts Akademiska Förlag.

SemDax, det semantisk opmærkede korpus kan downloades fra github: https://github.com/coastalcph/semdax. Forskningsreference: Pedersen et al. 2016.

STO - SprogTeknologisk Ordbase. CLARIN-DK-UCPH Centre Repository: http://hdl.handle.net/20.500.12115/21 (samt /22, /23, /26). Forskningsreference: Braasch et al. 2004.

Sørensen, Nicolai H. & Sanni Nimb (2018). Word2Dict - Lemma Selection and Dictionary Editing Assisted by Word Embeddings. I: Proceedings from Euralex 2018, Ljubliana, Slovenien.

Talmy, Leonard (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description (pp. 36-149). Cambridge: Cambridge University Press.

Tangherlini Timothy & Peter Leonard (2013). Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research. Poetics 41(6): 725-749.

https://doi.org/10.1016/j.poetic.2013.08.002

Torrent, Tiago, Lars Borin, Collin F. Baker (2018). Multilingual Framenets and Constructions. Proceedings of the LREC 2018 Workshop International FrameNet Workshop 2018. Miyazaki, Japan.

Vossen, Piek (ed.) (1999). EuroWordNet, A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, The Netherlands.

https://doi.org/10.1007/978-94-017-1491-4

Downloads

Publiceret

2022-11-08

Citation/Eksport

Pedersen, B. S., Nimb , S., & Olsen, S. (2022). Dansk betydningsinventar i et datalingvistisk perspektiv. Danske Studier, (2021), 72–106. https://doi.org/10.7146/danskestudier.vi2021.134544

Nummer

Sektion

Artikler