EMMA: Danish Natural-Language Processing of Emotion in Text

The new State-of-the-Art in Danish Sentiment Analysis and a Multidimensional Emotional Sentiment Validation Dataset

  • Esben Kran Aarhus University
  • Søren Orm Aarhus University
Keywords: Sentiment Analysis, Danish NLP, Computational Linguistics, Dataset, Open Science

Abstract

Sentiment analysis (SA) is the research and development field of computationally analysing emotion in text. One usage example of SA could be to track the sentiment of a company’s mentions on Twitter or to analyse a book’s positivity level. In this paper, we attempt to add to this work in two ways. First, we further develop the current tool Sentida (Lauridsen et al., 2019), which was originally developed to score valence in text. Valence is the amount of positivity in a text, e.g. a review. Our new version has a higher awareness of punctuation and syntax compared to the earlier version and shows significant improvement in classifying valence compared to the previous version in three different validation datasets (p < 0.01). Second, we develop a test dataset which future developers of SA can use called Emma (Emotional Multidimensional Analysis). In Emma, we supplement the dimension valence with a further three emotional dimensions: Intensity, dominance, and utility in a dataset of sentences scored by human coders on these four dimensions. The emotional dimensions are based on cognitive psychology work throughout the last 65 years.

With Emma, we present both a more reliable validation dataset and the possibility of further improving the Danish SA field by using the dataset to train a neural network with machine learning for analysing more complex emotions in text. The current standard is the 1-dimensional classification of positivity in text, but with this approach, we allow for a classification in the four dimensions of the Emma dataset that reveals much more complex emotions in texts. To allow others to work with Sentida and Emma, we help update the currently available Sentida optimized for Python and publish Emma on Github.

Author Biographies

Esben Kran, Aarhus University

Esben Kran is affiliated with the Aarhus Cognitive Science program and is interested in utilizing machine learning and neuroscience technologies.

Søren Orm, Aarhus University

Søren Orm is affiliated with the Aarhus Cognitive Science program and is interested in cognitive neuroscience and human behaviour.

References

Alsawaier, R. S. (2018). The effect of gamification on motivation and engagement. The International Journal of Information and Learning Technology, 35(1), 56–79. https://doi.org/10.1108/IJILT-02-2017-0009
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. http://arxiv.org/abs/1607.04606
Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Stimuli, instruction manual and affective ratings (Technical Report No. C-1). Gainesville, FL: NIMH Center for Research in Psychophysiology, University of Florida.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3–4), 169–200. https://doi.org/10.1080/02699939208411068
Enevoldsen, K. C., & Hansen, L. (2017). Analysing Political Biases in Danish Newspapers Using Sentiment Analysis. 12.
Gamer, M., Lemon, J., & Singh, I. F. P. (2019). irr: Various Coefficients of Interrater Reliability and Agreement (0.84.1) [Computer software]. https://CRAN.R-project.org/package=irr
Goldberg, Y., & Levy, O. (2014). word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. ArXiv:1402.3722 [Cs, Stat]. http://arxiv.org/abs/1402.3722
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2017). Learning Word Vectors for 157 Languages. 5.
Guscode. (2019). Guscode/Sentida [R]. https://github.com/Guscode/Sentida (Original work published 2019)
Hasson, U., Nastase, S. A., & Goldstein, A. (2020). Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks. Neuron, 105(3), 416–434. https://doi.org/10.1016/j.neuron.2019.12.002
Heck, R., Vuculescu, O., Sørensen, J. J., Zoller, J., Andreasen, M. G., Bason, M. G., Ejlertsen, P., Elíasson, O., Haikka, P., Laustsen, J. S., Nielsen, L. L., Mao, A., Müller, R., Napolitano, M., Pedersen, M. K., Thorsen, A. R., Bergenholtz, C., Calarco, T., Montangero, S., & Sherson, J. F. (2018). Remote optimization of an ultracold atoms experiment by experts and citizen scientists. Proceedings of the National Academy of Sciences, 115(48), E11231–E11237. https://doi.org/10.1073/pnas.1716869115
Hepach, R., Kliemann, D., Grüneisen, S., Heekeren, H. R., & Dziobek, I. (2011). Conceptualizing Emotions Along the Dimensions of Valence, Arousal, and Communicative Frequency – Implications for Social-Cognitive Tests and Training Tools. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00266
Hoang, M., Bihorac, O. A., & Rouces, J. (2019, September 30). Aspect-Based Sentiment Analysis using BERT. Proceedings of the 22nd Nordic Conference on Computational Linguistics. https://www.aclweb.org/anthology/W19-6120.pdf
Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 328–339. https://doi.org/10.18653/v1/P18-1031
Hutto, C. J. (2019). Cjhutto/vaderSentiment [Python]. https://github.com/cjhutto/vaderSentiment (Original work published 2014)
Hutto, C. J., & Gilbert, E. (2014, May 16). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Eighth International AAAI Conference on Weblogs and Social Media. Eighth International AAAI Conference on Weblogs and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8109
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of Tricks for Efficient Text Classification. ArXiv:1607.01759 [Cs]. http://arxiv.org/abs/1607.01759
Krippendorff, K. (2004). Content analysis: An introduction to its methodology Thousand Oaks. Calif.: Sage.
Lauridsen, G. A., Dalsgaard, J. A., & Svendsen, L. K. B. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38–53.
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
Liu, N., Shen, B., Zhang, Z., Zhang, Z., & Mi, K. (2019). Attention-based Sentiment Reasoner for aspect-based sentiment analysis. Human-Centric Computing and Information Sciences, 9(1), 35. https://doi.org/10.1186/s13673-019-0196-3
Maas, A. L., Ng, A. Y., & Potts, C. (2012). Multi-Dimensional Sentiment Analysis with Learned Representations.
Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16–32. https://doi.org/10.1016/j.cosrev.2017.10.002
Mehrabian, A. (1980). Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies. Cambridge : Oelgeschlager, Gunn & Hain. http://archive.org/details/basicdimensionsf0000mehr
Mekler, E. D., Brühlmann, F., Tuch, A. N., & Opwis, K. (2017). Towards understanding the effects of individual gamification elements on intrinsic motivation and performance. Computers in Human Behavior, 71, 525–534. https://doi.org/10.1016/j.chb.2015.08.048
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ArXiv:1301.3781 [Cs]. http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Munikar, M., Shakya, S., & Shrestha, A. (2019). Fine-grained Sentiment Classification using BERT. 5.
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. ArXiv:1103.2903 [Cs]. http://arxiv.org/abs/1103.2903
Nielsen, F. Å. (2017, April 28). AFINN. AFINN. http://www2.compute.dtu.dk/pubdb/views-/edoc_download.php/6975/pdf/imm6975.pdf
Nielsen, F. Å. (2019). Danish resources. https://www2.imm.dtu.dk/pubdb/views-/edoc_download.php/6956/pdf/imm6956.pdf
Nielsen, F. Å. (2019). Fnielsen/afinn [Jupyter Notebook]. https://github.com/fnielsen/afinn (Original work published 2015)
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. University of Illinois press.
Pedersen, M. K., Rasmussen, N. R., Sherson, J. F., & Basaiawmoit, R. V. (2017). Leaderboard Effects on Player Performance in a Citizen Science Game. 8.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv:1802.05365 [Cs]. http://arxiv.org/abs/1802.05365
Plutchik, R. (2001). The Nature of Emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344–350. JSTOR.
R Core Team. (2013). R: A language and environment for statistical computing [R]. R Foundation for Statistical Computing. http://www.R-project.org/
Rana, T. A., & Cheah, Y.-N. (2016). Aspect extraction in sentiment analysis: Comparative analysis and survey. Artificial Intelligence Review, 46(4), 459–483. https://doi.org/10.1007/s10462-016-9472-z
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. https://doi.org/10.1037/h0077714
Shafie, A. S., Sharef, N. M., Azmi Murad, M. A., & Azman, A. (2018). Aspect Extraction Performance with POS Tag Pattern of Dependency Relation in Aspect-based Sentiment Analysis. 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP), 1–6. https://doi.org/10.1109/INFRKM.2018.8464692
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. 12.
Strømberg-Derczynski, L., Baglini, R., Christiansen, M. H., Ciosici, M. R., Dalsgaard, J. A., Fusaroli, R., Henrichsen, P. J., Hvingelby, R., Kirkedal, A., Kjeldsen, A. S., Ladefoged, C., Nielsen, F. Å., Petersen, M. L., Rystrøm, J. H., & Varab, D. (2020). The Danish Gigaword Project. ArXiv:2005.03521 [Cs]. http://arxiv.org/abs/2005.03521
Trnka, R., Lačev, A., Balcar, K., Kuška, M., & Tavel, P. (2016). Modeling Semantic Emotion Space Using a 3D Hypercube-Projection: An Innovative Analytical Approach for the Psychology of Emotions. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.00522
Wang, J., Yu, L.-C., Lai, K. R., & Zhang, X. (2016). Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 225–230. https://doi.org/10.18653/v1/P16-2037
Watson, D., & Tellegen, A. (1985). Toward a Consensual Structure of Mood. Psychological Bulletin, 98(2), 219–235. https://doi.org/10.1037/0033-2909.98.2.219
Published
2020-07-03
How to Cite
Kran, E., & Orm, S. (2020). EMMA: Danish Natural-Language Processing of Emotion in Text. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 5(1), 92-110. Retrieved from https://tidsskrift.dk/lwo/article/view/121221