MULTILINGUAL SENTIMENT NORMALIZATION FOR SCANDINAVIAN LANGUAGES
DOI:
https://doi.org/10.7146/sss.v12i1.130068Keywords:
multilingual sentiment, sentiment lexicon, normalizationAbstract
In this paper, we address the challenge of multilingual sentiment analysis using a traditional lexicon and rule-based sentiment instrument that is tailored to capture sentiment patterns in a particular language. Focusing on a case study of three closely related Scandinavian languages (Danish, Norwegian, and Swedish) and using three tailored versions of VADER, we measure the relative degree of variation in valence using the OPUS corpus. We found that scores for Swedish are systematically skewed lower than Danish for translational pairs, and that scores for Norwegian are skewed higher for both other languages. We use a neural network to optimize the fit between Norwegian and Swedish respectively and Danish as the reference (target) language.
References
Biewald, Lukas (2020). Experiment Tracking with Weights and Biases. URL: https://www.wandb.com/
Bollen, J., H. Mao, and X. Zeng (2011). Twitter mood predicts the stock market. In: Journal of Computational Science 2.1, pp. 1–8. DOI: https://10.1016/j.jocs.2010.12.007
Chew, C. and G. Eysenbach (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. In: PLoS ONE 5.11. DOI: https://10.1371/journal.pone.0014118
Devlin, Jacob et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805v2 [cs.CL]
Heise, David R (Dec. 2014). Cultural variations in sentiments. In: SpringerPlus 3.1, pp.1-11. DOI: https://10.1186/2193-1801-3-170
Hu, Minqing and Bing Liu (2004). Mining and Summarizing Customer Reviews. In: KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168–177 https://doi-org.ez.statsbiblioteket.dk:12048/10.1145/1014052.1014073
Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In: Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225. Retrieved from https://ojs.aaai.org/index.php/ICWSM/article/view/14550
Jackson, Joshua Conrad, et al. (2019). Emotion semantics show both cultural variation and universal structure. In: Science 366.6472: 1517-1522. https://10.1126/science.aaw8160.
Jin, Haifeng, Qingquan Song, and Xia Hu (2019). Auto-Keras: An Efficient Neural Architecture Search System. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, pp. 1946–1956.
Lauridsen, G. A., Dalsgaard, J. A., & Svendsen, L. K. B. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38–53. Retrieved from https://tidsskrift.dk/lwo/article/view/115711
Le, Q. and T. Mikolov (2014). Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2), pp.1188-1196.
Lison, Pierre and Jorg Tiedemann (2016). OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 923-929.
Mohammad, Saif M, Mohammad Salameh, and Svetlana Kiritchenko (2016). How Translation Alters Sentimen”. In: The Journal of artificial intelligence research 55, pp. 95–130. ISSN: 1076-9757.
Nielsen, Finn A˚ rup (2011). “A new ANEW: Evaluation of a word list for sentiment analysis in microblogs”. In: arXiv preprint arXiv:1103.2903. (Visited on 01/06/2017).
O’Connor, B. et al. (2010). “From tweets to polls: Linking text sentiment to public opinion time series”. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pp. 122–129.
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan (2002). Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP’02. Vol. 10. Association for Computational Linguistics, pp. 79–86. DOI: https://10.3115/1118693.1118704 (Visited on 04/30/2021).
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001).
Picard, Rosalind W. (Sept. 1997). Affective Computing. English. Second Edition 1998. Cam- bridge, Mass: The MIT Press. ISBN: 978-0-262-16170-1.
Reagan, A. et al. (2015). Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs. In: arXiv preprint arXiv:1512.00531.
Rouces, Jacobo et al. (2018). SenSALDO: Creating a Sentiment Lexicon for Swedish. In: Proceedings of the Eleventh International Conference on Language Resources and Evalu- ation (LREC 2018), p. 7.
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 151-161).
Thelwall, M., K. Buckley, and G. Paltoglou (2011). Sentiment in Twitter events. In: Journal of the American Society for Information Science and Technology 62(2), pp. 406–418. DOI: https://10.1002/asi.21462
ThelWall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2011). Sentiment in Short Strength Detection Informal Text (vol 61, pg 2544, 2010). JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 62(2), 419-419. DOI: https://10.1002/asi.21416
Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). Istanbul, Turkey: European Language Resources Association (ELRA), pp. 2214–2218.
Tumasjan, A., Sprenger, T., Sandner, P., & Welpe, I. (2010, May). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 4, No. 1).
Turc, Iulia et al. (2019). Well-Read Students Learn Better: On the Importance of Pre- training Compact Models. In: arXiv:1908.08962 [cs]. arXiv: 1908.08962. (Visited on 04/30/2021).
Wang, A. et al. (2019). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: arXiv:1804.07461 [cs]. arXiv: 1804.07461.