SENTIDA: A New Tool for Sentiment Analysis in Danish

Forfattere

  • Gustav Aarup Lauridsen Aarhus Universitet
  • Jacob Aarup Dalsgaard Aarhus Universitet
  • Lars Kjartan Bacher Svendsen Aarhus Universitet

Nøgleord:

Sentiment analysis, Danish, , automated text analysis, Data Linguistics, SENTIDA

Resumé

In the midst of the Era of Big Data, tools for analysing and processing unstructured data are needed more than ever. Being among these, sentiment analysis has experienced both a substantial proliferation in popularity and major developmental progress. However, the development of sentiment analysis tools in Danish has not experienced the same rapid development as e.g. English tools. Few Danish tools exist, and often the ones available are either ineffective or outdated. Moreover, authoritative validation tests in low-resource languages, are missing, which is why little can be deduced about the competence of current Danish models. We present SENTIDA, a simple and effective model for general sentiment analysis in Danish, and compare its competence to the current benchmark within the field of Danish sentiment analysis, AFINN. Combining a lexical approach with several incorporated functions, we construct SENTIDA and categorise it as a domain-independent sentiment analysis tool focusing on polarity strength. Subsequently, we run different validation tests, including a binary classification test of Trustpilot reviews and a correlation test based on manually rated texts from different domains. The results show that SENTIDA excels across all tests, predicting reviews with an accuracy above 80% in all trials and providing significant correlations with manually annotated texts.

Referencer

Aue, A., & Gamon, M. (2005). Customizing Sentiment Classifiers to New Domains: A Case Study. Retrieved from http://tiny.cc/y82h9y
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., van der Goot, E., Halkia, M., Belyaeva, J. (2013). Sentiment Analysis in the News. ArXiv:1309.6202 [Cs]. Retrieved from http://arxiv.org/abs/1309.6202
Barnes, J., Klinger, R., & Walde, S. S. im. (2017). Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets. ArXiv:1709.04219 [Cs]. Retrieved from http://arxiv.org/abs/1709.04219
Bollen, J., Mao, H., & Zeng, X.-J. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Bouchet-Valat, M. (2015). SnowballC. Retrieved from https://cran.rproject.org/web/packages/ SnowballC/SnowballC.pdf
Choudhury, M. D., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting Depression via Social Media. Retrieved from http://tiny.cc/980e9y
Dragut, E., & Fellbaum, C. (2014). The Role of Adverbs in Sentiment Analysis. In Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) (pp. 38–41). Baltimore, MD, USA: Association for Computational Linguistics. https://doi.org/10.3115/ v1/W14-3010
Enevoldsen, K. C., & Hansen, L. (2017). Analysing Political Biases in Danish Newspapers Using Sentiment Analysis. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 2(2), 87–98.
Gamer, M. (2015). irr. Retrieved from https://cran.r-project.org/web/packages/irr/irr.pdf
Hu, M., & Liu, B. (2004). Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168–177. https://doi.org/10.1145/1014052.1014073
Hutto, C. J., & Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text, 10, 216–225.
Kolchyna, O., Souza, T. T. P., Treleaven, P., & Aste, T. (2015). Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination. ArXiv:1507.00955 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1507.00955
Krippendorff, K. (2004). Reliability in Content Analysis: Some Common Misconceptions and
Recommendations. Human Communication Research, 30(3), 411–433. https://doi.org/ 10.1093/hcr/30.3.411
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. ArXiv:1103.2903 [Cs]. Retrieved from http://arxiv.org/abs/1103.2903
Nielsen, F. Å. (2018). Danish resources. Retrieved from https://bit.ly/2NDHcbW
Pang, B., & Lee, L. (2005a). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135. https://doi.org/10.1561/1500000011
Pang, B., & Lee, L. (2005b). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115–124). Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.3115/1219840.1219855
Penka, D., & Zeijlstra, H. (2010). Negation and polarity: an introduction. Natural Language & Linguistic Theory, 28(4), 771–786. https://doi.org/10.1007/s11049-010-9114-0
RStudio Team (2015). RStudio: Integrated Development for R. Rstudio, Inc., Boston, MA. URL: http://www.rstudio.com
Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Linguistics, 20, 33-53.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1631–1642). Seattle, Washington, USA: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/D13-1170
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York, NY, US: Doubleday & Co.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y. Patwardhan, S. (2005). OpinionFinder: A System for Subjectivity Analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations (pp. 34–35). Stroudsburg, PA, USA: Association for Computational Linguistics. https://doi.org/10.3115/1225733.1225751

Downloads

Publiceret

2019-09-02

Citation/Eksport

Lauridsen, G. A., Dalsgaard, J. A., & Svendsen, L. K. B. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38–53. Hentet fra https://tidsskrift.dk/lwo/article/view/115711