SENTIDA: A New Tool for Sentiment Analysis in Danish


  • Gustav Aarup Lauridsen Aarhus University
  • Jacob Aarup Dalsgaard Aarhus University
  • Lars Kjartan Bacher Svendsen Aarhus University


Sentiment analysis, Danish, , automated text analysis, Data Linguistics, SENTIDA


In the midst of the Era of Big Data, tools for analysing and processing unstructured data are needed more than ever. Being among these, sentiment analysis has experienced both a substantial proliferation in popularity and major developmental progress. However, the development of sentiment analysis tools in Danish has not experienced the same rapid development as e.g. English tools. Few Danish tools exist, and often the ones available are either ineffective or outdated. Moreover, authoritative validation tests in low-resource languages, are missing, which is why little can be deduced about the competence of current Danish models. We present SENTIDA, a simple and effective model for general sentiment analysis in Danish, and compare its competence to the current benchmark within the field of Danish sentiment analysis, AFINN. Combining a lexical approach with several incorporated functions, we construct SENTIDA and categorise it as a domain-independent sentiment analysis tool focusing on polarity strength. Subsequently, we run different validation tests, including a binary classification test of Trustpilot reviews and a correlation test based on manually rated texts from different domains. The results show that SENTIDA excels across all tests, predicting reviews with an accuracy above 80% in all trials and providing significant correlations with manually annotated texts.


Aue, A., & Gamon, M. (2005). Customizing Sentiment Classifiers to New Domains: A Case Study. Retrieved from
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., van der Goot, E., Halkia, M., Belyaeva, J. (2013). Sentiment Analysis in the News. ArXiv:1309.6202 [Cs]. Retrieved from
Barnes, J., Klinger, R., & Walde, S. S. im. (2017). Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets. ArXiv:1709.04219 [Cs]. Retrieved from
Bollen, J., Mao, H., & Zeng, X.-J. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.
Bouchet-Valat, M. (2015). SnowballC. Retrieved from SnowballC/SnowballC.pdf
Choudhury, M. D., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting Depression via Social Media. Retrieved from
Dragut, E., & Fellbaum, C. (2014). The Role of Adverbs in Sentiment Analysis. In Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) (pp. 38–41). Baltimore, MD, USA: Association for Computational Linguistics. v1/W14-3010
Enevoldsen, K. C., & Hansen, L. (2017). Analysing Political Biases in Danish Newspapers Using Sentiment Analysis. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 2(2), 87–98.
Gamer, M. (2015). irr. Retrieved from
Hu, M., & Liu, B. (2004). Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168–177.
Hutto, C. J., & Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text, 10, 216–225.
Kolchyna, O., Souza, T. T. P., Treleaven, P., & Aste, T. (2015). Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination. ArXiv:1507.00955 [Cs, Stat]. Retrieved from
Krippendorff, K. (2004). Reliability in Content Analysis: Some Common Misconceptions and
Recommendations. Human Communication Research, 30(3), 411–433. 10.1093/hcr/30.3.411
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. ArXiv:1103.2903 [Cs]. Retrieved from
Nielsen, F. Å. (2018). Danish resources. Retrieved from
Pang, B., & Lee, L. (2005a). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.
Pang, B., & Lee, L. (2005b). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115–124). Stroudsburg, PA, USA: Association for Computational Linguistics.
Penka, D., & Zeijlstra, H. (2010). Negation and polarity: an introduction. Natural Language & Linguistic Theory, 28(4), 771–786.
RStudio Team (2015). RStudio: Integrated Development for R. Rstudio, Inc., Boston, MA. URL:
Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Linguistics, 20, 33-53.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1631–1642). Seattle, Washington, USA: Association for Computational Linguistics. Retrieved from
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York, NY, US: Doubleday & Co.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y. Patwardhan, S. (2005). OpinionFinder: A System for Subjectivity Analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations (pp. 34–35). Stroudsburg, PA, USA: Association for Computational Linguistics.




How to Cite

Lauridsen, G. A., Dalsgaard, J. A., & Svendsen, L. K. B. (2019). SENTIDA: A New Tool for Sentiment Analysis in Danish. Journal of Language Works - Sprogvidenskabeligt Studentertidsskrift, 4(1), 38–53. Retrieved from