Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches

Munnes, Stefan; Harsch, Corinna; Knobloch, Marcel; Vogel, Johannes S.; Hipp, Lena; Schilling, Erik

doi:10.3389/fdata.2022.886362

Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the "gold standard " of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels-for example, a summary of the work and the reviewer's appraisal-but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lowerCan we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the "gold standard " of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels-for example, a summary of the work and the reviewer's appraisal-but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lower (r between 0.10 and 0.28). Given the high coding intensity and contingency on seed selection as well as the degree of data pre-processing of word embeddings that we found with our data, we would not recommend them for complex texts without further adaptation. While fully automated approaches appear not to work in accurately predicting text sentiments with complex texts such as ours, we found relatively high correlations with a semiautomated approach (r of around 0.6)-which, however, requires intensive human coding efforts for the training dataset. In addition to illustrating the benefits and limits of computational approaches in analyzing complex text corpora and the potential of metric rather than binary scales of text sentiment, we also provide a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts.… show more

Author details:	Stefan Munnes, Corinna Harsch, Marcel Knobloch, Johannes S. Vogel ORCiD, Lena Hipp ORCiD GND, Erik Schilling
DOI:	https://doi.org/10.3389/fdata.2022.886362
ISSN:	2624-909X
Pubmed ID:	https://pubmed.ncbi.nlm.nih.gov/35600329
Title of parent work (English):	Frontiers in Big Data
Publisher:	Frontiers Media
Place of publishing:	Lausanne
Publication type:	Article
Language:	English
Date of first publication:	2022/05/04
Publication year:	2022
Release date:	2023/12/14
Tag:	German literature; automated text analysis; computer-assisted text analysis; dictionary; scaling method; sentiment analysis; word embeddings
Volume:	5
Article number:	886362
Number of pages:	16
Funding institution:	Junge Akademie; Leibniz Association
Organizational units:	Wirtschafts- und Sozialwissenschaftliche Fakultät / Sozialwissenschaften
DDC classification:	3 Sozialwissenschaften / 30 Sozialwissenschaften, Soziologie / 300 Sozialwissenschaften
Peer review:	Referiert
Publishing method:	Open Access / Gold Open-Access
	DOAJ gelistet
License (German):	CC-BY - Namensnennung 4.0 International

Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches

Export metadata

Additional Services