Classifying news versus opinions in newspapers
- Newspaper text can be broadly divided in the classes ‘opinion’ (editorials, commentary, letters to the editor) and ‘neutral’ (reports). We describe a classification system for performing this separation, which uses a set of linguistically motivated features. Working with various English newspaper corpora, we demonstrate that it significantly outperforms bag-of-lemma and PoS-tag models. We conclude that the linguistic features constitute the best method for achieving robustness against change of newspaper or domain.
Author details: | K. R. Krüger, A. Lukowiak, J. Sonntag, Saskia Warzecha, Manfred StedeORCiDGND |
---|---|
DOI: | https://doi.org/10.1017/S1351324917000043 |
ISSN: | 1351-3249 |
ISSN: | 1469-8110 |
Title of parent work (English): | Natural language engineering |
Subtitle (English): | linguistic features for domain independence |
Publisher: | Cambridge Univ. Press |
Place of publishing: | Cambridge |
Publication type: | Article |
Language: | English |
Date of first publication: | 2017/02/21 |
Publication year: | 2017 |
Release date: | 2022/04/11 |
Volume: | 23 |
Number of pages: | 21 |
First page: | 687 |
Last Page: | 707 |
Funding institution: | German Federal Ministry of Education and Research (BMBF) [01UG1234] |
Organizational units: | Humanwissenschaftliche Fakultät / Strukturbereich Kognitionswissenschaften / Department Linguistik |
DDC classification: | 4 Sprache / 41 Linguistik / 410 Linguistik |
Peer review: | Referiert |
Institution name at the time of the publication: | Humanwissenschaftliche Fakultät / Exzellenzbereich Kognitionswissenschaften |