publish.UP Search

Comparing decoding mechanisms for parsing argumentative structures (2018)

Afantenos, Stergos ; Peldszus, Andreas ; Stede, Manfred

Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraints that are specific to the task at hand. Specifically for argumentation parsing, two decoding approaches have been recently proposed: Minimum Spanning Trees (MST) and Integer Linear Programming (ILP), following similar trends in discourse parsing. In contrast to discourse parsing though, where trees are not always used as underlying annotation schemes, argumentation structures so far have always been represented with trees. Using the 'argumentative microtext corpus' [in: Argumentation and Reasoned Action: Proceedings of the 1st European Conference on Argumentation, Lisbon 2015 / Vol. 2, College Publications, London, 2016, pp. 801-815] as underlying data and replicating three different decoding mechanisms, in this paper we propose a novel ILP decoder and an extension to our earlier MST work, and then thoroughly compare the approaches. The result is that our new decoder outperforms related work in important respects, and that in general, ILP and MST yield very similar performance.

Comparing decoding mechanisms for parsing argumentative structures (2018)

Afantenos, Stergos ; Peldszus, Andreas ; Stede, Manfred

Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraints that are specific to the task at hand. Specifically for argumentation parsing, two decoding approaches have been recently proposed: Minimum Spanning Trees (MST) and Integer Linear Programming (ILP), following similar trends in discourse parsing. In contrast to discourse parsing though, where trees are not always used as underlying annotation schemes, argumentation structures so far have always been represented with trees. Using the ‘argumentative microtext corpus’ [in: Argumentation and Reasoned Action: Proceedings of the 1st European Conference on Argumentation, Lisbon 2015 / Vol. 2, College Publications, London, 2016, pp. 801–815] as underlying data and replicating three different decoding mechanisms, in this paper we propose a novel ILP decoder and an extension to our earlier MST work, and then thoroughly compare the approaches. The result is that our new decoder outperforms related work in important respects, and that in general, ILP and MST yield very similar performance.

Anaphoric distance in oral and written language (2022)

Aktas, Berfin ; Stede, Manfred

We investigate the variation in oral and written language in terms of anaphoric distance (i.e., the textual distance between anaphors and their antecedents), expanding corpus-based research with experimental evidence. Contrastive corpus studies demonstrate that oral genres include longer average anaphoric distance than written genres, if the distance is measured in terms of clauses (Fox, 1987; Aktas & Stede, 2020). We designed an experiment in order to examine the contrasts in oral and written mediums, using the same genre. We aim to gain more insight about the impact of the medium, in a situation where both mediums convey a similar level of spontaneity, informality and interactivity. We designed a story continuation study, where the participants are recruited via crowdsourcing. To our knowledge, this is the first study of its kind, where anaphoric distance is manipulated systematically in a language production experiment in order to examine medium distinctions. We observed that participants use more pronouns in oral medium than in written medium if the anaphoric distance is long. This result is in line with the implications of the earlier corpus-based research. In addition, our results indicate that anaphoric distance has a larger effect in referential choice for the written medium.

A flexible framework for integrating annotations from different tools and tag sets (2008)

Chiarcos, Christian ; Dipper, Stefanie ; Götze, Michael ; Leser, Ulf ; Lüdeling, Anke ; Ritz, Julia ; Stede, Manfred

We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation.

By all these lovely tokens... Merging conflicting tokenizations (2012)

Chiarcos, Christian ; Ritz, Julia ; Stede, Manfred

Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.

Discourse connectives and their arguments (2022)

Clausen, Yulia ; Stede, Manfred

Adverbial connectives like therefore, which link a preceding 'external' to an 'internal' argument, can be regarded as anaphoric: The external argument is selected by an interpretation process akin to that of an event anaphor, and intervening material can appear between both arguments. We report on a crowdsourcing experiment on the German connectives trotzdem and dennoch that studies factors that lead readers to assume such long-distance arguments: semantic plausibility of intervening material, 'subjective' versus 'objective' content, and the presence of an anaphoric morpheme in the connective. We find that the type and content of the intervening material play an important role in argument choice.

Bei : intraclausal coherence relations illustrated with a German preposition (2006)

Grabski, Michael ; Stede, Manfred

Coherence relations are typically taken to link two clauses or larger units and to be signaled at the text surface by conjunctions and certain adverbials. Relations, however, also can hold within clauses, indicated by prepositions like despite, due to, or in case of, when these have an internal argument denoting an eventuality. Although these prepositions act as reliable cues to indicate a specific relation, others are lexically more neutral. We investigated this situation for the German preposition bei, which turns out to be highly ambiguous. We demonstrate the range of readings in a corpus study, proposing 6 more specific prepositions as a comprehensive substitution set. All these uses of bei share a common kernel meaning, which is missed by the standard accounts that assume lexical polysemy. We examine the range of coherence relations that can be signaled by bei and provide some factors here supporting the disambiguation task in a framework of discourse interpretation

Classifying news versus opinions in newspapers (2017)

Krüger, K. R. ; Lukowiak, A. ; Sonntag, J. ; Warzecha, Saskia ; Stede, Manfred

Newspaper text can be broadly divided in the classes ‘opinion’ (editorials, commentary, letters to the editor) and ‘neutral’ (reports). We describe a classification system for performing this separation, which uses a set of linguistically motivated features. Working with various English newspaper corpora, we demonstrate that it significantly outperforms bag-of-lemma and PoS-tag models. We conclude that the linguistic features constitute the best method for achieving robustness against change of newspaper or domain.

Corpus Linguistics and Information Structure Research (2016)

Lüdeling, Anke ; Ritz, Julia ; Stede, Manfred ; Zeldes, Amir

Of Trees and Birds (2019)

Gisbert Fanselow’s work has been invaluable and inspiring to many researchers working on syntax, morphology, and information structure, both from a theoretical and from an experimental perspective. This volume comprises a collection of articles dedicated to Gisbert on the occasion of his 60th birthday, covering a range of topics from these areas and beyond. The contributions have in common that in a broad sense they have to do with language structures (and thus trees), and that in a more specific sense they have to do with birds. They thus cover two of Gisbert’s major interests in- and outside of the linguistic world (and perhaps even at the interface).

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

31 search hits