Refine
Year of publication
Document Type
- Article (24) (remove)
Keywords
- Linguistic annotation (2)
- Annotation tools (1)
- Argument Mining (1)
- Argumentation structure (1)
- Coherence relation (1)
- Conflicting tokenizations (1)
- Connective (1)
- Corpus linguistics (1)
- Festschrift (1)
- Illocutionary force (1)
ANNIS
(2004)
In this paper, we discuss the design and implementation of our first version of the database "ANNIS" ("ANNotation of Information Structure"). For research based on empirical data, ANNIS provides a uniform environment for storing this data together with its linguistic annotations. A central database promotes standardized annotation, which facilitates interpretation and comparison of the data. ANNIS is used through a standard web browser and offers tier-based visualization of data and annotations, as well as search facilities that allow for cross-level and cross-sentential queries. The paper motivates the design of the system, characterizes its user interface, and provides an initial technical evaluation of ANNIS with respect to data size and query processing.
Coherence relations are typically taken to link two clauses or larger units and to be signaled at the text surface by conjunctions and certain adverbials. Relations, however, also can hold within clauses, indicated by prepositions like despite, due to, or in case of, when these have an internal argument denoting an eventuality. Although these prepositions act as reliable cues to indicate a specific relation, others are lexically more neutral. We investigated this situation for the German preposition bei, which turns out to be highly ambiguous. We demonstrate the range of readings in a corpus study, proposing 6 more specific prepositions as a comprehensive substitution set. All these uses of bei share a common kernel meaning, which is missed by the standard accounts that assume lexical polysemy. We examine the range of coherence relations that can be signaled by bei and provide some factors here supporting the disambiguation task in a framework of discourse interpretation
Empirical studies of text coherence often use tree-like structures in the spirit of Rhetorical Structure Theory (RST) as representational device. This paper identifies several sources of ambiguity in RST-inspired trees and argues that such structures are therefore not as explanatory as a text representation should be. As an alternative, an approach toward multi-level annotation (MLA) of texts is proposed, which separates the information into distinct levels of representation, in particular: referential structure, thematic structure, conjunctive relations, and intentional structure. Levels are conceptually built upon each other, and human annotators can produce them using a dedicated software environment. We argue that the resulting multi-level corpora are descriptively more adequate, and as a resource are more useful than RST-style treebanks.
We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation.