Refine
Year of publication
Document Type
- Article (24)
- Monograph/Edited Volume (4)
Is part of the Bibliography
- yes (28) (remove)
Keywords
- Festschrift (2)
- Informationsstruktur (2)
- Linguistic annotation (2)
- Linguistik (2)
- Morphologie (2)
- Syntax (2)
- connective (2)
- festschrift (2)
- information structure (2)
- linguistics (2)
Institute
- Department Linguistik (28) (remove)
Of Trees and Birds
(2019)
Gisbert Fanselow’s work has been invaluable and inspiring to many researchers working on syntax, morphology, and information structure, both from a theoretical and from an experimental perspective. This volume comprises a collection of articles dedicated to Gisbert on the occasion of his 60th birthday, covering a range of topics from these areas and beyond. The contributions have in common that in a broad sense they have to do with language structures (and thus trees), and that in a more specific sense they have to do with birds. They thus cover two of Gisbert’s major interests in- and outside of the linguistic world (and perhaps even at the interface).
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. We show that SO-CAL's performance is consistent across domains and on completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.
The meaning of linguistic connectives has often been characterized in terms of their position in a bipartite (semantic, pragmatic) or a tripartite (content, epistemic, speech act) structure of domains, depending on what kinds of entities are being connected (largely: propositions or speech acts). This paper argues that a more fine-grained analysis can be achieved by directing some more attention to the characterization of the entities being related. We propose an inventory of categories of illocutionary status for labelling the spans that are being connected. On this basis, the distinction between the content and the epistemic domain, in particular, can be made more explicit. Focusing on the group of causal connectives in German, we conducted a corpus annotation study from which we derived distinct pragmatic 'usage profiles' of the most frequent causal connectives. Finally, we offer some suggestions on the role of illocutions in relation-based accounts of discourse structure.
Annotating linguistic data has become a major field of interest, both for supplying the necessary data for machine learning approaches to NLP applications, and as a research issue in its own right. This comprises issues of technical formats, tools, and methodologies of annotation. We provide a brief overview of these notions and then introduce the papers assembled in this special issue.
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.
We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation.