Filtern
Volltext vorhanden
- nein (2)
Erscheinungsjahr
- 2012 (2)
Dokumenttyp
- Wissenschaftlicher Artikel (2) (entfernen)
Sprache
- Englisch (2)
Gehört zur Bibliographie
- ja (2)
Schlagworte
- Linguistic annotation (2) (entfernen)
Institut
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.
Annotating linguistic data has become a major field of interest, both for supplying the necessary data for machine learning approaches to NLP applications, and as a research issue in its own right. This comprises issues of technical formats, tools, and methodologies of annotation. We provide a brief overview of these notions and then introduce the papers assembled in this special issue.