Filtern
Volltext vorhanden
- nein (1)
Erscheinungsjahr
- 2012 (1)
Dokumenttyp
Sprache
- Englisch (1)
Gehört zur Bibliographie
- ja (1) (entfernen)
Schlagworte
- Conflicting tokenizations (1) (entfernen)
Institut
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.