Refine
Has Fulltext
- no (1)
Year of publication
- 2012 (1) (remove)
Document Type
- Article (1)
Language
- English (1)
Is part of the Bibliography
- yes (1) (remove)
Keywords
- Conflicting tokenizations (1) (remove)
Institute
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.