TY - JOUR A1 - Chiarcos, Christian A1 - Ritz, Julia A1 - Stede, Manfred T1 - By all these lovely tokens... Merging conflicting tokenizations JF - Language resources and evaluation N2 - Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective. KW - Linguistic annotation KW - Multi-layer annotation KW - Conflicting tokenizations KW - Tokenization alignment KW - Corpus linguistics Y1 - 2012 U6 - https://doi.org/10.1007/s10579-011-9161-0 SN - 1574-020X VL - 46 IS - 1 SP - 53 EP - 74 PB - Springer CY - Dordrecht ER -