TY - JOUR A1 - Chiarcos, Christian A1 - Ritz, Julia A1 - Stede, Manfred T1 - By all these lovely tokens... Merging conflicting tokenizations JF - Language resources and evaluation N2 - Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective. KW - Linguistic annotation KW - Multi-layer annotation KW - Conflicting tokenizations KW - Tokenization alignment KW - Corpus linguistics Y1 - 2012 U6 - https://doi.org/10.1007/s10579-011-9161-0 SN - 1574-020X VL - 46 IS - 1 SP - 53 EP - 74 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Stede, Manfred A1 - Huang, Chu-Ren T1 - Inter-operability and reusability the science of annotation JF - Language resources and evaluation N2 - Annotating linguistic data has become a major field of interest, both for supplying the necessary data for machine learning approaches to NLP applications, and as a research issue in its own right. This comprises issues of technical formats, tools, and methodologies of annotation. We provide a brief overview of these notions and then introduce the papers assembled in this special issue. KW - Linguistic annotation KW - Annotation tools KW - Inter-operability Y1 - 2012 U6 - https://doi.org/10.1007/s10579-011-9164-x SN - 1574-020X VL - 46 IS - 1 SP - 91 EP - 94 PB - Springer CY - Dordrecht ER -