publish.UP Search

2 search hits

1 to 2

Sort by

By all these lovely tokens... Merging conflicting tokenizations (2012)

Chiarcos, Christian ; Ritz, Julia ; Stede, Manfred

Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.

Inter-operability and reusability the science of annotation (2012)

Stede, Manfred ; Huang, Chu-Ren

Annotating linguistic data has become a major field of interest, both for supplying the necessary data for machine learning approaches to NLP applications, and as a research issue in its own right. This comprises issues of technical formats, tools, and methodologies of annotation. We provide a brief overview of these notions and then introduce the papers assembled in this special issue.

1 to 2

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

2 search hits