The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 2 of 5
Back to Result List

By all these lovely tokens... Merging conflicting tokenizations

  • Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.

Export metadata

Additional Services

Share in Twitter Search Google Scholar Statistics
Author:Christian ChiarcosORCiD, Julia Ritz, Manfred StedeGND
Parent Title (English):Language resources and evaluation
Place of publication:Dordrecht
Document Type:Article
Year of first Publication:2012
Year of Completion:2012
Release Date:2017/03/26
Tag:Conflicting tokenizations; Corpus linguistics; Linguistic annotation; Multi-layer annotation; Tokenization alignment
First Page:53
Last Page:74
Funder:Deutsche Forschungsgemeinschaft (DFG) [(SFB) 632]
Organizational units:Humanwissenschaftliche Fakultät / Institut für Linguistik / Allgemeine Sprachwissenschaft
Peer Review:Referiert