TY  - JOUR
A1  - Lüdeling, Anke
A1  - Ritz, Julia
A1  - Stede, Manfred
A1  - Zeldes, Amir
T1  - Corpus Linguistics and Information Structure Research
JF  - The Oxford handbook of information structure
Y1  - 2016
SN  - 978-0-19-964267-0
SP  - 599
EP  - 617
PB  - Oxford University Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Chiarcos, Christian
A1  - Ritz, Julia
A1  - Stede, Manfred
T1  - By all these lovely tokens... Merging conflicting tokenizations
JF  - Language resources and evaluation
N2  - Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective.
KW  - Linguistic annotation
KW  - Multi-layer annotation
KW  - Conflicting tokenizations
KW  - Tokenization alignment
KW  - Corpus linguistics
Y1  - 2012
U6  - https://doi.org/10.1007/s10579-011-9161-0
SN  - 1574-020X
VL  - 46
IS  - 1
SP  - 53
EP  - 74
PB  - Springer
CY  - Dordrecht
ER  - 
TY  - JOUR
A1  - Chiarcos, Christian
A1  - Fiedler, Ines
A1  - Grubic, Mira
A1  - Hartmann, Katharina
A1  - Ritz, Julia
A1  - Schwarz, Anne
A1  - Zeldes, Amir
A1  - Zimmermann, Malte
T1  - Information structure in African languages corpora and tools
JF  - Language resources and evaluation
N2  - In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 "Information Structure". These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.
KW  - African language resources
KW  - Pragmatics
KW  - Corpus search infrastructure
Y1  - 2011
U6  - https://doi.org/10.1007/s10579-011-9153-0
SN  - 1574-020X
VL  - 45
IS  - 3
SP  - 361
EP  - 374
PB  - Springer
CY  - Dordrecht
ER  - 
TY  - JOUR
A1  - Chiarcos, Christian
A1  - Dipper, Stefanie
A1  - Götze, Michael
A1  - Leser, Ulf
A1  - Lüdeling, Anke
A1  - Ritz, Julia
A1  - Stede, Manfred
T1  - A flexible framework for integrating annotations from different tools and tag sets
N2  - We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation.
Y1  - 2008
UR  - http://www.atala.org/A-Flexible-Framework-for
SN  - 1248-9433
ER  -