TY - BOOK A1 - Stede, Manfred A1 - Chiarcos, Christian A1 - Grabski, Michael A1 - Lagerwerf, Luuk T1 - Salience in discurse : multidisciplinary approaches to discourse 2005 T3 - Uitgaven Stichting Neerlandistiek VU Y1 - 2005 SN - 3-89323-749-6 VL - 49 PB - Nodus-Publ; Stichting Neerlandistiek VU CY - Münster; Amsterdam ER - TY - JOUR A1 - Chiarcos, Christian A1 - Dipper, Stefanie A1 - Götze, Michael A1 - Leser, Ulf A1 - Lüdeling, Anke A1 - Ritz, Julia A1 - Stede, Manfred T1 - A flexible framework for integrating annotations from different tools and tag sets N2 - We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation. Y1 - 2008 UR - http://www.atala.org/A-Flexible-Framework-for SN - 1248-9433 ER - TY - THES A1 - Chiarcos, Christian T1 - Mental salience and grammatical form : toward a framework for salience metrics in natural language generation Y1 - 2009 CY - Potsdam ER - TY - JOUR A1 - Chiarcos, Christian A1 - Ritz, Julia A1 - Stede, Manfred T1 - By all these lovely tokens... Merging conflicting tokenizations JF - Language resources and evaluation N2 - Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences from a corpus-linguistic perspective. KW - Linguistic annotation KW - Multi-layer annotation KW - Conflicting tokenizations KW - Tokenization alignment KW - Corpus linguistics Y1 - 2012 U6 - https://doi.org/10.1007/s10579-011-9161-0 SN - 1574-020X VL - 46 IS - 1 SP - 53 EP - 74 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Chiarcos, Christian A1 - Fiedler, Ines A1 - Grubic, Mira A1 - Hartmann, Katharina A1 - Ritz, Julia A1 - Schwarz, Anne A1 - Zeldes, Amir A1 - Zimmermann, Malte T1 - Information structure in African languages corpora and tools JF - Language resources and evaluation N2 - In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 "Information Structure". These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives. KW - African language resources KW - Pragmatics KW - Corpus search infrastructure Y1 - 2011 U6 - https://doi.org/10.1007/s10579-011-9153-0 SN - 1574-020X VL - 45 IS - 3 SP - 361 EP - 374 PB - Springer CY - Dordrecht ER - TY - BOOK A1 - Stede, Manfred A1 - Mamprin, Sara A1 - Peldszus, Andreas A1 - Herzog, André A1 - Kaupat, David A1 - Chiarcos, Christian A1 - Warzecha, Saskia ED - Stede, Manfred T1 - Handbuch Textannotation T1 - Handbook text annotation BT - Potsdamer Kommentarkorpus 2.0 BT - Potsdam commentary corpus 2.0 N2 - Das Potsdamer Kommentarkorpus ist eine Sammlung von Zeitungstexten, die dem Genre ‘Kommentar' zuzuordnen sind. Der öffentlich verfügbare Teil besteht aus 175 Texten aus der Märkischen Allgemeinen Zeitung, die hinsichtlich Syntax, Koreferenz, Konnektoren und Rhetorische Struktur manuell annotiert wurden. Weitere Ebenen werden bei zukünftigen Korpusversionen hinzukommen. Dieses Buch enthält die Annotationsrichtlinien, die der Bearbeitung des öffentlichen Teils des Korpus zugrunde lagen, sowie auch anderer Teile, bei denen mit weiteren Annotationsebenen experimentiert wurde. Die meisten der Richtlinien werden auch für ähnliche Text-Genres und für andere Sprachen verwendbar sein. N2 - The Potsdam Commentary Corpus is a collection of newspaper texts belonging to the ‘commentary’ genre. The public part consists of 175 texts from Märkische Allgemeine Zeitung that have been manually annotated for syntax, coreference, connectives, and rhetorical structure. Further layers will be added to future releases of the corpus. This book assembles the annotation guidelines that have been used for that public part, as well as for other portions, where other layers of annotation have been experimented with. Most of the guidelines will be applicable to similar genres, and also to other languages. T3 - Potsdam Cognitive Science Series - 8 KW - linguistische Annotation KW - linguistisches Korpus KW - Textstruktur KW - Zeitungskommentare KW - linguistic annotation KW - linguistic corpus KW - text structure KW - newspaper commentary Y1 - 2015 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-82761 SN - 978-3-86956-343-5 ER -