TY  - JOUR
A1  - Wagner, Andreas
T1  - Unity in diversity
BT  - integrating different linguistic data in TUSNELDA
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8625
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 1
EP  - 20
ER  - 
TY  - JOUR
A1  - Witt, Andreas
T1  - Multiple hierarchies
BT  - new aspects of an old solution
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8657
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 55
EP  - 85
ER  - 
TY  - JOUR
A1  - Meyer, Roland
T1  - VP-fronting in Czech and Polish
BT  - a case study in corpus-oriented grammar research
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - Fronting of an infinite VP across a finite main verb-akin to German "VP-topicalization"-can be found also in Czech and Polish. The paper discusses evidence from large corpora for this process and some of its properties, both syntactic and information-structural. Based on this case, criteria for more user-friedly searching and retrieval of corpus data in syntactic research are being developed.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8662
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 87
EP  - 115
ER  - 
TY  - JOUR
A1  - Teich, Elke
A1  - Fankhauser, Peter
T1  - Exploring lexical patterns in text
BT  - lexical cohesion analysis with WordNet
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - We present a system for the linguistic exploration and analysis of lexical cohesion in English texts. Using an electronic thesaurus-like resource, Princeton WordNet, and the Brown Corpus of English, we have implemented a process of annotating text with lexical chains and a graphical user interface for inspection of the annotated text. We describe the system and report on some sample linguistic analyses carried out using the combined thesaurus-corpus resource.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8685
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 129
EP  - 145
ER  - 
TY  - JOUR
A1  - Schmidt, Thomas
T1  - EXMARaLDA und Datenbank "Mehrsprachigkeit"
BT  - Konzepte und praktische Erfahrungen
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - This paper presents some concepts and principles used in the development of a database of multilingual spoken discourse at the University of Hamburg. The emphasis of the first part is on general considerations for the handling of heterogeneous data sets: After showing that diversity in transcription data is partly conceptually and partly technologically motivated, it is argued that the processing of transcription corpora should be approached via a three-level architecture which separates form (application) and content (data) on the one hand, and logical and physical data structures on the other hand. Such an architecture does not only pave the way for modern text-technological approaches to linguistic data processing, it can also help to decide where and how a standardization in the work with heterogeneous data is possible and desirable and where it would run counter to the needs of the research community. It is further argued that, in order to ensure user acceptance, new solutions developed in this approach must take care not to abandon established concepts too quickly. The focus of the second part is on some practical experiences with users and technologies gained in the four years’ project work. Concerning the practical development work, the value of open standards like XML and Unicode is emphasized and some limitations of the “platform-independent” JAVA technology are indicated. With respect to users of the EXMARaLDA system, a predominantly conservative attitude towards technological innovations in transcription corpus work can be stated: individual users tend to stick to known functionalities and are reluctant to adopt themselves to the new possibilities. Furthermore, an active commitment to cooperative corpus work still seems to be the exception rather than the rule. It is concluded that technological innovations can contribute their share to a progress in the work with heterogeneous linguistic data, but that they will have to be supplemented, in the long run, with an adequate methodological reflection and the creation of an appropriate infrastructure.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8636
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 21
EP  - 42
ER  - 
TY  - JOUR
A1  - Smith, George
T1  - Refining queries on a treebank with XSLT filters
BT  - approaching the universal quantifier
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - This paper discusses the use of XSLT stylesheets as a filtering mechanism for refining the results of user queries on treebanks. The discussion is within the context of the TIGER treebank, the associated search engine and query language, but the general ideas can apply to any search engine for XML-encoded treebanks. It will be shown that important classes of linguistic phenomena can be accessed by applying relatively simple XSLT templates to the output of a query, effectively simulating the universal quantifier for a subset of the query language. uni-potsdam.de/cgi-bin/publika/view.pl?id=206">
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8678
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 117
EP  - 128
ER  - 
TY  - JOUR
A1  - Lüdeling, Anke
T1  - Heterogeneity and standardization in data, use, and annotation
BT  - a diachronic corpus of german
JF  - Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
N2  - This paper describes the standardization problems that come up in a diachronic corpus: it has to cope with differing standards with regard to diplomaticity, annotation, and header information.  Such highly het-erogeneous texts must be standardized to allow for comparative re-search without (too much) loss of information.
Y1  - 2005
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-8643
SN  - 1866-4725
SN  - 1614-4708
IS  - 2
SP  - 43
EP  - 54
ER  -