EXMARaLDA und Datenbank "Mehrsprachigkeit"

  • This paper presents some concepts and principles used in the development of a database of multilingual spoken discourse at the University of Hamburg. The emphasis of the first part is on general considerations for the handling of heterogeneous data sets: After showing that diversity in transcription data is partly conceptually and partly technologically motivated, it is argued that the processing of transcription corpora should be approached via a three-level architecture which separates form (application) and content (data) on the one hand, and logical and physical data structures on the other hand. Such an architecture does not only pave the way for modern text-technological approaches to linguistic data processing, it can also help to decide where and how a standardization in the work with heterogeneous data is possible and desirable and where it would run counter to the needs of the research community. It is further argued that, in order to ensure user acceptance, new solutions developed in this approach must take care not to abanThis paper presents some concepts and principles used in the development of a database of multilingual spoken discourse at the University of Hamburg. The emphasis of the first part is on general considerations for the handling of heterogeneous data sets: After showing that diversity in transcription data is partly conceptually and partly technologically motivated, it is argued that the processing of transcription corpora should be approached via a three-level architecture which separates form (application) and content (data) on the one hand, and logical and physical data structures on the other hand. Such an architecture does not only pave the way for modern text-technological approaches to linguistic data processing, it can also help to decide where and how a standardization in the work with heterogeneous data is possible and desirable and where it would run counter to the needs of the research community. It is further argued that, in order to ensure user acceptance, new solutions developed in this approach must take care not to abandon established concepts too quickly. The focus of the second part is on some practical experiences with users and technologies gained in the four years’ project work. Concerning the practical development work, the value of open standards like XML and Unicode is emphasized and some limitations of the “platform-independent” JAVA technology are indicated. With respect to users of the EXMARaLDA system, a predominantly conservative attitude towards technological innovations in transcription corpus work can be stated: individual users tend to stick to known functionalities and are reluctant to adopt themselves to the new possibilities. Furthermore, an active commitment to cooperative corpus work still seems to be the exception rather than the rule. It is concluded that technological innovations can contribute their share to a progress in the work with heterogeneous linguistic data, but that they will have to be supplemented, in the long run, with an adequate methodological reflection and the creation of an appropriate infrastructure.show moreshow less

Download full text files

Export metadata

  • Export Bibtex
  • Export RIS
  • Export XML

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Thomas Schmidt
URN:urn:nbn:de:kobv:517-opus-8636
ISSN:1866-4725 (online)
ISSN:1614-4708 (print)
Parent Title (German):Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632
Subtitle (German):Konzepte und praktische Erfahrungen
Document Type:Article
Language:German
Year of Completion:2005
Publishing Institution:Universität Potsdam
Contributing Corporation:Sonderforschungsbereich 538 Mehrsprachigkeit <Hamburg>
Release Date:2006/09/14
Issue:2
First Page:21
Last Page:42
Source:Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632. - Vol. 2
RVK - Regensburg Classification:ER 300
Organizational units:Extern / Extern
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Collections:Universität Potsdam / Schriftenreihen / Interdisciplinary studies on information structure : ISIS ; working papers of the SFB 632, ISSN 1866-4725 / ISIS (2005) 02
Licence (German):License LogoKeine Nutzungslizenz vergeben - es gilt das deutsche Urheberrecht
Notes extern:erschienen in:
Heterogeneity in focus: Creating and using linguistic databases / Stefanie Dipper ; Michael Götze ; Manfred Stede (eds.). - Potsdam : Univ.-Verl., 2005. - 145 S.
(Interdisciplinary studies on information structure ; 2)
ISBN 3-937786-48-1
URN: urn:nbn:de:kobv:517-opus-8244

Die Printausgabe kann beim Universitätsverlag Potsdam bestellt werden.