• search hit 16 of 1417
Back to Result List

POS-Tagging Historical Corpora: The Case of Early New High German

  • A key problem in automatic annotation of historical corpora is inconsistent spelling. Because the spelling of some word forms can differ between texts, a language model trained on already annotated treebanks may fail to recognize known word forms due to differences in spelling. In the present work, we explore the feasibility of an unsupervised method for spelling-adjustment for the purpose of improved part of speech (POS) tagging. To this end, we present a method for spelling normalization based on weighted edit distances, which exploits within-text spelling variation. We then evaluate the improvement in taging accuracy resulting from between-texts spelling normalization in two tagging experiments on several Early New High German (ENHG) texts.
Metadaten
Author details:Ulrike DemskeORCiDGND, Pavel LogacevORCiDGND, Katrin Goldschmidt
Title of parent work (English):Proceedings of the thirteenth workshop on treebanks and linguistic theories (TLT 13)
Publisher:TALAR - Tübingen Archive of Language Resources
Place of publishing:Tübingen
Publication type:Conference Proceeding
Language:English
Date of first publication:2014/12/13
Publication year:2014
Publishing institution:Universität Potsdam
Release date:2020/02/04
Volume:2014
Number of pages:10
First page:103
Last Page:112
Organizational units:Philosophische Fakultät / Institut für Germanistik
DDC classification:4 Sprache / 41 Linguistik / 415 Grammatik
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.