Segmentation in super-chunks with a finite-state approach
- Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.
Verfasserangaben: | Olivier Blanc, Matthieu Constant, Patrick Watrin |
---|---|
URN: | urn:nbn:de:kobv:517-opus-27133 |
Publikationstyp: | Konferenzveröffentlichung |
Sprache: | Englisch |
Erscheinungsjahr: | 2008 |
Veröffentlichende Institution: | Universität Potsdam |
Datum der Freischaltung: | 11.12.2008 |
Organisationseinheiten: | Extern / Extern |
DDC-Klassifikation: | 4 Sprache / 40 Sprache / 400 Sprache |
Sammlung(en): | Universität Potsdam / Tagungsbände/Proceedings (nicht fortlaufend) / Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 / II Regular Papers |
Lizenz (Deutsch): | Keine öffentliche Lizenz: Unter Urheberrechtsschutz |
Externe Anmerkung: | The complete edition of the proceedings "Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 ; Revised Papers" is available: URN urn:nbn:de:kobv:517-opus-23812 |