Asymmetric term alignment with selective contiguity constraints by multi-tape automata
- This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.
Verfasserangaben: | Mădălina Barbaiani, Nicola Cancedda, Chris Dance, Szilárd Fazekas, Tamás Gaál, Éric Gaussier |
---|---|
URN: | urn:nbn:de:kobv:517-opus-27115 |
Publikationstyp: | Konferenzveröffentlichung |
Sprache: | Englisch |
Erscheinungsjahr: | 2008 |
Veröffentlichende Institution: | Universität Potsdam |
Datum der Freischaltung: | 11.12.2008 |
Organisationseinheiten: | Extern / Extern |
DDC-Klassifikation: | 4 Sprache / 40 Sprache / 400 Sprache |
Sammlung(en): | Universität Potsdam / Tagungsbände/Proceedings (nicht fortlaufend) / Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 / II Regular Papers |
Lizenz (Deutsch): | Keine öffentliche Lizenz: Unter Urheberrechtsschutz |
Externe Anmerkung: | The complete edition of the proceedings "Finite-state methods and natural language processing : 6th International Workshop, FSMNLP 2007 ; Revised Papers" is available: URN urn:nbn:de:kobv:517-opus-23812 |