publish.UP Extern

Asymmetric term alignment with selective contiguity constraints by multi-tape automata (2008)

Barbaiani, Mădălina ; Cancedda, Nicola ; Dance, Chris ; Fazekas, Szilárd ; Gaál, Tamás ; Gaussier, Éric

This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.

Finite-state compilation of feature structures for two-level morphology (2008)

Barthélemy, François

This paper describes a two-level formalism where feature structures are used in contextual rules. Whereas usual two-level grammars describe rational sets over symbol pairs, this new formalism uses tree structured regular expressions. They allow an explicit and precise definition of the scope of feature structures. A given surface form may be described using several feature structures. Feature unification is expressed in contextual rules using variables, like in a unification grammar. Grammars are compiled in finite state multi-tape transducers.

Segmentation in super-chunks with a finite-state approach (2008)

Blanc, Olivier ; Constant, Matthieu ; Watrin, Patrick

Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.

Intersection optimization is NP-complete (2008)

Bonfante, Guillaume ; Le Roux, Joseph

Finite state methods for natural language processing often require the construction and the intersection of several automata. In this paper, we investigate the question of determining the best order in which these intersections should be performed. We take as an example lexical disambiguation in polarity grammars. We show that there is no efficient way to minimize the state complexity of these intersections.

Developing a finite-state morphological analyzer for Urdu and Hindi (2008)

Bögel, Tina ; Butt, Miriam ; Hautli, Annette ; Sulger, Sebastian

We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.

Perfect hashing tree automata (2008)

Daciuk, Jan

We present an algorithm that computes a function that assigns consecutive integers to trees recognized by a deterministic, acyclic, finite-state, bottom-up tree automaton. Such function is called minimal perfect hashing. It can be used to identify trees recognized by the automaton. Its value may be seen as an index in some other data structures. We also present an algorithm for inverted hashing.

SynCoP : combining syntactic tagging with chunking using weighted finite state transducers (2008)

Didakowski, Jörg

This paper describes the key aspects of the system SynCoP (Syntactic Constraint Parser) developed at the Berlin-Brandenburgische Akademie der Wissenschaften. The parser allows to combine syntactic tagging and chunking by means of constraint grammar using weighted finite state transducers (WFST). Chunks are interpreted as local dependency structures within syntactic tagging. The linguistic theories are formulated by criteria which are formalized by a semiring; these criteria allow structural preferences and gradual grammaticality. The parser is essentially a cascade of WFSTs. To find the most likely syntactic readings a best-path search is used.

Temporal propositions as regular languages (2008)

Fernando, Tim

Temporal propositions are mapped to sets of strings that witness (in a precise sense) the propositions over discrete linear Kripke frames. The strings are collected into regular languages to ensure the decidability of entailments given by inclusions between languages. (Various notions of bounded entailment are shown to be expressible as language inclusions.) The languages unwind computations implicit in the logical (and temporal) connectives via a system of finite-state constraints adapted from finite-state morphology. Applications to Hybrid Logic and non-monotonic inertial reasoning are briefly considered.

Phrase-based finite state models (2008)

González, Jorge ; Casacuberta, Francisco

In the last years, statistical machine translation has already demonstrated its usefulness within a wide variety of translation applications. In this line, phrase-based alignment models have become the reference to follow in order to build competitive systems. Finite state models are always an interesting framework because there are well-known efficient algorithms for their representation and manipulation. This document is a contribution to the evolution of finite state models towards a phrase-based approach. The inference of stochastic transducers that are based on bilingual phrases is carefully analysed from a finite state point of view. Indeed, the algorithmic phenomena that have to be taken into account in order to deal with such phrase-based finite state models when in decoding time are also in-depth detailed.

Syntactic error detection and correction in date expressions using finite-state transducers (2008)

Ilarraza, Arantza Díaz de ; Gojenola, Koldo ; Oronoz, Maite ; Otaegi, Maialen ; Alegria, Iñaki

This paper presents a system for the detection and correction of syntactic errors. It combines a robust morphosyntactic analyser and two groups of finite-state transducers specified using the Xerox Finite State Tool (xfst). One of the groups is used for the description of syntactic error patterns while the second one is used for the correction of the detected errors. The system has been tested on a corpus of real texts, containing both correct and incorrect sentences, with good results.

Extern

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Institute

18 search hits