TY  - CHAP
A1  - Piskorski, Jakub
T1  - ExPRESS : extraction pattern recognition engine and specification suite
N2  - The emergence of information extraction (IE) oriented pattern engines has been observed during the last decade. Most of them exploit heavily finite-state devices. This paper introduces ExPRESS – a new extraction pattern engine, whose rules are regular expressions over flat feature structures. The underlying pattern language is a blend of two previously introduced IE oriented pattern formalisms, namely, JAPE, used in the widely known GATE system, and the unificationbased XTDL formalism used in SProUT. A brief and technical overview of ExPRESS, its pattern language and the pool of its native linguistic components is given. Furthermore, the implementation of the grammar interpreter is addressed too.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27227
ER  - 
TY  - CHAP
A1  - Watson, Bruce W.
T1  - Advances in automata implementation techniques (Abstract)
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27094
ER  - 
TY  - CHAP
A1  - Barthélemy, François
T1  - Finite-state compilation of feature structures for two-level morphology
N2  - This paper describes a two-level formalism where feature structures are used in contextual rules. Whereas usual two-level grammars describe rational sets over symbol pairs, this new formalism uses tree structured regular expressions. They allow an explicit and precise definition of the scope of feature structures. A given surface form may be described using several feature structures. Feature unification is expressed in contextual rules using variables, like in a unification grammar. Grammars are compiled in finite state multi-tape transducers.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27120
ER  - 
TY  - CHAP
A1  - Blanc, Olivier
A1  - Constant, Matthieu
A1  - Watrin, Patrick
T1  - Segmentation in super-chunks with a finite-state approach
N2  - Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27133
ER  - 
TY  - CHAP
A1  - Bonfante, Guillaume
A1  - Le Roux, Joseph
T1  - Intersection optimization is NP-complete
N2  - Finite state methods for natural language processing often require the construction and the intersection of several automata. In this paper, we investigate the question of determining the best order in which these intersections should be performed. We take as an example lexical disambiguation in polarity grammars. We show that there is no efficient way to minimize the state complexity of these intersections.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27146
ER  - 
TY  - CHAP
A1  - Daciuk, Jan
T1  - Perfect hashing tree automata
N2  - We present an algorithm that computes a function that assigns consecutive integers to trees recognized by a deterministic, acyclic, finite-state, bottom-up tree automaton. Such function is called minimal perfect hashing. It can be used to identify trees recognized by the automaton. Its value may be seen as an index in some other data structures. We also present an algorithm for inverted hashing.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27163
ER  - 
TY  - CHAP
A1  - Padró, Muntsa
A1  - Padró, Lluís
T1  - ME-CSSR : an extension of CSSR using maximum entropy models
N2  - In this work an extension of CSSR algorithm using Maximum Entropy Models is introduced. Preliminary experiments to perform Named Entity Recognition with this new system are presented.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27210
ER  - 
TY  - CHAP
A1  - Bögel, Tina
A1  - Butt, Miriam
A1  - Hautli, Annette
A1  - Sulger, Sebastian
T1  - Developing a finite-state morphological analyzer for Urdu and Hindi
N2  - We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27155
ER  - 
TY  - CHAP
A1  - Yli-Jyrä, Anssi
T1  - Applications of diamonded double negation
N2  - Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or “double negation”. This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author (Yli-Jyrä and Koskenniemi 2004). This operation encapsulates a double-nested complementation and elimination of a concatenation marker, diamond, whose finite occurrences align concatenations in the arguments of the operation. The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operator.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27109
ER  - 
TY  - CHAP
A1  - Barbaiani, Mădălina
A1  - Cancedda, Nicola
A1  - Dance, Chris
A1  - Fazekas, Szilárd
A1  - Gaál, Tamás
A1  - Gaussier, Éric
T1  - Asymmetric term alignment with selective contiguity constraints by multi-tape automata
N2  - This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.
Y1  - 2008
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-27115
ER  -