Extern
Refine
Has Fulltext
- yes (55)
Year of publication
- 2008 (55) (remove)
Document Type
- Postprint (23)
- Conference Proceeding (18)
- Article (7)
- Doctoral Thesis (6)
- Preprint (1)
Keywords
- Demokratie (2)
- EU (2)
- Russland (2)
- Türkei (2)
- Abrüstung (1)
- Antibiotic alternatives (1)
- Antibiotic resistance (1)
- Antibiotikaersatz (1)
- Antibiotikaresistenz (1)
- Arbeit (1)
Institute
- Extern (55) (remove)
The boundary paradigm (Rayner, 1975) with a novel preview manipulation was used to examine the extent of parafoveal processing of words to the right of fixation. Words n+1 and n+2 had either correct or incorrect previews prior to fixation (prior to crossing the boundary location). In addition, the manipulation utilized either a high or low frequency word in word n+1 location on the assumption that it would be more likely that n+2 preview effects could be obtained when word n+1 was high frequency. The primary findings were that there was no evidence for a preview benefit for word n+2 and no evidence for parafoveal-on-foveal effects when word n+1 is at least four letters long. We discuss implications for models of eye-movement control in reading.
This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.
This paper describes a two-level formalism where feature structures are used in contextual rules. Whereas usual two-level grammars describe rational sets over symbol pairs, this new formalism uses tree structured regular expressions. They allow an explicit and precise definition of the scope of feature structures. A given surface form may be described using several feature structures. Feature unification is expressed in contextual rules using variables, like in a unification grammar. Grammars are compiled in finite state multi-tape transducers.
Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.
Finite state methods for natural language processing often require the construction and the intersection of several automata. In this paper, we investigate the question of determining the best order in which these intersections should be performed. We take as an example lexical disambiguation in polarity grammars. We show that there is no efficient way to minimize the state complexity of these intersections.
Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
(2008)
The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram and bigram frequency, word length, and empirically-derived word predictability; the so-called “early” and “late” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.
We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.
A multitype Dawson-Watanabe process is conditioned, in subcritical and critical cases, on non-extinction in the remote future. On every finite time interval, its distribution is absolutely continuous with respect to the law of the unconditioned process. A martingale problem characterization is also given. Several results on the long time behavior of the conditioned mass process - the conditioned multitype Feller branching diffusion - are then proved. The general case is first considered, where the mutation matrix which models the interaction between the types, is irreducible. Several two-type models with decomposable mutation matrices are analyzed too .
Kaukasische Verwicklungen
(2008)
We present an algorithm that computes a function that assigns consecutive integers to trees recognized by a deterministic, acyclic, finite-state, bottom-up tree automaton. Such function is called minimal perfect hashing. It can be used to identify trees recognized by the automaton. Its value may be seen as an index in some other data structures. We also present an algorithm for inverted hashing.