publish.UP Extern

Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus (2008)

Boston, Marisa Ferrara ; Hale, John ; Kliegl, Reinhold ; Patil, Umesh ; Vasishth, Shravan

The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram and bigram frequency, word length, and empirically-derived word predictability; the so-called “early” and “late” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.

Parafoveal processing in reading: Manipulating n+1 and n+2 previews simultaneously (2008)

Angele, Bernhard ; Slattery, Timothy J. ; Yang, Jinmian ; Kliegl, Reinhold ; Rayner, Keith

The boundary paradigm (Rayner, 1975) with a novel preview manipulation was used to examine the extent of parafoveal processing of words to the right of fixation. Words n+1 and n+2 had either correct or incorrect previews prior to fixation (prior to crossing the boundary location). In addition, the manipulation utilized either a high or low frequency word in word n+1 location on the assumption that it would be more likely that n+2 preview effects could be obtained when word n+1 was high frequency. The primary findings were that there was no evidence for a preview benefit for word n+2 and no evidence for parafoveal-on-foveal effects when word n+1 is at least four letters long. We discuss implications for models of eye-movement control in reading.

Finite-state rule deduction for parsing non-constituent coordination (2008)

Zarrieß, Sina ; Seeker, Wolfgang

In this paper, we present a finite-state approach to constituency and therewith an analysis of coordination phenomena involving so-called non-constituents. We show that non-constituents can be seen as parts of fully-fledged constituents and therefore be coordinated in the same way. We have implemented an algorithm based on finite state automata that generates an LFG grammar assigning valid analyses to non-constituent coordination structures in the German language.

Transducers from parallel replace rules and modes with generalized lenient composition (2008)

Yli-Jyrä, Anssi

Generalized Two-Level Grammar (GTWOL) provides a new method for compilation of parallel replacement rules into transducers. The current paper identifies the role of generalized lenient composition (GLC) in this method. Thanks to the GLC operation, the compilation method becomes bipartite and easily extendible to capture various application modes. In the light of three notions of obligatoriness, a modification to the compilation method is proposed. We argue that the bipartite design makes implementation of parallel obligatoriness, directionality, length and rank based application modes extremely easy, which is the main result of the paper.

On resolving long distance dependencies in Russian verbs (2008)

Saléschus, Dirk

Morphological analyses based on word syntax approaches can encounter difficulties with long distance dependencies. The reason is that in some cases an affix has to have access to the inner structure of the form with which it combines. One solution is the percolation of features from ther inner morphemes to the outer morphemes with some process of feature unification. However, the obstacle of percolation constraints or stipulated features has lead some linguists to argue in favour of other frameworks such as, e.g., realizational morphology or parallel approaches like optimality theory. This paper proposes a linguistic analysis of two long distance dependencies in the morphology of Russian verbs, namely secondary imperfectivization and deverbal nominalization.We show how these processes can be reanalysed as local dependencies. Although finitestate frameworks are not bound by such linguistically motivated considerations, we present an implementation of our analysis as proposed in [1] that does not complicate the grammar or enlarge the network unproportionally.

ExPRESS : extraction pattern recognition engine and specification suite (2008)

Piskorski, Jakub

The emergence of information extraction (IE) oriented pattern engines has been observed during the last decade. Most of them exploit heavily finite-state devices. This paper introduces ExPRESS – a new extraction pattern engine, whose rules are regular expressions over flat feature structures. The underlying pattern language is a blend of two previously introduced IE oriented pattern formalisms, namely, JAPE, used in the widely known GATE system, and the unificationbased XTDL formalism used in SProUT. A brief and technical overview of ExPRESS, its pattern language and the pool of its native linguistic components is given. Furthermore, the implementation of the grammar interpreter is addressed too.

ME-CSSR : an extension of CSSR using maximum entropy models (2008)

Padró, Muntsa ; Padró, Lluís

In this work an extension of CSSR algorithm using Maximum Entropy Models is introduced. Preliminary experiments to perform Named Entity Recognition with this new system are presented.

Phrase-based finite state models (2008)

González, Jorge ; Casacuberta, Francisco

In the last years, statistical machine translation has already demonstrated its usefulness within a wide variety of translation applications. In this line, phrase-based alignment models have become the reference to follow in order to build competitive systems. Finite state models are always an interesting framework because there are well-known efficient algorithms for their representation and manipulation. This document is a contribution to the evolution of finite state models towards a phrase-based approach. The inference of stochastic transducers that are based on bilingual phrases is carefully analysed from a finite state point of view. Indeed, the algorithmic phenomena that have to be taken into account in order to deal with such phrase-based finite state models when in decoding time are also in-depth detailed.

Temporal propositions as regular languages (2008)

Fernando, Tim

Temporal propositions are mapped to sets of strings that witness (in a precise sense) the propositions over discrete linear Kripke frames. The strings are collected into regular languages to ensure the decidability of entailments given by inclusions between languages. (Various notions of bounded entailment are shown to be expressible as language inclusions.) The languages unwind computations implicit in the logical (and temporal) connectives via a system of finite-state constraints adapted from finite-state morphology. Applications to Hybrid Logic and non-monotonic inertial reasoning are briefly considered.

Syntactic error detection and correction in date expressions using finite-state transducers (2008)

Ilarraza, Arantza Díaz de ; Gojenola, Koldo ; Oronoz, Maite ; Otaegi, Maialen ; Alegria, Iñaki

This paper presents a system for the detection and correction of syntactic errors. It combines a robust morphosyntactic analyser and two groups of finite-state transducers specified using the Xerox Finite State Tool (xfst). One of the groups is used for the description of syntactic error patterns while the second one is used for the correction of the detected errors. The system has been tested on a corpus of real texts, containing both correct and incorrect sentences, with good results.

Extern

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

47 search hits