publish.UP Extern

Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus (2008)

Boston, Marisa Ferrara ; Hale, John ; Kliegl, Reinhold ; Patil, Umesh ; Vasishth, Shravan

The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect of surprisal while taking into account unigram and bigram frequency, word length, and empirically-derived word predictability; the so-called “early” and “late” measures of processing difficulty both showed an effect of surprisal. Surprisal is also shown to have a small but statistically non-significant effect on empirically-derived predictability itself. This work thus demonstrates the importance of including parsing costs as a predictor of comprehension difficulty in models of reading, and suggests that a simple identification of syntactic parsing costs with early measures and late measures with durations of post-syntactic events may be difficult to uphold.

Parafoveal processing in reading: Manipulating n+1 and n+2 previews simultaneously (2008)

Angele, Bernhard ; Slattery, Timothy J. ; Yang, Jinmian ; Kliegl, Reinhold ; Rayner, Keith

The boundary paradigm (Rayner, 1975) with a novel preview manipulation was used to examine the extent of parafoveal processing of words to the right of fixation. Words n+1 and n+2 had either correct or incorrect previews prior to fixation (prior to crossing the boundary location). In addition, the manipulation utilized either a high or low frequency word in word n+1 location on the assumption that it would be more likely that n+2 preview effects could be obtained when word n+1 was high frequency. The primary findings were that there was no evidence for a preview benefit for word n+2 and no evidence for parafoveal-on-foveal effects when word n+1 is at least four letters long. We discuss implications for models of eye-movement control in reading.

The comprehension of figurative language : electrophysiological evidence on the processing of irony (2008)

Regel, Stefanie

This dissertation investigates the comprehension of figurative language, in particular the temporal processing of verbal irony. In six experiments using event-related potentials(ERP) brain activity during the comprehension of ironic utterances in relation to equivalent non-ironic utterances was measured and analyzed. Moreover, the impact of various language-accompanying cues, e.g., prosody or the use of punctuation marks, as well as non-verbal cues such as pragmatic knowledge has been examined with respect to the processing of irony. On the basis of these findings different models on figurative language comprehension, i.e., the 'standard pragmatic model', the 'graded salience hypothesis', and the 'direct access view', are discussed.

Supernova-driven turbulence and magnetic field amplification in disk galaxies (2008)

Gressel, Oliver

Supernovae are known to be the dominant energy source for driving turbulence in the interstellar medium. Yet, their effect on magnetic field amplification in spiral galaxies is still poorly understood. Analytical models based on the uncorrelated-ensemble approach predicted that any created field will be expelled from the disk before a significant amplification can occur. By means of direct simulations of supernova-driven turbulence, we demonstrate that this is not the case. Accounting for vertical stratification and galactic differential rotation, we find an exponential amplification of the mean field on timescales of 100Myr. The self-consistent numerical verification of such a “fast dynamo” is highly beneficial in explaining the observed strong magnetic fields in young galaxies. We, furthermore, highlight the importance of rotation in the generation of helicity by showing that a similar mechanism based on Cartesian shear does not lead to a sustained amplification of the mean magnetic field. This finding impressively confirms the classical picture of a dynamo based on cyclonic turbulence.

Chloroplasts as bioreactors : high-yield production of active bacteriolytic protein antibiotics (2008)

Oey, Melanie

Plants, more precisely their chloroplasts with their bacterial-like expression machinery inherited from their cyanobacterial ancestors, can potentially offer a cheap expression system for proteinaceous pharmaceuticals. This system would be easily scalable and provides appropriate safety due to chloroplasts maternal inheritance. In this work, it was shown that three phage lytic enzymes (Pal, Cpl-1 and PlyGBS) could be successfully expressed at very high levels and with high stability in tobacco chloroplasts. PlyGBS expression reached an amount of foreign protein accumulation (> 70% TSP) that has never been obtained before. Although the high expression levels of PlyGBS caused a pale green phenotype with retarded growth, presumably due to exhaustion of plastid protein synthesis capacity, development and seed production were not impaired under greenhouse conditions. Since Pal and Cpl-1 showed toxic effects when expressed in E. coli, a special plastid transformation vector (pTox) was constructed to allow DNA amplification in bacteria. The construction of the pTox transformation vector allowing a recombinase-mediated deletion of an E. coli transcription block in the chloroplast, leading to an increase of foreign protein accumulation to up to 40% of TSP for Pal and 20% of TSP for Cpl-1. High dose-dependent bactericidal efficiency was shown for all three plant-derived lytic enzymes using their pathogenic target bacteria S. pyogenes and S. pneumoniae. Confirmation of specificity was obtained for the endotoxic proteins Pal and Cpl-1 by application to E. coli cultures. These results establish tobacco chloroplasts as a new cost-efficient and convenient production platform for phage lytic enzymes and address the greatest obstacle for clinical application. The present study is the first report of lysin production in a non-bacterial system. The properties of chloroplast-produced lysins described in this work, their stability, high accumulation rate and biological activity make them highly attractive candidates for future antibiotics.

Finite-state rule deduction for parsing non-constituent coordination (2008)

Zarrieß, Sina ; Seeker, Wolfgang

In this paper, we present a finite-state approach to constituency and therewith an analysis of coordination phenomena involving so-called non-constituents. We show that non-constituents can be seen as parts of fully-fledged constituents and therefore be coordinated in the same way. We have implemented an algorithm based on finite state automata that generates an LFG grammar assigning valid analyses to non-constituent coordination structures in the German language.

Transducers from parallel replace rules and modes with generalized lenient composition (2008)

Yli-Jyrä, Anssi

Generalized Two-Level Grammar (GTWOL) provides a new method for compilation of parallel replacement rules into transducers. The current paper identifies the role of generalized lenient composition (GLC) in this method. Thanks to the GLC operation, the compilation method becomes bipartite and easily extendible to capture various application modes. In the light of three notions of obligatoriness, a modification to the compilation method is proposed. We argue that the bipartite design makes implementation of parallel obligatoriness, directionality, length and rank based application modes extremely easy, which is the main result of the paper.

On resolving long distance dependencies in Russian verbs (2008)

Saléschus, Dirk

Morphological analyses based on word syntax approaches can encounter difficulties with long distance dependencies. The reason is that in some cases an affix has to have access to the inner structure of the form with which it combines. One solution is the percolation of features from ther inner morphemes to the outer morphemes with some process of feature unification. However, the obstacle of percolation constraints or stipulated features has lead some linguists to argue in favour of other frameworks such as, e.g., realizational morphology or parallel approaches like optimality theory. This paper proposes a linguistic analysis of two long distance dependencies in the morphology of Russian verbs, namely secondary imperfectivization and deverbal nominalization.We show how these processes can be reanalysed as local dependencies. Although finitestate frameworks are not bound by such linguistically motivated considerations, we present an implementation of our analysis as proposed in [1] that does not complicate the grammar or enlarge the network unproportionally.

ExPRESS : extraction pattern recognition engine and specification suite (2008)

Piskorski, Jakub

The emergence of information extraction (IE) oriented pattern engines has been observed during the last decade. Most of them exploit heavily finite-state devices. This paper introduces ExPRESS – a new extraction pattern engine, whose rules are regular expressions over flat feature structures. The underlying pattern language is a blend of two previously introduced IE oriented pattern formalisms, namely, JAPE, used in the widely known GATE system, and the unificationbased XTDL formalism used in SProUT. A brief and technical overview of ExPRESS, its pattern language and the pool of its native linguistic components is given. Furthermore, the implementation of the grammar interpreter is addressed too.

ME-CSSR : an extension of CSSR using maximum entropy models (2008)

Padró, Muntsa ; Padró, Lluís

In this work an extension of CSSR algorithm using Maximum Entropy Models is introduced. Preliminary experiments to perform Named Entity Recognition with this new system are presented.

Phrase-based finite state models (2008)

González, Jorge ; Casacuberta, Francisco

In the last years, statistical machine translation has already demonstrated its usefulness within a wide variety of translation applications. In this line, phrase-based alignment models have become the reference to follow in order to build competitive systems. Finite state models are always an interesting framework because there are well-known efficient algorithms for their representation and manipulation. This document is a contribution to the evolution of finite state models towards a phrase-based approach. The inference of stochastic transducers that are based on bilingual phrases is carefully analysed from a finite state point of view. Indeed, the algorithmic phenomena that have to be taken into account in order to deal with such phrase-based finite state models when in decoding time are also in-depth detailed.

Temporal propositions as regular languages (2008)

Fernando, Tim

Temporal propositions are mapped to sets of strings that witness (in a precise sense) the propositions over discrete linear Kripke frames. The strings are collected into regular languages to ensure the decidability of entailments given by inclusions between languages. (Various notions of bounded entailment are shown to be expressible as language inclusions.) The languages unwind computations implicit in the logical (and temporal) connectives via a system of finite-state constraints adapted from finite-state morphology. Applications to Hybrid Logic and non-monotonic inertial reasoning are briefly considered.

Syntactic error detection and correction in date expressions using finite-state transducers (2008)

Ilarraza, Arantza Díaz de ; Gojenola, Koldo ; Oronoz, Maite ; Otaegi, Maialen ; Alegria, Iñaki

This paper presents a system for the detection and correction of syntactic errors. It combines a robust morphosyntactic analyser and two groups of finite-state transducers specified using the Xerox Finite State Tool (xfst). One of the groups is used for the description of syntactic error patterns while the second one is used for the correction of the detected errors. The system has been tested on a corpus of real texts, containing both correct and incorrect sentences, with good results.

SynCoP : combining syntactic tagging with chunking using weighted finite state transducers (2008)

Didakowski, Jörg

This paper describes the key aspects of the system SynCoP (Syntactic Constraint Parser) developed at the Berlin-Brandenburgische Akademie der Wissenschaften. The parser allows to combine syntactic tagging and chunking by means of constraint grammar using weighted finite state transducers (WFST). Chunks are interpreted as local dependency structures within syntactic tagging. The linguistic theories are formulated by criteria which are formalized by a semiring; these criteria allow structural preferences and gradual grammaticality. The parser is essentially a cascade of WFSTs. To find the most likely syntactic readings a best-path search is used.

Perfect hashing tree automata (2008)

Daciuk, Jan

We present an algorithm that computes a function that assigns consecutive integers to trees recognized by a deterministic, acyclic, finite-state, bottom-up tree automaton. Such function is called minimal perfect hashing. It can be used to identify trees recognized by the automaton. Its value may be seen as an index in some other data structures. We also present an algorithm for inverted hashing.

Developing a finite-state morphological analyzer for Urdu and Hindi (2008)

Bögel, Tina ; Butt, Miriam ; Hautli, Annette ; Sulger, Sebastian

We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.

Intersection optimization is NP-complete (2008)

Bonfante, Guillaume ; Le Roux, Joseph

Finite state methods for natural language processing often require the construction and the intersection of several automata. In this paper, we investigate the question of determining the best order in which these intersections should be performed. We take as an example lexical disambiguation in polarity grammars. We show that there is no efficient way to minimize the state complexity of these intersections.

Segmentation in super-chunks with a finite-state approach (2008)

Blanc, Olivier ; Constant, Matthieu ; Watrin, Patrick

Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.

Finite-state compilation of feature structures for two-level morphology (2008)

Barthélemy, François

This paper describes a two-level formalism where feature structures are used in contextual rules. Whereas usual two-level grammars describe rational sets over symbol pairs, this new formalism uses tree structured regular expressions. They allow an explicit and precise definition of the scope of feature structures. A given surface form may be described using several feature structures. Feature unification is expressed in contextual rules using variables, like in a unification grammar. Grammars are compiled in finite state multi-tape transducers.

Asymmetric term alignment with selective contiguity constraints by multi-tape automata (2008)

Barbaiani, Mădălina ; Cancedda, Nicola ; Dance, Chris ; Fazekas, Szilárd ; Gaál, Tamás ; Gaussier, Éric

This article describes a HMM-based word-alignment method that can selectively enforce a contiguity constraint. This method has a direct application in the extraction of a bilingual terminological lexicon from a parallel corpus, but can also be used as a preliminary step for the extraction of phrase pairs in a Phrase-Based Statistical Machine Translation system. Contiguous source words composing terms are aligned to contiguous target language words. The HMM is transformed into a Weighted Finite State Transducer (WFST) and contiguity constraints are enforced by specific multi-tape WFSTs. The proposed method is especially suited when basic linguistic resources (morphological analyzer, part-of-speech taggers and term extractors) are available for the source language only.

Extern

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

55 search hits