Refine
Year of publication
Document Type
- Article (80)
- Postprint (16)
- Review (3)
- Monograph/Edited Volume (1)
- Other (1)
Language
- English (101) (remove)
Keywords
- German (11)
- eye-tracking (9)
- interference (8)
- locality (8)
- Eye movements (7)
- sentence processing (7)
- Bayesian data analysis (6)
- Reading (6)
- individual differences (6)
- self-paced reading (6)
Institute
An important aspect of aphasia is the observation of behavioral variability between and within individual participants. Our study addresses variability in sentence comprehension in German, by testing 21 individuals with aphasia and a control group and involving (a) several constructions (declarative sentences, relative clauses and control structures with an overt pronoun or PRO), (b) three response tasks (object manipulation, sentence-picture matching with/without self-paced listening), and (c) two test phases (to investigate test-retest performance). With this systematic, large-scale study we gained insights into variability in sentence comprehension. We found that the size of syntactic effects varied both in aphasia and in control participants. Whereas variability in control participants led to systematic changes, variability in individuals with aphasia was unsystematic across test phases or response tasks. The persistent occurrence of canonicity and interference effects across response tasks and test phases, however, shows that the performance is systematically influenced by syntactic complexity.
Among theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt's eye-tracking data. The features can easily be plugged into existing parsing models (including cue-based retrieval and self-organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.
In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. How effective is this policy? We compared 59 papers published before, and 59 papers published after, the policy took effect. After the policy was in place, the rate of data sharing increased by more than 50%. We further looked at whether papers published under the open data policy were reproducible, in the sense that the published results should be possible to regenerate given the data, and given the code, when code was provided. For 8 out of the 59 papers, data sets were inaccessible. The reproducibility rate ranged from 34% to 56%, depending on the reproducibility criteria. The strongest predictor of whether an attempt to reproduce would be successful is the presence of the analysis code: it increases the probability of reproducing reported results by almost 40%. We propose two simple steps that can increase the reproducibility of published papers: share the analysis code, and attempt to reproduce one's own analysis using only the shared materials.
In this paper we examine the effect of uncertainty on readers’ predictions about meaning. In particular, we were interested in how uncertainty might influence the likelihood of committing to a specific sentence meaning. We conducted two event-related potential (ERP) experiments using particle verbs such as turn down and manipulated uncertainty by constraining the context such that readers could be either highly certain about the identity of a distant verb particle, such as turn the bed […] down, or less certain due to competing particles, such as turn the music […] up/down. The study was conducted in German, where verb particles appear clause-finally and may be separated from the verb by a large amount of material. We hypothesised that this separation would encourage readers to predict the particle, and that high certainty would make prediction of a specific particle more likely than lower certainty. If a specific particle was predicted, this would reflect a strong commitment to sentence meaning that should incur a higher processing cost if the prediction is wrong. If a specific particle was less likely to be predicted, commitment should be weaker and the processing cost of a wrong prediction lower. If true, this could suggest that uncertainty discourages predictions via an unacceptable cost-benefit ratio. However, given the clear predictions made by the literature, it was surprisingly unclear whether the uncertainty manipulation affected the two ERP components studied, the N400 and the PNP. Bayes factor analyses showed that evidence for our a priori hypothesised effect sizes was inconclusive, although there was decisive evidence against a priori hypothesised effect sizes larger than 1μV for the N400 and larger than 3μV for the PNP. We attribute the inconclusive finding to the properties of verb-particle dependencies that differ from the verb-noun dependencies in which the N400 and PNP are often studied.
In this paper we examine the effect of uncertainty on readers’ predictions about meaning. In particular, we were interested in how uncertainty might influence the likelihood of committing to a specific sentence meaning. We conducted two event-related potential (ERP) experiments using particle verbs such as turn down and manipulated uncertainty by constraining the context such that readers could be either highly certain about the identity of a distant verb particle, such as turn the bed […] down, or less certain due to competing particles, such as turn the music […] up/down. The study was conducted in German, where verb particles appear clause-finally and may be separated from the verb by a large amount of material. We hypothesised that this separation would encourage readers to predict the particle, and that high certainty would make prediction of a specific particle more likely than lower certainty. If a specific particle was predicted, this would reflect a strong commitment to sentence meaning that should incur a higher processing cost if the prediction is wrong. If a specific particle was less likely to be predicted, commitment should be weaker and the processing cost of a wrong prediction lower. If true, this could suggest that uncertainty discourages predictions via an unacceptable cost-benefit ratio. However, given the clear predictions made by the literature, it was surprisingly unclear whether the uncertainty manipulation affected the two ERP components studied, the N400 and the PNP. Bayes factor analyses showed that evidence for our a priori hypothesised effect sizes was inconclusive, although there was decisive evidence against a priori hypothesised effect sizes larger than 1μV for the N400 and larger than 3μV for the PNP. We attribute the inconclusive finding to the properties of verb-particle dependencies that differ from the verb-noun dependencies in which the N400 and PNP are often studied.
Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for effects with more than one numerator degrees of freedom, e.g., for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. Because researchers typically have specific hypotheses about which condition means differ from each other, a priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be "tested instead of, rather than as a supplement to, the ordinary 'omnibus' F test" (Hays, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, polynomial, custom, nested, interaction contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more useful confirmatory tests of experimental hypotheses than standard omnibus F-tests. Reproducible code is available from https://osf.io/7ukf6/.
In eye-movement control during reading, advanced process-oriented models have been developed to reproduce behavioral data. So far, model complexity and large numbers of model parameters prevented rigorous statistical inference and modeling of interindividual differences. Here we propose a Bayesian approach to both problems for one representative computational model of sentence reading (SWIFT; Engbert et al., Psychological Review, 112, 2005, pp. 777-813). We used experimental data from 36 subjects who read the text in a normal and one of four manipulated text layouts (e.g., mirrored and scrambled letters). The SWIFT model was fitted to subjects and experimental conditions individually to investigate between- subject variability. Based on posterior distributions of model parameters, fixation probabilities and durations are reliably recovered from simulated data and reproduced for withheld empirical data, at both the experimental condition and subject levels. A subsequent statistical analysis of model parameters across reading conditions generates model-driven explanations for observable effects between conditions.
We present a computational evaluation of three hypotheses about sources of deficit in sentence comprehension in aphasia: slowed processing, intermittent deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) model is used to implement these three proposals. Slowed processing is implemented as slowed execution time of parse steps; intermittent deficiency as increased random noise in activation of elements in memory; and resource reduction as reduced spreading activation. As data, we considered subject vs. object relative sentences, presented in a self-paced listening modality to 56 individuals with aphasia (IWA) and 46 matched controls. The participants heard the sentences and carried out a picture verification task to decide on an interpretation of the sentence. These response accuracies are used to identify the best parameters (for each participant) that correspond to the three hypotheses mentioned above. We show that controls have more tightly clustered (less variable) parameter values than IWA; specifically, compared to controls, among IWA there are more individuals with slow parsing times, high noise, and low spreading activation. We find that (a) individual IWA show differential amounts of deficit along the three dimensions of slowed processing, intermittent deficiency, and resource reduction, (b) overall, there is evidence for all three sources of deficit playing a role, and (c) IWA have a more variable range of parameter values than controls. An important implication is that it may be meaningless to talk about sources of deficit with respect to an abstract verage IWA; the focus should be on the individual's differential degrees of deficit along different dimensions, and on understanding the causes of variability in deficit between participants.
Research on similarity-based interference has provided extensive evidence that the formation of dependencies between non-adjacent words relies on a cue-based retrieval mechanism. There are two different models that can account for one of the main predictions of interference, i.e., a slowdown at a retrieval site, when several items share a feature associated with a retrieval cue: Lewis and Vasishth’s (2005) activation-based model and McElree’s (2000) direct-access model. Even though these two models have been used almost interchangeably, they are based on different assumptions and predict differences in the relationship between reading times and response accuracy. The activation-based model follows the assumptions of the ACT-R framework, and its retrieval process behaves as a lognormal race between accumulators of evidence with a single variance. Under this model, accuracy of the retrieval is determined by the winner of the race and retrieval time by its rate of accumulation. In contrast, the direct-access model assumes a model of memory where only the probability of retrieval can be affected, while the retrieval time is drawn from the same distribution; in this model, differences in latencies are a by-product of the possibility of backtracking and repairing incorrect retrievals. We implemented both models in a Bayesian hierarchical framework in order to evaluate them and compare them. The data show that correct retrievals take longer than incorrect ones, and this pattern is better fit under the direct-access model than under the activation-based model. This finding does not rule out the possibility that retrieval may be behaving as a race model with assumptions that follow less closely the ones from the ACT-R framework. By introducing a modification of the activation model, i.e., by assuming that the accumulation of evidence for retrieval of incorrect items is not only slower but noisier (i.e., different variances for the correct and incorrect items), the model can provide a fit as good as the one of the direct-access model. This first ever computational evaluation of alternative accounts of retrieval processes in sentence processing opens the way for a broader investigation of theories of dependency completion.
We used Chinese prenominal relative clauses (RCs) to test the predictions of two competing accounts of sentence comprehension difficulty: the experience-based account of Levy () and the Dependency Locality Theory (DLT; Gibson, ). Given that in Chinese RCs, a classifier and/or a passive marker BEI can be added to the sentence-initial position, we manipulated the presence/absence of classifiers and the presence/absence of BEI, such that BEI sentences were passivized subject-extracted RCs, and no-BEI sentences were standard object-extracted RCs. We conducted two self-paced reading experiments, using the same critical stimuli but somewhat different filler items. Reading time patterns from both experiments showed facilitative effects of BEI within and beyond RC regions, and delayed facilitative effects of classifiers, suggesting that cues that occur before a clear signal of an upcoming RC can help Chinese comprehenders to anticipate RC structures. The data patterns are not predicted by the DLT, but they are consistent with the predictions of experience-based theories.
Given the replication crisis in cognitive science, it is important to consider what researchers need to do in order to report results that are reliable. We consider three changes in current practice that have the potential to deliver more realistic and robust claims. First, the planned experiment should be divided into two stages, an exploratory stage and a confirmatory stage. This clear separation allows the researcher to check whether any results found in the exploratory stage are robust. The second change is to carry out adequately powered studies. We show that this is imperative if we want to obtain realistic estimates of effects in psycholinguistics. The third change is to use Bayesian data-analytic methods rather than frequentist ones; the Bayesian framework allows us to focus on the best estimates we can obtain of the effect, rather than rejecting a strawman null. As a case study, we investigate number interference effects in German. Number feature interference is predicted by cue-based retrieval models of sentence processing (Van Dyke & Lewis, 2003; Vasishth & Lewis, 2006), but it has shown inconsistent results. We show that by implementing the three changes mentioned, suggestive evidence emerges that is consistent with the predicted number interference effects.
Within quantitative phonetics, it is common practice to draw conclusions based on statistical significance alone Using incomplete neutralization of final devoicing in German as a case study, we illustrate the problems with this approach. If researchers find a significant acoustic difference between voiceless and devoiced obstruents, they conclude that neutralization is incomplete, and if they find no significant difference, they conclude that neutralization is complete. However, such strong claims regarding the existence or absence of an effect based on significant results alone can be misleading. Instead, the totality of available evidence should be brought to bear on the question. Towards this end, we synthesize the evidence from 14 studies on incomplete neutralization in German using a Bayesian random-effects meta-analysis. Our meta-analysis provides evidence in favor of incomplete neutralization. We conclude with some suggestions for improving the quality of future research on phonetic phenomena: ensure that sample sizes allow for high-precision estimates of the effect; avoid the temptation to deploy researcher degrees of freedom when analyzing data; focus on estimates of the parameter of interest and the uncertainty about that parameter; attempt to replicate effects found; and, whenever possible, make both the data and analysis available publicly. (c) 2018 Elsevier Ltd. All rights reserved.
This tutorial analyzes voice onset time (VOT) data from Dongbei (Northeastern) Mandarin Chinese and North American English to demonstrate how Bayesian linear mixed models can be fit using the programming language Stan via the R package brms. Through this case study, we demonstrate some of the advantages of the Bayesian framework: researchers can (i) flexibly define the underlying process that they believe to have generated the data; (ii) obtain direct information regarding the uncertainty about the parameter that relates the data to the theoretical question being studied; and (iii) incorporate prior knowledge into the analysis. Getting started with Bayesian modeling can be challenging, especially when one is trying to model one’s own (often unique) data. It is difficult to see how one can apply general principles described in textbooks to one’s own specific research problem. We address this barrier to using Bayesian methods by providing three detailed examples, with source code to allow easy reproducibility. The examples presented are intended to give the reader a flavor of the process of model-fitting; suggestions for further study are also provided. All data and code are available from: https://osf.io/g4zpv.
It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overoptimistic expectations of replicability. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving current practices.
Sentence comprehension requires that the comprehender work out who did what to whom. This process has been characterized as retrieval from memory. This review summarizes the quantitative predictions and empirical coverage of the two existing computational models of retrieval and shows how the predictive performance of these two competing models can be tested against a benchmark data-set. We also show how computational modeling can help us better understand sources of variability in both unimpaired and impaired sentence comprehension.
We report a comprehensive review of the published reading studies on retrieval interference in reflexive-/reciprocal-antecedent and subject-verb dependencies. We also provide a quantitative random-effects meta-analysis of eyetracking and self-paced reading studies. We show that the empirical evidence is only partly consistent with cue-based retrieval as implemented in the ACT-R-based model of sentence processing by Lewis and Vasishth (2005) (LV05) and that there are important differences between the reviewed dependency types. In non-agreement subject-verb dependencies, there is evidence for inhibitory interference in configurations where the correct dependent fully matches the retrieval cues. This is consistent with the LV05 cue-based retrieval account. By contrast, in subject-verb agreement as well as in reflexive-/reciprocal-antecedent dependencies, no evidence for inhibitory interference is found in configurations with a fully cue-matching subject/antecedent. In configurations with only a partially cue-matching subject or antecedent, the meta-analysis reveals facilitatory interference in subject-verb agreement and inhibitory interference in reflexives/reciprocals. The former is consistent with the LV05 account, but the latter is not. Moreover, the meta-analysis reveals that (i) interference type (proactive versus retroactive) leads to different effects in the reviewed dependency types and (ii) the prominence of the distractor strongly influences the interference effect. In sum, the meta-analysis suggests that the LV05 needs important modifications to account for the unexplained interference patterns and the differences between the dependency types. More generally, the meta-analysis provides a quantitative empirical basis for comparing the predictions of competing accounts of retrieval processes in sentence comprehension. (C) 2017 Elsevier Inc. All rights reserved.
Linear mixed-effects models have increasingly replaced mixed-model analyses of variance for statistical inference in factorial psycholinguistic experiments. Although LMMs have many advantages over ANOVA, like ANOVAs, setting them up for data analysis also requires some care. One simple option, when numerically possible, is to fit the full variance covariance structure of random effects (the maximal model; Barr, Levy, Scheepers & Tily, 2013), presumably to keep Type I error down to the nominal a in the presence of random effects. Although it is true that fitting a model with only random intercepts may lead to higher Type I error, fitting a maximal model also has a cost: it can lead to a significant loss of power. We demonstrate this with simulations and suggest that for typical psychological and psycholinguistic data, higher power is achieved without inflating Type I error rate if a model selection criterion is used to select a random effect structure that is supported by the data. (C) 2017 The Authors. Published by Elsevier Inc.
Argument-head distance and processing complexity: Explaining both locality and antilocality effects
(2006)
Although proximity between arguments and verbs (locality) is a relatively robust determinant of sentence-processing difficulty (Hawkins 1998, 2001, Gibson 2000), increasing argument-verb distance can also facilitate processing (Konieczny 2000). We present two self-paced reading (SPR) experiments involving Hindi that provide further evidence of antilocality, and a third SPR experiment which suggests that similarity-based interference can attenuate this distance-based facilitation. A unified explanation of interference, locality, and antilocality effects is proposed via an independently motivated theory of activation decay and retrieval interference (Anderson et al. 2004).*
Individuals with agrammatic Broca's aphasia experience difficulty when processing reversible non-canonical sentences. Different accounts have been proposed to explain this phenomenon. The Trace Deletion account (Grodzinsky, 1995, 2000, 2006) attributes this deficit to an impairment in syntactic representations, whereas others (e.g., Caplan, Waters, Dede, Michaud, & Reddy, 2007; Haarmann, Just, & Carpenter, 1997) propose that the underlying structural representations are unimpaired, but sentence comprehension is affected by processing deficits, such as slow lexical activation, reduction in memory resources, slowed processing and/or intermittent deficiency, among others. We test the claims of two processing accounts, slowed processing and intermittent deficiency, and two versions of the Trace Deletion Hypothesis (TDH), in a computational framework for sentence processing (Lewis & Vasishth, 2005) implemented in ACT-R (Anderson, Byrne, Douglass, Lebiere, & Qin, 2004). The assumption of slowed processing is operationalized as slow procedural memory, so that each processing action is performed slower than normal, and intermittent deficiency as extra noise in the procedural memory, so that the parsing steps are more noisy than normal. We operationalize the TDH as an absence of trace information in the parse tree. To test the predictions of the models implementing these theories, we use the data from a German sentence—picture matching study reported in Hanne, Sekerina, Vasishth, Burchert, and De Bleser (2011). The data consist of offline (sentence-picture matching accuracies and response times) and online (eye fixation proportions) measures. From among the models considered, the model assuming that both slowed processing and intermittent deficiency are present emerges as the best model of sentence processing difficulty in aphasia. The modeling of individual differences suggests that, if we assume that patients have both slowed processing and intermittent deficiency, they have them in differing degrees.
With the arrival of the R packages nlme and lme4, linear mixed models (LMMs) have come to be widely used in experimentally-driven areas like psychology, linguistics, and cognitive science. This tutorial provides a practical introduction to fitting LMMs in a Bayesian framework using the probabilistic programming language Stan. We choose Stan (rather than WinBUGS or JAGS) because it provides an elegant and scalable framework for fitting models in most of the standard applications of LMMs. We ease the reader into fitting increasingly complex LMMs, using a two-condition repeated measures self-paced reading study.
Traxler, Pickering, and Clifton (1998) found that ambiguous sentences are read faster than their unambiguous counterparts. This so-called ambiguity advantage has presented a major challenge to classical theories of human sentence comprehension (parsing) because its most prominent explanation, in the form of the unrestricted race model (URM), assumes that parsing is non-deterministic. Recently, Swets, Desmet, Clifton, and Ferreira (2008) have challenged the URM. They argue that readers strategically underspecify the representation of ambiguous sentences to save time, unless disambiguation is required by task demands. When disambiguation is required, however, readers assign sentences full structure—and Swets et al. provide experimental evidence to this end. On the basis of their findings, they argue against the URM and in favor of a model of task-dependent sentence comprehension. We show through simulations that the Swets et al. data do not constitute evidence for task-dependent parsing because they can be explained by the URM. However, we provide decisive evidence from a German self-paced reading study consistent with Swets et al.'s general claim about task-dependent parsing. Specifically, we show that under certain conditions, ambiguous sentences can be read more slowly than their unambiguous counterparts, suggesting that the parser may create several parses, when required. Finally, we present the first quantitative model of task-driven disambiguation that subsumes the URM, and we show that it can explain both Swets et al.'s results and our findings.
An English double-embedded relative clause from which the middle verb is omitted can often be processed more easily than its grammatical counterpart, a phenomenon known as the grammaticality illusion. This effect has been found to be reversed in German, suggesting that the illusion is language specific rather than a consequence of universal working memory constraints. We present results from three self-paced reading experiments which show that Dutch native speakers also do not show the grammaticality illusion in Dutch, whereas both German and Dutch native speakers do show the illusion when reading English sentences. These findings provide evidence against working memory constraints as an explanation for the observed effect in English. We propose an alternative account based on the statistical patterns of the languages involved. In support of this alternative, a single recurrent neural network model that is trained on both Dutch and English sentences is shown to predict the cross-linguistic difference in the grammaticality effect.
Background: Individuals with aphasia (IWA) show deficits in comprehending object-extracted declaratives while comprehension of subject-extracted structures is relatively preserved. It is a matter of debate whether this subject–object asymmetry also arises for comprehension of wh-questions. Successful comprehension of wh-questions critically entails correct resolution of a filler–gap dependency. Most previous studies have used only offline accuracy measures to investigate wh-question comprehension in aphasia. Online studies exploring syntactic processing in real time are needed in order to draw inferences about gap-filling abilities in IWA and to identify the point of breakdown in sentence comprehension.
Aims: This study aimed at investigating processing of subject and object who-questions in German-speaking IWA and in a group of controls by combining an offline and online method. We further aimed to explore the impact of case-marking cues on processing of wh-questions.
Methods & Procedures: Applying a variant of the visual world eye-tracking paradigm, we measured participants’ eye movements while they performed the same offline task, which is frequently used to assess comprehension of declaratives (sentence–picture matching).
Outcomes & Results: Concerning online processing of who-questions in controls, we found anticipation of the most likely post-verbal theta-role immediately after processing the case-marked wh-pronoun in both subject and object questions. In addition, we observed an unexpected advantage of object over subject questions in terms of processing time. The offline results for IWA revealed that there were three heterogeneous patterns: (a) symmetrical comprehension with equal impairments for both question types, (b) asymmetrical performance with better comprehension of subject than object who-questions, and (c) a reversed asymmetry with better comprehension of object as compared to subject questions. For online processing of both types of who-questions, IWA showed retained abilities in postulating the gap and in associating the filler with this gap, although they were slower as compared to controls. Moreover, similarly to controls, they anticipated the most likely post-verbal theta-role.
Conclusions: For controls, the findings provide evidence for rapid resolution of the filler–gap dependency and incremental processing of case-marking cues, reflected in early prediction of upcoming syntactic structure. We attribute faster processing of object questions to faster alignment of the anticipated element with a semantically more salient character. For IWA, the online data provide evidence for retained predictive abilities in processing of filler–gap dependencies in wh-questions, but prediction was delayed. This is most likely attributed to delayed integration of case-marking cues.
Swets et al. (2008. Underspecification of syntactic ambiguities: Evidence from self-paced reading. Memory and Cognition, 36(1), 201–216) presented evidence that the so-called ambiguity advantage [Traxler et al. (1998 Traxler, M. J., Pickering, M. J., & Clifton, C. (1998). Adjunct attachment is not a form of lexical ambiguity resolution. Journal of Memory and Language, 39(4), 558–592. doi: 10.1006/jmla.1998.2600[CrossRef], [Web of Science ®], [Google Scholar]). Adjunct attachment is not a form of lexical ambiguity resolution. Journal of Memory and Language, 39(4), 558–592], which has been explained in terms of the Unrestricted Race Model, can equally well be explained by assuming underspecification in ambiguous conditions driven by task-demands. Specifically, if comprehension questions require that ambiguities be resolved, the parser tends to make an attachment: when questions are about superficial aspects of the target sentence, readers tend to pursue an underspecification strategy. It is reasonable to assume that individual differences in strategy will play a significant role in the application of such strategies, so that studying average behaviour may not be informative. In order to study the predictions of the good-enough processing theory, we implemented two versions of underspecification: the partial specification model (PSM), which is an implementation of the Swets et al. proposal, and a more parsimonious version, the non-specification model (NSM). We evaluate the relative fit of these two kinds of underspecification to Swets et al.’s data; as a baseline, we also fitted three models that assume no underspecification. We find that a model without underspecification provides a somewhat better fit than both underspecification models, while the NSM model provides a better fit than the PSM. We interpret the results as lack of unambiguous evidence in favour of underspecification; however, given that there is considerable existing evidence for good-enough processing in the literature, it is reasonable to assume that some underspecification might occur. Under this assumption, the results can be interpreted as tentative evidence for NSM over PSM. More generally, our work provides a method for choosing between models of real-time processes in sentence comprehension that make qualitative predictions about the relationship between several dependent variables. We believe that sentence processing research will greatly benefit from a wider use of such methods.
It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory—which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in—constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue-based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues.
We present the fundamental ideas underlying statistical hypothesis testing using the frequentist framework. We start with a simple example that builds up the one-sample t-test from the beginning, explaining important concepts such as the sampling distribution of the sample mean, and the iid assumption. Then, we examine the meaning of the p-value in detail and discuss several important misconceptions about what a p-value does and does not tell us. This leads to a discussion of Type I, II error and power, and Type S and M error. An important conclusion from this discussion is that one should aim to carry out appropriately powered studies. Next, we discuss two common issues that we have encountered in psycholinguistics and linguistics: running experiments until significance is reached and the ‘garden-of-forking-paths’ problem discussed by Gelman and others. The best way to use frequentist methods is to run appropriately powered studies, check model assumptions, clearly separate exploratory data analysis from planned comparisons decided upon before the study was run, and always attempt to replicate results.
SOPARSE predicts so-called local coherence effects: locally plausible but globally impossible parses of substrings can exert a distracting influence during sentence processing. Additionally, it predicts digging-in effects: the longer the parser stays committed to a particular analysis, the harder it becomes to inhibit that analysis. We investigated the interaction of these two predictions using German sentences. Results from a self-paced reading study show that the processing difficulty caused by a local coherence can be reduced by first allowing the globally correct parse to become entrenched, which supports SOPARSE’s assumptions.
We provide an introductory review of Bayesian data analytical methods, with a focus on applications for linguistics, psychology, psycholinguistics, and cognitive science. The empirically oriented researcher will benefit from making Bayesian methods part of their statistical toolkit due to the many advantages of this framework, among them easier interpretation of results relative to research hypotheses and flexible model specification. We present an informal introduction to the foundational ideas behind Bayesian data analysis, using, as an example, a linear mixed models analysis of data from a typical psycholinguistics experiment. We discuss hypothesis testing using the Bayes factor and model selection using cross-validation. We close with some examples illustrating the flexibility of model specification in the Bayesian framework. Suggestions for further reading are also provided.
It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue -based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues.
Understanding a sentence and integrating it into the discourse depends upon the identification of its focus, which, in spoken German, is marked by accentuation. In the case of written language, which lacks explicit cues to accent, readers have to draw on other kinds of information to determine the focus. We study the joint or interactive
effects of two kinds of information that have no direct representation in print but have each been shown to be influential in the reader’s text comprehension: (i) the (low-level)rhythmic-prosodic structure that is based on the distribution of lexically stressed syllables, and (ii) the (high-level) discourse context that is grounded in the memory of previous linguistic content. Systematically manipulating these factors, we examine the way readers resolve a syntactic ambiguity involving the scopally ambiguous focus operator auch (engl. “too”) in both oral (Experiment 1) and silent reading (Experiment 2). The results of both experiments attest that discourse context and local linguistic rhythm conspire to guide the syntactic and, oncomitantly, the focus-structural analysis of ambiguous sentences. We argue that reading comprehension requires the (implicit) assignment of accents according to the focus structure and that, by establishing a prominence profile, the implicit prosodic rhythm directly affects accent assignment.
Understanding a sentence and integrating it into the discourse depends upon the identification of its focus, which, in spoken German, is marked by accentuation. In the case of written language, which lacks explicit cues to accent, readers have to draw on other kinds of information to determine the focus. We study the joint or interactive effects of two kinds of information that have no direct representation in print but have each been shown to be influential in the reader's text comprehension: (i) the (low-level) rhythmic-prosodic structure that is based on the distribution of lexically stressed syllables, and (ii) the (high-level) discourse context that is grounded in the memory of previous linguistic content. Systematically manipulating these factors, we examine the way readers resolve a syntactic ambiguity involving the scopally ambiguous focus operator auch (engl. "too") in both oral (Experiment 1) and silent reading (Experiment 2). The results of both experiments attest that discourse context and local linguistic rhythm conspire to guide the syntactic and, concomitantly, the focus-structural analysis of ambiguous sentences. We argue that reading comprehension requires the (implicit) assignment of accents according to the focus structure and that, by establishing a prominence profile, the implicit prosodic rhythm directly affects accent assignment.
In a self-paced reading study on German sluicing, Paape (Paape, 2016) found that reading times were shorter at the ellipsis site when the antecedent was a temporarily ambiguous garden-path structure. As a post-hoc explanation of this finding, Paape assumed that the antecedent’s memory representation was reactivated during syntactic reanalysis, making it easier to retrieve. In two eye tracking experiments, we subjected the reactivation hypothesis to further empirical scrutiny. Experiment 1, carried out in French, showed no evidence in favor in the reactivation hypothesis. Instead, results for one out of the three types of garden-path sentences that were tested suggest that subjects sometimes failed to resolve the temporary ambiguity in the antecedent clause, and subsequently failed to resolve the ellipsis. The results of Experiment 2, a conceptual replication of Paape’s (Paape, 2016) original study carried out in German, are compatible with the reactivation hypothesis, but leave open the possibility that the observed speedup for ambiguous antecedents may be due to occasional retrievals of an incorrect structure.
In a self-paced reading study on German sluicing, Paape (Paape, 2016) found that reading times were shorter at the ellipsis site when the antecedent was a temporarily ambiguous garden-path structure. As a post-hoc explanation of this finding, Paape assumed that the antecedent’s memory representation was reactivated during syntactic reanalysis, making it easier to retrieve. In two eye tracking experiments, we subjected the reactivation hypothesis to further empirical scrutiny. Experiment 1, carried out in French, showed no evidence in favor in the reactivation hypothesis. Instead, results for one out of the three types of garden-path sentences that were tested suggest that subjects sometimes failed to resolve the temporary ambiguity in the antecedent clause, and subsequently failed to resolve the ellipsis. The results of Experiment 2, a conceptual replication of Paape’s (Paape, 2016) original study carried out in German, are compatible with the reactivation hypothesis, but leave open the possibility that the observed speedup for ambiguous antecedents may be due to occasional retrievals of an incorrect structure.
SOPARSE predicts so-called local coherence effects: locally plausible but globally impossible parses of substrings can exert a distracting influence during sentence processing. Additionally, it predicts digging-in effects: the longer the parser stays committed to a particular analysis, the harder it becomes to inhibit that analysis. We investigated the interaction of these two predictions using German sentences. Results from a self-paced reading study show that the processing difficulty caused by a local coherence can be reduced by first allowing the globally correct parse to become entrenched, which supports SOPARSE’s assumptions.
In two self-paced reading experiments, we investigated the effect of changes in antecedent complexity on processing times for ellipsis. Pointer- or “sharing”-based approaches to ellipsis processing (Frazier & Clifton 2001, 2005; Martin & McElree 2008) predict no effect of antecedent complexity on reading times at the ellipsis site while other accounts predict increased antecedent complexity to either slow down processing (Murphy 1985) or to speed it up (Hofmeister 2011). Experiment 1 manipulated antecedent complexity and elision, yielding evidence against a speedup at the ellipsis site and in favor of a null effect. In order to investigate possible superficial processing on part of participants, Experiment 2 manipulated the amount of attention required to correctly respond to end-of-sentence comprehension probes, yielding evidence against a complexity-induced slowdown at the ellipsis site. Overall, our results are compatible with pointer-based approaches while casting doubt on the notion that changes antecedent complexity lead to measurable differences in ellipsis processing speed.
In two self-paced reading experiments, we investigated the effect of changes in antecedent complexity on processing times for ellipsis. Pointer- or “sharing”-based approaches to ellipsis processing (Frazier & Clifton 2001, 2005; Martin & McElree 2008) predict no effect of antecedent complexity on reading times at the ellipsis site while other accounts predict increased antecedent complexity to either slow down processing (Murphy 1985) or to speed it up (Hofmeister 2011). Experiment 1 manipulated antecedent complexity and elision, yielding evidence against a speedup at the ellipsis site and in favor of a null effect. In order to investigate possible superficial processing on part of participants, Experiment 2 manipulated the amount of attention required to correctly respond to end-of-sentence comprehension probes, yielding evidence against a complexity-induced slowdown at the ellipsis site. Overall, our results are compatible with pointer-based approaches while casting doubt on the notion that changes antecedent complexity lead to measurable differences in ellipsis processing speed.
This is the first attempt at characterizing reading difficulty in Hindi using naturally occurring sentences. We created the Potsdam-Allahabad Hindi Eyetracking Corpus by recording eye-movement data from 30 participants at the University of Allahabad, India. The target stimuli were 153 sentences selected from the beta version of the Hindi-Urdu treebank. We find that word- or low-level predictors (syllable length, unigram and bigram frequency) affect first-pass reading times, regression path duration, total reading time, and outgoing saccade length. An increase in syllable length results in longer fixations, and an increase in word unigram and bigram frequency leads to shorter fixations. Longer syllable length and higher frequency lead to longer outgoing saccades. We also find that two predictors of sentence comprehension difficulty, integration and storage cost, have an effect on reading difficulty. Integration cost (Gibson, 2000) was approximated by calculating the distance (in words) between a dependent and head; and storage cost (Gibson, 2000), which measures difficulty of maintaining predictions, was estimated by counting the number of predicted heads at each point in the sentence. We find that integration cost mainly affects outgoing saccade length, and storage cost affects total reading times and outgoing saccade length. Thus, word-level predictors have an effect in both early and late measures of reading time, while predictors of sentence comprehension difficulty tend to affect later measures. This is, to our knowledge, the first demonstration using eye-tracking that both integration and storage cost influence reading difficulty.
Chinese relative clauses are an important test case for pitting the predictions of expectation-based accounts against those of memory-based theories. The memory-based accounts predict that object relatives are easier to process than subject relatives because, in object relatives, the distance between the relative clause verb and the head noun is shorter. By contrast, expectation-based accounts such as surprisal predict that the less frequent object relative should be harder to process. In previous studies on Chinese relative clause comprehension, local ambiguities may have rendered a comparison between relative clause types uninterpretable. We designed experimental materials in which no local ambiguities confound the comparison. We ran two experiments (self-paced reading and eye-tracking) to compare reading difficulty in subject and object relatives which were placed either in subject or object modifying position. The evidence from our studies is consistent with the predictions of expectation-based accounts but not with those of memory-based theories. (C) 2014 Elsevier Inc. All rights reserved.
There is a wealth of evidence showing that increasing the distance between an argument and its head leads to more processing effort, namely, locality effects: these are usually associated with constraints in working memory (DLT: Gibson, 2000: activation-based model: Lewis and Vasishth, 2005). In SOV languages, however, the opposite effect has been found: antilocality (see discussion in Levy et al., 2013). Antilocality effects can be explained by the expectation based approach as proposed by Levy (2008) or by the activation-based model of sentence processing as proposed by Lewis and Vasishth (2005). We report an eye-tracking and a self-paced reading study with sentences in Spanish together with measures of individual differences to examine the distinction between expectation- and memory based accounts, and within memory-based accounts the further distinction between DLT and the activation-based model. The experiments show that (i) antilocality effects as predicted by the expectation account appear only for high-capacity readers; (ii) increasing dependency length by interposing material that modifies the head of the dependency (the verb) produces stronger facilitation than increasing dependency length with material that does not modify the head; this is in agreement with the activation-based model but not with the expectation account; and (iii) a possible outcome of memory load on low-capacity readers is the increase in regressive saccades (locality effects as predicted by memory-based accounts) or, surprisingly, a speedup in the self-paced reading task; the latter consistent with good-enough parsing (Ferreira et al., 2002). In sum, the study suggests that individual differences in working memory capacity play a role in dependency resolution, and that some of the aspects of dependency resolution can be best explained with the activation-based model together with a prediction component.
Recent research has shown that brain potentials time-locked to fixations in natural reading can be similar to brain potentials recorded during rapid serial visual presentation (RSVP). We attempted two replications of Hagoort, Hald, Bastiaansen, and Petersson [Hagoort, P., Hald, L., Bastiaansen, M., & Petersson, K. M. Integration of word meaning and world knowledge in language comprehension. Science, 304, 438-441, 2004] to determine whether this correspondence also holds for oscillatory brain responses. Hagoort et al. reported an N400 effect and synchronization in the theta and gamma range following world knowledge violations. Our first experiment (n = 32) used RSVP and replicated both the N400 effect in the ERPs and the power increase in the theta range in the time-frequency domain. In the second experiment (n = 49), participants read the same materials freely while their eye movements and their EEG were monitored. First fixation durations, gaze durations, and regression rates were increased, and the ERP showed an N400 effect. An analysis of time-frequency representations showed synchronization in the delta range (1-3 Hz) and desynchronization in the upper alpha range (11-13 Hz) but no theta or gamma effects. The results suggest that oscillatory EEG changes elicited by world knowledge violations are different in natural reading and RSVP. This may reflect differences in how representations are constructed and retrieved from memory in the two presentation modes.
Comprehension of non-canonical sentences can be difficult for individuals with aphasia (IWA). It is still unclear to which extent morphological cues like case marking or verb inflection may influence IWA's performance or even help to override deficits in sentence comprehension. Until now, studies have mainly used offline methods to draw inferences about syntactic deficits and, so far, only a few studies have looked at online syntactic processing in aphasia. We investigated sentence processing in German-speaking IWA by combining an offline (sentence-picture matching) and an online (eye-tracking in the visual-world paradigm) method. Our goal was to determine whether IWA are capable of using inflectional morphology (number-agreement markers on verbs and case markers in noun phrases) as a cue to sentence interpretation. We report results of two visual-world experiments using German reversible SVO and OVS sentences. In each study, there were eight IWA and 20 age-matched controls. Experiment 1 targeted the role of unambiguous case morphology, while Experiment 2 looked at processing of number-agreement cues at the verb in caseambiguous sentences. IWA showed deficits in using both types of morphological markers as a cue to non-canonical sentence interpretation and the results indicate that in aphasia, processing of case-marking cues is more vulnerable as compared to verbagreement morphology. We ascribe this finding to the higher cue reliability of agreement cues, which renders them more resistant against impairments in aphasia. However, the online data revealed that IWA are in principle capable of successfully computing morphological cues, but the integration of morphological information is delayed as compared to age-matched controls. Furthermore, we found striking differences between controls and IWA regarding subject-before-object parsing predictions. While in case-unambiguous sentences IWA showed evidence for early subjectbefore-object parsing commitments, they exhibited no straightforward subject-first prediction in case-ambiguous sentences, although controls did so for ambiguous structures. IWA delayed their parsing decisions in case-ambiguous sentences until unambiguous morphological information, such as a subject-verbnumber-agreement cue, was available. We attribute the results for IWA to deficits in predictive processes based on morphosyntactic cues during sentence comprehension. The results indicate that IWA adopt a wait-and-see strategy and initiate prediction of upcoming syntactic structure only when unambiguous case or agreement cues are available. (C) 2015 Elsevier Ltd. All rights reserved.
We conducted two eye-tracking experiments investigating the processing of the Mandarin reflexive ziji in order to tease apart structurally constrained accounts from standard cue-based accounts of memory retrieval. In both experiments, we tested whether structurally inaccessible distractors that fulfill the animacy requirement of ziji influence processing times at the reflexive. In Experiment 1, we manipulated animacy of the antecedent and a structurally inaccessible distractor intervening between the antecedent and the reflexive. In conditions where the accessible antecedent mismatched the animacy cue, we found inhibitory interference whereas in antecedent-match conditions, no effect of the distractor was observed. In Experiment 2, we tested only antecedent-match configurations and manipulated locality of the reflexive-antecedent binding (Mandarin allows non-local binding). Participants were asked to hold three distractors (animate vs. inanimate nouns) in memory while reading the target sentence. We found slower reading times when animate distractors were held in memory (inhibitory interference). Moreover, we replicated the locality effect reported in previous studies. These results are incompatible with structure-based accounts. However, the cue-based ACT-R model of Lewis and Vasishth (2005) cannot explain the observed pattern either. We therefore extend the original ACT-R model and show how this model not only explains the data presented in this article, but is also able to account for previously unexplained patterns in the literature on reflexive processing.
Two classes of account have been proposed to explain the memory processes subserving the processing of reflexive-antecedent dependencies. Structure-based accounts assume that the retrieval of the antecedent is guided by syntactic tree-configurational information without considering other kinds of information such as gender marking in the case of English reflexives. By contrast, unconstrained cue-based retrieval assumes that all available information is used for retrieving the antecedent. Similarity-based interference effects from structurally illicit distractors which match a non-structural retrieval cue have been interpreted as evidence favoring the unconstrained cue-based retrieval account since cue-based retrieval interference from structurally illicit distractors is incompatible with the structure-based account. However, it has been argued that the observed effects do not necessarily reflect interference occurring at the moment of retrieval but might equally well be accounted for by interference occurring already at the stage of encoding or maintaining the antecedent in memory, in which case they cannot be taken as evidence against the structure-based account. We present three experiments (self-paced reading and eye-tracking) on German reflexives and Swedish reflexive and pronominal possessives in which we pit the predictions of encoding interference and cue-based retrieval interference against each other. We could not find any indication that encoding interference affects the processing ease of the reflexive-antecedent dependency formation. Thus, there is no evidence that encoding interference might be the explanation for the interference effects observed in previous work. We therefore conclude that invoking encoding interference may not be a plausible way to reconcile interference effects with a structure-based account of reflexive processing.
Scanpaths have played an important role in classic research on reading behavior. Nevertheless, they have largely been neglected in later research perhaps due to a lack of suitable analytical tools. Recently, von der Malsburg and Vasishth (2011) proposed a new measure for quantifying differences between scanpaths and demonstrated that this measure can recover effects that were missed with the traditional eyetracking measures. However, the sentences used in that study were difficult to process and scanpath effects accordingly strong. The purpose of the present study was to test the validity, sensitivity, and scope of applicability of the scanpath measure, using simple sentences that are typically read from left to right. We derived predictions for the regularity of scanpaths from the literature on oculomotor control, sentence processing, and cognitive aging and tested these predictions using the scanpath measure and a large database of eye movements. All predictions were confirmed: Sentences with short words and syntactically more difficult sentences elicited more irregular scanpaths. Also, older readers produced more irregular scanpaths than younger readers. In addition, we found an effect that was not reported earlier: Syntax had a smaller influence on the eye movements of older readers than on those of young readers. We discuss this interaction of syntactic parsing cost with age in terms of shifts in processing strategies and a decline of executive control as readers age. Overall, our results demonstrate the validity and sensitivity of the scanpath measure and thus establish it as a productive and versatile tool for reading research.
Expectation-driven facilitation (Hale, 2001; Levy, 2008) and locality-driven retrieval difficulty (Gibson, 1998, 2000; Lewis & Vasishth, 2005) are widely recognized to be two critical factors in incremental sentence processing; there is accumulating evidence that both can influence processing difficulty. However, it is unclear whether and how expectations and memory interact. We first confirm a key prediction of the expectation account: a Hindi self-paced reading study shows that when an expectation for an upcoming part of speech is dashed, building a rarer structure consumes more processing time than building a less rare structure. This is a strong validation of the expectation-based account. In a second study, we show that when expectation is strong, i.e., when a particular verb is predicted, strong facilitation effects are seen when the appearance of the verb is delayed; however, when expectation is weak, i.e., when only the part of speech "verb' is predicted but a particular verb is not predicted, the facilitation disappears and a tendency towards a locality effect is seen. The interaction seen between expectation strength and distance shows that strong expectations cancel locality effects, and that weak expectations allow locality effects to emerge.
In explicit memory recall and recognition tasks, elaboration and contextual isolation both facilitate memory performance. Here, we investigate these effects in the context of sentence processing: targets for retrieval during online sentence processing of English object relative clause constructions differ in the amount of elaboration associated with the target noun phrase, or the homogeneity of superficial features (text color). Experiment 1 shows that greater elaboration for targets during the encoding phase reduces reading times at retrieval sites, but elaboration of non-targets has considerably weaker effects. Experiment 2 illustrates that processing isolated superficial features of target noun phrases-here, a green word in a sentence with words colored white-does not lead to enhanced memory performance, despite triggering longer encoding times. These results are interpreted in the light of the memory models of Nairne, 1990, 2001, 2006, which state that encoding remnants contribute to the set of retrieval cues that provide the basis for similarity-based interference effects.
Eye fixation durations during normal reading correlate with processing difficulty, but the specific cognitive mechanisms reflected in these measures are not well understood. This study finds support in German readers' eye fixations for two distinct difficulty metrics: surprisal, which reflects the change in probabilities across syntactic analyses as new words are integrated; and retrieval, which quantifies comprehension difficulty in terms of working memory constraints. We examine the predictions of both metrics using a family of dependency parsers indexed by an upper limit on the number of candidate syntactic analyses they retain at successive words. Surprisal models all fixation measures and regression probability. By contrast, retrieval does not model any measure in serial processing. As more candidate analyses are considered in parallel at each word, retrieval can account for the same measures as surprisal. This pattern suggests an important role for ranked parallelism in theories of sentence comprehension.