TY  - JOUR
A1  - Nicenboim, Bruno
A1  - Roettger, Timo B.
A1  - Vasishth, Shravan
T1  - Using meta-analysis for evidence synthesis
BT  - the case of incomplete neutralization in German
JF  - Journal of phonetics
N2  - Within quantitative phonetics, it is common practice to draw conclusions based on statistical significance alone Using incomplete neutralization of final devoicing in German as a case study, we illustrate the problems with this approach. If researchers find a significant acoustic difference between voiceless and devoiced obstruents, they conclude that neutralization is incomplete, and if they find no significant difference, they conclude that neutralization is complete. However, such strong claims regarding the existence or absence of an effect based on significant results alone can be misleading. Instead, the totality of available evidence should be brought to bear on the question. Towards this end, we synthesize the evidence from 14 studies on incomplete neutralization in German using a Bayesian random-effects meta-analysis. Our meta-analysis provides evidence in favor of incomplete neutralization. We conclude with some suggestions for improving the quality of future research on phonetic phenomena: ensure that sample sizes allow for high-precision estimates of the effect; avoid the temptation to deploy researcher degrees of freedom when analyzing data; focus on estimates of the parameter of interest and the uncertainty about that parameter; attempt to replicate effects found; and, whenever possible, make both the data and analysis available publicly. (c) 2018 Elsevier Ltd. All rights reserved.
KW  - Meta-analysis
KW  - Incomplete neutralization
KW  - Final devoicing
KW  - German
KW  - Bayesian data analysis
Y1  - 2018
U6  - https://doi.org/10.1016/j.wocn.2018.06.001
SN  - 0095-4470
VL  - 70
SP  - 39
EP  - 55
PB  - Elsevier
CY  - London
ER  - 
TY  - JOUR
A1  - Vasishth, Shravan
A1  - Mertzen, Daniela
A1  - Jaeger, Lena A.
A1  - Gelman, Andrew
T1  - The statistical significance filter leads to overoptimistic expectations of replicability
JF  - Journal of memory and language
N2  - It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overoptimistic expectations of replicability. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving current practices.
KW  - Type M error
KW  - Replicability
KW  - Surprisal
KW  - Expectation
KW  - Locality
KW  - Bayesian data analysis
KW  - Parameter estimation
Y1  - 2018
U6  - https://doi.org/10.1016/j.jml.2018.07.004
SN  - 0749-596X
SN  - 1096-0821
VL  - 103
SP  - 151
EP  - 175
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - JOUR
A1  - Paape, Dario L. J. F.
A1  - Hemforth, Barbara
A1  - Vasishth, Shravan
T1  - Processing of ellipsis with garden-path antecedents in French and German
BT  - Evidence from eye tracking
JF  - PLoS ONE
N2  - In a self-paced reading study on German sluicing, Paape (Paape, 2016) found that reading times were shorter at the ellipsis site when the antecedent was a temporarily ambiguous garden-path structure. As a post-hoc explanation of this finding, Paape assumed that the antecedent’s memory representation was reactivated during syntactic reanalysis, making it easier to retrieve. In two eye tracking experiments, we subjected the reactivation hypothesis to further empirical scrutiny. Experiment 1, carried out in French, showed no evidence in favor in the reactivation hypothesis. Instead, results for one out of the three types of garden-path sentences that were tested suggest that subjects sometimes failed to resolve the temporary ambiguity in the antecedent clause, and subsequently failed to resolve the ellipsis. The results of Experiment 2, a conceptual replication of Paape’s (Paape, 2016) original study carried out in German, are compatible with the reactivation hypothesis, but leave open the possibility that the observed speedup for ambiguous antecedents may be due to occasional retrievals of an incorrect structure.
KW  - verb-phrase ellipsis
KW  - lingering misinterpretation
KW  - sentence comprehension
KW  - memory
KW  - ambiguities
KW  - activation
KW  - hypothesis
KW  - discourse
KW  - clauses
Y1  - 2018
U6  - https://doi.org/10.1371/journal.pone.0198620
SN  - 1932-6203
VL  - 13
IS  - 6
SP  - 1
EP  - 46
PB  - PLOS
CY  - San Francisco
ER  - 
TY  - JOUR
A1  - Nicenboim, Bruno
A1  - Vasishth, Shravan
T1  - Models of retrieval in sentence comprehension
BT  - a computational evaluation using Bayesian hierarchical modeling
JF  - Journal of memory and language
N2  - Research on similarity-based interference has provided extensive evidence that the formation of dependencies between non-adjacent words relies on a cue-based retrieval mechanism. There are two different models that can account for one of the main predictions of interference, i.e., a slowdown at a retrieval site, when several items share a feature associated with a retrieval cue: Lewis and Vasishth’s (2005) activation-based model and McElree’s (2000) direct-access model. Even though these two models have been used almost interchangeably, they are based on different assumptions and predict differences in the relationship between reading times and response accuracy. The activation-based model follows the assumptions of the ACT-R framework, and its retrieval process behaves as a lognormal race between accumulators of evidence with a single variance. Under this model, accuracy of the retrieval is determined by the winner of the race and retrieval time by its rate of accumulation. In contrast, the direct-access model assumes a model of memory where only the probability of retrieval can be affected, while the retrieval time is drawn from the same distribution; in this model, differences in latencies are a by-product of the possibility of backtracking and repairing incorrect retrievals. We implemented both models in a Bayesian hierarchical framework in order to evaluate them and compare them. The data show that correct retrievals take longer than incorrect ones, and this pattern is better fit under the direct-access model than under the activation-based model. This finding does not rule out the possibility that retrieval may be behaving as a race model with assumptions that follow less closely the ones from the ACT-R framework. By introducing a modification of the activation model, i.e., by assuming that the accumulation of evidence for retrieval of incorrect items is not only slower but noisier (i.e., different variances for the correct and incorrect items), the model can provide a fit as good as the one of the direct-access model. This first ever computational evaluation of alternative accounts of retrieval processes in sentence processing opens the way for a broader investigation of theories of dependency completion.
KW  - Cognitive modeling
KW  - Sentence processing
KW  - Working memory
KW  - Cue-based retrieval
KW  - Similarity-based interference
KW  - Bayesian hierarchical modeling
Y1  - 2018
U6  - https://doi.org/10.1016/j.jml.2017.08.004
SN  - 0749-596X
SN  - 1096-0821
VL  - 99
SP  - 1
EP  - 34
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - JOUR
A1  - Nicenboim, Bruno
A1  - Vasishth, Shravan
A1  - Engelmann, Felix
A1  - Suckow, Katja
T1  - Exploratory and confirmatory analyses in sentence processing
BT  - a case study of number interference in German
JF  - Cognitive science : a multidisciplinary journal of anthropology, artificial intelligence, education, linguistics, neuroscience, philosophy, psychology ; journal of the Cognitive Science Society
N2  - Given the replication crisis in cognitive science, it is important to consider what researchers need to do in order to report results that are reliable. We consider three changes in current practice that have the potential to deliver more realistic and robust claims. First, the planned experiment should be divided into two stages, an exploratory stage and a confirmatory stage. This clear separation allows the researcher to check whether any results found in the exploratory stage are robust. The second change is to carry out adequately powered studies. We show that this is imperative if we want to obtain realistic estimates of effects in psycholinguistics. The third change is to use Bayesian data-analytic methods rather than frequentist ones; the Bayesian framework allows us to focus on the best estimates we can obtain of the effect, rather than rejecting a strawman null. As a case study, we investigate number interference effects in German. Number feature interference is predicted by cue-based retrieval models of sentence processing (Van Dyke & Lewis, 2003; Vasishth & Lewis, 2006), but it has shown inconsistent results. We show that by implementing the three changes mentioned, suggestive evidence emerges that is consistent with the predicted number interference effects.
KW  - Exploratory and confirmatory analyses
KW  - Sentence processing
KW  - Bayesian hierarchical modeling
KW  - Cue-based retrieval
KW  - Working memory
KW  - Similarity-based interference
KW  - Number interference
KW  - German
Y1  - 2018
U6  - https://doi.org/10.1111/cogs.12589
SN  - 0364-0213
SN  - 1551-6709
VL  - 42
SP  - 1075
EP  - 1100
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Vasishth, Shravan
A1  - Nicenboim, Bruno
A1  - Beckman, Mary E.
A1  - Li, Fangfang
A1  - Kong, Eun Jong
T1  - Bayesian data analysis in the phonetic sciences
BT  - a tutorial introduction
JF  - Journal of phonetics
N2  - This tutorial analyzes voice onset time (VOT) data from Dongbei (Northeastern) Mandarin Chinese and North American English to demonstrate how Bayesian linear mixed models can be fit using the programming language Stan via the R package brms. Through this case study, we demonstrate some of the advantages of the Bayesian framework: researchers can (i) flexibly define the underlying process that they believe to have generated the data; (ii) obtain direct information regarding the uncertainty about the parameter that relates the data to the theoretical question being studied; and (iii) incorporate prior knowledge into the analysis. Getting started with Bayesian modeling can be challenging, especially when one is trying to model one’s own (often unique) data. It is difficult to see how one can apply general principles described in textbooks to one’s own specific research problem. We address this barrier to using Bayesian methods by providing three detailed examples, with source code to allow easy reproducibility. The examples presented are intended to give the reader a flavor of the process of model-fitting; suggestions for further study are also provided. All data and code are available from:  https://osf.io/g4zpv.
KW  - Bayesian data analysis
KW  - Linear mixed models
KW  - Voice onset time
KW  - Gender effects
KW  - Vowel duration
Y1  - 2018
U6  - https://doi.org/10.1016/j.wocn.2018.07.008
SN  - 0095-4470
VL  - 71
SP  - 147
EP  - 161
PB  - Elsevier
CY  - London
ER  - 
TY  - JOUR
A1  - Mätzig, Paul
A1  - Vasishth, Shravan
A1  - Engelmann, Felix
A1  - Caplan, David
A1  - Burchert, Frank
T1  - A computational investigation of sources of variability in sentence comprehension difficulty in aphasia
JF  - Topics in cognitive science
N2  - We present a computational evaluation of three hypotheses about sources of deficit in sentence comprehension in aphasia: slowed processing, intermittent deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) model is used to implement these three proposals. Slowed processing is implemented as slowed execution time of parse steps; intermittent deficiency as increased random noise in activation of elements in memory; and resource reduction as reduced spreading activation. As data, we considered subject vs. object relative sentences, presented in a self-paced listening modality to 56 individuals with aphasia (IWA) and 46 matched controls. The participants heard the sentences and carried out a picture verification task to decide on an interpretation of the sentence. These response accuracies are used to identify the best parameters (for each participant) that correspond to the three hypotheses mentioned above. We show that controls have more tightly clustered (less variable) parameter values than IWA; specifically, compared to controls, among IWA there are more individuals with slow parsing times, high noise, and low spreading activation. We find that (a) individual IWA show differential amounts of deficit along the three dimensions of slowed processing, intermittent deficiency, and resource reduction, (b) overall, there is evidence for all three sources of deficit playing a role, and (c) IWA have a more variable range of parameter values than controls. An important implication is that it may be meaningless to talk about sources of deficit with respect to an abstract verage IWA; the focus should be on the individual's differential degrees of deficit along different dimensions, and on understanding the causes of variability in deficit between participants.
KW  - Sentence comprehension
KW  - Aphasia
KW  - Computational modeling
KW  - Cue-based retrieval
Y1  - 2018
U6  - https://doi.org/10.1111/tops.12323
SN  - 1756-8757
SN  - 1756-8765
VL  - 10
IS  - 1
SP  - 161
EP  - 174
PB  - Wiley
CY  - Hoboken
ER  -