TY - JOUR
A1 - Schad, Daniel
A1 - Nicenboim, Bruno
A1 - Bürkner, Paul-Christian
A1 - Betancourt, Michael
A1 - Vasishth, Shravan
T1 - Workflow techniques for the robust use of bayes factors
JF - Psychological methods
N2 - Inferences about hypotheses are ubiquitous in the cognitive sciences. Bayes factors provide one general way to compare different hypotheses by their compatibility with the observed data. Those quantifications can then also be used to choose between hypotheses. While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions and it's unclear whether the details of the computational implementation (such as bridge sampling) are unbiased for complex analyses. Hem, we study how Bayes factors misbehave under different conditions. This includes a study of errors in the estimation of Bayes factors; the first-ever use of simulation-based calibration to test the accuracy and bias of Bayes factor estimates using bridge sampling; a study of the stability of Bayes factors against different MCMC draws and sampling variation in the data; and a look at the variability of decisions based on Bayes factors using a utility function. We outline a Bayes factor workflow that researchers can use to study whether Bayes factors are robust for their individual analysis. Reproducible code is available from haps://osf.io/y354c/.
Translational Abstract
In psychology and related areas, scientific hypotheses are commonly tested by asking questions like "is [some] effect present or absent." Such hypothesis testing is most often carried out using frequentist null hypothesis significance testing (NIIST). The NHST procedure is very simple: It usually returns a p-value, which is then used to make binary decisions like "the effect is present/abscnt." For example, it is common to see studies in the media that draw simplistic conclusions like "coffee causes cancer," or "coffee reduces the chances of geuing cancer." However, a powerful and more nuanced alternative approach exists: Bayes factors. Bayes factors have many advantages over NHST. However, for the complex statistical models that arc commonly used for data analysis today, computing Bayes factors is not at all a simple matter. In this article, we discuss the main complexities associated with computing Bayes factors. This is the first article to provide a detailed workflow for understanding and computing Bayes factors in complex statistical models. The article provides a statistically more nuanced way to think about hypothesis testing than the overly simplistic tendency to declare effects as being "present" or "absent".
KW - Bayes factors
KW - Bayesian model comparison
KW - prior
KW - posterior
KW - simulation-based calibration
Y1 - 2022
U6 - https://doi.org/10.1037/met0000472
SN - 1082-989X
SN - 1939-1463
VL - 28
IS - 6
SP - 1404
EP - 1426
PB - American Psychological Association
CY - Washington
ER -
TY - JOUR
A1 - Nicenboim, Bruno
A1 - Vasishth, Shravan
A1 - Rösler, Frank
T1 - Are words pre-activated probabilistically during sentence comprehension?
BT - evidence from new data and a Bayesian random-effects meta-analysis using publicly available data
JF - Neuropsychologia : an international journal in behavioural and cognitive neuroscience
N2 - Several studies (e.g., Wicha et al., 2003b; DeLong et al., 2005) have shown that readers use information from the sentential context to predict nouns (or some of their features), and that predictability effects can be inferred from the EEG signal in determiners or adjectives appearing before the predicted noun. While these findings provide evidence for the pre-activation proposal, recent replication attempts together with inconsistencies in the results from the literature cast doubt on the robustness of this phenomenon. Our study presents the first attempt to use the effect of gender on predictability in German to study the pre-activation hypothesis, capitalizing on the fact that all German nouns have a gender and that their preceding determiners can show an unambiguous gender marking when the noun phrase has accusative case. Despite having a relatively large sample size (of 120 subjects), both our preregistered and exploratory analyses failed to yield conclusive evidence for or against an effect of pre-activation. The sign of the effect is, however, in the expected direction: the more unexpected the gender of the determiner, the larger the negativity. The recent, inconclusive replication attempts by Nieuwland et al. (2018) and others also show effects with signs in the expected direction. We conducted a Bayesian random-ef-fects meta-analysis using our data and the publicly available data from these recent replication attempts. Our meta-analysis shows a relatively clear but very small effect that is consistent with the pre-activation account and demonstrates a very important advantage of the Bayesian data analysis methodology: we can incrementally accumulate evidence to obtain increasingly precise estimates of the effect of interest.
KW - ERP
KW - pre-activation
KW - predictions
KW - grammatical gender
KW - Bayesian meta-analysis
Y1 - 2020
U6 - https://doi.org/10.1016/j.neuropsychologia.2020.107427
SN - 0028-3932
SN - 1873-3514
VL - 142
PB - Elsevier Science
CY - Oxford
ER -
TY - JOUR
A1 - Stone, Kate
A1 - Vasishth, Shravan
A1 - von der Malsburg, Titus Raban
T1 - Does entropy modulate the prediction of German long-distance verb particles?
JF - PLOS ONE
N2 - In this paper we examine the effect of uncertainty on readers' predictions about meaning. In particular, we were interested in how uncertainty might influence the likelihood of committing to a specific sentence meaning. We conducted two event-related potential (ERP) experiments using particle verbs such as turn down and manipulated uncertainty by constraining the context such that readers could be either highly certain about the identity of a distant verb particle, such as turn the bed [...] down, or less certain due to competing particles, such as turn the music [...] up/down. The study was conducted in German, where verb particles appear clause-finally and may be separated from the verb by a large amount of material. We hypothesised that this separation would encourage readers to predict the particle, and that high certainty would make prediction of a specific particle more likely than lower certainty. If a specific particle was predicted, this would reflect a strong commitment to sentence meaning that should incur a higher processing cost if the prediction is wrong. If a specific particle was less likely to be predicted, commitment should be weaker and the processing cost of a wrong prediction lower. If true, this could suggest that uncertainty discourages predictions via an unacceptable cost-benefit ratio. However, given the clear predictions made by the literature, it was surprisingly unclear whether the uncertainty manipulation affected the two ERP components studied, the N400 and the PNP. Bayes factor analyses showed that evidence for our a priori hypothesised effect sizes was inconclusive, although there was decisive evidence against a priori hypothesised effect sizes larger than 1 mu Vfor the N400 and larger than 3 mu V for the PNP. We attribute the inconclusive finding to the properties of verb-particle dependencies that differ from the verb-noun dependencies in which the N400 and PNP are often studied.
Y1 - 2022
U6 - https://doi.org/10.1371/journal.pone.0267813
SN - 1932-6203
VL - 17
IS - 8
PB - PLOS
CY - San Francisco, California, US
ER -
TY - JOUR
A1 - Schad, Daniel
A1 - Betancourt, Michael
A1 - Vasishth, Shravan
T1 - Toward a principled Bayesian workflow in cognitive science
JF - Psychological methods
N2 - Experiments in research on memory, language, and in other areas of cognitive science are increasingly being analyzed using Bayesian methods. This has been facilitated by the development of probabilistic programming languages such as Stan, and easily accessible front-end packages such as brms. The utility of Bayesian methods, however, ultimately depends on the relevance of the Bayesian model, in particular whether or not it accurately captures the structure of the data and the data analyst's domain expertise. Even with powerful software, the analyst is responsible for verifying the utility of their model. To demonstrate this point, we introduce a principled Bayesian workflow (Betancourt, 2018) to cognitive science. Using a concrete working example, we describe basic questions one should ask about the model: prior predictive checks, computational faithfulness, model sensitivity, and posterior predictive checks. The running example for demonstrating the workflow is data on reading times with a linguistic manipulation of object versus subject relative clause sentences. This principled Bayesian workflow also demonstrates how to use domain knowledge to inform prior distributions. It provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model. Given the increasing use of Bayesian methods, we aim to discuss how these methods can be properly employed to obtain robust answers to scientific questions.
KW - workflow
KW - prior predictive checks
KW - posterior predictive checks
KW - model
KW - building
KW - Bayesian data analysis
Y1 - 2021
U6 - https://doi.org/10.1037/met0000275
SN - 1082-989X
SN - 1939-1463
VL - 26
IS - 1
SP - 103
EP - 126
PB - American Psychological Association
CY - Washington
ER -
TY - JOUR
A1 - Paape, Dario
A1 - Vasishth, Shravan
T1 - Estimating the true cost of garden pathing:
BT - a computational model of latent cognitive processes
JF - Cognitive science
N2 - What is the processing cost of being garden-pathed by a temporary syntactic ambiguity? We argue that comparing average reading times in garden-path versus non-garden-path sentences is not enough to answer this question. Trial-level contaminants such as inattention, the fact that garden pathing may occur non-deterministically in the ambiguous condition, and "triage" (rejecting the sentence without reanalysis; Fodor & Inoue, 2000) lead to systematic underestimates of the true cost of garden pathing. Furthermore, the "pure" garden-path effect due to encountering an unexpected word needs to be separated from the additional cost of syntactic reanalysis. To get more realistic estimates for the individual processing costs of garden pathing and syntactic reanalysis, we implement a novel computational model that includes trial-level contaminants as probabilistically occurring latent cognitive processes. The model shows a good predictive fit to existing reading time and judgment data. Furthermore, the latent-process approach captures differences between noun phrase/zero complement (NP/Z) garden-path sentences and semantically biased reduced relative clause (RRC) garden-path sentences: The NP/Z garden path occurs nearly deterministically but can be mostly eliminated by adding a comma. By contrast, the RRC garden path occurs with a lower probability, but disambiguation via semantic plausibility is not always effective.
KW - garden-path effect
KW - syntactic reanalysis
KW - multinomial processing tree
KW - latent processes
KW - mixture modeling
Y1 - 2022
U6 - https://doi.org/10.1111/cogs.13186
SN - 0364-0213
SN - 1551-6709
VL - 46
IS - 8
PB - Wiley-Blackwell
CY - Malden, Mass.
ER -
TY - JOUR
A1 - Schad, Daniel
A1 - Vasishth, Shravan
T1 - The posterior probability of a null hypothesis given a statistically significant result
JF - The quantitative methods for psychology
N2 - When researchers carry out a null hypothesis significance test, it is tempting to assume that a statistically significant result lowers Prob(H0), the probability of the null hypothesis being true. Technically, such a statement is meaningless for various reasons: e.g., the null hypothesis does not have a probability associated with it. However, it is possible to relax certain assumptions to compute the posterior probability Prob(H0) under repeated sampling. We show in a step-by-step guide that the intuitively appealing belief, that Prob(H0) is low when significant results have been obtained under repeated sampling, is in general incorrect and depends greatly on: (a) the prior probability of the null being true; (b) type-I error rate, (c) type-II error rate, and (d) replication of a result. Through step-by-step simulations using open-source code in the R System of Statistical Computing, we show that uncertainty about the null hypothesis being true often remains high despite a significant result. To help the reader develop intuitions about this common misconception, we provide a Shiny app (https://danielschad.shinyapps.io/probnull/). We expect that this tutorial will help researchers better understand and judge results from null hypothesis significance tests.
KW - Null hypothesis significance testing
KW - Bayesian inference
KW - statistical
KW - power
Y1 - 2022
U6 - https://doi.org/10.20982/tqmp.18.2.p011
SN - 1913-4126
SN - 2292-1354
VL - 18
IS - 2
SP - 130
EP - 141
PB - University of Montreal, Department of Psychology
CY - Montreal
ER -
TY - JOUR
A1 - Vasishth, Shravan
A1 - Gelman, Andrew
T1 - How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis
JF - Linguistics : an interdisciplinary journal of the language sciences
N2 - The use of statistical inference in linguistics and related areas like psychology typically involves a binary decision: either reject or accept some null hypothesis using statistical significance testing. When statistical power is low, this frequentist data-analytic approach breaks down: null results are uninformative, and effect size estimates associated with significant results are overestimated. Using an example from psycholinguistics, several alternative approaches are demonstrated for reporting inconsistencies between the data and a theoretical prediction. The key here is to focus on committing to a falsifiable prediction, on quantifying uncertainty statistically, and learning to accept the fact that - in almost all practical data analysis situations - we can only draw uncertain conclusions from data, regardless of whether we manage to obtain statistical significance or not. A focus on uncertainty quantification is likely to lead to fewer excessively bold claims that, on closer investigation, may turn out to be not supported by the data.
KW - experimental linguistics
KW - statistical data analysis
KW - statistical
KW - inference
KW - uncertainty quantification
Y1 - 2021
U6 - https://doi.org/10.1515/ling-2019-0051
SN - 0024-3949
SN - 1613-396X
VL - 59
IS - 5
SP - 1311
EP - 1342
PB - De Gruyter Mouton
CY - Berlin
ER -
TY - JOUR
A1 - Paape, Dario
A1 - Avetisyan, Serine
A1 - Lago, Sol
A1 - Vasishth, Shravan
T1 - Modeling misretrieval and feature substitution in agreement attraction
BT - a computational evaluation
JF - Cognitive science
N2 - We present computational modeling results based on a self-paced reading study investigating number attraction effects in Eastern Armenian. We implement three novel computational models of agreement attraction in a Bayesian framework and compare their predictive fit to the data using k-fold cross-validation. We find that our data are better accounted for by an encoding-based model of agreement attraction, compared to a retrieval-based model. A novel methodological contribution of our study is the use of comprehension questions with open-ended responses, so that both misinterpretation of the number feature of the subject phrase and misassignment of the thematic subject role of the verb can be investigated at the same time. We find evidence for both types of misinterpretation in our study, sometimes in the same trial. However, the specific error patterns in our data are not fully consistent with any previously proposed model.
KW - Agreement attraction
KW - Eastern Armenian
KW - Self-paced reading
KW - Computational modeling
Y1 - 2021
U6 - https://doi.org/10.1111/cogs.13019
SN - 0364-0213
SN - 1551-6709
VL - 45
IS - 8
PB - Wiley-Blackwell
CY - Malden, Mass.
ER -
TY - JOUR
A1 - Mertzen, Daniela
A1 - Lago, Sol
A1 - Vasishth, Shravan
T1 - The benefits of preregistration for hypothesis-driven bilingualism research
JF - Bilingualism : language and cognition
N2 - Preregistration is an open science practice that requires the specification of research hypotheses and analysis plans before the data are inspected. Here, we discuss the benefits of preregistration for hypothesis-driven, confirmatory bilingualism research. Using examples from psycholinguistics and bilingualism, we illustrate how non-peer reviewed preregistrations can serve to implement a clean distinction between hypothesis testing and data exploration. This distinction helps researchers avoid casting post-hoc hypotheses and analyses as confirmatory ones. We argue that, in keeping with current best practices in the experimental sciences, preregistration, along with sharing data and code, should be an integral part of hypothesis-driven bilingualism research.
KW - preregistration
KW - open science
KW - bilingualism
KW - psycholinguistics
KW - confirmatory analysis
KW - exploratory analysis
Y1 - 2021
U6 - https://doi.org/10.1017/S1366728921000031
SN - 1366-7289
SN - 1469-1841
VL - 24
IS - 5
SP - 807
EP - 812
PB - Cambridge Univ. Press
CY - Cambridge
ER -
TY - JOUR
A1 - Jäger, Lena Ann
A1 - Mertzen, Daniela
A1 - Van Dyke, Julie A.
A1 - Vasishth, Shravan
T1 - Interference patterns in subject-verb agreement and reflexives revisited
BT - a large-sample study
JF - Journal of memory and language
N2 - Cue-based retrieval theories in sentence processing predict two classes of interference effect: (i) Inhibitory interference is predicted when multiple items match a retrieval cue: cue-overloading leads to an overall slowdown in reading time; and (ii) Facilitatory interference arises when a retrieval target as well as a distractor only partially match the retrieval cues; this partial matching leads to an overall speedup in retrieval time. Inhibitory interference effects are widely observed, but facilitatory interference apparently has an exception: reflexives have been claimed to show no facilitatory interference effects. Because the claim is based on underpowered studies, we conducted a large-sample experiment that investigated both facilitatory and inhibitory interference. In contrast to previous studies, we find facilitatory interference effects in reflexives. We also present a quantitative evaluation of the cue-based retrieval model of Engelmann, Jager, and Vasishth (2019).
KW - Sentence processing
KW - Cue-based retrieval
KW - Similarity-based interference
KW - Reflexives
KW - Agreement
KW - Bayesian data analysis
KW - Replication
Y1 - 2020
U6 - https://doi.org/10.1016/j.jml.2019.104063
SN - 0749-596X
SN - 1096-0821
VL - 111
PB - Elsevier
CY - San Diego
ER -