TY - JOUR A1 - Schad, Daniel A1 - Nicenboim, Bruno A1 - Bürkner, Paul-Christian A1 - Betancourt, Michael A1 - Vasishth, Shravan T1 - Workflow techniques for the robust use of bayes factors JF - Psychological methods N2 - Inferences about hypotheses are ubiquitous in the cognitive sciences. Bayes factors provide one general way to compare different hypotheses by their compatibility with the observed data. Those quantifications can then also be used to choose between hypotheses. While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions and it's unclear whether the details of the computational implementation (such as bridge sampling) are unbiased for complex analyses. Hem, we study how Bayes factors misbehave under different conditions. This includes a study of errors in the estimation of Bayes factors; the first-ever use of simulation-based calibration to test the accuracy and bias of Bayes factor estimates using bridge sampling; a study of the stability of Bayes factors against different MCMC draws and sampling variation in the data; and a look at the variability of decisions based on Bayes factors using a utility function. We outline a Bayes factor workflow that researchers can use to study whether Bayes factors are robust for their individual analysis. Reproducible code is available from haps://osf.io/y354c/.
Translational Abstract
In psychology and related areas, scientific hypotheses are commonly tested by asking questions like "is [some] effect present or absent." Such hypothesis testing is most often carried out using frequentist null hypothesis significance testing (NIIST). The NHST procedure is very simple: It usually returns a p-value, which is then used to make binary decisions like "the effect is present/abscnt." For example, it is common to see studies in the media that draw simplistic conclusions like "coffee causes cancer," or "coffee reduces the chances of geuing cancer." However, a powerful and more nuanced alternative approach exists: Bayes factors. Bayes factors have many advantages over NHST. However, for the complex statistical models that arc commonly used for data analysis today, computing Bayes factors is not at all a simple matter. In this article, we discuss the main complexities associated with computing Bayes factors. This is the first article to provide a detailed workflow for understanding and computing Bayes factors in complex statistical models. The article provides a statistically more nuanced way to think about hypothesis testing than the overly simplistic tendency to declare effects as being "present" or "absent". KW - Bayes factors KW - Bayesian model comparison KW - prior KW - posterior KW - simulation-based calibration Y1 - 2022 U6 - https://doi.org/10.1037/met0000472 SN - 1082-989X SN - 1939-1463 VL - 28 IS - 6 SP - 1404 EP - 1426 PB - American Psychological Association CY - Washington ER - TY - JOUR A1 - Nicenboim, Bruno A1 - Vasishth, Shravan A1 - Rösler, Frank T1 - Are words pre-activated probabilistically during sentence comprehension? BT - evidence from new data and a Bayesian random-effects meta-analysis using publicly available data JF - Neuropsychologia : an international journal in behavioural and cognitive neuroscience N2 - Several studies (e.g., Wicha et al., 2003b; DeLong et al., 2005) have shown that readers use information from the sentential context to predict nouns (or some of their features), and that predictability effects can be inferred from the EEG signal in determiners or adjectives appearing before the predicted noun. While these findings provide evidence for the pre-activation proposal, recent replication attempts together with inconsistencies in the results from the literature cast doubt on the robustness of this phenomenon. Our study presents the first attempt to use the effect of gender on predictability in German to study the pre-activation hypothesis, capitalizing on the fact that all German nouns have a gender and that their preceding determiners can show an unambiguous gender marking when the noun phrase has accusative case. Despite having a relatively large sample size (of 120 subjects), both our preregistered and exploratory analyses failed to yield conclusive evidence for or against an effect of pre-activation. The sign of the effect is, however, in the expected direction: the more unexpected the gender of the determiner, the larger the negativity. The recent, inconclusive replication attempts by Nieuwland et al. (2018) and others also show effects with signs in the expected direction. We conducted a Bayesian random-ef-fects meta-analysis using our data and the publicly available data from these recent replication attempts. Our meta-analysis shows a relatively clear but very small effect that is consistent with the pre-activation account and demonstrates a very important advantage of the Bayesian data analysis methodology: we can incrementally accumulate evidence to obtain increasingly precise estimates of the effect of interest. KW - ERP KW - pre-activation KW - predictions KW - grammatical gender KW - Bayesian meta-analysis Y1 - 2020 U6 - https://doi.org/10.1016/j.neuropsychologia.2020.107427 SN - 0028-3932 SN - 1873-3514 VL - 142 PB - Elsevier Science CY - Oxford ER - TY - JOUR A1 - Stone, Kate A1 - Vasishth, Shravan A1 - von der Malsburg, Titus Raban T1 - Does entropy modulate the prediction of German long-distance verb particles? JF - PLOS ONE N2 - In this paper we examine the effect of uncertainty on readers' predictions about meaning. In particular, we were interested in how uncertainty might influence the likelihood of committing to a specific sentence meaning. We conducted two event-related potential (ERP) experiments using particle verbs such as turn down and manipulated uncertainty by constraining the context such that readers could be either highly certain about the identity of a distant verb particle, such as turn the bed [...] down, or less certain due to competing particles, such as turn the music [...] up/down. The study was conducted in German, where verb particles appear clause-finally and may be separated from the verb by a large amount of material. We hypothesised that this separation would encourage readers to predict the particle, and that high certainty would make prediction of a specific particle more likely than lower certainty. If a specific particle was predicted, this would reflect a strong commitment to sentence meaning that should incur a higher processing cost if the prediction is wrong. If a specific particle was less likely to be predicted, commitment should be weaker and the processing cost of a wrong prediction lower. If true, this could suggest that uncertainty discourages predictions via an unacceptable cost-benefit ratio. However, given the clear predictions made by the literature, it was surprisingly unclear whether the uncertainty manipulation affected the two ERP components studied, the N400 and the PNP. Bayes factor analyses showed that evidence for our a priori hypothesised effect sizes was inconclusive, although there was decisive evidence against a priori hypothesised effect sizes larger than 1 mu Vfor the N400 and larger than 3 mu V for the PNP. We attribute the inconclusive finding to the properties of verb-particle dependencies that differ from the verb-noun dependencies in which the N400 and PNP are often studied. Y1 - 2022 U6 - https://doi.org/10.1371/journal.pone.0267813 SN - 1932-6203 VL - 17 IS - 8 PB - PLOS CY - San Francisco, California, US ER - TY - JOUR A1 - Schad, Daniel A1 - Betancourt, Michael A1 - Vasishth, Shravan T1 - Toward a principled Bayesian workflow in cognitive science JF - Psychological methods N2 - Experiments in research on memory, language, and in other areas of cognitive science are increasingly being analyzed using Bayesian methods. This has been facilitated by the development of probabilistic programming languages such as Stan, and easily accessible front-end packages such as brms. The utility of Bayesian methods, however, ultimately depends on the relevance of the Bayesian model, in particular whether or not it accurately captures the structure of the data and the data analyst's domain expertise. Even with powerful software, the analyst is responsible for verifying the utility of their model. To demonstrate this point, we introduce a principled Bayesian workflow (Betancourt, 2018) to cognitive science. Using a concrete working example, we describe basic questions one should ask about the model: prior predictive checks, computational faithfulness, model sensitivity, and posterior predictive checks. The running example for demonstrating the workflow is data on reading times with a linguistic manipulation of object versus subject relative clause sentences. This principled Bayesian workflow also demonstrates how to use domain knowledge to inform prior distributions. It provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model. Given the increasing use of Bayesian methods, we aim to discuss how these methods can be properly employed to obtain robust answers to scientific questions. KW - workflow KW - prior predictive checks KW - posterior predictive checks KW - model KW - building KW - Bayesian data analysis Y1 - 2021 U6 - https://doi.org/10.1037/met0000275 SN - 1082-989X SN - 1939-1463 VL - 26 IS - 1 SP - 103 EP - 126 PB - American Psychological Association CY - Washington ER - TY - JOUR A1 - Paape, Dario A1 - Vasishth, Shravan T1 - Estimating the true cost of garden pathing: BT - a computational model of latent cognitive processes JF - Cognitive science N2 - What is the processing cost of being garden-pathed by a temporary syntactic ambiguity? We argue that comparing average reading times in garden-path versus non-garden-path sentences is not enough to answer this question. Trial-level contaminants such as inattention, the fact that garden pathing may occur non-deterministically in the ambiguous condition, and "triage" (rejecting the sentence without reanalysis; Fodor & Inoue, 2000) lead to systematic underestimates of the true cost of garden pathing. Furthermore, the "pure" garden-path effect due to encountering an unexpected word needs to be separated from the additional cost of syntactic reanalysis. To get more realistic estimates for the individual processing costs of garden pathing and syntactic reanalysis, we implement a novel computational model that includes trial-level contaminants as probabilistically occurring latent cognitive processes. The model shows a good predictive fit to existing reading time and judgment data. Furthermore, the latent-process approach captures differences between noun phrase/zero complement (NP/Z) garden-path sentences and semantically biased reduced relative clause (RRC) garden-path sentences: The NP/Z garden path occurs nearly deterministically but can be mostly eliminated by adding a comma. By contrast, the RRC garden path occurs with a lower probability, but disambiguation via semantic plausibility is not always effective. KW - garden-path effect KW - syntactic reanalysis KW - multinomial processing tree KW - latent processes KW - mixture modeling Y1 - 2022 U6 - https://doi.org/10.1111/cogs.13186 SN - 0364-0213 SN - 1551-6709 VL - 46 IS - 8 PB - Wiley-Blackwell CY - Malden, Mass. ER - TY - JOUR A1 - Schad, Daniel A1 - Vasishth, Shravan T1 - The posterior probability of a null hypothesis given a statistically significant result JF - The quantitative methods for psychology N2 - When researchers carry out a null hypothesis significance test, it is tempting to assume that a statistically significant result lowers Prob(H0), the probability of the null hypothesis being true. Technically, such a statement is meaningless for various reasons: e.g., the null hypothesis does not have a probability associated with it. However, it is possible to relax certain assumptions to compute the posterior probability Prob(H0) under repeated sampling. We show in a step-by-step guide that the intuitively appealing belief, that Prob(H0) is low when significant results have been obtained under repeated sampling, is in general incorrect and depends greatly on: (a) the prior probability of the null being true; (b) type-I error rate, (c) type-II error rate, and (d) replication of a result. Through step-by-step simulations using open-source code in the R System of Statistical Computing, we show that uncertainty about the null hypothesis being true often remains high despite a significant result. To help the reader develop intuitions about this common misconception, we provide a Shiny app (https://danielschad.shinyapps.io/probnull/). We expect that this tutorial will help researchers better understand and judge results from null hypothesis significance tests. KW - Null hypothesis significance testing KW - Bayesian inference KW - statistical KW - power Y1 - 2022 U6 - https://doi.org/10.20982/tqmp.18.2.p011 SN - 1913-4126 SN - 2292-1354 VL - 18 IS - 2 SP - 130 EP - 141 PB - University of Montreal, Department of Psychology CY - Montreal ER - TY - JOUR A1 - Vasishth, Shravan A1 - Gelman, Andrew T1 - How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis JF - Linguistics : an interdisciplinary journal of the language sciences N2 - The use of statistical inference in linguistics and related areas like psychology typically involves a binary decision: either reject or accept some null hypothesis using statistical significance testing. When statistical power is low, this frequentist data-analytic approach breaks down: null results are uninformative, and effect size estimates associated with significant results are overestimated. Using an example from psycholinguistics, several alternative approaches are demonstrated for reporting inconsistencies between the data and a theoretical prediction. The key here is to focus on committing to a falsifiable prediction, on quantifying uncertainty statistically, and learning to accept the fact that - in almost all practical data analysis situations - we can only draw uncertain conclusions from data, regardless of whether we manage to obtain statistical significance or not. A focus on uncertainty quantification is likely to lead to fewer excessively bold claims that, on closer investigation, may turn out to be not supported by the data. KW - experimental linguistics KW - statistical data analysis KW - statistical KW - inference KW - uncertainty quantification Y1 - 2021 U6 - https://doi.org/10.1515/ling-2019-0051 SN - 0024-3949 SN - 1613-396X VL - 59 IS - 5 SP - 1311 EP - 1342 PB - De Gruyter Mouton CY - Berlin ER - TY - JOUR A1 - Paape, Dario A1 - Avetisyan, Serine A1 - Lago, Sol A1 - Vasishth, Shravan T1 - Modeling misretrieval and feature substitution in agreement attraction BT - a computational evaluation JF - Cognitive science N2 - We present computational modeling results based on a self-paced reading study investigating number attraction effects in Eastern Armenian. We implement three novel computational models of agreement attraction in a Bayesian framework and compare their predictive fit to the data using k-fold cross-validation. We find that our data are better accounted for by an encoding-based model of agreement attraction, compared to a retrieval-based model. A novel methodological contribution of our study is the use of comprehension questions with open-ended responses, so that both misinterpretation of the number feature of the subject phrase and misassignment of the thematic subject role of the verb can be investigated at the same time. We find evidence for both types of misinterpretation in our study, sometimes in the same trial. However, the specific error patterns in our data are not fully consistent with any previously proposed model. KW - Agreement attraction KW - Eastern Armenian KW - Self-paced reading KW - Computational modeling Y1 - 2021 U6 - https://doi.org/10.1111/cogs.13019 SN - 0364-0213 SN - 1551-6709 VL - 45 IS - 8 PB - Wiley-Blackwell CY - Malden, Mass. ER - TY - JOUR A1 - Mertzen, Daniela A1 - Lago, Sol A1 - Vasishth, Shravan T1 - The benefits of preregistration for hypothesis-driven bilingualism research JF - Bilingualism : language and cognition N2 - Preregistration is an open science practice that requires the specification of research hypotheses and analysis plans before the data are inspected. Here, we discuss the benefits of preregistration for hypothesis-driven, confirmatory bilingualism research. Using examples from psycholinguistics and bilingualism, we illustrate how non-peer reviewed preregistrations can serve to implement a clean distinction between hypothesis testing and data exploration. This distinction helps researchers avoid casting post-hoc hypotheses and analyses as confirmatory ones. We argue that, in keeping with current best practices in the experimental sciences, preregistration, along with sharing data and code, should be an integral part of hypothesis-driven bilingualism research. KW - preregistration KW - open science KW - bilingualism KW - psycholinguistics KW - confirmatory analysis KW - exploratory analysis Y1 - 2021 U6 - https://doi.org/10.1017/S1366728921000031 SN - 1366-7289 SN - 1469-1841 VL - 24 IS - 5 SP - 807 EP - 812 PB - Cambridge Univ. Press CY - Cambridge ER - TY - JOUR A1 - Jäger, Lena Ann A1 - Mertzen, Daniela A1 - Van Dyke, Julie A. A1 - Vasishth, Shravan T1 - Interference patterns in subject-verb agreement and reflexives revisited BT - a large-sample study JF - Journal of memory and language N2 - Cue-based retrieval theories in sentence processing predict two classes of interference effect: (i) Inhibitory interference is predicted when multiple items match a retrieval cue: cue-overloading leads to an overall slowdown in reading time; and (ii) Facilitatory interference arises when a retrieval target as well as a distractor only partially match the retrieval cues; this partial matching leads to an overall speedup in retrieval time. Inhibitory interference effects are widely observed, but facilitatory interference apparently has an exception: reflexives have been claimed to show no facilitatory interference effects. Because the claim is based on underpowered studies, we conducted a large-sample experiment that investigated both facilitatory and inhibitory interference. In contrast to previous studies, we find facilitatory interference effects in reflexives. We also present a quantitative evaluation of the cue-based retrieval model of Engelmann, Jager, and Vasishth (2019). KW - Sentence processing KW - Cue-based retrieval KW - Similarity-based interference KW - Reflexives KW - Agreement KW - Bayesian data analysis KW - Replication Y1 - 2020 U6 - https://doi.org/10.1016/j.jml.2019.104063 SN - 0749-596X SN - 1096-0821 VL - 111 PB - Elsevier CY - San Diego ER -