TY - JOUR A1 - Trukenbrod, Hans Arne A1 - Barthelme, Simon A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Spatial statistics for gaze patterns in scene viewing BT - effects of repeated viewing JF - Journal of vision N2 - Scene viewing is used to study attentional selection in complex but still controlled environments. One of the main observations on eye movements during scene viewing is the inhomogeneous distribution of fixation locations: While some parts of an image are fixated by almost all observers and are inspected repeatedly by the same observer, other image parts remain unfixated by observers even after long exploration intervals. Here, we apply spatial point process methods to investigate the relationship between pairs of fixations. More precisely, we use the pair correlation function, a powerful statistical tool, to evaluate dependencies between fixation locations along individual scanpaths. We demonstrate that aggregation of fixation locations within 4 degrees is stronger than expected from chance. Furthermore, the pair correlation function reveals stronger aggregation of fixations when the same image is presented a second time. We use simulations of a dynamical model to show that a narrower spatial attentional span may explain differences in pair correlations between the first and the second inspection of the same image. KW - scene viewing KW - pair correlation function KW - spatial correlations Y1 - 2019 U6 - https://doi.org/10.1167/19.6.5 SN - 1534-7362 VL - 19 IS - 5 SP - 1 EP - 19 PB - Association for Research in Vision and Opthalmology CY - Rockville ER - TY - CHAP A1 - Trukenbrod, Hans Arne A1 - Barthelme, S. A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Color does not guide eye movements: Evidence from a gaze-contingent experiment T2 - PERCEPTION Y1 - 2012 SN - 0301-0066 SN - 1468-4233 VL - 41 SP - 88 EP - 88 PB - PION LTD CY - LONDON ER - TY - JOUR A1 - Schütt, Heiko Herbert A1 - Rothkegel, Lars Oliver Martin A1 - Trukenbrod, Hans Arne A1 - Reich, Sebastian A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Likelihood-based parameter estimation and comparison of dynamical cognitive models JF - Psychological Review N2 - Dynamical models of cognition play an increasingly important role in driving theoretical and experimental research in psychology. Therefore, parameter estimation, model analysis and comparison of dynamical models are of essential importance. In this article, we propose a maximum likelihood approach for model analysis in a fully dynamical framework that includes time-ordered experimental data. Our methods can be applied to dynamical models for the prediction of discrete behavior (e.g., movement onsets); in particular, we use a dynamical model of saccade generation in scene viewing as a case study for our approach. For this model, the likelihood function can be computed directly by numerical simulation, which enables more efficient parameter estimation including Bayesian inference to obtain reliable estimates and corresponding credible intervals. Using hierarchical models inference is even possible for individual observers. Furthermore, our likelihood approach can be used to compare different models. In our example, the dynamical framework is shown to outperform nondynamical statistical models. Additionally, the likelihood based evaluation differentiates model variants, which produced indistinguishable predictions on hitherto used statistics. Our results indicate that the likelihood approach is a promising framework for dynamical cognitive models. KW - likelihood KW - model fitting KW - dynamical model KW - eye movements KW - model comparison Y1 - 2017 U6 - https://doi.org/10.1037/rev0000068 SN - 0033-295X SN - 1939-1471 VL - 124 IS - 4 SP - 505 EP - 524 PB - American Psychological Association CY - Washington ER - TY - JOUR A1 - Schütt, Heiko Herbert A1 - Rothkegel, Lars Oliver Martin A1 - Trukenbrod, Hans Arne A1 - Engbert, Ralf A1 - Wichmann, Felix A. T1 - Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time JF - Journal of vision N2 - Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences. KW - saliency KW - fixations KW - natural scenes KW - visual search KW - eye movements Y1 - 2019 U6 - https://doi.org/10.1167/19.3.1 SN - 1534-7362 VL - 19 IS - 3 PB - Association for Research in Vision and Opthalmology CY - Rockville ER - TY - GEN A1 - Schütt, Heiko Herbert A1 - Rothkegel, Lars Oliver Martin A1 - Trukenbrod, Hans Arne A1 - Engbert, Ralf A1 - Wichmann, Felix A. T1 - Predicting fixation densities over time from early visual processing T2 - Perception N2 - Bottom-up saliency is often cited as a factor driving the choice of fixation locations of human observers, based on the (partial) success of saliency models to predict fixation densities in free viewing. However, these observations are only weak evidence for a causal role of bottom-up saliency in natural viewing behaviour. To test bottom-up saliency more directly, we analyse the performance of a number of saliency models---including our own saliency model based on our recently published model of early visual processing (Schütt & Wichmann, 2017, JoV)---as well as the theoretical limits for predictions over time. On free viewing data our model performs better than classical bottom-up saliency models, but worse than the current deep learning based saliency models incorporating higher-level information like knowledge about objects. However, on search data all saliency models perform worse than the optimal image independent prediction. We observe that the fixation density in free viewing is not stationary over time, but changes over the course of a trial. It starts with a pronounced central fixation bias on the first chosen fixation, which is nonetheless influenced by image content. Starting with the 2nd to 3rd fixation, the fixation density is already well predicted by later densities, but more concentrated. From there the fixation distribution broadens until it reaches a stationary distribution around the 10th fixation. Taken together these observations argue against bottom-up saliency as a mechanistic explanation for eye movement control after the initial orienting reaction in the first one to two saccades, although we confirm the predictive value of early visual representations for fixation locations. The fixation distribution is, first, not well described by any stationary density, second, is predicted better when including object information and, third, is badly predicted by any saliency model in a search task. Y1 - 2019 SN - 0301-0066 SN - 1468-4233 VL - 48 SP - 64 EP - 65 PB - Sage Publ. CY - London ER - TY - JOUR A1 - Schütt, Heiko Herbert A1 - Harmeling, Stefan A1 - Macke, Jakob H. A1 - Wichmann, Felix A. T1 - Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data JF - Vision research : an international journal for functional aspects of vision. N2 - The psychometric function describes how an experimental variable, such as stimulus strength, influences the behaviour of an observer. Estimation of psychometric functions from experimental data plays a central role in fields such as psychophysics, experimental psychology and in the behavioural neurosciences. Experimental data may exhibit substantial overdispersion, which may result from non-stationarity in the behaviour of observers. Here we extend the standard binomial model which is typically used for psychometric function estimation to a beta-binomial model. We show that the use of the beta-binomial model makes it possible to determine accurate credible intervals even in data which exhibit substantial overdispersion. This goes beyond classical measures for overdispersion goodness-of-fit which can detect overdispersion but provide no method to do correct inference for overdispersed data. We use Bayesian inference methods for estimating the posterior distribution of the parameters of the psychometric function. Unlike previous Bayesian psychometric inference methods our software implementation-psignifit 4 performs numerical integration of the posterior within automatically determined bounds. This avoids the use of Markov chain Monte Carlo (MCMC) methods typically requiring expert knowledge. Extensive numerical tests show the validity of the approach and we discuss implications of overdispersion for experimental design. A comprehensive MATLAB toolbox implementing the method is freely available; a python implementation providing the basic capabilities is also available. (C) 2016 The Authors. Published by Elsevier Ltd. KW - Psychometric function KW - Bayesian inference KW - Beta-binomial model KW - Overdispersion KW - Non-stationarity KW - Confidence intervals KW - Credible intervals KW - Psychophysical methods Y1 - 2016 U6 - https://doi.org/10.1016/j.visres.2016.02.002 SN - 0042-6989 SN - 1878-5646 VL - 122 SP - 105 EP - 123 PB - Elsevier CY - Oxford ER - TY - JOUR A1 - Schutt, Heiko Herbert A1 - Wichmann, Felix A. T1 - An image-computable psychophysical spatial vision model JF - Journal of vision N2 - A large part of classical visual psychophysics was concerned with the fundamental question of how pattern information is initially encoded in the human visual system. From these studies a relatively standard model of early spatial vision emerged, based on spatial frequency and orientation-specific channels followed by an accelerating nonlinearity and divisive normalization: contrast gain-control. Here we implement such a model in an image-computable way, allowing it to take arbitrary luminance images as input. Testing our implementation on classical psychophysical data, we find that it explains contrast detection data including the ModelFest data, contrast discrimination data, and oblique masking data, using a single set of parameters. Leveraging the advantage of an image-computable model, we test our model against a recent dataset using natural images as masks. We find that the model explains these data reasonably well, too. To explain data obtained at different presentation durations, our model requires different parameters to achieve an acceptable fit. In addition, we show that contrast gain-control with the fitted parameters results in a very sparse encoding of luminance information, in line with notions from efficient coding. Translating the standard early spatial vision model to be image-computable resulted in two further insights: First, the nonlinear processing requires a denser sampling of spatial frequency and orientation than optimal coding suggests. Second, the normalization needs to be fairly local in space to fit the data obtained with natural image masks. Finally, our image-computable model can serve as tool in future quantitative analyses: It allows optimized stimuli to be used to test the model and variants of it, with potential applications as an image-quality metric. In addition, it may serve as a building block for models of higher level processing. KW - model KW - spatial vision KW - image-computable KW - psychophysics Y1 - 2017 U6 - https://doi.org/10.1167/17.12.12 SN - 1534-7362 VL - 17 PB - Association for Research in Vision and Opthalmology CY - Rockville ER - TY - JOUR A1 - Rothkegel, Lars Oliver Martin A1 - Trukenbrod, Hans Arne A1 - Schütt, Heiko Herbert A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Temporal evolution of the central fixation bias in scene viewing JF - Journal of vision N2 - When watching the image of a natural scene on a computer screen, observers initially move their eyes toward the center of the image—a reliable experimental finding termed central fixation bias. This systematic tendency in eye guidance likely masks attentional selection driven by image properties and top-down cognitive processes. Here, we show that the central fixation bias can be reduced by delaying the initial saccade relative to image onset. In four scene-viewing experiments we manipulated observers' initial gaze position and delayed their first saccade by a specific time interval relative to the onset of an image. We analyzed the distance to image center over time and show that the central fixation bias of initial fixations was significantly reduced after delayed saccade onsets. We additionally show that selection of the initial saccade target strongly depended on the first saccade latency. A previously published model of saccade generation was extended with a central activation map on the initial fixation whose influence declined with increasing saccade latency. This extension was sufficient to replicate the central fixation bias from our experiments. Our results suggest that the central fixation bias is generated by default activation as a response to the sudden image onset and that this default activation pattern decreases over time. Thus, it may often be preferable to use a modified version of the scene viewing paradigm that decouples image onset from the start signal for scene exploration to explicitly reduce the central fixation bias. KW - eye movements KW - dynamic models KW - visual scanpath KW - visual attention Y1 - 2017 U6 - https://doi.org/10.1167/17.13.3 SN - 1534-7362 VL - 17 SP - 1626 EP - 1638 PB - Association for Research in Vision and Opthalmology CY - Rockville ER - TY - JOUR A1 - Rothkegel, Lars Oliver Martin A1 - Trukenbrod, Hans Arne A1 - Schütt, Heiko Herbert A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Influence of initial fixation position in scene viewing JF - Vision research : an international journal for functional aspects of vision. KW - Visual scanpath KW - Visual attention KW - Inhibition of return KW - Eye movements KW - Saliency Y1 - 2016 U6 - https://doi.org/10.1016/j.visres.2016.09.012 SN - 0042-6989 SN - 1878-5646 VL - 129 SP - 33 EP - 49 PB - Elsevier CY - Oxford ER - TY - JOUR A1 - Rothkegel, Lars Oliver Martin A1 - Schütt, Heiko Herbert A1 - Trukenbrod, Hans Arne A1 - Wichmann, Felix A. A1 - Engbert, Ralf T1 - Searchers adjust their eye-movement dynamics to target characteristics in natural scenes JF - Scientific reports N2 - When searching a target in a natural scene, it has been shown that both the target’s visual properties and similarity to the background influence whether and how fast humans are able to find it. So far, it was unclear whether searchers adjust the dynamics of their eye movements (e.g., fixation durations, saccade amplitudes) to the target they search for. In our experiment, participants searched natural scenes for six artificial targets with different spatial frequency content throughout eight consecutive sessions. High-spatial frequency targets led to smaller saccade amplitudes and shorter fixation durations than low-spatial frequency targets if target identity was known. If a saccade was programmed in the same direction as the previous saccade, fixation durations and successive saccade amplitudes were not influenced by target type. Visual saliency and empirical fixation density at the endpoints of saccades which maintain direction were comparatively low, indicating that these saccades were less selective. Our results suggest that searchers adjust their eye movement dynamics to the search target efficiently, since previous research has shown that low-spatial frequencies are visible farther into the periphery than high-spatial frequencies. We interpret the saccade direction specificity of our effects as an underlying separation into a default scanning mechanism and a selective, target-dependent mechanism. Y1 - 2019 U6 - https://doi.org/10.1038/s41598-018-37548-w SN - 2045-2322 VL - 9 PB - Nature Publ. Group CY - London ER - TY - GEN A1 - Geirhos, Robert A1 - Temme, Carlos R. Medina A1 - Rauber, Jonas A1 - Schütt, Heiko Herbert A1 - Bethge, Matthias A1 - Wichmann, Felix A. T1 - Generalisation in humans and deep neural networks T2 - Proceedings of the 32nd International Conference on Neural Information Processing Systems N2 - We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system. Y1 - 2018 SN - 1049-5258 VL - 31 SP - 7549 EP - 7561 PB - Curran Associates Inc. CY - Red Hook ER - TY - JOUR A1 - Engbert, Ralf A1 - Trukenbrod, Hans Arne A1 - Barthelme, Simon A1 - Wichmann, Felix A. T1 - Spatial statistics and attentional dynamics in scene viewing JF - Journal of vision N2 - In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data. KW - scene perception KW - eye movements KW - attention KW - saccades KW - modeling KW - spatial statistics Y1 - 2015 U6 - https://doi.org/10.1167/15.1.14 SN - 1534-7362 VL - 15 IS - 1 PB - Association for Research in Vision and Opthalmology CY - Rockville ER - TY - JOUR A1 - Barthelme, Simon A1 - Trukenbrod, Hans Arne A1 - Engbert, Ralf A1 - Wichmann, Felix A. T1 - Modeling fixation locations using spatial point processes JF - Journal of vision N2 - Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation. KW - eye movements KW - fixation locations KW - saliency KW - modeling KW - point process KW - spatial statistics Y1 - 2013 U6 - https://doi.org/10.1167/13.12.1 SN - 1534-7362 VL - 13 IS - 12 PB - Association for Research in Vision and Opthalmology CY - Rockville ER -