TY  - JOUR
A1  - Schütt, Heiko Herbert
A1  - Rothkegel, Lars Oliver Martin
A1  - Trukenbrod, Hans Arne
A1  - Reich, Sebastian
A1  - Wichmann, Felix A.
A1  - Engbert, Ralf
T1  - Likelihood-based parameter estimation and comparison of dynamical cognitive models
JF  - Psychological Review
N2  - Dynamical models of cognition play an increasingly important role in driving theoretical and experimental research in psychology. Therefore, parameter estimation, model analysis and comparison of dynamical models are of essential importance. In this article, we propose a maximum likelihood approach for model analysis in a fully dynamical framework that includes time-ordered experimental data. Our methods can be applied to dynamical models for the prediction of discrete behavior (e.g., movement onsets); in particular, we use a dynamical model of saccade generation in scene viewing as a case study for our approach. For this model, the likelihood function can be computed directly by numerical simulation, which enables more efficient parameter estimation including Bayesian inference to obtain reliable estimates and corresponding credible intervals. Using hierarchical models inference is even possible for individual observers. Furthermore, our likelihood approach can be used to compare different models. In our example, the dynamical framework is shown to outperform nondynamical statistical models. Additionally, the likelihood based evaluation differentiates model variants, which produced indistinguishable predictions on hitherto used statistics. Our results indicate that the likelihood approach is a promising framework for dynamical cognitive models.
KW  - likelihood
KW  - model fitting
KW  - dynamical model
KW  - eye movements
KW  - model comparison
Y1  - 2017
U6  - https://doi.org/10.1037/rev0000068
SN  - 0033-295X
SN  - 1939-1471
VL  - 124
IS  - 4
SP  - 505
EP  - 524
PB  - American Psychological Association
CY  - Washington
ER  - 
TY  - JOUR
A1  - Schütt, Heiko Herbert
A1  - Rothkegel, Lars Oliver Martin
A1  - Trukenbrod, Hans Arne
A1  - Engbert, Ralf
A1  - Wichmann, Felix A.
T1  - Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time
JF  - Journal of vision
N2  - Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences.
KW  - saliency
KW  - fixations
KW  - natural scenes
KW  - visual search
KW  - eye movements
Y1  - 2019
U6  - https://doi.org/10.1167/19.3.1
SN  - 1534-7362
VL  - 19
IS  - 3
PB  - Association for Research in Vision and Opthalmology
CY  - Rockville
ER  - 
TY  - GEN
A1  - Schütt, Heiko Herbert
A1  - Rothkegel, Lars Oliver Martin
A1  - Trukenbrod, Hans Arne
A1  - Engbert, Ralf
A1  - Wichmann, Felix A.
T1  - Predicting fixation densities over time from early visual processing
T2  - Perception
N2  - Bottom-up saliency is often cited as a factor driving the choice of fixation locations of human observers, based on the (partial) success of saliency models to predict fixation densities in free viewing. However, these observations are only weak evidence for a causal role of bottom-up saliency in natural viewing behaviour. To test bottom-up saliency more directly, we analyse the performance of a number of saliency models---including our own saliency model based on our recently published model of early visual processing (Schütt & Wichmann, 2017, JoV)---as well as the theoretical limits for predictions over time. On free viewing data our model performs better than classical bottom-up saliency models, but worse than the current deep learning based saliency models incorporating higher-level information like knowledge about objects. However, on search data all saliency models perform worse than the optimal image independent prediction. We observe that the fixation density in free viewing is not stationary over time, but changes over the course of a trial. It starts with a pronounced central fixation bias on the first chosen fixation, which is nonetheless influenced by image content. Starting with the 2nd to 3rd fixation, the fixation density is already well predicted by later densities, but more concentrated. From there the fixation distribution broadens until it reaches a stationary distribution around the 10th fixation. Taken together these observations argue against bottom-up saliency as a mechanistic explanation for eye movement control after the initial orienting reaction in the first one to two saccades, although we confirm the predictive value of early visual representations for fixation locations. The fixation distribution is, first, not well described by any stationary density, second, is predicted better when including object information and, third, is badly predicted by any saliency model in a search task.
Y1  - 2019
SN  - 0301-0066
SN  - 1468-4233
VL  - 48
SP  - 64
EP  - 65
PB  - Sage Publ.
CY  - London
ER  - 
TY  - JOUR
A1  - Rothkegel, Lars Oliver Martin
A1  - Trukenbrod, Hans Arne
A1  - Schütt, Heiko Herbert
A1  - Wichmann, Felix A.
A1  - Engbert, Ralf
T1  - Influence of initial fixation position in scene viewing
JF  - Vision research : an international journal for functional aspects of vision.
KW  - Visual scanpath
KW  - Visual attention
KW  - Inhibition of return
KW  - Eye movements
KW  - Saliency
Y1  - 2016
U6  - https://doi.org/10.1016/j.visres.2016.09.012
SN  - 0042-6989
SN  - 1878-5646
VL  - 129
SP  - 33
EP  - 49
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Rothkegel, Lars Oliver Martin
A1  - Schütt, Heiko Herbert
A1  - Trukenbrod, Hans Arne
A1  - Wichmann, Felix A.
A1  - Engbert, Ralf
T1  - Searchers adjust their eye-movement dynamics to target characteristics in natural scenes
JF  - Scientific reports
N2  - When searching a target in a natural scene, it has been shown that both the target’s visual properties and similarity to the background influence whether and how fast humans are able to find it. So far, it was unclear whether searchers adjust the dynamics of their eye movements (e.g., fixation durations, saccade amplitudes) to the target they search for. In our experiment, participants searched natural scenes for six artificial targets with different spatial frequency content throughout eight consecutive sessions. High-spatial frequency targets led to smaller saccade amplitudes and shorter fixation durations than low-spatial frequency targets if target identity was known. If a saccade was programmed in the same direction as the previous saccade, fixation durations and successive saccade amplitudes were not influenced by target type. Visual saliency and empirical fixation density at the endpoints of saccades which maintain direction were comparatively low, indicating that these saccades were less selective. Our results suggest that searchers adjust their eye movement dynamics to the search target efficiently, since previous research has shown that low-spatial frequencies are visible farther into the periphery than high-spatial frequencies. We interpret the saccade direction specificity of our effects as an underlying separation into a default scanning mechanism and a selective, target-dependent mechanism.
Y1  - 2019
U6  - https://doi.org/10.1038/s41598-018-37548-w
SN  - 2045-2322
VL  - 9
PB  - Nature Publ. Group
CY  - London
ER  - 
TY  - GEN
A1  - Geirhos, Robert
A1  - Temme, Carlos R. Medina
A1  - Rauber, Jonas
A1  - Schütt, Heiko Herbert
A1  - Bethge, Matthias
A1  - Wichmann, Felix A.
T1  - Generalisation in humans and deep neural networks
T2  - Proceedings of the 32nd International Conference on Neural Information Processing Systems
N2  - We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.
Y1  - 2018
SN  - 1049-5258
VL  - 31
SP  - 7549
EP  - 7561
PB  - Curran Associates Inc.
CY  - Red Hook
ER  -