Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.
When studying how people search for objects in scenes, the inhomogeneity of the visual field is often ignored. Due to physiological limitations, peripheral vision is blurred and mainly uses coarse-grained information (i.e., low spatial frequencies) for selecting saccade targets, whereas high-acuity central vision uses fine-grained information (i.e., high spatial frequencies) for analysis of details. Here we investigated how spatial frequencies and color affect object search in real-world scenes. Using gaze-contingent filters, we attenuated high or low frequencies in central or peripheral vision while viewers searched color or grayscale scenes. Results showed that peripheral filters and central high-pass filters hardly affected search accuracy, whereas accuracy dropped drastically with central low-pass filters. Peripheral filtering increased the time to localize the target by decreasing saccade amplitudes and increasing number and duration of fixations. The use of coarse-grained information in the periphery was limited to color scenes. Central filtering increased the time to verify target identity instead, especially with low-pass filters. We conclude that peripheral vision is critical for object localization and central vision is critical for object identification. Visual guidance during peripheral object localization is dominated by low-frequency color information, whereas high-frequency information, relatively independent of color, is most important for object identification in central vision.
In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data.
Reading requires the orchestration of visual, attentional, language-related, and oculomotor processing constraints. This study replicates previous effects of frequency, predictability, and length of fixated words on fixation durations in natural reading and demonstrates new effects of these variables related to previous and next words. Results are based on fixation durations recorded from 222 persons, each reading 144 sentences. Such evidence for distributed processing of words across fixation durations challenges psycholinguistic immediacy-of-processing and eye-mind assumptions. Most of the time the mind processes several words in parallel at different perceptual and cognitive levels. Eye movements can help to unravel these processes.
During reading, saccadic eye movements are generated to shift words into the center of the visual field for lexical processing. Recently, Krugel and Engbert (Vision Research 50:1532-1539, 2010) demonstrated that within-word fixation positions are largely shifted to the left after skipped words. However, explanations of the origin of this effect cannot be drawn from normal reading data alone. Here we show that the large effect of skipped words on the distribution of within-word fixation positions is primarily based on rather subtle differences in the low-level visual information acquired before saccades. Using arrangements of "x" letter strings, we reproduced the effect of skipped character strings in a highly controlled single-saccade task. Our results demonstrate that the effect of skipped words in reading is the signature of a general visuomotor phenomenon. Moreover, our findings extend beyond the scope of the widely accepted range-error model, which posits that within-word fixation positions in reading depend solely on the distances of target words. We expect that our results will provide critical boundary conditions for the development of visuomotor models of saccade planning during reading.
During visual fixation, the eye generates microsaccades and slower components of fixational eye movements that are part of the visual processing strategy in humans. Here, we show that ongoing heartbeat is coupled to temporal rate variations in the generation of microsaccades. Using coregistration of eye recording and ECG in humans, we tested the hypothesis that microsaccade onsets are coupled to the relative phase of the R-R intervals in heartbeats. We observed significantly more microsaccades during the early phase after the R peak in the ECG. This form of coupling between heartbeat and eye movements was substantiated by the additional finding of a coupling between heart phase and motion activity in slow fixational eye movements; i.e., retinal image slip caused by physiological drift. Our findings therefore demonstrate a coupling of the oculomotor system and ongoing heartbeat, which provides further evidence for bodily influences on visuomotor functioning.
When watching the image of a natural scene on a computer screen, observers initially move their eyes toward the center of the image—a reliable experimental finding termed central fixation bias. This systematic tendency in eye guidance likely masks attentional selection driven by image properties and top-down cognitive processes. Here, we show that the central fixation bias can be reduced by delaying the initial saccade relative to image onset. In four scene-viewing experiments we manipulated observers' initial gaze position and delayed their first saccade by a specific time interval relative to the onset of an image. We analyzed the distance to image center over time and show that the central fixation bias of initial fixations was significantly reduced after delayed saccade onsets. We additionally show that selection of the initial saccade target strongly depended on the first saccade latency. A previously published model of saccade generation was extended with a central activation map on the initial fixation whose influence declined with increasing saccade latency. This extension was sufficient to replicate the central fixation bias from our experiments. Our results suggest that the central fixation bias is generated by default activation as a response to the sudden image onset and that this default activation pattern decreases over time. Thus, it may often be preferable to use a modified version of the scene viewing paradigm that decouples image onset from the start signal for scene exploration to explicitly reduce the central fixation bias.
Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences.