Refine
Year of publication
Document Type
- Article (26)
- Doctoral Thesis (13)
- Postprint (9)
Language
- English (48)
Is part of the Bibliography
- yes (48) (remove)
Keywords
- eye movements (48) (remove)
Institute
- Department Psychologie (27)
- Strukturbereich Kognitionswissenschaften (7)
- Humanwissenschaftliche Fakultät (4)
- Department Linguistik (2)
- Institut für Physik und Astronomie (2)
- Potsdam Research Institute for Multilingualism (PRIM) (2)
- Extern (1)
- Institut für Informatik und Computational Science (1)
- Institut für Mathematik (1)
- Mathematisch-Naturwissenschaftliche Fakultät (1)
Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences.
Recent studies using the gaze-contingent boundary paradigm reported a reversed preview benefit- shorter fixations on a target word when an unrelated preview was easier to process than the fixated target (Schotter & Leinenger, 2016). This is explained viaforeedfixatiotzs-short fixations on words that would ideally be skipped (because lexical processing has progressed enough) but could not be because saccade planning reached a point of no return. This contrasts with accounts of preview effects via trans-saccadic integration-shorter fixations on a target word when the preview is more similar to it (see Cutter. Drieghe, & Liversedge, 2015). In addition, if the previewed word-not the fixated target-determines subsequent eye movements, is it also this word that enters the linguistic processing stream? We tested these accounts by having 24 subjects read 150 sentences in the boundary paradigm in which both the preview and target were initially plausible but later one, both, or neither became implausible, providing an opportunity to probe which one was linguistically encoded. In an intervening buffer region, both words were plausible, providing an opportunity to investigate trans-saccadic integration. The frequency of the previewed word affected progressive saccades (i.e.. forced fixations) as well as when transsaccadic integration failure increased regressions, but, only the implausibility of the target word affected semantic encoding. These data support a hybrid account of saccadic control (Reingold, Reichle. Glaholt, & Sheridan, 2012) driven by incomplete (often parafoveal) word recognition, which occurs prior to complete (often foveal) word recognition.
Moving arms
(2018)
Embodied cognition postulates a bi-directional link between the human body and its cognitive functions. Whether this holds for higher cognitive functions such as problem solving is unknown. We predicted that arm movement manipulations performed by the participants could affect the problem-solving solutions. We tested this prediction in quantitative reasoning tasks that allowed two solutions to each problem (addition or subtraction). In two studies with healthy adults (N=53 and N=50), we found an effect of problem-congruent movements on problem solutions. Consistent with embodied cognition, sensorimotor information gained via right or left arm movements affects the solution in different types of problem-solving tasks.
A central insight from psychological studies on human eye movements is that eye movement patterns are highly individually characteristic. They can, therefore, be used as a biometric feature, that is, subjects can be identified based on their eye movements. This thesis introduces new machine learning methods to identify subjects based on their eye movements while viewing arbitrary content. The thesis focuses on probabilistic modeling of the problem, which has yielded the best results in the most recent literature. The thesis studies the problem in three phases by proposing a purely probabilistic, probabilistic deep learning, and probabilistic deep metric learning approach. In the first phase, the thesis studies models that rely on psychological concepts about eye movements. Recent literature illustrates that individual-specific distributions of gaze patterns can be used to accurately identify individuals. In these studies, models were based on a simple parametric family of distributions. Such simple parametric models can be robustly estimated from sparse data, but have limited flexibility to capture the differences between individuals. Therefore, this thesis proposes a semiparametric model of gaze patterns that is flexible yet robust for individual identification. These patterns can be understood as domain knowledge derived from psychological literature. Fixations and saccades are examples of simple gaze patterns. The proposed semiparametric densities are drawn under a Gaussian process prior centered at a simple parametric distribution. Thus, the model will stay close to the parametric class of densities if little data is available, but it can also deviate from this class if enough data is available, increasing the flexibility of the model. The proposed method is evaluated on a large-scale dataset, showing significant improvements over the state-of-the-art. Later, the thesis replaces the model based on gaze patterns derived from psychological concepts with a deep neural network that can learn more informative and complex patterns from raw eye movement data. As previous work has shown that the distribution of these patterns across a sequence is informative, a novel statistical aggregation layer called the quantile layer is introduced. It explicitly fits the distribution of deep patterns learned directly from the raw eye movement data. The proposed deep learning approach is end-to-end learnable, such that the deep model learns to extract informative, short local patterns while the quantile layer learns to approximate the distributions of these patterns. Quantile layers are a generic approach that can converge to standard pooling layers or have a more detailed description of the features being pooled, depending on the problem. The proposed model is evaluated in a large-scale study using the eye movements of subjects viewing arbitrary visual input. The model improves upon the standard pooling layers and other statistical aggregation layers proposed in the literature. It also improves upon the state-of-the-art eye movement biometrics by a wide margin. Finally, for the model to identify any subject — not just the set of subjects it is trained on — a metric learning approach is developed. Metric learning learns a distance function over instances. The metric learning model maps the instances into a metric space, where sequences of the same individual are close, and sequences of different individuals are further apart. This thesis introduces a deep metric learning approach with distributional embeddings. The approach represents sequences as a set of continuous distributions in a metric space; to achieve this, a new loss function based on Wasserstein distances is introduced. The proposed method is evaluated on multiple domains besides eye movement biometrics. This approach outperforms the state of the art in deep metric learning in several domains while also outperforming the state of the art in eye movement biometrics.
When watching the image of a natural scene on a computer screen, observers initially move their eyes toward the center of the image—a reliable experimental finding termed central fixation bias. This systematic tendency in eye guidance likely masks attentional selection driven by image properties and top-down cognitive processes. Here, we show that the central fixation bias can be reduced by delaying the initial saccade relative to image onset. In four scene-viewing experiments we manipulated observers' initial gaze position and delayed their first saccade by a specific time interval relative to the onset of an image. We analyzed the distance to image center over time and show that the central fixation bias of initial fixations was significantly reduced after delayed saccade onsets. We additionally show that selection of the initial saccade target strongly depended on the first saccade latency. A previously published model of saccade generation was extended with a central activation map on the initial fixation whose influence declined with increasing saccade latency. This extension was sufficient to replicate the central fixation bias from our experiments. Our results suggest that the central fixation bias is generated by default activation as a response to the sudden image onset and that this default activation pattern decreases over time. Thus, it may often be preferable to use a modified version of the scene viewing paradigm that decouples image onset from the start signal for scene exploration to explicitly reduce the central fixation bias.
During visual fixation, the eye generates microsaccades and slower components of fixational eye movements that are part of the visual processing strategy in humans. Here, we show that ongoing heartbeat is coupled to temporal rate variations in the generation of microsaccades. Using coregistration of eye recording and ECG in humans, we tested the hypothesis that microsaccade onsets are coupled to the relative phase of the R-R intervals in heartbeats. We observed significantly more microsaccades during the early phase after the R peak in the ECG. This form of coupling between heartbeat and eye movements was substantiated by the additional finding of a coupling between heart phase and motion activity in slow fixational eye movements; i.e., retinal image slip caused by physiological drift. Our findings therefore demonstrate a coupling of the oculomotor system and ongoing heartbeat, which provides further evidence for bodily influences on visuomotor functioning.
During reading, saccadic eye movements are generated to shift words into the center of the visual field for lexical processing. Recently, Krugel and Engbert (Vision Research 50:1532-1539, 2010) demonstrated that within-word fixation positions are largely shifted to the left after skipped words. However, explanations of the origin of this effect cannot be drawn from normal reading data alone. Here we show that the large effect of skipped words on the distribution of within-word fixation positions is primarily based on rather subtle differences in the low-level visual information acquired before saccades. Using arrangements of "x" letter strings, we reproduced the effect of skipped character strings in a highly controlled single-saccade task. Our results demonstrate that the effect of skipped words in reading is the signature of a general visuomotor phenomenon. Moreover, our findings extend beyond the scope of the widely accepted range-error model, which posits that within-word fixation positions in reading depend solely on the distances of target words. We expect that our results will provide critical boundary conditions for the development of visuomotor models of saccade planning during reading.
Saccades move objects of interest into the center of the visual field for high-acuity visual analysis. White, Stritzke, and Gegenfurtner (Current Biology, 18, 124–128, 2008) have shown that saccadic latencies in the context of a structured background are much shorter than those with an unstructured background at equal levels of visibility. This effect has been explained by possible preactivation of the saccadic circuitry whenever a structured background acts as a mask for potential saccade targets. Here, we show that background textures modulate rates of microsaccades during visual fixation. First, after a display change, structured backgrounds induce a stronger decrease of microsaccade rates than do uniform backgrounds. Second, we demonstrate that the occurrence of a microsaccade in a critical time window can delay a subsequent saccadic response. Taken together, our findings suggest that microsaccades contribute to the saccadic facilitation effect, due to a modulation of microsaccade rates by properties of the background.
Linked linear mixed models
(2016)
The complexity of eye-movement control during reading allows measurement of many dependent variables, the most prominent ones being fixation durations and their locations in words. In current practice, either variable may serve as dependent variable or covariate for the other in linear mixed models (LMMs) featuring also psycholinguistic covariates of word recognition and sentence comprehension. Rather than analyzing fixation location and duration with separate LMMs, we propose linking the two according to their sequential dependency. Specifically, we include predicted fixation location (estimated in the first LMM from psycholinguistic covariates) and its associated residual fixation location as covariates in the second, fixation-duration LMM. This linked LMM affords a distinction between direct and indirect effects (mediated through fixation location) of psycholinguistic covariates on fixation durations. Results confirm the robustness of distributed processing in the perceptual span. They also offer a resolution of the paradox of the inverted optimal viewing position (IOVP) effect (i.e., longer fixation durations in the center than at the beginning and end of words) although the opposite (i.e., an OVP effect) is predicted from default assumptions of psycholinguistic processing efficiency: The IOVP effect in fixation durations is due to the residual fixation-location covariate, presumably driven primarily by saccadic error, and the OVP effect (at least the left part of it) is uncovered with the predicted fixation-location covariate, capturing the indirect effects of psycholinguistic covariates. We expect that linked LMMs will be useful for the analysis of other dynamically related multiple outcomes, a conundrum of most psychonomic research.
Understanding how humans move their eyes is an important part for understanding the functioning of the visual system. Analyzing eye movements from observations of natural scenes on a computer screen is a step to understand human visual behavior in the real world. When analyzing eye-movement data from scene-viewing experiments, the impor- tant questions are where (fixation locations), how long (fixation durations) and when (ordering of fixations) participants fixate on an image. By answering these questions, computational models can be developed which predict human scanpaths. Models serve as a tool to understand the underlying cognitive processes while observing an image, especially the allocation of visual attention.
The goal of this thesis is to provide new contributions to characterize and model human scanpaths on natural scenes. The results from this thesis will help to understand and describe certain systematic eye-movement tendencies, which are mostly independent of the image. One eye-movement tendency I focus on throughout this thesis is the tendency to fixate more in the center of an image than on the outer parts, called the central fixation bias. Another tendency, which I will investigate thoroughly, is the characteristic distribution of angles between successive eye movements.
The results serve to evaluate and improve a previously published model of scanpath generation from our laboratory, the SceneWalk model. Overall, six experiments were conducted for this thesis which led to the following five core results:
i) A spatial inhibition of return can be found in scene-viewing data. This means that locations which have already been fixated are afterwards avoided for a certain time interval (Chapter 2).
ii) The initial fixation position when observing an image has a long-lasting influence of up to five seconds on further scanpath progression (Chapter 2 & 3).
iii) The often described central fixation bias on images depends strongly on the duration of the initial fixation. Long-lasting initial fixations lead to a weaker central fixation bias than short fixations (Chapter 2 & 3).
iv) Human observers adjust their basic eye-movement parameters, like fixation dura- tions and saccade amplitudes, to the visual properties of a target they look for in visual search (Chapter 4).
v) The angle between two adjacent saccades is an indicator for the selectivity of the upcoming saccade target (Chapter 4).
All results emphasize the importance of systematic behavioral eye-movement tenden- cies and dynamic aspects of human scanpaths in scene viewing.