Refine
Has Fulltext
- yes (25) (remove)
Year of publication
Document Type
- Doctoral Thesis (13)
- Postprint (12)
Language
- English (25)
Keywords
- eye movements (25) (remove)
Institute
- Department Psychologie (11)
- Strukturbereich Kognitionswissenschaften (5)
- Humanwissenschaftliche Fakultät (4)
- Institut für Physik und Astronomie (2)
- Institut für Informatik und Computational Science (1)
- Mathematisch-Naturwissenschaftliche Fakultät (1)
- Potsdam Research Institute for Multilingualism (PRIM) (1)
The evaluation of process-oriented cognitive theories through time-ordered observations is crucial for the advancement of cognitive science. The findings presented herein integrate insights from research on eye-movement control and sentence comprehension during reading, addressing challenges in modeling time-ordered data, statistical inference, and interindividual variability. Using kernel density estimation and a pseudo-marginal likelihood for fixation durations and locations, a likelihood implementation of the SWIFT model of eye-movement control during reading (Engbert et al., Psychological Review, 112, 2005, pp. 777–813) is proposed. Within the broader framework of data assimilation, Bayesian parameter inference with adaptive Markov Chain Monte Carlo techniques is facilitated for reliable model fitting. Across the different studies, this framework has shown to enable reliable parameter recovery from simulated data and prediction of experimental summary statistics. Despite its complexity, SWIFT can be fitted within a principled Bayesian workflow, capturing interindividual differences and modeling experimental effects on reading across different geometrical alterations of text. Based on these advancements, the integrated dynamical model SEAM is proposed, which combines eye-movement control, a traditionally psychological research area, and post-lexical language processing in the form of cue-based memory retrieval (Lewis & Vasishth, Cognitive Science, 29, 2005, pp. 375–419), typically the purview of psycholinguistics. This proof-of-concept integration marks a significant step forward in natural language comprehension during reading and suggests that the presented methodology can be useful to develop complex cognitive dynamical models that integrate processes at levels of perception, higher cognition, and (oculo-)motor control. These findings collectively advance process-oriented cognitive modeling and highlight the importance of Bayesian inference, individual differences, and interdisciplinary integration for a holistic understanding of reading processes. Implications for theory and methodology, including proposals for model comparison and hierarchical parameter inference, are briefly discussed.
A central insight from psychological studies on human eye movements is that eye movement patterns are highly individually characteristic. They can, therefore, be used as a biometric feature, that is, subjects can be identified based on their eye movements. This thesis introduces new machine learning methods to identify subjects based on their eye movements while viewing arbitrary content. The thesis focuses on probabilistic modeling of the problem, which has yielded the best results in the most recent literature. The thesis studies the problem in three phases by proposing a purely probabilistic, probabilistic deep learning, and probabilistic deep metric learning approach. In the first phase, the thesis studies models that rely on psychological concepts about eye movements. Recent literature illustrates that individual-specific distributions of gaze patterns can be used to accurately identify individuals. In these studies, models were based on a simple parametric family of distributions. Such simple parametric models can be robustly estimated from sparse data, but have limited flexibility to capture the differences between individuals. Therefore, this thesis proposes a semiparametric model of gaze patterns that is flexible yet robust for individual identification. These patterns can be understood as domain knowledge derived from psychological literature. Fixations and saccades are examples of simple gaze patterns. The proposed semiparametric densities are drawn under a Gaussian process prior centered at a simple parametric distribution. Thus, the model will stay close to the parametric class of densities if little data is available, but it can also deviate from this class if enough data is available, increasing the flexibility of the model. The proposed method is evaluated on a large-scale dataset, showing significant improvements over the state-of-the-art. Later, the thesis replaces the model based on gaze patterns derived from psychological concepts with a deep neural network that can learn more informative and complex patterns from raw eye movement data. As previous work has shown that the distribution of these patterns across a sequence is informative, a novel statistical aggregation layer called the quantile layer is introduced. It explicitly fits the distribution of deep patterns learned directly from the raw eye movement data. The proposed deep learning approach is end-to-end learnable, such that the deep model learns to extract informative, short local patterns while the quantile layer learns to approximate the distributions of these patterns. Quantile layers are a generic approach that can converge to standard pooling layers or have a more detailed description of the features being pooled, depending on the problem. The proposed model is evaluated in a large-scale study using the eye movements of subjects viewing arbitrary visual input. The model improves upon the standard pooling layers and other statistical aggregation layers proposed in the literature. It also improves upon the state-of-the-art eye movement biometrics by a wide margin. Finally, for the model to identify any subject — not just the set of subjects it is trained on — a metric learning approach is developed. Metric learning learns a distance function over instances. The metric learning model maps the instances into a metric space, where sequences of the same individual are close, and sequences of different individuals are further apart. This thesis introduces a deep metric learning approach with distributional embeddings. The approach represents sequences as a set of continuous distributions in a metric space; to achieve this, a new loss function based on Wasserstein distances is introduced. The proposed method is evaluated on multiple domains besides eye movement biometrics. This approach outperforms the state of the art in deep metric learning in several domains while also outperforming the state of the art in eye movement biometrics.
During reading, saccadic eye movements are generated to shift words into the center of the visual field for lexical processing. Recently, Krugel and Engbert (Vision Research 50:1532-1539, 2010) demonstrated that within-word fixation positions are largely shifted to the left after skipped words. However, explanations of the origin of this effect cannot be drawn from normal reading data alone. Here we show that the large effect of skipped words on the distribution of within-word fixation positions is primarily based on rather subtle differences in the low-level visual information acquired before saccades. Using arrangements of "x" letter strings, we reproduced the effect of skipped character strings in a highly controlled single-saccade task. Our results demonstrate that the effect of skipped words in reading is the signature of a general visuomotor phenomenon. Moreover, our findings extend beyond the scope of the widely accepted range-error model, which posits that within-word fixation positions in reading depend solely on the distances of target words. We expect that our results will provide critical boundary conditions for the development of visuomotor models of saccade planning during reading.
Saccades move objects of interest into the center of the visual field for high-acuity visual analysis. White, Stritzke, and Gegenfurtner (Current Biology, 18, 124–128, 2008) have shown that saccadic latencies in the context of a structured background are much shorter than those with an unstructured background at equal levels of visibility. This effect has been explained by possible preactivation of the saccadic circuitry whenever a structured background acts as a mask for potential saccade targets. Here, we show that background textures modulate rates of microsaccades during visual fixation. First, after a display change, structured backgrounds induce a stronger decrease of microsaccade rates than do uniform backgrounds. Second, we demonstrate that the occurrence of a microsaccade in a critical time window can delay a subsequent saccadic response. Taken together, our findings suggest that microsaccades contribute to the saccadic facilitation effect, due to a modulation of microsaccade rates by properties of the background.
Linked linear mixed models
(2016)
The complexity of eye-movement control during reading allows measurement of many dependent variables, the most prominent ones being fixation durations and their locations in words. In current practice, either variable may serve as dependent variable or covariate for the other in linear mixed models (LMMs) featuring also psycholinguistic covariates of word recognition and sentence comprehension. Rather than analyzing fixation location and duration with separate LMMs, we propose linking the two according to their sequential dependency. Specifically, we include predicted fixation location (estimated in the first LMM from psycholinguistic covariates) and its associated residual fixation location as covariates in the second, fixation-duration LMM. This linked LMM affords a distinction between direct and indirect effects (mediated through fixation location) of psycholinguistic covariates on fixation durations. Results confirm the robustness of distributed processing in the perceptual span. They also offer a resolution of the paradox of the inverted optimal viewing position (IOVP) effect (i.e., longer fixation durations in the center than at the beginning and end of words) although the opposite (i.e., an OVP effect) is predicted from default assumptions of psycholinguistic processing efficiency: The IOVP effect in fixation durations is due to the residual fixation-location covariate, presumably driven primarily by saccadic error, and the OVP effect (at least the left part of it) is uncovered with the predicted fixation-location covariate, capturing the indirect effects of psycholinguistic covariates. We expect that linked LMMs will be useful for the analysis of other dynamically related multiple outcomes, a conundrum of most psychonomic research.
Understanding how humans move their eyes is an important part for understanding the functioning of the visual system. Analyzing eye movements from observations of natural scenes on a computer screen is a step to understand human visual behavior in the real world. When analyzing eye-movement data from scene-viewing experiments, the impor- tant questions are where (fixation locations), how long (fixation durations) and when (ordering of fixations) participants fixate on an image. By answering these questions, computational models can be developed which predict human scanpaths. Models serve as a tool to understand the underlying cognitive processes while observing an image, especially the allocation of visual attention.
The goal of this thesis is to provide new contributions to characterize and model human scanpaths on natural scenes. The results from this thesis will help to understand and describe certain systematic eye-movement tendencies, which are mostly independent of the image. One eye-movement tendency I focus on throughout this thesis is the tendency to fixate more in the center of an image than on the outer parts, called the central fixation bias. Another tendency, which I will investigate thoroughly, is the characteristic distribution of angles between successive eye movements.
The results serve to evaluate and improve a previously published model of scanpath generation from our laboratory, the SceneWalk model. Overall, six experiments were conducted for this thesis which led to the following five core results:
i) A spatial inhibition of return can be found in scene-viewing data. This means that locations which have already been fixated are afterwards avoided for a certain time interval (Chapter 2).
ii) The initial fixation position when observing an image has a long-lasting influence of up to five seconds on further scanpath progression (Chapter 2 & 3).
iii) The often described central fixation bias on images depends strongly on the duration of the initial fixation. Long-lasting initial fixations lead to a weaker central fixation bias than short fixations (Chapter 2 & 3).
iv) Human observers adjust their basic eye-movement parameters, like fixation dura- tions and saccade amplitudes, to the visual properties of a target they look for in visual search (Chapter 4).
v) The angle between two adjacent saccades is an indicator for the selectivity of the upcoming saccade target (Chapter 4).
All results emphasize the importance of systematic behavioral eye-movement tenden- cies and dynamic aspects of human scanpaths in scene viewing.
Moving arms
(2018)
Embodied cognition postulates a bi-directional link between the human body and its cognitive functions. Whether this holds for higher cognitive functions such as problem solving is unknown. We predicted that arm movement manipulations performed by the participants could affect the problem-solving solutions. We tested this prediction in quantitative reasoning tasks that allowed two solutions to each problem (addition or subtraction). In two studies with healthy adults (N=53 and N=50), we found an effect of problem-congruent movements on problem solutions. Consistent with embodied cognition, sensorimotor information gained via right or left arm movements affects the solution in different types of problem-solving tasks.
While the influence of spatial-numerical associations in number categorization tasks has been well established, their role in mental arithmetic is less clear. It has been hypothesized that mental addition leads to rightward and upward shifts of spatial attention (along the "mental number line"), whereas subtraction leads to leftward and downward shifts. We addressed this hypothesis by analyzing spontaneous eye movements during mental arithmetic. Participants solved verbally presented arithmetic problems (e.g., 2 + 7, 8-3) aloud while looking at a blank screen. We found that eye movements reflected spatial biases in the ongoing mental operation: Gaze position shifted more upward when participants solved addition compared to subtraction problems, and the horizontal gaze position was partly determined by the magnitude of the operands. Interestingly, the difference between addition and subtraction trials was driven by the operator (plus vs. minus) but was not influenced by the computational process. Thus, our results do not support the idea of a mental movement toward the solution during arithmetic but indicate a semantic association between operation and space.
Eye movements serve as a window into ongoing visual-cognitive processes and can thus be used to investigate how people perceive real-world scenes. A key issue for understanding eye-movement control during scene viewing is the roles of central and peripheral vision, which process information differently and are therefore specialized for different tasks (object identification and peripheral target selection respectively). Yet, rather little is known about the contributions of central and peripheral processing to gaze control and how they are coordinated within a fixation during scene viewing. Additionally, the factors determining fixation durations have long been neglected, as scene perception research has mainly been focused on the factors determining fixation locations. The present thesis aimed at increasing the knowledge on how central and peripheral vision contribute to spatial and, in particular, to temporal aspects of eye-movement control during scene viewing. In a series of five experiments, we varied processing difficulty in the central or the peripheral visual field by attenuating selective parts of the spatial-frequency spectrum within these regions. Furthermore, we developed a computational model on how foveal and peripheral processing might be coordinated for the control of fixation duration. The thesis provides three main findings. First, the experiments indicate that increasing processing demands in central or peripheral vision do not necessarily prolong fixation durations; instead, stimulus-independent timing is adapted when processing becomes too difficult. Second, peripheral vision seems to play a prominent role in the control of fixation durations, a notion also implemented in the computational model. The model assumes that foveal and peripheral processing proceed largely in parallel and independently during fixation, but can interact to modulate fixation duration. Thus, we propose that the variation in fixation durations can in part be accounted for by the interaction between central and peripheral processing. Third, the experiments indicate that saccadic behavior largely adapts to processing demands, with a bias of avoiding spatial-frequency filtered scene regions as saccade targets. We demonstrate that the observed saccade amplitude patterns reflect corresponding modulations of visual attention. The present work highlights the individual contributions and the interplay of central and peripheral vision for gaze control during scene viewing, particularly for the control of fixation duration. Our results entail new implications for computational models and for experimental research on scene perception.