Refine
Year of publication
Document Type
- Article (26)
- Doctoral Thesis (13)
- Postprint (9)
Language
- English (48) (remove)
Is part of the Bibliography
- yes (48) (remove)
Keywords
- eye movements (48) (remove)
Institute
- Department Psychologie (27)
- Strukturbereich Kognitionswissenschaften (7)
- Humanwissenschaftliche Fakultät (4)
- Department Linguistik (2)
- Institut für Physik und Astronomie (2)
- Potsdam Research Institute for Multilingualism (PRIM) (2)
- Extern (1)
- Institut für Informatik und Computational Science (1)
- Institut für Mathematik (1)
- Mathematisch-Naturwissenschaftliche Fakultät (1)
Bottom-up and top-down as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analyzing their influence over time. For this purpose, we develop a saliency model that is based on the internal representation of a recent early spatial vision model to measure the low-level, bottom-up factor. To measure the influence of high-level, bottom-up features, we use a recent deep neural network-based saliency model. To account for top-down influences, we evaluate the models on two large data sets with different tasks: first, a memorization task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: the first saccade, an initial guided exploration characterized by a gradual broadening of the fixation density, and a steady state that is reached after roughly 10 fixations. Saccade-target selection during the initial exploration and in the steady state is related to similar areas of interest, which are better predicted when including high-level features. In the search data set, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties, and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level, bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later, this high-level, bottom-up control can be overruled by top-down influences.
Recent studies using the gaze-contingent boundary paradigm reported a reversed preview benefit- shorter fixations on a target word when an unrelated preview was easier to process than the fixated target (Schotter & Leinenger, 2016). This is explained viaforeedfixatiotzs-short fixations on words that would ideally be skipped (because lexical processing has progressed enough) but could not be because saccade planning reached a point of no return. This contrasts with accounts of preview effects via trans-saccadic integration-shorter fixations on a target word when the preview is more similar to it (see Cutter. Drieghe, & Liversedge, 2015). In addition, if the previewed word-not the fixated target-determines subsequent eye movements, is it also this word that enters the linguistic processing stream? We tested these accounts by having 24 subjects read 150 sentences in the boundary paradigm in which both the preview and target were initially plausible but later one, both, or neither became implausible, providing an opportunity to probe which one was linguistically encoded. In an intervening buffer region, both words were plausible, providing an opportunity to investigate trans-saccadic integration. The frequency of the previewed word affected progressive saccades (i.e.. forced fixations) as well as when transsaccadic integration failure increased regressions, but, only the implausibility of the target word affected semantic encoding. These data support a hybrid account of saccadic control (Reingold, Reichle. Glaholt, & Sheridan, 2012) driven by incomplete (often parafoveal) word recognition, which occurs prior to complete (often foveal) word recognition.
Moving arms
(2018)
Embodied cognition postulates a bi-directional link between the human body and its cognitive functions. Whether this holds for higher cognitive functions such as problem solving is unknown. We predicted that arm movement manipulations performed by the participants could affect the problem-solving solutions. We tested this prediction in quantitative reasoning tasks that allowed two solutions to each problem (addition or subtraction). In two studies with healthy adults (N=53 and N=50), we found an effect of problem-congruent movements on problem solutions. Consistent with embodied cognition, sensorimotor information gained via right or left arm movements affects the solution in different types of problem-solving tasks.
A central insight from psychological studies on human eye movements is that eye movement patterns are highly individually characteristic. They can, therefore, be used as a biometric feature, that is, subjects can be identified based on their eye movements. This thesis introduces new machine learning methods to identify subjects based on their eye movements while viewing arbitrary content. The thesis focuses on probabilistic modeling of the problem, which has yielded the best results in the most recent literature. The thesis studies the problem in three phases by proposing a purely probabilistic, probabilistic deep learning, and probabilistic deep metric learning approach. In the first phase, the thesis studies models that rely on psychological concepts about eye movements. Recent literature illustrates that individual-specific distributions of gaze patterns can be used to accurately identify individuals. In these studies, models were based on a simple parametric family of distributions. Such simple parametric models can be robustly estimated from sparse data, but have limited flexibility to capture the differences between individuals. Therefore, this thesis proposes a semiparametric model of gaze patterns that is flexible yet robust for individual identification. These patterns can be understood as domain knowledge derived from psychological literature. Fixations and saccades are examples of simple gaze patterns. The proposed semiparametric densities are drawn under a Gaussian process prior centered at a simple parametric distribution. Thus, the model will stay close to the parametric class of densities if little data is available, but it can also deviate from this class if enough data is available, increasing the flexibility of the model. The proposed method is evaluated on a large-scale dataset, showing significant improvements over the state-of-the-art. Later, the thesis replaces the model based on gaze patterns derived from psychological concepts with a deep neural network that can learn more informative and complex patterns from raw eye movement data. As previous work has shown that the distribution of these patterns across a sequence is informative, a novel statistical aggregation layer called the quantile layer is introduced. It explicitly fits the distribution of deep patterns learned directly from the raw eye movement data. The proposed deep learning approach is end-to-end learnable, such that the deep model learns to extract informative, short local patterns while the quantile layer learns to approximate the distributions of these patterns. Quantile layers are a generic approach that can converge to standard pooling layers or have a more detailed description of the features being pooled, depending on the problem. The proposed model is evaluated in a large-scale study using the eye movements of subjects viewing arbitrary visual input. The model improves upon the standard pooling layers and other statistical aggregation layers proposed in the literature. It also improves upon the state-of-the-art eye movement biometrics by a wide margin. Finally, for the model to identify any subject — not just the set of subjects it is trained on — a metric learning approach is developed. Metric learning learns a distance function over instances. The metric learning model maps the instances into a metric space, where sequences of the same individual are close, and sequences of different individuals are further apart. This thesis introduces a deep metric learning approach with distributional embeddings. The approach represents sequences as a set of continuous distributions in a metric space; to achieve this, a new loss function based on Wasserstein distances is introduced. The proposed method is evaluated on multiple domains besides eye movement biometrics. This approach outperforms the state of the art in deep metric learning in several domains while also outperforming the state of the art in eye movement biometrics.
When watching the image of a natural scene on a computer screen, observers initially move their eyes toward the center of the image—a reliable experimental finding termed central fixation bias. This systematic tendency in eye guidance likely masks attentional selection driven by image properties and top-down cognitive processes. Here, we show that the central fixation bias can be reduced by delaying the initial saccade relative to image onset. In four scene-viewing experiments we manipulated observers' initial gaze position and delayed their first saccade by a specific time interval relative to the onset of an image. We analyzed the distance to image center over time and show that the central fixation bias of initial fixations was significantly reduced after delayed saccade onsets. We additionally show that selection of the initial saccade target strongly depended on the first saccade latency. A previously published model of saccade generation was extended with a central activation map on the initial fixation whose influence declined with increasing saccade latency. This extension was sufficient to replicate the central fixation bias from our experiments. Our results suggest that the central fixation bias is generated by default activation as a response to the sudden image onset and that this default activation pattern decreases over time. Thus, it may often be preferable to use a modified version of the scene viewing paradigm that decouples image onset from the start signal for scene exploration to explicitly reduce the central fixation bias.
During visual fixation, the eye generates microsaccades and slower components of fixational eye movements that are part of the visual processing strategy in humans. Here, we show that ongoing heartbeat is coupled to temporal rate variations in the generation of microsaccades. Using coregistration of eye recording and ECG in humans, we tested the hypothesis that microsaccade onsets are coupled to the relative phase of the R-R intervals in heartbeats. We observed significantly more microsaccades during the early phase after the R peak in the ECG. This form of coupling between heartbeat and eye movements was substantiated by the additional finding of a coupling between heart phase and motion activity in slow fixational eye movements; i.e., retinal image slip caused by physiological drift. Our findings therefore demonstrate a coupling of the oculomotor system and ongoing heartbeat, which provides further evidence for bodily influences on visuomotor functioning.
During reading, saccadic eye movements are generated to shift words into the center of the visual field for lexical processing. Recently, Krugel and Engbert (Vision Research 50:1532-1539, 2010) demonstrated that within-word fixation positions are largely shifted to the left after skipped words. However, explanations of the origin of this effect cannot be drawn from normal reading data alone. Here we show that the large effect of skipped words on the distribution of within-word fixation positions is primarily based on rather subtle differences in the low-level visual information acquired before saccades. Using arrangements of "x" letter strings, we reproduced the effect of skipped character strings in a highly controlled single-saccade task. Our results demonstrate that the effect of skipped words in reading is the signature of a general visuomotor phenomenon. Moreover, our findings extend beyond the scope of the widely accepted range-error model, which posits that within-word fixation positions in reading depend solely on the distances of target words. We expect that our results will provide critical boundary conditions for the development of visuomotor models of saccade planning during reading.
Saccades move objects of interest into the center of the visual field for high-acuity visual analysis. White, Stritzke, and Gegenfurtner (Current Biology, 18, 124–128, 2008) have shown that saccadic latencies in the context of a structured background are much shorter than those with an unstructured background at equal levels of visibility. This effect has been explained by possible preactivation of the saccadic circuitry whenever a structured background acts as a mask for potential saccade targets. Here, we show that background textures modulate rates of microsaccades during visual fixation. First, after a display change, structured backgrounds induce a stronger decrease of microsaccade rates than do uniform backgrounds. Second, we demonstrate that the occurrence of a microsaccade in a critical time window can delay a subsequent saccadic response. Taken together, our findings suggest that microsaccades contribute to the saccadic facilitation effect, due to a modulation of microsaccade rates by properties of the background.
Linked linear mixed models
(2016)
The complexity of eye-movement control during reading allows measurement of many dependent variables, the most prominent ones being fixation durations and their locations in words. In current practice, either variable may serve as dependent variable or covariate for the other in linear mixed models (LMMs) featuring also psycholinguistic covariates of word recognition and sentence comprehension. Rather than analyzing fixation location and duration with separate LMMs, we propose linking the two according to their sequential dependency. Specifically, we include predicted fixation location (estimated in the first LMM from psycholinguistic covariates) and its associated residual fixation location as covariates in the second, fixation-duration LMM. This linked LMM affords a distinction between direct and indirect effects (mediated through fixation location) of psycholinguistic covariates on fixation durations. Results confirm the robustness of distributed processing in the perceptual span. They also offer a resolution of the paradox of the inverted optimal viewing position (IOVP) effect (i.e., longer fixation durations in the center than at the beginning and end of words) although the opposite (i.e., an OVP effect) is predicted from default assumptions of psycholinguistic processing efficiency: The IOVP effect in fixation durations is due to the residual fixation-location covariate, presumably driven primarily by saccadic error, and the OVP effect (at least the left part of it) is uncovered with the predicted fixation-location covariate, capturing the indirect effects of psycholinguistic covariates. We expect that linked LMMs will be useful for the analysis of other dynamically related multiple outcomes, a conundrum of most psychonomic research.
Understanding how humans move their eyes is an important part for understanding the functioning of the visual system. Analyzing eye movements from observations of natural scenes on a computer screen is a step to understand human visual behavior in the real world. When analyzing eye-movement data from scene-viewing experiments, the impor- tant questions are where (fixation locations), how long (fixation durations) and when (ordering of fixations) participants fixate on an image. By answering these questions, computational models can be developed which predict human scanpaths. Models serve as a tool to understand the underlying cognitive processes while observing an image, especially the allocation of visual attention.
The goal of this thesis is to provide new contributions to characterize and model human scanpaths on natural scenes. The results from this thesis will help to understand and describe certain systematic eye-movement tendencies, which are mostly independent of the image. One eye-movement tendency I focus on throughout this thesis is the tendency to fixate more in the center of an image than on the outer parts, called the central fixation bias. Another tendency, which I will investigate thoroughly, is the characteristic distribution of angles between successive eye movements.
The results serve to evaluate and improve a previously published model of scanpath generation from our laboratory, the SceneWalk model. Overall, six experiments were conducted for this thesis which led to the following five core results:
i) A spatial inhibition of return can be found in scene-viewing data. This means that locations which have already been fixated are afterwards avoided for a certain time interval (Chapter 2).
ii) The initial fixation position when observing an image has a long-lasting influence of up to five seconds on further scanpath progression (Chapter 2 & 3).
iii) The often described central fixation bias on images depends strongly on the duration of the initial fixation. Long-lasting initial fixations lead to a weaker central fixation bias than short fixations (Chapter 2 & 3).
iv) Human observers adjust their basic eye-movement parameters, like fixation dura- tions and saccade amplitudes, to the visual properties of a target they look for in visual search (Chapter 4).
v) The angle between two adjacent saccades is an indicator for the selectivity of the upcoming saccade target (Chapter 4).
All results emphasize the importance of systematic behavioral eye-movement tenden- cies and dynamic aspects of human scanpaths in scene viewing.
Moving arms
(2018)
Embodied cognition postulates a bi-directional link between the human body and its cognitive functions. Whether this holds for higher cognitive functions such as problem solving is unknown. We predicted that arm movement manipulations performed by the participants could affect the problem-solving solutions. We tested this prediction in quantitative reasoning tasks that allowed two solutions to each problem (addition or subtraction). In two studies with healthy adults (N=53 and N=50), we found an effect of problem-congruent movements on problem solutions. Consistent with embodied cognition, sensorimotor information gained via right or left arm movements affects the solution in different types of problem-solving tasks.
While the influence of spatial-numerical associations in number categorization tasks has been well established, their role in mental arithmetic is less clear. It has been hypothesized that mental addition leads to rightward and upward shifts of spatial attention (along the "mental number line"), whereas subtraction leads to leftward and downward shifts. We addressed this hypothesis by analyzing spontaneous eye movements during mental arithmetic. Participants solved verbally presented arithmetic problems (e.g., 2 + 7, 8-3) aloud while looking at a blank screen. We found that eye movements reflected spatial biases in the ongoing mental operation: Gaze position shifted more upward when participants solved addition compared to subtraction problems, and the horizontal gaze position was partly determined by the magnitude of the operands. Interestingly, the difference between addition and subtraction trials was driven by the operator (plus vs. minus) but was not influenced by the computational process. Thus, our results do not support the idea of a mental movement toward the solution during arithmetic but indicate a semantic association between operation and space.
Eye movements serve as a window into ongoing visual-cognitive processes and can thus be used to investigate how people perceive real-world scenes. A key issue for understanding eye-movement control during scene viewing is the roles of central and peripheral vision, which process information differently and are therefore specialized for different tasks (object identification and peripheral target selection respectively). Yet, rather little is known about the contributions of central and peripheral processing to gaze control and how they are coordinated within a fixation during scene viewing. Additionally, the factors determining fixation durations have long been neglected, as scene perception research has mainly been focused on the factors determining fixation locations. The present thesis aimed at increasing the knowledge on how central and peripheral vision contribute to spatial and, in particular, to temporal aspects of eye-movement control during scene viewing. In a series of five experiments, we varied processing difficulty in the central or the peripheral visual field by attenuating selective parts of the spatial-frequency spectrum within these regions. Furthermore, we developed a computational model on how foveal and peripheral processing might be coordinated for the control of fixation duration. The thesis provides three main findings. First, the experiments indicate that increasing processing demands in central or peripheral vision do not necessarily prolong fixation durations; instead, stimulus-independent timing is adapted when processing becomes too difficult. Second, peripheral vision seems to play a prominent role in the control of fixation durations, a notion also implemented in the computational model. The model assumes that foveal and peripheral processing proceed largely in parallel and independently during fixation, but can interact to modulate fixation duration. Thus, we propose that the variation in fixation durations can in part be accounted for by the interaction between central and peripheral processing. Third, the experiments indicate that saccadic behavior largely adapts to processing demands, with a bias of avoiding spatial-frequency filtered scene regions as saccade targets. We demonstrate that the observed saccade amplitude patterns reflect corresponding modulations of visual attention. The present work highlights the individual contributions and the interplay of central and peripheral vision for gaze control during scene viewing, particularly for the control of fixation duration. Our results entail new implications for computational models and for experimental research on scene perception.
The present study explored the perceptual span (i.e., the physical extent of an area from which useful visual information is extracted during a single fixation) during the reading of Chinese sentences in 2 experiments. In Experiment 1, we tested whether the rightward span can go beyond 3 characters when visually similar masks were used. Results showed that Chinese readers needed at least 4 characters to the right of fixation to maintain a normal reading behavior when visually similar masks were used and when characters were displayed in small fonts, indicating that the span is dynamically influenced by masking materials. In Experiments 2 and 3, we asked whether the perceptual span varies as a function of font size in spaced (German) and unspaced (Chinese) scripts. Results clearly suggest perceptual span depends on font size in Chinese, but we failed to find such evidence for German. We propose that the perceptual span in Chinese is flexible; it is strongly constrained by its language-specific properties such as high information density and lack of word spacing. Implications for saccade-target selection during the reading of Chinese sentences are discussed.
In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data.
While the influence of spatial-numerical associations in number categorization tasks has been well established, their role in mental arithmetic is less clear. It has been hypothesized that mental addition leads to rightward and upward shifts of spatial attention (along the "mental number line"), whereas subtraction leads to leftward and downward shifts. We addressed this hypothesis by analyzing spontaneous eye movements during mental arithmetic. Participants solved verbally presented arithmetic problems (e.g., 2 + 7, 8-3) aloud while looking at a blank screen. We found that eye movements reflected spatial biases in the ongoing mental operation: Gaze position shifted more upward when participants solved addition compared to subtraction problems, and the horizontal gaze position was partly determined by the magnitude of the operands. Interestingly, the difference between addition and subtraction trials was driven by the operator (plus vs. minus) but was not influenced by the computational process. Thus, our results do not support the idea of a mental movement toward the solution during arithmetic but indicate a semantic association between operation and space.
A number of recent studies have investigated how syntactic and non-syntactic constraints combine to cue memory retrieval during anaphora resolution. In this paper we investigate how syntactic constraints and gender congruence interact to guide memory retrieval during the resolution of subject pronouns. Subject pronouns are always technically ambiguous, and the application of syntactic constraints on their interpretation depends on properties of the antecedent that is to be retrieved. While pronouns can freely corefer with non-quantified referential antecedents, linking a pronoun to a quantified antecedent is only possible in certain syntactic configurations via variable binding. We report the results from a judgment task and three online reading comprehension experiments investigating pronoun resolution with quantified and non-quantified antecedents. Results from both the judgment task and participants' eye movements during reading indicate that comprehenders freely allow pronouns to corefer with non-quantified antecedents, but that retrieval of quantified antecedents is restricted to specific syntactic environments. We interpret our findings as indicating that syntactic constraints constitute highly weighted cues to memory retrieval during anaphora resolution.
Although eye movements during reading are modulated by cognitive processing demands, they also reflect visual sampling of the input, and possibly preparation of output for speech or the inner voice. By simultaneously recording eye movements and the voice during reading aloud, we obtained an output measure that constrains the length of time spent on cognitive processing. Here we investigate the dynamics of the eye-voice span (EVS), the distance between eye and voice. We show that the EVS is regulated immediately during fixation of a word by either increasing fixation duration or programming a regressive eye movement against the reading direction. EVS size at the beginning of a fixation was positively correlated with the likelihood of regressions and refixations. Regression probability was further increased if the EVS was still large at the end of a fixation: if adjustment of fixation duration did not sufficiently reduce the EVS during a fixation, then a regression rather than a refixation followed with high probability. We further show that the EVS can help understand cognitive influences on fixation duration during reading: in mixed model analyses, the EVS was a stronger predictor of fixation durations than either word frequency or word length. The EVS modulated the influence of several other predictors on single fixation durations (SFDs). For example, word-N frequency effects were larger with a large EVS, especially when word N-1 frequency was low. Finally, a comparison of SFDs during oral and silent reading showed that reading is governed by similar principles in both reading modes, although EVS maintenance and articulatory processing also cause some differences. In summary, the EVS is regulated by adjusting fixation duration and/or by programming a regressive eye movement when the EVS gets too large. Overall, the EVS appears to be directly related to updating of the working memory buffer during reading.
Word features in parafoveal vision influence eye movements during reading. The question of whether readers extract semantic information from parafoveal words was studied in 3 experiments by using a gaze-contingent display change technique. Subjects read German sentences containing 1 of several preview words that were replaced by a target word during the saccade to the preview (boundary paradigm). In the 1st experiment the preview word was semantically related or unrelated to the target. Fixation durations on the target were shorter for semantically related than unrelated previews, consistent with a semantic preview benefit. In the 2nd experiment, half the sentences were presented following the rules of German spelling (i.e., previews and targets were printed with an initial capital letter), and the other half were presented completely in lowercase. A semantic preview benefit was obtained under both conditions. In the 3rd experiment, we introduced 2 further preview conditions, an identical word and a pronounceable nonword, while also manipulating the text contrast. Whereas the contrast had negligible effects, fixation durations on the target were reliably different for all 4 types of preview. Semantic preview benefits were greater for pretarget fixations closer to the boundary (large preview space) and, although not as consistently, for long pretarget fixation durations (long preview time). The results constrain theoretical proposals about eye movement control in reading.