TY  - THES
A1  - Abdelwahab Hussein Abdelwahab Elsayed, Ahmed
T1  - Probabilistic, deep, and metric learning for biometric identiﬁcation from eye movements
N2  - A central insight from psychological studies on human eye movements is that eye movement patterns are highly individually characteristic. They can, therefore, be used as a biometric feature, that is, subjects can be identiﬁed based on their eye movements. This thesis introduces new machine learning methods to identify subjects based on their eye movements while viewing arbitrary content. The thesis focuses on probabilistic modeling of the problem, which has yielded the best results in the most recent literature. The thesis studies the problem in three phases by proposing a purely probabilistic, probabilistic deep learning, and probabilistic deep metric learning approach. In the ﬁrst phase, the thesis studies models that rely on psychological concepts about eye movements. Recent literature illustrates that individual-speciﬁc distributions of gaze patterns can be used to accurately identify individuals. In these studies, models were based on a simple parametric family of distributions. Such simple parametric models can be robustly estimated from sparse data, but have limited ﬂexibility to capture the differences between individuals. Therefore, this thesis proposes a semiparametric model of gaze patterns that is ﬂexible yet robust for individual identiﬁcation. These patterns can be understood as domain knowledge derived from psychological literature. Fixations and saccades are examples of simple gaze patterns. The proposed semiparametric densities are drawn under a Gaussian process prior centered at a simple parametric distribution. Thus, the model will stay close to the parametric class of densities if little data is available, but it can also deviate from this class if enough data is available, increasing the ﬂexibility of the model. The proposed method is evaluated on a large-scale dataset, showing signiﬁcant improvements over the state-of-the-art. Later, the thesis replaces the model based on gaze patterns derived from psychological concepts with a deep neural network that can learn more informative and complex patterns from raw eye movement data. As previous work has shown that the distribution of these patterns across a sequence is informative, a novel statistical aggregation layer called the quantile layer is introduced. It explicitly ﬁts the distribution of deep patterns learned directly from the raw eye movement data. The proposed deep learning approach is end-to-end learnable, such that the deep model learns to extract informative, short local patterns while the quantile layer learns to approximate the distributions of these patterns. Quantile layers are a generic approach that can converge to standard pooling layers or have a more detailed description of the features being pooled, depending on the problem. The proposed model is evaluated in a large-scale study using the eye movements of subjects viewing arbitrary visual input. The model improves upon the standard pooling layers and other statistical aggregation layers proposed in the literature. It also improves upon the state-of-the-art eye movement biometrics by a wide margin. Finally, for the model to identify any subject — not just the set of subjects it is trained on — a metric learning approach is developed. Metric learning learns a distance function over instances. The metric learning model maps the instances into a metric space, where sequences of the same individual are close, and sequences of diﬀerent individuals are further apart. This thesis introduces a deep metric learning approach with distributional embeddings. The approach represents sequences as a set of continuous distributions in a metric space; to achieve this, a new loss function based on Wasserstein distances is introduced. The proposed method is evaluated on multiple domains besides eye movement biometrics. This approach outperforms the state of the art in deep metric learning in several domains while also outperforming the state of the art in eye movement biometrics.
N2  - Die Art und Weise, wie wir unsere Augen bewegen, ist individuell charakteristisch. Augenbewegungen können daher zur biometrischen Identifikation verwendet werden. Die Dissertation stellt neuartige Methoden des maschinellen Lernens zur Identifzierung von Probanden anhand ihrer Blickbewegungen während des Betrachtens beliebiger visueller Inhalte vor. Die Arbeit konzentriert sich auf die probabilistische Modellierung des Problems, da dies die besten Ergebnisse in der aktuellsten Literatur liefert. Die Arbeit untersucht das Problem in drei Phasen.
In der ersten Phase stützt sich die Arbeit bei der Entwicklung eines probabilistischen Modells auf Wissen über Blickbewegungen aus der psychologischen Literatur. Existierende Studien haben gezeigt, dass die individuelle Verteilung von Blickbewegungsmustern verwendet werden kann, um Individuen genau zu identifizieren. Existierende probabilistische Modelle verwenden feste Verteilungsfamilien in Form von parametrischen Modellen, um diese Verteilungen zu approximieren. Die Verwendung solcher einfacher Verteilungsfamilien hat den Vorteil, dass sie robuste Verteilungsschätzungen auch auf kleinen Mengen von Beobachtungen ermöglicht. Ihre Flexibilität, Unterschiede zwischen Personen zu erfassen, ist jedoch begrenzt. Die Arbeit schlägt daher eine semiparametrische Modellierung der Blickmuster vor, die flexibel und dennoch robust individuelle Verteilungen von Blickbewegungsmustern schätzen kann. Die modellierten Blickmuster können als Domänenwissen verstanden werden, das aus der psychologischen Literatur abgeleitet ist. Beispielsweise werden Verteilungen über Fixationsdauern und Sprungweiten (Sakkaden) bei bestimmten Vor- und Rücksprüngen innerhalb des Textes modelliert. Das semiparametrische Modell bleibt nahe des parametrischen Modells, wenn nur wenige Daten verfügbar sind, kann jedoch auch vom parametrischen Modell abweichen, wenn genügend Daten verfügbar sind, wodurch die Flexibilität erhöht wird. Die Methode wird auf einem großen Datenbestand evaluiert und zeigt eine signifikante Verbesserung gegenüber dem Stand der Technik der Forschung zur biometrischen Identifikation aus Blickbewegungen.
Später ersetzt die Dissertation die zuvor untersuchten aus der psychologischen Literatur abgeleiteten Blickmuster durch ein auf tiefen neuronalen Netzen basierendes Modell, das aus den Rohdaten der Augenbewegungen informativere komplexe Muster lernen kann. Tiefe neuronale Netze sind eine Technik des maschinellen Lernens, bei der in komplexen, mehrschichtigen Modellen schrittweise abstraktere Merkmale aus Rohdaten extrahiert werden. Da frühere Arbeiten gezeigt haben, dass die Verteilung von Blickbewegungsmustern innerhalb einer Blickbewegungssequenz informativ ist, wird eine neue Aggrgationsschicht für tiefe neuronale Netze eingeführt, die explizit die Verteilung der gelernten Muster schätzt. Die vorgeschlagene Aggregationsschicht für tiefe neuronale Netze ist nicht auf die Modellierung von Blickbewegungen beschränkt, sondern kann als Verallgemeinerung von existierenden einfacheren Aggregationsschichten in beliebigen Anwendungen eingesetzt werden. Das vorgeschlagene Modell wird in einer umfangreichen Studie unter Verwendung von Augenbewegungen von Probanden evaluiert, die Videomaterial unterschiedlichen Inhalts und unterschiedlicher Länge betrachten. Das Modell verbessert die Identifikationsgenauigkeit im Vergleich zu tiefen neuronalen Netzen mit Standardaggregationsschichten und existierenden probabilistischen Modellen zur Identifikation aus Blickbewegungen.
Damit das Modell zum Anwendungszeitpunkt beliebige Probanden identifizieren kann, und nicht nur diejenigen Probanden, mit deren Daten es trainiert wurde, wird ein metrischer Lernansatz entwickelt. Beim metrischen Lernen lernt das Modell eine Funktion, mit der die Ähnlichkeit zwischen Blickbewegungssequenzen geschätzt werden kann. Das metrische Lernen bildet die Instanzen in einen neuen Raum ab, in dem Sequenzen desselben Individuums nahe beieinander liegen und Sequenzen verschiedener Individuen weiter voneinander entfernt sind. Die Dissertation stellt einen neuen metrischen Lernansatz auf Basis tiefer neuronaler Netze vor. Der Ansatz repäsentiert eine Sequenz in einem metrischen Raum durch eine Menge von Verteilungen. Das vorgeschlagene Verfahren ist nicht spezifisch für die Blickbewegungsmodellierung, und wird in unterschiedlichen Anwendungsproblemen empirisch evaluiert. Das Verfahren führt zu genaueren Modellen im Vergleich zu existierenden metrischen Lernverfahren und existierenden Modellen zur Identifikation aus Blickbewegungen.
KW  - probabilistic deep metric learning
KW  - probabilistic deep learning
KW  - biometrics
KW  - eye movements
KW  - biometrische Identifikation
KW  - Augenbewegungen
KW  - probabilistische tiefe neuronale Netze
KW  - probabilistisches tiefes metrisches Lernen
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-467980
ER  - 
TY  - JOUR
A1  - Adam, Maurits
A1  - Elsner, Birgit
T1  - The impact of salient action effects on 6-, 7-, and 11-month-olds’ goal-predictive gaze shifts for a human grasping action
JF  - PLOS ONE
N2  - When infants observe a human grasping action, experience-based accounts predict that all infants familiar with grasping actions should be able to predict the goal regardless of additional agency cues such as an action effect. Cue-based accounts, however, suggest that infants use agency cues to identify and predict action goals when the action or the agent is not familiar. From these accounts, we hypothesized that younger infants would need additional agency cues such as a salient action effect to predict the goal of a human grasping action, whereas older infants should be able to predict the goal regardless of agency cues. In three experiments, we presented 6-, 7-, and 11-month-olds with videos of a manual grasping action presented either with or without an additional salient action effect (Exp. 1 and 2), or we presented 7-month-olds with videos of a mechanical claw performing a grasping action presented with a salient action effect (Exp. 3). The 6-month-olds showed tracking gaze behavior, and the 11-month-olds showed predictive gaze behavior, regardless of the action effect. However, the 7-month-olds showed predictive gaze behavior in the action-effect condition, but tracking gaze behavior in the no-action-effect condition and in the action-effect condition with a mechanical claw. The results therefore support the idea that salient action effects are especially important for infants' goal predictions from 7 months on, and that this facilitating influence of action effects is selective for the observation of human hands.
KW  - attention
KW  - eye movements
KW  - infants perception
KW  - mechanisms
KW  - origins
Y1  - 2020
U6  - https://doi.org/10.1371/journal.pone.0240165
SN  - 1932-6203
VL  - 15
IS  - 10
PB  - Public Library of Science
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Barthelme, Simon
A1  - Trukenbrod, Hans Arne
A1  - Engbert, Ralf
A1  - Wichmann, Felix A.
T1  - Modeling fixation locations using spatial point processes
JF  - Journal of vision
N2  - Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.
KW  - eye movements
KW  - fixation locations
KW  - saliency
KW  - modeling
KW  - point process
KW  - spatial statistics
Y1  - 2013
U6  - https://doi.org/10.1167/13.12.1
SN  - 1534-7362
VL  - 13
IS  - 12
PB  - Association for Research in Vision and Opthalmology
CY  - Rockville
ER  - 
TY  - THES
A1  - Cajar, Anke
T1  - Eye-movement control during scene viewing
T1  - Blicksteuerung bei der Betrachtung natürlicher Szenen
BT  - the roles of central and peripheral vision
BT  - die Rollen des zentralen und peripheren Sehens
N2  - Eye movements serve as a window into ongoing visual-cognitive processes and can thus be used to investigate how people perceive real-world scenes. A key issue for understanding eye-movement control during scene viewing is the roles of central and peripheral vision, which process information differently and are therefore specialized for different tasks (object identification and peripheral target selection respectively). Yet, rather little is known about the contributions of central and peripheral processing to gaze control and how they are coordinated within a fixation during scene viewing. Additionally, the factors determining fixation durations have long been neglected, as scene perception research has mainly been focused on the factors determining fixation locations. The present thesis aimed at increasing the knowledge on how central and peripheral vision contribute to spatial and, in particular, to temporal aspects of eye-movement control during scene viewing. In a series of five experiments, we varied processing difficulty in the central or the peripheral visual field by attenuating selective parts of the spatial-frequency spectrum within these regions. Furthermore, we developed a computational model on how foveal and peripheral processing might be coordinated for the control of fixation duration. The thesis provides three main findings. First, the experiments indicate that increasing processing demands in central or peripheral vision do not necessarily prolong fixation durations; instead, stimulus-independent timing is adapted when processing becomes too difficult. Second, peripheral vision seems to play a prominent role in the control of fixation durations, a notion also implemented in the computational model. The model assumes that foveal and peripheral processing proceed largely in parallel and independently during fixation, but can interact to modulate fixation duration. Thus, we propose that the variation in fixation durations can in part be accounted for by the interaction between central and peripheral processing. Third, the experiments indicate that saccadic behavior largely adapts to processing demands, with a bias of avoiding spatial-frequency filtered scene regions as saccade targets. We demonstrate that the observed saccade amplitude patterns reflect corresponding modulations of visual attention. The present work highlights the individual contributions and the interplay of central and peripheral vision for gaze control during scene viewing, particularly for the control of fixation duration. Our results entail new implications for computational models and for experimental research on scene perception.
N2  - Blickbewegungen stellen ein Fenster in aktuelle visuell-kognitive Prozesse dar und können genutzt werden um zu untersuchen wie Menschen natürliche Szenen wahrnehmen. Eine zentrale Frage ist, welche Rollen zentrales und peripheres Sehen für die Blicksteuerung in Szenen spielen, da sie Information unterschiedlich verarbeiten und auf verschiedene Aufgaben spezialisiert sind (Objektidentifikation bzw. periphere Zielauswahl). Jedoch ist kaum bekannt, welche Beiträge zentrale und periphere Verarbeitung für die Blicksteuerung in Szenen leisten und wie sie während der Fixation koordiniert werden. Des Weiteren wurden Einflussfaktoren auf Fixationsdauern bisher vernachlässigt, da die Forschung zur Szenenwahrnehmung hauptsächlich auf Einflussfaktoren auf Fixationsorte fokussiert war. Die vorliegende Arbeit hatte zum Ziel, das Wissen über die Beiträge des zentralen und peripheren Sehens zu räumlichen, aber vor allem zu zeitlichen Aspekten der Blicksteuerung in Szenen zu erweitern. In einer Serie von fünf Experimenten haben wir die Verarbeitungsschwierigkeit im zentralen oder peripheren visuellen Feld durch die Abschwächung selektiver Raumfrequenzanteile innerhalb dieser Regionen variiert. Des Weiteren haben wir ein computationales Modell zur Koordination von fovealer und peripherer Verarbeitung für die Kontrolle von Fixationsdauern entwickelt. Die Arbeit liefert drei Hauptbefunde. Erstens zeigen die Experimente, dass erhöhte Verarbeitungsanforderungen im zentralen oder peripheren visuellen Feld nicht zwangsläufig zu längeren Fixationsdauern führen; stattdessen werden Fixationsdauern stimulus-unabhängig gesteuert, wenn die Verarbeitung zu schwierig wird. Zweitens scheint peripheres Sehen eine entscheidene Rolle für die Kontrolle von Fixationsdauern zu spielen, eine Idee, die auch im computationalen Modell umgesetzt wurde. Das Modell nimmt an, dass foveale und periphere Verarbeitung während der Fixation weitgehend parallel und unabhängig ablaufen, aber interagieren können um Fixationsdauern zu modulieren. Wir schlagen somit vor, dass Änderungen in Fixationsdauern zum Teil auf die Interaktion von zentraler und peripherer Verarbeitung zurückgeführt werden können. Drittens zeigen die Experimente, dass räumliches Blickverhalten sich weitgehend an Verarbeitungsanforderungen anpasst und Betrachter Szenenregionen mit Raumfrequenzfilterung als Sakkadenziele vermeiden. Wir zeigen, dass diese Sakkadenamplitudeneffekte entsprechende Modulationen der visuellen Aufmerksamkeit reflektieren. Die vorliegende Arbeit hebt die einzelnen Beiträge und das Zusammenspiel zentralen und peripheren Sehens für die Blicksteuerung in der Szenenwahrnehmung hervor, besonders für die Kontrolle von Fixationsdauern. Unsere Ergebnisse geben neue Implikationen für computationale Modelle und experimentelle Forschung zur Szenenwahrnehmung.
KW  - psychology
KW  - eye movements
KW  - scene perception
KW  - Psychologie
KW  - Blickbewegungen
KW  - Szenenwahrnehmung
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-395536
ER  - 
TY  - GEN
A1  - Cajar, Anke
A1  - Engbert, Ralf
A1  - Laubrock, Jochen
T1  - Potsdam Eye-Movement Corpus for Scene Memorization and Search With Color and Spatial-Frequency Filtering
T2  - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe
T3  - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 788 
KW  - eye movements
KW  - corpus dataset
KW  - scene viewing
KW  - object search
KW  - scene memorization
KW  - spatial frequencies
KW  - color
KW  - central and peripheral vision
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-563184
SN  - 1866-8364
SP  - 1
EP  - 7
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Cajar, Anke
A1  - Engbert, Ralf
A1  - Laubrock, Jochen
T1  - Potsdam Eye-Movement Corpus for Scene Memorization and Search With Color and Spatial-Frequency Filtering
JF  - Frontiers in psychology / Frontiers Research Foundation
KW  - eye movements
KW  - corpus dataset
KW  - scene viewing
KW  - object search
KW  - scene memorization
KW  - spatial frequencies
KW  - color
KW  - central and peripheral vision
Y1  - 2022
U6  - https://doi.org/10.3389/fpsyg.2022.850482
SN  - 1664-1078
VL  - 13
SP  - 1
EP  - 7
PB  - Frontiers Research Foundation
CY  - Lausanne, Schweiz
ER  - 
TY  - JOUR
A1  - Cajar, Anke
A1  - Engbert, Ralf
A1  - Laubrock, Jochen
T1  - How spatial frequencies and color drive object search in real-world scenes
BT  - a new eye-movement corpus
JF  - Journal of vision
N2  - When studying how people search for objects in scenes, the inhomogeneity of the visual field is often ignored. Due to physiological limitations, peripheral vision is blurred and mainly uses coarse-grained information (i.e., low spatial frequencies) for selecting saccade targets, whereas high-acuity central vision uses fine-grained information (i.e., high spatial frequencies) for analysis of details. Here we investigated how spatial frequencies and color affect object search in real-world scenes. Using gaze-contingent filters, we attenuated high or low frequencies in central or peripheral vision while viewers searched color or grayscale scenes. Results showed that peripheral filters and central high-pass filters hardly affected search accuracy, whereas accuracy dropped drastically with central low-pass filters. Peripheral filtering increased the time to localize the target by decreasing saccade amplitudes and increasing number and duration of fixations. The use of coarse-grained information in the periphery was limited to color scenes. Central filtering increased the time to verify target identity instead, especially with low-pass filters. We conclude that peripheral vision is critical for object localization and central vision is critical for object identification. Visual guidance during peripheral object localization is dominated by low-frequency color information, whereas high-frequency information, relatively independent of color, is most important for object identification in central vision.
KW  - scene viewing
KW  - eye movements
KW  - object search
KW  - central and peripheral
KW  - vision
KW  - spatial frequencies
KW  - color
KW  - gaze-contingent displays
Y1  - 2020
U6  - https://doi.org/10.1167/jov.20.7.8
SN  - 1534-7362
VL  - 20
IS  - 7
PB  - Association for Research in Vision and Opthalmology
CY  - Rockville
ER  - 
TY  - JOUR
A1  - Cunnings, Ian
A1  - Patterson, Clare
A1  - Felser, Claudia
T1  - Structural constraints on pronoun binding and coreference: evidence from eye movements during reading
JF  - Frontiers in psychology
N2  - A number of recent studies have investigated how syntactic and non-syntactic constraints combine to cue memory retrieval during anaphora resolution. In this paper we investigate how syntactic constraints and gender congruence interact to guide memory retrieval during the resolution of subject pronouns. Subject pronouns are always technically ambiguous, and the application of syntactic constraints on their interpretation depends on properties of the antecedent that is to be retrieved. While pronouns can freely corefer with non-quantified referential antecedents, linking a pronoun to a quantified antecedent is only possible in certain syntactic configurations via variable binding. We report the results from a judgment task and three online reading comprehension experiments investigating pronoun resolution with quantified and non-quantified antecedents. Results from both the judgment task and participants' eye movements during reading indicate that comprehenders freely allow pronouns to corefer with non-quantified antecedents, but that retrieval of quantified antecedents is restricted to specific syntactic environments. We interpret our findings as indicating that syntactic constraints constitute highly weighted cues to memory retrieval during anaphora resolution.
KW  - pronoun resolution
KW  - memory retrieval
KW  - quantification
KW  - eye movements
KW  - reading
KW  - English
Y1  - 2015
U6  - https://doi.org/10.3389/fpsyg.2015.00840
SN  - 1664-1078
VL  - 6
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Engbert, Ralf
A1  - Trukenbrod, Hans Arne
A1  - Barthelme, Simon
A1  - Wichmann, Felix A.
T1  - Spatial statistics and attentional dynamics in scene viewing
JF  - Journal of vision
N2  - In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data.
KW  - scene perception
KW  - eye movements
KW  - attention
KW  - saccades
KW  - modeling
KW  - spatial statistics
Y1  - 2015
U6  - https://doi.org/10.1167/15.1.14
SN  - 1534-7362
VL  - 15
IS  - 1
PB  - Association for Research in Vision and Opthalmology
CY  - Rockville
ER  - 
TY  - JOUR
A1  - Felser, Claudia
A1  - Patterson, Clare
A1  - Cunnings, Ian
T1  - Structural constraints on pronoun binding and coreference: Evidence from eye movements during reading
JF  - Frontiers in psychology
N2  - A number of recent studies have investigated how syntactic and non-syntactic constraints combine to cue memory retrieval during anaphora resolution. In this paper we investigate how syntactic constraints and gender congruence interact to guide memory retrieval during the resolution of subject pronouns. Subject pronouns are always technically ambiguous, and the application of syntactic constraints on their interpretation depends on properties of the antecedent that is to be retrieved. While pronouns can freely corefer with non-quantified referential antecedents, linking a pronoun to a quantified antecedent is only possible in certain syntactic configurations via variable binding. We report the results from a judgment task and three online reading comprehension experiments investigating pronoun resolution with quantified and non-quantified antecedents. Results from both the judgment task and participants' eye movements during reading indicate that comprehenders freely allow pronouns to corefer with non-quantified antecedents, but that retrieval of quantified antecedents is restricted to specific syntactic environments. We interpret our findings as indicating that syntactic constraints constitute highly weighted cues to memory retrieval during anaphora resolution.
KW  - pronoun resolution
KW  - memory retrieval
KW  - quantification
KW  - eye movements
KW  - reading
KW  - English
Y1  - 2015
U6  - https://doi.org/10.3389/fpsyg.2015.00840
SN  - 1664-1078
VL  - 6
IS  - 840
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  -