TY  - THES
A1  - Garoufi, Konstantina
T1  - Interactive generation of effective discourse in situated context : a planning-based approach
T1  - Interaktive Generierung von effektivem Diskurs in situiertem Kontext: Ein planungsbasierter Ansatz
N2  - As our modern-built structures are becoming increasingly complex, carrying out basic tasks such as identifying points or objects of interest in our surroundings can consume considerable time and cognitive resources. In this thesis, we present a computational approach to converting contextual information about a person's physical environment into natural language, with the aim of helping this person identify given task-related entities in their environment. Using efficient methods from automated planning - the field of artificial intelligence concerned with finding courses of action that can achieve a goal -, we generate discourse that interactively guides a hearer through completing their task. Our approach addresses the challenges of controlling, adapting to, and monitoring the situated context. To this end, we develop a natural language generation system that plans how to manipulate the non-linguistic context of a scene in order to make it more favorable for references to task-related objects. This strategy distributes a hearer's cognitive load of interpreting a reference over multiple utterances rather than one long referring expression. Further, to optimize the system's linguistic choices in a given context, we learn how to distinguish speaker behavior according to its helpfulness to hearers in a certain situation, and we model the behavior of human speakers that has been proven helpful. The resulting system combines symbolic with statistical reasoning, and tackles the problem of making non-trivial referential choices in rich context. Finally, we complement our approach with a mechanism for preventing potential misunderstandings after a reference has been generated. Employing remote eye-tracking technology, we monitor the hearer's gaze and find that it provides a reliable index of online referential understanding, even in dynamically changing scenes. We thus present a system that exploits hearer gaze to generate rapid feedback on a per-utterance basis, further enhancing its effectiveness. Though we evaluate our approach in virtual environments, the efficiency of our planning-based model suggests that this work could be a step towards effective conversational human-computer interaction situated in the real world.
N2  - Die zunehmende Komplexität moderner Gebäude und Infrastrukturen führt dazu, dass alltägliche Aktivitäten, wie z.B. die Identifizierung von gesuchten Objekten in unserer Umgebung und das Auffinden von Orten, beträchtliche Zeit und kognitive Ressourcen in Anspruch nehmen können. In dieser Dissertation werden computerbasierte Verfahren präsentiert, welche eine Person dabei unterstützen, Zielobjekte in Ihrem Umfeld zu identifizieren. Dabei werden Informationen über die Situation und das physische Umfeld der Person - der sog. situierte Kontext - in natürliche Sprache umgewandelt. So wird Diskurs generiert, der einen Hörer interaktiv zum Erreichen eines Zieles bzw. zum Abschließen einer Aufgabe führt. Hierbei kommen Methoden aus der Planung zum Einsatz, einem Gebiet der künstlichen Intelligenz, welches sich mit der Berechnung von zielgerichteten Handlungsabfolgen beschäftigt. Die in dieser Arbeit vorgestellten Verfahren widmen sich den Herausforderungen der Kontrolle des situierten Kontexts, der Anpassung an den situierten Kontext sowie der Überwachung des situierten Kontexts. Zu diesem Zweck wird zunächst ein Sprachgenerierungssystem entwickelt, das plant, wie der nicht-linguistische Kontext einer Szene manipuliert werden kann, damit die Referenz auf relevante Objekte erleichtert wird. Dadurch ist es möglich, die kognitive Beanspruchung eines Hörers bei der Interpretation einer Referenz über mehrere sprachliche Äußerungen zu verteilen. Damit die linguistischen Entscheidungen des Systems in einem vorgegebenen Kontext optimiert werden können, wird weiterhin gelernt, die Äußerungen von Sprechern danach zu differenzieren, wie hilfreich sie in bestimmten Situationen für die Hörer waren. Dabei wird das Verhalten von menschlichen Sprechern, welches sich als hilfreich erwiesen hat, modelliert. Das daraus entstehende System kombiniert symbolisches und statistisches Schließen und stellt somit einen Lösungsansatz für das Problem dar, wie nicht-triviale referentielle Entscheidungen in reichem Kontext getroffen werden können. Zum Schluss wird ein komplementärer Mechanismus vorgestellt, der potentielle Missverständnisse bzgl. generierter Referenzen verhindern kann. Zu diesem Zweck kommt Blickerfassungstechnologie zum Einsatz. Auf Basis der Überwachung und Auswertung des Blicks des Hörers können Rückschlüsse über die Interpretation gegebener Referenzen gemacht werden; dieser Mechanismus funktioniert auch in sich dynamisch verändernden Szenen zuverlässig. Somit wird ein System präsentiert, welches den Blick des Hörers nutzt, um rasch Feedback zu generieren. Dieses Vorgehen verbessert die Effektivität des Diskurses zusätzlich. Die vorgestellten Verfahren werden in virtuellen Umwelten evaluiert. Die Effizienz des planungsbasierten Modells ist allerdings ein Indiz dafür, dass die in dieser Arbeit gemachten Vorschläge dazu dienen können, effektive Mensch-Computer-Interaktion auf Basis von Sprache auch in der realen Welt umzusetzen.
KW  - natural language generation
KW  - human-computer interaction
KW  - situated context
KW  - effective discourse
KW  - automated planning
Y1  - 2013
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-69108
ER  - 
TY  - JOUR
A1  - Garoufi, Konstantina
A1  - Koller, Alexander
T1  - Generation of effective referring expressions in situated context
JF  - Language, cognition and neuroscience
N2  - In task-oriented communication, references often need to be effective in their distinctive function, that is, help the hearer identify the referent correctly and as effortlessly as possible. However, it can be challenging for computational or empirical studies to capture referential effectiveness. Empirical findings indicate that human-produced references are not always optimally effective, and that their effectiveness may depend on different aspects of the situational context that can evolve dynamically over the course of an interaction. On this basis, we propose a computational model of effective reference generation which distinguishes speaker behaviour according to its helpfulness to the hearer in a certain situation, and explicitly aims at modelling highly helpful speaker behaviour rather than speaker behaviour invariably. Our model, which extends the planning-based paradigm of sentence generation with a statistical account of effectiveness, can adapt to the situational context by making this distinction newly for each new reference. We find that the generated references resemble those of effective human speakers more closely than references of baseline models, and that they are resolved correctly more often than those of other models participating in a shared-task evaluation with human hearers. Finally, we argue that the model could serve as a methodological framework for computational and empirical research on referential effectiveness.
KW  - natural language generation
KW  - reference
KW  - referential effectiveness
Y1  - 2014
U6  - https://doi.org/10.1080/01690965.2013.847190
SN  - 2327-3798
SN  - 2327-3801
VL  - 29
IS  - 8
SP  - 986
EP  - 1001
PB  - Routledge, Taylor & Francis Group
CY  - Abingdon
ER  - 
TY  - JOUR
A1  - Garoufi, Konstantina
A1  - Staudte, Maria
A1  - Koller, Alexander
A1  - Crocker, Matthew W.
T1  - Exploiting Listener Gaze to Improve Situated Communication in Dynamic Virtual Environments
JF  - Cognitive science : a multidisciplinary journal of anthropology, artificial intelligence, education, linguistics, neuroscience, philosophy, psychology ; journal of the Cognitive Science Society
N2  - Beyond the observation that both speakers and listeners rapidly inspect the visual targets of referring expressions, it has been argued that such gaze may constitute part of the communicative signal. In this study, we investigate whether a speaker may, in principle, exploit listener gaze to improve communicative success. In the context of a virtual environment where listeners follow computer-generated instructions, we provide two kinds of support for this claim. First, we show that listener gaze provides a reliable real-time index of understanding even in dynamic and complex environments, and on a per-utterance basis. Second, we show that a language generation system that uses listener gaze to provide rapid feedback improves overall task performance in comparison with two systems that do not use gaze. Aside from demonstrating the utility of listener gaze insituated communication, our findings open the door to new methods for developing and evaluating multi-modal models of situated interaction.
KW  - Listener gaze
KW  - Eye-tracking
KW  - Referential understanding
KW  - Virtual environments
KW  - Situated communication
Y1  - 2016
U6  - https://doi.org/10.1111/cogs.12298
SN  - 0364-0213
SN  - 1551-6709
VL  - 40
SP  - 1671
EP  - 1703
PB  - Wiley-Blackwell
CY  - Hoboken
ER  -