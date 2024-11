When humans engage in a conversation, many cognitive, linguistic and social forces and constraints are at play simultaneously. In particular, each utterance incrementally enriches the participants' common ground by making new bits of information accumulate into their shared set of knowledge, experiences, suppositions, beliefs and memories. To make it happen, they must understand each other, keep track of what has been shared, and collaborate to solve misunderstandings. Modelling this grounding process is evidently very challenging, but also vital both to shed light on how human dialogue works and to build applications that can safely interact with humans. The field of natural language processing has seen advancements with data-driven, end-to-end deep learning models, but this prevailing paradigm also has conceptual frailties when it comes to processing dialogue phenomena: in vogue encoders are by design not fully incremental, models are not trained with an explicit grounding signal, hidden representations are not directly

When humans engage in a conversation, many cognitive, linguistic and social forces and constraints are at play simultaneously. In particular, each utterance incrementally enriches the participants' common ground by making new bits of information accumulate into their shared set of knowledge, experiences, suppositions, beliefs and memories. To make it happen, they must understand each other, keep track of what has been shared, and collaborate to solve misunderstandings. Modelling this grounding process is evidently very challenging, but also vital both to shed light on how human dialogue works and to build applications that can safely interact with humans. The field of natural language processing has seen advancements with data-driven, end-to-end deep learning models, but this prevailing paradigm also has conceptual frailties when it comes to processing dialogue phenomena: in vogue encoders are by design not fully incremental, models are not trained with an explicit grounding signal, hidden representations are not directly interpretable and static datasets abstract away many aspects of interactivity. In this thesis, I propose methods to evaluate the grounding competence of deep learning dialogue models from three perspectives: (i) incremental understanding with timing and revisions; (ii) making information shared and processing the conversation history while considering the interlocutor's perspective; and (iii) requesting clarification while taking actions and dealing with uncertainty in collaborative settings. With a reflective summary of my publications on these three themes, I argue that cognitively motivated evaluation is an effective and useful approach to appreciate what current dialogue models, including chat-optimised large language models, can do, while standing on firm grounds about their limitations and, most importantly, the ethical concerns raised due to their development and use.

