A principled approach to feature selection in models of sentence processing
- Among theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encodeAmong theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt's eye-tracking data. The features can easily be plugged into existing parsing models (including cue-based retrieval and self-organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.…
Author details: | Garrett SmithORCiD, Shravan VasishthORCiDGND |
---|---|
DOI: | https://doi.org/10.1111/cogs.12918 |
ISSN: | 0364-0213 |
ISSN: | 1551-6709 |
Pubmed ID: | https://pubmed.ncbi.nlm.nih.gov/33306205 |
Title of parent work (English): | Cognitive science : a multidisciplinary journal of anthropology, artificial intelligence, education, linguistics, neuroscience, philosophy, psychology ; journal of the Cognitive Science Society |
Publisher: | Wiley |
Place of publishing: | Hoboken |
Publication type: | Article |
Language: | English |
Date of first publication: | 2020/12/11 |
Publication year: | 2020 |
Release date: | 2022/11/14 |
Tag: | Cue‐based retrieval; features; linguistic; plausibility; word embeddings |
Volume: | 44 |
Issue: | 12 |
Article number: | e12918 |
Number of pages: | 25 |
Funding institution: | University of Potsdam |
Organizational units: | Humanwissenschaftliche Fakultät / Strukturbereich Kognitionswissenschaften / Department Linguistik |
DDC classification: | 4 Sprache / 41 Linguistik / 410 Linguistik |
Peer review: | Referiert |
Publishing method: | Open Access / Hybrid Open-Access |
License (German): | CC-BY - Namensnennung 4.0 International |