• Treffer 1 von 2
Zurück zur Trefferliste

Beyond lexical frequencies: using R for text analysis in the digital humanities

  • This paper presents a combination of R packages-user contributed toolkits written in a common core programming language-to facilitate the humanistic investigation of digitised, text-based corpora.Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughout, the focus is on moving beyond the bag-of-words,This paper presents a combination of R packages-user contributed toolkits written in a common core programming language-to facilitate the humanistic investigation of digitised, text-based corpora.Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughout, the focus is on moving beyond the bag-of-words, lexical frequency model by incorporating linguistically-driven analyses in research.zeige mehrzeige weniger

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Taylor ArnoldORCiDGND, Nicolas BallierORCiDGND, Paula LissonORCiD, Lauren TiltonORCiDGND
DOI:https://doi.org/10.1007/s10579-019-09456-6
ISSN:1574-020X
ISSN:1574-0218
Titel des übergeordneten Werks (Englisch):Language resources and evaluation
Verlag:Springer
Verlagsort:Dordrecht
Publikationstyp:Wissenschaftlicher Artikel
Sprache:Englisch
Jahr der Erstveröffentlichung:2019
Erscheinungsjahr:2019
Datum der Freischaltung:28.09.2020
Freies Schlagwort / Tag:Digital humanities; R; Text interoperability; Text mining
Band:53
Ausgabe:4
Seitenanzahl:27
Erste Seite:707
Letzte Seite:733
Organisationseinheiten:Humanwissenschaftliche Fakultät / Strukturbereich Kognitionswissenschaften / Department Linguistik
DDC-Klassifikation:4 Sprache / 41 Linguistik / 410 Linguistik
Peer Review:Referiert
Name der Einrichtung zum Zeitpunkt der Publikation:Humanwissenschaftliche Fakultät / Institut für Linguistik / Allgemeine Sprachwissenschaft
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.