• Treffer 1 von 1
Zurück zur Trefferliste

My Approach = Your Apparatus?

  • Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. This model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and differences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up toComparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. This model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and differences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up to 31% higher document classification accuracy. More importantly, our approach is the first topic model that ensures disjunct general and specific word distributions, resulting in clear-cut topic representations.zeige mehrzeige weniger

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Julian RischGND, Ralf KrestelORCiDGND
DOI:https://doi.org/10.1145/3197026.3197038
ISBN:978-1-4503-5178-2
ISSN:2575-7865
ISSN:2575-8152
Titel des übergeordneten Werks (Englisch):Libraries
Untertitel (Englisch):Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
Verlag:Association for Computing Machinery
Verlagsort:New York
Publikationstyp:Sonstiges
Sprache:Englisch
Datum der Erstveröffentlichung:23.05.2018
Erscheinungsjahr:2018
Datum der Freischaltung:24.02.2022
Freies Schlagwort / Tag:Automatic domain term extraction; Entropy; Topic modeling
Seitenanzahl:10
Erste Seite:283
Letzte Seite:292
Organisationseinheiten:Digital Engineering Fakultät / Hasso-Plattner-Institut für Digital Engineering GmbH
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Peer Review:Referiert
Publikationsweg:Open Access / Green Open-Access
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.