Exploring Change
- Data and metadata in datasets experience many different kinds of change. Values axe inserted, deleted or updated; rows appear and disappear; columns are added or repurposed, etc. In such a dynamic situation, users might have many questions related to changes in the dataset, for instance which parts of the data are trustworthy and which are not? Users will wonder: How many changes have there been in the recent minutes, days or years? What kind of changes were made at which points of time? How dirty is the data? Is data cleansing required? The fact that data changed can hint at different hidden processes or agendas: a frequently crowd-updated city name may be controversial; a person whose name has been recently changed may be the target of vandalism; and so on. We show various use cases that benefit from recognizing and exploring such change. We envision a system and methods to interactively explore such change, addressing the variability dimension of big data challenges. To this end, we propose a model to capture change and the processData and metadata in datasets experience many different kinds of change. Values axe inserted, deleted or updated; rows appear and disappear; columns are added or repurposed, etc. In such a dynamic situation, users might have many questions related to changes in the dataset, for instance which parts of the data are trustworthy and which are not? Users will wonder: How many changes have there been in the recent minutes, days or years? What kind of changes were made at which points of time? How dirty is the data? Is data cleansing required? The fact that data changed can hint at different hidden processes or agendas: a frequently crowd-updated city name may be controversial; a person whose name has been recently changed may be the target of vandalism; and so on. We show various use cases that benefit from recognizing and exploring such change. We envision a system and methods to interactively explore such change, addressing the variability dimension of big data challenges. To this end, we propose a model to capture change and the process of exploring dynamic data to identify salient changes. We provide exploration primitives along with motivational examples and measures for the volatility of data. We identify technical challenges that need to be addressed to make our vision a reality, and propose directions of future work for the data management community.…
Verfasserangaben: | Tobias Bleifuss, Leon BornemannORCiD, Theodore Johnson, Dmitri Kalashnikov, Felix NaumannORCiDGND, Divesh Srivastava |
---|---|
DOI: | https://doi.org/10.14778/3282495.3282496 |
ISSN: | 2150-8097 |
Titel des übergeordneten Werks (Englisch): | Proceedings of the VLDB Endowment |
Untertitel (Englisch): | a new dimension of data analytics |
Verlag: | Association for Computing Machinery |
Verlagsort: | New York |
Publikationstyp: | Wissenschaftlicher Artikel |
Sprache: | Englisch |
Jahr der Erstveröffentlichung: | 2018 |
Erscheinungsjahr: | 2018 |
Datum der Freischaltung: | 13.09.2021 |
Band: | 12 |
Ausgabe: | 2 |
Seitenanzahl: | 14 |
Erste Seite: | 85 |
Letzte Seite: | 98 |
Organisationseinheiten: | Digital Engineering Fakultät / Hasso-Plattner-Institut für Digital Engineering GmbH |
DDC-Klassifikation: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme |
Peer Review: | Referiert |
Publikationsweg: | Open Access / Green Open-Access |