• Treffer 4 von 404
Zurück zur Trefferliste

Efficient distributed discovery of bidirectional order dependencies

  • Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical formBidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.zeige mehrzeige weniger

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Sebastian SchmidlORCiD, Thorsten PapenbrockORCiDGND
DOI:https://doi.org/10.1007/s00778-021-00683-4
ISSN:1066-8888
ISSN:0949-877X
Titel des übergeordneten Werks (Englisch):The VLDB journal
Verlag:Springer
Verlagsort:Berlin ; Heidelberg ; New York
Publikationstyp:Wissenschaftlicher Artikel
Sprache:Englisch
Datum der Erstveröffentlichung:16.08.2021
Erscheinungsjahr:2022
Datum der Freischaltung:31.08.2022
Freies Schlagwort / Tag:Actor; Bidirectional order dependencies; Data profiling; Dependency discovery; Distributed computing; Parallelization; programming
Band:31
Ausgabe:1
Seitenanzahl:26
Erste Seite:49
Letzte Seite:74
Fördernde Institution:Projekt DEAL
Organisationseinheiten:An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Peer Review:Referiert
Publikationsweg:Open Access / Hybrid Open-Access
Lizenz (Deutsch):License LogoCC-BY - Namensnennung 4.0 International
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.