• search hit 4 of 403
Back to Result List

Efficient distributed discovery of bidirectional order dependencies

  • Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical formBidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.show moreshow less

Export metadata

Additional Services

Search Google Scholar Statistics
Metadaten
Author details:Sebastian SchmidlORCiD, Thorsten PapenbrockORCiDGND
DOI:https://doi.org/10.1007/s00778-021-00683-4
ISSN:1066-8888
ISSN:0949-877X
Title of parent work (English):The VLDB journal
Publisher:Springer
Place of publishing:Berlin ; Heidelberg ; New York
Publication type:Article
Language:English
Date of first publication:2021/08/16
Publication year:2022
Release date:2022/08/31
Tag:Actor; Bidirectional order dependencies; Data profiling; Dependency discovery; Distributed computing; Parallelization; programming
Volume:31
Issue:1
Number of pages:26
First page:49
Last Page:74
Funding institution:Projekt DEAL
Organizational units:An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH
DDC classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Peer review:Referiert
Publishing method:Open Access / Hybrid Open-Access
License (German):License LogoCC-BY - Namensnennung 4.0 International
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.