TY  - JOUR
A1  - Schmidl, Sebastian
A1  - Papenbrock, Thorsten
T1  - Efficient distributed discovery of bidirectional order dependencies
JF  - The VLDB journal
N2  - Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
KW  - Bidirectional order dependencies
KW  - Distributed computing
KW  - Actor
KW  - programming
KW  - Parallelization
KW  - Data profiling
KW  - Dependency discovery
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00683-4
SN  - 1066-8888
SN  - 0949-877X
VL  - 31
IS  - 1
SP  - 49
EP  - 74
PB  - Springer
CY  - Berlin ; Heidelberg ; New York
ER  - 
TY  - JOUR
A1  - Wenig, Phillip
A1  - Schmidl, Sebastian
A1  - Papenbrock, Thorsten
T1  - TimeEval: a benchmarking toolkit for time series anomaly detection algorithms
JF  - Proceedings of the VLDB Endowment
N2  - Detecting anomalous subsequences in time series is an important task in time series analytics because it serves the identification of special events, such as production faults, delivery bottlenecks, system defects, or heart flicker. 
Consequently, many algorithms have been developed for the automatic detection of such anomalous patterns. The enormous number of approaches (i.e., more than 158 as of today), the lack of properly labeled test data, and the complexity of time series anomaly benchmarking have, though, led to a situation where choosing the best detection technique for a given anomaly detection task is a difficult challenge. 
In this demonstration, we present TIMEEVAL, an extensible, scalable and automatic benchmarking toolkit for time series anomaly detection algorithms. TIMEEVAL includes an extensive data generator and supports both interactive and batch evaluation scenarios. With our novel toolkit, we aim to ease the evaluation effort and help the community to provide more meaningful evaluations.
Y1  - 2022
U6  - https://doi.org/10.14778/3554821.3554873
SN  - 2150-8097
VL  - 15
IS  - 12
SP  - 3678
EP  - 3681
PB  - Association for Computing Machinery
CY  - New York, NY
ER  -