TY - JOUR
A1 - Bläsius, Thomas
A1 - Freiberger, Cedric
A1 - Friedrich, Tobias
A1 - Katzmann, Maximilian
A1 - Montenegro-Retana, Felix
A1 - Thieffry, Marianne
T1 - Efficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic Geometry
JF - ACM Transactions on Algorithms
N2 - A standard approach to accelerating shortest path algorithms on networks is the bidirectional search, which explores the graph from the start and the destination, simultaneously. In practice this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry.
To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is (O) over tilde (n(2-1/alpha) + n(1/(2 alpha)) + delta(max)) with high probability, where alpha is an element of (1/2, 1) controls the power-law exponent of the degree distribution, and dmax is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it.
KW - Random graphs
KW - hyperbolic geometry
KW - scale-free networks
KW - bidirectional shortest path
Y1 - 2022
U6 - https://doi.org/10.1145/3516483
SN - 1549-6325
SN - 1549-6333
VL - 18
IS - 2
SP - 1
EP - 32
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Bläsius, Thomas
A1 - Friedrich, Tobias
A1 - Lischeid, Julius
A1 - Meeks, Kitty
A1 - Schirneck, Friedrich Martin
T1 - Efficiently enumerating hitting sets of hypergraphs arising in data profiling
JF - Journal of computer and system sciences : JCSS
N2 - The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
KW - Data profiling
KW - Enumeration algorithm
KW - Minimal hitting set
KW - Transversal hypergraph
KW - Unique column combination
KW - W[3]-Completeness
Y1 - 2022
U6 - https://doi.org/10.1016/j.jcss.2021.10.002
SN - 0022-0000
SN - 1090-2724
VL - 124
SP - 192
EP - 213
PB - Elsevier
CY - San Diego
ER -
TY - JOUR
A1 - Birnick, Johann
A1 - Bläsius, Thomas
A1 - Friedrich, Tobias
A1 - Naumann, Felix
A1 - Papenbrock, Thorsten
A1 - Schirneck, Friedrich Martin
T1 - Hitting set enumeration with partial information for unique column combination discovery
JF - Proceedings of the VLDB Endowment
N2 - Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations.
In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Y1 - 2020
U6 - https://doi.org/10.14778/3407790.3407824
SN - 2150-8097
VL - 13
IS - 11
SP - 2270
EP - 2283
PB - Association for Computing Machinery
CY - [New York, NY]
ER -