TY  - JOUR
A1  - Bläsius, Thomas
A1  - Freiberger, Cedric
A1  - Friedrich, Tobias
A1  - Katzmann, Maximilian
A1  - Montenegro-Retana, Felix
A1  - Thieffry, Marianne
T1  - Efficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic Geometry
JF  - ACM Transactions on Algorithms
N2  - A standard approach to accelerating shortest path algorithms on networks is the bidirectional search, which explores the graph from the start and the destination, simultaneously. In practice this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry. <br /> To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is (O) over tilde (n(2-1/alpha) + n(1/(2 alpha)) + delta(max)) with high probability, where alpha is an element of (1/2, 1) controls the power-law exponent of the degree distribution, and dmax is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it.
KW  - Random graphs
KW  - hyperbolic geometry
KW  - scale-free networks
KW  - bidirectional shortest path
Y1  - 2022
U6  - https://doi.org/10.1145/3516483
SN  - 1549-6325
SN  - 1549-6333
VL  - 18
IS  - 2
SP  - 1
EP  - 32
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Bläsius, Thomas
A1  - Friedrich, Tobias
A1  - Lischeid, Julius
A1  - Meeks, Kitty
A1  - Schirneck, Friedrich Martin
T1  - Efficiently enumerating hitting sets of hypergraphs arising in data profiling
JF  - Journal of computer and system sciences : JCSS
N2  - The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
KW  - Data profiling
KW  - Enumeration algorithm
KW  - Minimal hitting set
KW  - Transversal hypergraph
KW  - Unique column combination
KW  - W[3]-Completeness
Y1  - 2022
U6  - https://doi.org/10.1016/j.jcss.2021.10.002
SN  - 0022-0000
SN  - 1090-2724
VL  - 124
SP  - 192
EP  - 213
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - JOUR
A1  - Bläsius, Thomas
A1  - Friedrich, Tobias
A1  - Katzmann, Maximilian
A1  - Meyer, Ulrich
A1  - Penschuck, Manuel
A1  - Weyand, Christopher
T1  - Efficiently generating geometric inhomogeneous and hyperbolic random graphs
JF  - Network Science
N2  - Hyperbolic random graphs (HRGs) and geometric inhomogeneous random graphs (GIRGs) are two similar generative network models that were designed to resemble complex real-world networks. 
In particular, they have a power-law degree distribution with controllable exponent beta and high clustering that can be controlled via the temperature T. 
We present the first implementation of an efficient GIRG generator running in expected linear time. 
Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. 

Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to T = 0 . 
We also support parallelization, although this is not the focus of this paper. 
Moreover, we note that our generators draw from the correct probability distribution, that is, they involve no approximation. 
Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. 
This makes it possible to specify the desired expected average degree as input. Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straightforward inclusion does not hold in practice. 
However, the difference is negligible for most use cases.
KW  - hyperbolic random graphs
KW  - geometric inhomogeneous random graph
Y1  - 2022
U6  - https://doi.org/10.1017/nws.2022.32
SN  - 2050-1242
SN  - 2050-1250
VL  - 10
IS  - 4
SP  - 361
EP  - 380
PB  - Cambridge Univ. Press
CY  - New York
ER  - 
TY  - JOUR
A1  - Birnick, Johann
A1  - Bläsius, Thomas
A1  - Friedrich, Tobias
A1  - Naumann, Felix
A1  - Papenbrock, Thorsten
A1  - Schirneck, Friedrich Martin
T1  - Hitting set enumeration with partial information for unique column combination discovery
JF  - Proceedings of the VLDB Endowment
N2  - Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations. <br /> In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Y1  - 2020
U6  - https://doi.org/10.14778/3407790.3407824
SN  - 2150-8097
VL  - 13
IS  - 11
SP  - 2270
EP  - 2283
PB  - Association for Computing Machinery
CY  - [New York, NY]
ER  - 
TY  - JOUR
A1  - Bläsius, Thomas
A1  - Friedrich, Tobias
A1  - Krejca, Martin S.
A1  - Molitor, Louise
T1  - The impact of geometry on monochrome regions in the flip Schelling process
JF  - Computational geometry
N2  - Schelling's classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We introduce an agent-based saturated open-city variant, the Flip Schelling Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to change their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge {u,v} is monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and, moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge {u,v} makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c>0 such that the expected fraction of monochrome edges after the FSP is at least 1/2+c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2+o(1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength.
KW  - Agent-based model
KW  - Schelling segregation
KW  - Spin system
Y1  - 2022
U6  - https://doi.org/10.1016/j.comgeo.2022.101902
SN  - 0925-7721
SN  - 1879-081X
VL  - 108
PB  - Elsevier
CY  - Amsterdam
ER  -