TY - JOUR A1 - Birnick, Johann A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Naumann, Felix A1 - Papenbrock, Thorsten A1 - Schirneck, Friedrich Martin T1 - Hitting set enumeration with partial information for unique column combination discovery JF - Proceedings of the VLDB Endowment N2 - Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations.
In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint. Y1 - 2020 U6 - https://doi.org/10.14778/3407790.3407824 SN - 2150-8097 VL - 13 IS - 11 SP - 2270 EP - 2283 PB - Association for Computing Machinery CY - [New York, NY] ER - TY - JOUR A1 - Blaesius, Thomas A1 - Friedrich, Tobias A1 - Schirneck, Friedrich Martin T1 - The complexity of dependency detection and discovery in relational databases JF - Theoretical computer science N2 - Multi-column dependencies in relational databases come associated with two different computational tasks. The detection problem is to decide whether a dependency of a certain type and size holds in a given database, the discovery problem asks to enumerate all valid dependencies of that type. We settle the complexity of both of these problems for unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We show that the detection of UCCs and FDs is W[2]-complete when parameterized by the solution size. The discovery of inclusion-wise minimal UCCs is proven to be equivalent under parsimonious reductions to the transversal hypergraph problem of enumerating the minimal hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. We further identify the detection of INDs as one of the first natural W[3]-complete problems. The discovery of maximal INDs is shown to be equivalent to enumerating the maximal satisfying assignments of antimonotone, 3-normalized Boolean formulas. KW - data profiling KW - enumeration complexity KW - functional dependency KW - inclusion KW - dependency KW - parameterized complexity KW - parsimonious reduction KW - transversal hypergraph KW - Unique column combination KW - W[3]-completeness Y1 - 2021 U6 - https://doi.org/10.1016/j.tcs.2021.11.020 SN - 0304-3975 SN - 1879-2294 VL - 900 SP - 79 EP - 96 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Lischeid, Julius A1 - Meeks, Kitty A1 - Schirneck, Friedrich Martin T1 - Efficiently enumerating hitting sets of hypergraphs arising in data profiling JF - Journal of computer and system sciences : JCSS N2 - The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases. KW - Data profiling KW - Enumeration algorithm KW - Minimal hitting set KW - Transversal hypergraph KW - Unique column combination KW - W[3]-Completeness Y1 - 2022 U6 - https://doi.org/10.1016/j.jcss.2021.10.002 SN - 0022-0000 SN - 1090-2724 VL - 124 SP - 192 EP - 213 PB - Elsevier CY - San Diego ER - TY - RPRT A1 - Döllner, Jürgen Roland Friedrich A1 - Friedrich, Tobias A1 - Arnrich, Bert A1 - Hirschfeld, Robert A1 - Lippert, Christoph A1 - Meinel, Christoph T1 - Abschlussbericht KI-Labor ITSE T1 - Final report "AI Lab ITSE" BT - KI-Labor für Methodik, Technik und Ausbildung in der IT-Systemtechnik N2 - Der Abschlussbericht beschreibt Aufgaben und Ergebnisse des KI-Labors "ITSE". Gegenstand des KI-Labors bildeten Methodik, Technik und Ausbildung in der IT-Systemtechnik zur Analyse, Planung und Konstruktion KI-basierter, komplexer IT-Systeme. N2 - Final Report on the "AI Lab ITSE" dedicated to Methodology, Technology and Education of AI in IT-Systems Engineering. KW - Abschlussbericht KW - KI-Labor KW - final report KW - AI Lab Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-578604 ER - TY - BOOK A1 - Meinel, Christoph A1 - Döllner, Jürgen Roland Friedrich A1 - Weske, Mathias A1 - Polze, Andreas A1 - Hirschfeld, Robert A1 - Naumann, Felix A1 - Giese, Holger A1 - Baudisch, Patrick A1 - Friedrich, Tobias A1 - Böttinger, Erwin A1 - Lippert, Christoph A1 - Dörr, Christian A1 - Lehmann, Anja A1 - Renard, Bernhard A1 - Rabl, Tilmann A1 - Uebernickel, Falk A1 - Arnrich, Bert A1 - Hölzle, Katharina T1 - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat N2 - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment. N2 - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme. "Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien. Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 KW - Hasso Plattner Institute KW - research school KW - Ph.D. retreat KW - service-oriented systems engineering KW - Hasso-Plattner-Institut KW - Forschungskolleg KW - Klausurtagung KW - Service-oriented Systems Engineering Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132 SN - 978-3-86956-513-2 SN - 1613-5652 SN - 2191-1665 IS - 138 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - JOUR A1 - Shi, Feng A1 - Schirneck, Friedrich Martin A1 - Friedrich, Tobias A1 - Kötzing, Timo A1 - Neumann, Frank T1 - Correction to: Reoptimization time analysis of evolutionary algorithms on linear functions under dynamic uniform constraints JF - Algorithmica : an international journal in computer science Y1 - 2018 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-605295 SN - 0178-4617 SN - 1432-0541 VL - 82 IS - 10 SP - 3117 EP - 3123 PB - Springer CY - New York ER - TY - JOUR A1 - Friedrich, Tobias A1 - Krejca, Martin Stefan A1 - Rothenberger, Ralf A1 - Arndt, Tobias A1 - Hafner, Danijar A1 - Kellermeier, Thomas A1 - Krogmann, Simon A1 - Razmjou, Armin T1 - Routing for on-street parking search using probabilistic data JF - AI communications : AICOM ; the European journal on artificial intelligence N2 - A significant percentage of urban traffic is caused by the search for parking spots. One possible approach to improve this situation is to guide drivers along routes which are likely to have free parking spots. The task of finding such a route can be modeled as a probabilistic graph problem which is NP-complete. Thus, we propose heuristic approaches for solving this problem and evaluate them experimentally. For this, we use probabilities of finding a parking spot, which are based on publicly available empirical data from TomTom International B.V. Additionally, we propose a heuristic that relies exclusively on conventional road attributes. Our experiments show that this algorithm comes close to the baseline by a factor of 1.3 in our cost measure. Last, we complement our experiments with results from a field study, comparing the success rates of our algorithms against real human drivers. KW - Parking search KW - probabilistic routing KW - constrained optimization KW - field study Y1 - 2019 U6 - https://doi.org/10.3233/AIC-180574 SN - 0921-7126 SN - 1875-8452 VL - 32 IS - 2 SP - 113 EP - 124 PB - IOS Press CY - Amsterdam ER - TY - JOUR A1 - Friedrich, Tobias A1 - Kötzing, Timo A1 - Krejca, Martin Stefan A1 - Sutton, Andrew M. T1 - Robustness of Ant Colony Optimization to Noise JF - Evolutionary computation N2 - Recently, ant colony optimization (ACO) algorithms have proven to be efficient in uncertain environments, such as noisy or dynamically changing fitness functions. Most of these analyses have focused on combinatorial problems such as path finding. We rigorously analyze an ACO algorithm optimizing linear pseudo- Boolean functions under additive posterior noise. We study noise distributions whose tails decay exponentially fast, including the classical case of additive Gaussian noise. Without noise, the classical (mu + 1) EA outperforms any ACO algorithm, with smaller mu being better; however, in the case of large noise, the (mu + 1) EA fails, even for high values of mu (which are known to help against small noise). In this article, we show that ACO is able to deal with arbitrarily large noise in a graceful manner; that is, as long as the evaporation factor. is small enough, dependent on the variance s2 of the noise and the dimension n of the search space, optimization will be successful. We also briefly consider the case of prior noise and prove that ACO can also efficiently optimize linear functions under this noise model. KW - Ant colony optimization KW - Noisy Fitness KW - Theory KW - Run time analysis Y1 - 2016 U6 - https://doi.org/10.1162/EVCO_a_00178 SN - 1063-6560 SN - 1530-9304 VL - 24 SP - 237 EP - 254 PB - MIT Press CY - Cambridge ER - TY - JOUR A1 - Friedrich, Tobias A1 - Kötzing, Timo A1 - Krejca, Martin Stefan T1 - Unbiasedness of estimation-of-distribution algorithms JF - Theoretical computer science N2 - In the context of black-box optimization, black-box complexity is used for understanding the inherent difficulty of a given optimization problem. Central to our understanding of nature-inspired search heuristics in this context is the notion of unbiasedness. Specialized black-box complexities have been developed in order to better understand the limitations of these heuristics - especially of (population-based) evolutionary algorithms (EAs). In contrast to this, we focus on a model for algorithms explicitly maintaining a probability distribution over the search space: so-called estimation-of-distribution algorithms (EDAs). We consider the recently introduced n-Bernoulli-lambda-EDA framework, which subsumes, for example, the commonly known EDAs PBIL, UMDA, lambda-MMAS(IB), and cGA. We show that an n-Bernoulli-lambda-EDA is unbiased if and only if its probability distribution satisfies a certain invariance property under isometric automorphisms of [0, 1](n). By restricting how an n-Bernoulli-lambda-EDA can perform an update, in a way common to many examples, we derive conciser characterizations, which are easy to verify. We demonstrate this by showing that our examples above are all unbiased. (C) 2018 Elsevier B.V. All rights reserved. KW - Estimation-of-distribution algorithm KW - Unbiasedness KW - Theory Y1 - 2019 U6 - https://doi.org/10.1016/j.tcs.2018.11.001 SN - 0304-3975 SN - 1879-2294 VL - 785 SP - 46 EP - 59 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Riechelman, Dana F. C. A1 - Fohlmeister, Jens Bernd A1 - Kluge, Tobias A1 - Jochum, Klaus Peter A1 - Richter, Detlev K. A1 - Deininger, Michael A1 - Friedrich, Ronny A1 - Frank, Norbert A1 - Scholz, Denis T1 - Evaluating the potential of tree-ring methodology for cross-dating of three annually laminated stalagmites from Zoolithencave (SE Germany) JF - Quaternary geochronology : the international research and review journal on advances in quaternary dating techniques N2 - Three small stalagmites from Zoolithencave (southern Germany) show visible laminae, which consist of a clear and a brownish, pigmented layer pair. This potentially provides the opportunity to construct precise chronologies by counting annual laminae. The growth period of the three stalagmites was constrained by the C-14 bomb peak in the youngest part of all three stalagmites and C-14-dating of a piece of charcoal in the consolidated base part of stalagmite Zoo-rez-2. These data suggest an age of AD 1970 for the top laminae and a lower age limit of AD 1973-1682 or AD 1735-1778. Laminae were counted and their thickness determined on scanned thin sections of all stalagmites. On stalagmites Zoo-rez-1 and -2, three tracks were measured near the growth axes, each separated into three sections at prominent anchor laminae (I, II, III). Each section was replicated three times (a, b, c). For Zoo-rez-3, only one track was measured. The total number of laminae counted for Zoo-rez-1 ranges from 138 to 177, for Zoo-rez-2 from 119 to 145, and for Zoo-rez-3 from 159 to 166. The numbers agree well with the range constrained by the bomb peak and the age of the charcoal, which supports the annual origin of the laminae. The replicated measurements of the different tracks as well as the three different tracks on the stalagmites Zoo-rez-1 and-2 were cross-dated using the TSAP-Win (R) tree-ring software. This software is very useful for cross-dating because it enables to insert or delete missing or false laminae as well as identifying common pattern by shifting the series back and forth in time. However, visual inspection of the thin sections was necessary to confirm detection of missing or false laminae by TSAP-Win (R). For all three Zoo-rez speleothems, crossdating of the mean lamina thickness series was not possible due to a missing common pattern. The cross-dating procedure results in three refined chronologies for the three Zoo-rez stalagmites of ranging from AD 1821-1970 (Zoo-rez-1), AD 1835-1970 (Zoo-rez-2), and AD 1808-1970 (Zoo-rez-3). KW - Speleothems KW - Annual laminae KW - Lamina thickness KW - C-14 bomb peak KW - Tree-ring software KW - Cross-dating Y1 - 2019 U6 - https://doi.org/10.1016/j.quageo.2019.04.001 SN - 1871-1014 SN - 1878-0350 VL - 52 SP - 37 EP - 50 PB - Elsevier CY - Oxford ER -