TY - GEN A1 - Schlosser, Rainer A1 - Kossmann, Jan A1 - Boissier, Martin T1 - Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies T2 - 2019 IEEE 35th International Conference on Data Engineering (ICDE) N2 - An efficient selection of indexes is indispensable for database performance. For large problem instances with hundreds of tables, existing approaches are not suitable: They either exhibit prohibitive runtimes or yield far from optimal index configurations by strongly limiting the set of index candidates or not handling index interaction explicitly. We introduce a novel recursive strategy that does not exclude index candidates in advance and effectively accounts for index interaction. Using large real-world workloads, we demonstrate the applicability of our approach. Further, we evaluate our solution end to end with a commercial database system using a reproducible setup. We show that our solutions are near-optimal for small index selection problems. For larger problems, our strategy outperforms state-of-the-art approaches in both scalability and solution quality. Y1 - 2019 SN - 978-1-5386-7474-1 U6 - https://doi.org/10.1109/ICDE.2019.00113 SN - 1084-4627 SP - 1238 EP - 1249 PB - IEEE CY - New York ER - TY - JOUR A1 - Kossmann, Jan A1 - Halfpap, Stefan A1 - Jankrift, Marcel A1 - Schlosser, Rainer T1 - Magic mirror in my hand, which is the best in the land? BT - an experimental evaluation of index selection algorithms JF - Proceedings of the VLDB Endowment N2 - Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach. Y1 - 2020 U6 - https://doi.org/10.14778/3407790.3407832 SN - 2150-8097 VL - 13 IS - 11 SP - 2382 EP - 2395 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Kossmann, Jan A1 - Schlosser, Rainer T1 - Self-driving database systems BT - a conceptual approach JF - Distributed and parallel databases N2 - Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples. KW - database systems KW - self-driving KW - recursive tuning KW - workload prediction KW - robustness Y1 - 2020 U6 - https://doi.org/10.1007/s10619-020-07288-w SN - 0926-8782 SN - 1573-7578 VL - 38 IS - 4 SP - 795 EP - 817 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Koßmann, Jan A1 - Papenbrock, Thorsten A1 - Naumann, Felix T1 - Data dependencies for query optimization BT - a survey JF - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment N2 - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions. KW - Query optimization KW - Query execution KW - Data dependencies KW - Data profiling KW - Unique column combinations KW - Functional dependencies KW - Order dependencies KW - Inclusion dependencies KW - Relational data KW - SQL Y1 - 2021 U6 - https://doi.org/10.1007/s00778-021-00676-3 SN - 1066-8888 SN - 0949-877X VL - 31 IS - 1 SP - 1 EP - 22 PB - Springer CY - Berlin ; Heidelberg ; New York ER - TY - THES A1 - Koßmann, Jan T1 - Unsupervised database optimization BT - efficient index selection & data dependency-driven query optimization N2 - The amount of data stored in databases and the complexity of database workloads are ever- increasing. Database management systems (DBMSs) offer many configuration options, such as index creation or unique constraints, which must be adapted to the specific instance to efficiently process large volumes of data. Currently, such database optimization is complicated, manual work performed by highly skilled database administrators (DBAs). In cloud scenarios, manual database optimization even becomes infeasible: it exceeds the abilities of the best DBAs due to the enormous number of deployed DBMS instances (some providers maintain millions of instances), missing domain knowledge resulting from data privacy requirements, and the complexity of the configuration tasks. Therefore, we investigate how to automate the configuration of DBMSs efficiently with the help of unsupervised database optimization. While there are numerous configuration options, in this thesis, we focus on automatic index selection and the use of data dependencies, such as functional dependencies, for query optimization. Both aspects have an extensive performance impact and complement each other by approaching unsupervised database optimization from different perspectives. Our contributions are as follows: (1) we survey automated state-of-the-art index selection algorithms regarding various criteria, e.g., their support for index interaction. We contribute an extensible platform for evaluating the performance of such algorithms with industry-standard datasets and workloads. The platform is well-received by the community and has led to follow-up research. With our platform, we derive the strengths and weaknesses of the investigated algorithms. We conclude that existing solutions often have scalability issues and cannot quickly determine (near-)optimal solutions for large problem instances. (2) To overcome these limitations, we present two new algorithms. Extend determines (near-)optimal solutions with an iterative heuristic. It identifies the best index configurations for the evaluated benchmarks. Its selection runtimes are up to 10 times lower compared with other near-optimal approaches. SWIRL is based on reinforcement learning and delivers solutions instantly. These solutions perform within 3 % of the optimal ones. Extend and SWIRL are available as open-source implementations. (3) Our index selection efforts are complemented by a mechanism that analyzes workloads to determine data dependencies for query optimization in an unsupervised fashion. We describe and classify 58 query optimization techniques based on functional, order, and inclusion dependencies as well as on unique column combinations. The unsupervised mechanism and three optimization techniques are implemented in our open-source research DBMS Hyrise. Our approach reduces the Join Order Benchmark’s runtime by 26 % and accelerates some TPC-DS queries by up to 58 times. Additionally, we have developed a cockpit for unsupervised database optimization that allows interactive experiments to build confidence in such automated techniques. In summary, our contributions improve the performance of DBMSs, support DBAs in their work, and enable them to contribute their time to other, less arduous tasks. N2 - Sowohl die Menge der in Datenbanken gespeicherten Daten als auch die Komplexität der Datenbank-Workloads steigen stetig an. Datenbankmanagementsysteme bieten viele Konfigurationsmöglichkeiten, zum Beispiel das Anlegen von Indizes oder die Definition von Unique Constraints. Diese Konfigurations-möglichkeiten müssen für die spezifische Datenbankinstanz angepasst werden, um effizient große Datenmengen verarbeiten zu können. Heutzutage wird die komplizierte Datenbankoptimierung manuell von hochqualifizierten Datenbankadministratoren vollzogen. In Cloud-Szenarien ist die manuelle Daten-bankoptimierung undenkbar: Die enorme Anzahl der verwalteten Systeme (einige Anbieter verwalten Millionen von Instanzen), das fehlende Domänenwissen durch Datenschutzanforderungen und die Kom-plexität der Konfigurationsaufgaben übersteigen die Fähigkeiten der besten Datenbankadministratoren. Aus diesen Gründen betrachten wir, wie die Konfiguration von Datenbanksystemen mit der Hilfe von Unsupervised Database Optimization effizient automatisiert werden kann. Während viele Konfigura-tionsmöglichkeiten existieren, konzentrieren wir uns auf die automatische Indexauswahl und die Nutzung von Datenabhängigkeiten, zum Beispiel Functional Dependencies, für die Anfrageoptimierung. Beide Aspekte haben großen Einfluss auf die Performanz und ergänzen sich gegenseitig, indem sie Unsupervised Database Optimization aus verschiedenen Perspektiven betrachten. Wir leisten folgende Beiträge: (1) Wir untersuchen dem Stand der Technik entsprechende automatisierte Indexauswahlalgorithmen hinsichtlich verschiedener Kriterien, zum Beispiel bezüglich ihrer Berücksichtigung von Indexinteraktionen. Wir stellen eine erweiterbare Plattform zur Leistungsevaluierung solcher Algorithmen mit Industriestandarddatensätzen und -Workloads zur Verfügung. Diese Plattform wird von der Forschungsgemeinschaft aktiv verwendet und hat bereits zu weiteren Forschungsarbeiten geführt. Mit unserer Plattform leiten wir die Stärken und Schwächen der untersuchten Algorithmen ab. Wir kommen zu dem Schluss, dass bestehende Lösung häufig Skalierungsschwierigkeiten haben und nicht in der Lage sind, schnell (nahezu) optimale Lösungen für große Problemfälle zu ermitteln. (2) Um diese Einschränkungen zu bewältigen, stellen wir zwei neue Algorithmen vor. Extend ermittelt (nahezu) optimale Lösungen mit einer iterativen Heuristik. Das Verfahren identifiziert die besten Indexkonfigurationen für die evaluierten Benchmarks und seine Laufzeit ist bis zu 10-mal geringer als die Laufzeit anderer nahezu optimaler Ansätze. SWIRL basiert auf Reinforcement Learning und ermittelt Lösungen ohne Wartezeit. Diese Lösungen weichen maximal 3 % von den optimalen Lösungen ab. Extend und SWIRL sind verfügbar als Open-Source-Implementierungen. (3) Ein Mechanismus, der mittels automatischer Workload-Analyse Datenabhängigkeiten für die Anfrageoptimierung bestimmt, ergänzt die vorigen Beiträge. Wir beschreiben und klassifizieren 58 Techniken, die auf Functional, Order und Inclusion Dependencies sowie Unique Column Combinations basieren. Der Analysemechanismus und drei Optimierungstechniken sind in unserem Open-Source-Forschungsdatenbanksystem Hyrise implementiert. Der Ansatz reduziert die Laufzeit des Join Order Benchmark um 26 % und erreicht eine bis zu 58-fache Beschleunigung einiger TPC-DS-Anfragen. Darüber hinaus haben wir ein Cockpit für Unsupervised Database Optimization entwickelt. Dieses Cockpit ermöglicht interaktive Experimente, um Vertrauen in automatisierte Techniken zur Datenbankoptimie-rung zu schaffen. Zusammenfassend lässt sich festhalten, dass unsere Beiträge die Performanz von Datenbanksystemen verbessern, Datenbankadministratoren in ihrer Arbeit unterstützen und ihnen ermöglichen, ihre Zeit anderen, weniger mühsamen, Aufgaben zu widmen. KW - Datenbank KW - Datenbanksysteme KW - database KW - DBMS KW - Hyrise KW - index selection KW - database systems KW - RL KW - reinforcement learning KW - query optimization KW - data dependencies KW - functional dependencies KW - order dependencies KW - unique column combinations KW - inclusion dependencies KW - funktionale Abhängigkeiten KW - Anfrageoptimierung KW - Query-Optimierung KW - extend KW - SWIRL KW - unsupervised KW - database optimization KW - self-driving KW - autonomous Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589490 ER -