Dokument-ID Dokumenttyp Verfasser/Autoren Herausgeber Haupttitel Abstract Auflage Verlagsort Verlag Erscheinungsjahr Seitenzahl Schriftenreihe Titel Schriftenreihe Bandzahl ISBN Quelle der Hochschulschrift Konferenzname Quelle:Titel Quelle:Jahrgang Quelle:Heftnummer Quelle:Erste Seite Quelle:Letzte Seite URN DOI Abteilungen OPUS4-6737 Buch (Monographie) Meinel, Christoph; Schnjakin, Maxim; Metzke, Tobias; Freitag, Markus Anbieter von Cloud Speicherdiensten im Überblick Durch die immer stärker werdende Flut an digitalen Informationen basieren immer mehr Anwendungen auf der Nutzung von kostengünstigen Cloud Storage Diensten. Die Anzahl der Anbieter, die diese Dienste zur Verfügung stellen, hat sich in den letzten Jahren deutlich erhöht. Um den passenden Anbieter für eine Anwendung zu finden, müssen verschiedene Kriterien individuell berücksichtigt werden. In der vorliegenden Studie wird eine Auswahl an Anbietern etablierter Basic Storage Diensten vorgestellt und miteinander verglichen. Für die Gegenüberstellung werden Kriterien extrahiert, welche bei jedem der untersuchten Anbieter anwendbar sind und somit eine möglichst objektive Beurteilung erlauben. Hierzu gehören unter anderem Kosten, Recht, Sicherheit, Leistungsfähigkeit sowie bereitgestellte Schnittstellen. Die vorgestellten Kriterien können genutzt werden, um Cloud Storage Anbieter bezüglich eines konkreten Anwendungsfalles zu bewerten. Potsdam Universitätsverlag Potsdam 2014 84 978-3-86956-274-2 urn:nbn:de:kobv:517-opus-68780 Hasso-Plattner-Institut für Digital Engineering gGmbH OPUS4-7206 Dissertation Tinnefeld, Christian Building a columnar database on shared main memory-based storage In the field of disk-based parallel database management systems exists a great variety of solutions based on a shared-storage or a shared-nothing architecture. In contrast, main memory-based parallel database management systems are dominated solely by the shared-nothing approach as it preserves the in-memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the following three trends: a) Nowadays network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory inside a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic, and provide high-availability. c) A modern storage system such as Stanford's RAMCloud even keeps all data resident in main memory. Exploiting these characteristics in the context of a main-memory parallel database management system is desirable. The advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible. This thesis describes building a columnar database on shared main memory-based storage. The thesis discusses the resulting architecture (Part I), the implications on query processing (Part II), and presents an evaluation of the resulting solution in terms of performance, high-availability, and elasticity (Part III). In our architecture, we use Stanford's RAMCloud as shared-storage, and the self-designed and developed in-memory AnalyticsDB as relational query processor on top. AnalyticsDB encapsulates data access and operator execution via an interface which allows seamless switching between local and remote main memory, while RAMCloud provides not only storage capacity, but also processing power. Combining both aspects allows pushing-down the execution of database operators into the storage system. We describe how the columnar data processed by AnalyticsDB is mapped to RAMCloud's key-value data model and how the performance advantages of columnar data storage can be preserved. The combination of fast network technology and the possibility to execute database operators in the storage system opens the discussion for site selection. We construct a system model that allows the estimation of operator execution costs in terms of network transfer, data processed in memory, and wall time. This can be used for database operators that work on one relation at a time - such as a scan or materialize operation - to discuss the site selection problem (data pull vs. operator push). Since a database query translates to the execution of several database operators, it is possible that the optimal site selection varies per operator. For the execution of a database operator that works on two (or more) relations at a time, such as a join, the system model is enriched by additional factors such as the chosen algorithm (e.g. Grace- vs. Distributed Block Nested Loop Join vs. Cyclo-Join), the data partitioning of the respective relations, and their overlapping as well as the allowed resource allocation. We present an evaluation on a cluster with 60 nodes where all nodes are connected via RDMA-enabled network equipment. We show that query processing performance is about 2.4x slower if everything is done via the data pull operator execution strategy (i.e. RAMCloud is being used only for data access) and about 27% slower if operator execution is also supported inside RAMCloud (in comparison to operating only on main memory inside a server without any network communication at all). The fast-crash recovery feature of RAMCloud can be leveraged to provide high-availability, e.g. a server crash during query execution only delays the query response for about one second. Our solution is elastic in a way that it can adapt to changing workloads a) within seconds, b) without interruption of the ongoing query processing, and c) without manual intervention. 2014 175 Die Erstellung einer spaltenorientierten Datenbank auf einem verteilten, Hauptspeicher-basierenden Speichersystem urn:nbn:de:kobv:517-opus4-72063 Mathematisch-Naturwissenschaftliche Fakultät OPUS4-6982 Buch (Monographie) Meinel, Christoph; Polze, Andreas; Oswald, Gerhard; Strotmann, Rolf; Seibold, Ulrich; Schulzki, Bernard HPI Future SOC Lab The "HPI Future SOC Lab" is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard- and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2013. Selected projects have presented their results on April 10th and September 24th 2013 at the Future SOC Lab Day events. Potsdam Universitätsverlag Potsdam 2014 iii, 174 978-3-86956-282-7 88 urn:nbn:de:kobv:517-opus-68195 Hasso-Plattner-Institut für Digital Engineering gGmbH OPUS4-8627 Buch (Monographie) Meinel, Christoph; Polze, Andreas; Oswald, Gerhard; Strotmann, Rolf; Seibold, Ulrich; Schulzki, Bernhard HPI Future SOC Lab Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2014 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 9. April 2014 und 29. Oktober 2014 im Rahmen der Future SOC Lab Tag Veranstaltungen vor. 2014 vi, 250 urn:nbn:de:kobv:517-opus4-86271 Hasso-Plattner-Institut für Digital Engineering gGmbH