TY - THES A1 - Jiang, Lan T1 - Discovering metadata in data files N2 - It is estimated that data scientists spend up to 80% of the time exploring, cleaning, and transforming their data. A major reason for that expenditure is the lack of knowledge about the used data, which are often from different sources and have heterogeneous structures. As a means to describe various properties of data, metadata can help data scientists understand and prepare their data, saving time for innovative and valuable data analytics. However, metadata do not always exist: some data file formats are not capable of storing them; metadata were deleted for privacy concerns; legacy data may have been produced by systems that were not designed to store and handle meta- data. As data are being produced at an unprecedentedly fast pace and stored in diverse formats, manually creating metadata is not only impractical but also error-prone, demanding automatic approaches for metadata detection. In this thesis, we are focused on detecting metadata in CSV files – a type of plain-text file that, similar to spreadsheets, may contain different types of content at arbitrary positions. We propose a taxonomy of metadata in CSV files and specifically address the discovery of three different metadata: line and cell type, aggregations, and primary keys and foreign keys. Data are organized in an ad-hoc manner in CSV files, and do not follow a fixed structure, which is assumed by common data processing tools. Detecting the structure of such files is a prerequisite of extracting information from them, which can be addressed by detecting the semantic type, such as header, data, derived, or footnote, of each line or each cell. We propose the supervised- learning approach Strudel to detect the type of lines and cells. CSV files may also include aggregations. An aggregation represents the arithmetic relationship between a numeric cell and a set of other numeric cells. Our proposed AggreCol algorithm is capable of detecting aggregations of five arithmetic functions in CSV files. Note that stylistic features, such as font style and cell background color, do not exist in CSV files. Our proposed algorithms address the respective problems by using only content, contextual, and computational features. Storing a relational table is also a common usage of CSV files. Primary keys and foreign keys are important metadata for relational databases, which are usually not present for database instances dumped as plain-text files. We propose the HoPF algorithm to holistically detect both constraints in relational databases. Our approach is capable of distinguishing true primary and foreign keys from a great amount of spurious unique column combinations and inclusion dependencies, which can be detected by state-of-the-art data profiling algorithms. N2 - Schätzungen zufolge verbringen Datenwissenschaftler bis zu 80% ihrer Zeit mit der Erkundung, Bereinigung und Umwandlung ihrer Daten. Ein Hauptgrund für diesen Aufwand ist das fehlende Wissen über die verwendeten Daten, die oft aus unterschiedlichen Quellen stammen und heterogene Strukturen aufweisen. Als Mittel zur Beschreibung verschiedener Dateneigenschaften können Metadaten Datenwissenschaftlern dabei helfen, ihre Daten zu verstehen und aufzubereiten, und so wertvolle Zeit die Datenanalysen selbst sparen. Metadaten sind jedoch nicht immer vorhanden: Zum Beispiel sind einige Dateiformate nicht in der Lage, sie zu speichern; Metadaten können aus Datenschutzgründen gelöscht worden sein; oder ältere Daten wurden möglicherweise von Systemen erzeugt, die nicht für die Speicherung und Verarbeitung von Metadaten konzipiert waren. Da Daten in einem noch nie dagewesenen Tempo produziert und in verschiedenen Formaten gespeichert werden, ist die manuelle Erstellung von Metadaten nicht nur unpraktisch, sondern auch fehleranfällig, so dass automatische Ansätze zur Metadatenerkennung erforderlich sind. In dieser Arbeit konzentrieren wir uns auf die Erkennung von Metadaten in CSV-Dateien - einer Art von Klartextdateien, die, ähnlich wie Tabellenkalkulationen, verschiedene Arten von Inhalten an beliebigen Positionen enthalten können. Wir schlagen eine Taxonomie der Metadaten in CSV-Dateien vor und befassen uns speziell mit der Erkennung von drei verschiedenen Metadaten: Zeile und Zellensemantischer Typ, Aggregationen sowie Primärschlüssel und Fremdschlüssel. Die Daten sind in CSV-Dateien ad-hoc organisiert und folgen keiner festen Struktur, wie sie von gängigen Datenverarbeitungsprogrammen angenommen wird. Die Erkennung der Struktur solcher Dateien ist eine Voraussetzung für die Extraktion von Informationen aus ihnen, die durch die Erkennung des semantischen Typs jeder Zeile oder jeder Zelle, wie z. B. Kopfzeile, Daten, abgeleitete Daten oder Fußnote, angegangen werden kann. Wir schlagen den Ansatz des überwachten Lernens, genannt „Strudel“ vor, um den strukturellen Typ von Zeilen und Zellen zu klassifizieren. CSV-Dateien können auch Aggregationen enthalten. Eine Aggregation stellt die arithmetische Beziehung zwischen einer numerischen Zelle und einer Reihe anderer numerischer Zellen dar. Der von uns vorgeschlagene „Aggrecol“-Algorithmus ist in der Lage, Aggregationen von fünf arithmetischen Funktionen in CSV-Dateien zu erkennen. Da stilistische Merkmale wie Schriftart und Zellhintergrundfarbe in CSV-Dateien nicht vorhanden sind, die von uns vorgeschlagenen Algorithmen die entsprechenden Probleme, indem sie nur die Merkmale Inhalt, Kontext und Berechnungen verwenden. Die Speicherung einer relationalen Tabelle ist ebenfalls eine häufige Verwendung von CSV-Dateien. Primär- und Fremdschlüssel sind wichtige Metadaten für relationale Datenbanken, die bei Datenbankinstanzen, die als reine Textdateien gespeichert werden, normalerweise nicht vorhanden sind. Wir schlagen den „HoPF“-Algorithmus vor, um beide Constraints in relationalen Datenbanken ganzheitlich zu erkennen. Unser Ansatz ist in der Lage, echte Primär- und Fremdschlüssel von einer großen Menge an falschen eindeutigen Spaltenkombinationen und Einschlussabhängigkeiten zu unterscheiden, die von modernen Data-Profiling-Algorithmen erkannt werden können. KW - data preparation KW - metadata detection KW - data wrangling KW - Datenaufbereitung KW - Datentransformation KW - Erkennung von Metadaten Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566204 ER - TY - JOUR A1 - Rosin, Paul L. A1 - Lai, Yu-Kun A1 - Mould, David A1 - Yi, Ran A1 - Berger, Itamar A1 - Doyle, Lars A1 - Lee, Seungyong A1 - Li, Chuan A1 - Liu, Yong-Jin A1 - Semmo, Amir A1 - Shamir, Ariel A1 - Son, Minjung A1 - Winnemöller, Holger T1 - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits JF - Computational visual media N2 - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset. KW - non-photorealistic rendering (NPR) KW - image stylization KW - style transfer KW - portrait KW - evaluation KW - benchmark Y1 - 2022 U6 - https://doi.org/10.1007/s41095-021-0255-3 SN - 2096-0433 SN - 2096-0662 VL - 8 IS - 3 SP - 445 EP - 465 PB - Springer Nature CY - London ER - TY - CHAP A1 - Rojahn, Marcel A1 - Gronau, Norbert ED - Bui, Tung X. T1 - Openness indicators for the evaluation of digital platforms between the launch and maturity phase T2 - Proceedings of the 57th Annual Hawaii International Conference on System Sciences N2 - In recent years, the evaluation of digital platforms has become an important focus in the field of information systems science. The identification of influential indicators that drive changes in digital platforms, specifically those related to openness, is still an unresolved issue. This paper addresses the challenge of identifying measurable indicators and characterizing the transition from launch to maturity in digital platforms. It proposes a systematic analytical approach to identify relevant openness indicators for evaluation purposes. The main contributions of this study are the following (1) the development of a comprehensive procedure for analyzing indicators, (2) the categorization of indicators as evaluation metrics within a multidimensional grid-box model, (3) the selection and evaluation of relevant indicators, (4) the identification and assessment of digital platform architectures during the launch-to-maturity transition, and (5) the evaluation of the applicability of the conceptualization and design process for digital platform evaluation. KW - federated industrial platform ecosystems KW - technologies KW - business models KW - data-driven artifacts KW - design-science research KW - digital platform openness KW - evaluation KW - morphological analysis Y1 - 2024 SN - 978-0-99813-317-1 SP - 4516 EP - 4525 PB - Department of IT Management Shidler College of Business University of Hawaii CY - Honolulu, HI ER - TY - GEN A1 - Fandiño, Jorge T1 - Founded (auto)epistemic equilibrium logic satisfies epistemic splitting T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - In a recent line of research, two familiar concepts from logic programming semantics (unfounded sets and splitting) were extrapolated to the case of epistemic logic programs. The property of epistemic splitting provides a natural and modular way to understand programs without epistemic cycles but, surprisingly, was only fulfilled by Gelfond's original semantics (G91), among the many proposals in the literature. On the other hand, G91 may suffer from a kind of self-supported, unfounded derivations when epistemic cycles come into play. Recently, the absence of these derivations was also formalised as a property of epistemic semantics called foundedness. Moreover, a first semantics proved to satisfy foundedness was also proposed, the so-called Founded Autoepistemic Equilibrium Logic (FAEEL). In this paper, we prove that FAEEL also satisfies the epistemic splitting property something that, together with foundedness, was not fulfilled by any other approach up to date. To prove this result, we provide an alternative characterisation of FAEEL as a combination of G91 with a simpler logic we called Founded Epistemic Equilibrium Logic (FEEL), which is somehow an extrapolation of the stable model semantics to the modal logic S5. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1060 KW - answer set programming KW - epistemic specifications KW - epistemic logic programs Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-469685 SN - 1866-8372 IS - 1060 SP - 671 EP - 687 ER - TY - JOUR A1 - Cabalar, Pedro A1 - Fandiño, Jorge A1 - Fariñas del Cerro, Luis T1 - Splitting epistemic logic programs JF - Theory and practice of logic programming / publ. for the Association for Logic Programming N2 - Epistemic logic programs constitute an extension of the stable model semantics to deal with new constructs called subjective literals. Informally speaking, a subjective literal allows checking whether some objective literal is true in all or some stable models. As it can be imagined, the associated semantics has proved to be non-trivial, since the truth of subjective literals may interfere with the set of stable models it is supposed to query. As a consequence, no clear agreement has been reached and different semantic proposals have been made in the literature. Unfortunately, comparison among these proposals has been limited to a study of their effect on individual examples, rather than identifying general properties to be checked. In this paper, we propose an extension of the well-known splitting property for logic programs to the epistemic case. We formally define when an arbitrary semantics satisfies the epistemic splitting property and examine some of the consequences that can be derived from that, including its relation to conformant planning and to epistemic constraints. Interestingly, we prove (through counterexamples) that most of the existing approaches fail to fulfill the epistemic splitting property, except the original semantics proposed by Gelfond 1991 and a recent proposal by the authors, called Founded Autoepistemic Equilibrium Logic. KW - knowledge representation and nonmonotonic reasoning KW - logic programming methodology and applications KW - theory Y1 - 2021 U6 - https://doi.org/10.1017/S1471068420000058 SN - 1471-0684 SN - 1475-3081 VL - 21 IS - 3 SP - 296 EP - 316 PB - Cambridge Univ. Press CY - Cambridge [u.a.] ER - TY - GEN A1 - Aguado, Felicidad A1 - Cabalar, Pedro A1 - Fandiño, Jorge A1 - Pearce, David A1 - Perez, Gilberto A1 - Vidal, Concepcion T1 - Revisiting explicit negation in answer set programming T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - A common feature in Answer Set Programming is the use of a second negation, stronger than default negation and sometimes called explicit, strong or classical negation. This explicit negation is normally used in front of atoms, rather than allowing its use as a regular operator. In this paper we consider the arbitrary combination of explicit negation with nested expressions, as those defined by Lifschitz, Tang and Turner. We extend the concept of reduct for this new syntax and then prove that it can be captured by an extension of Equilibrium Logic with this second negation. We study some properties of this variant and compare to the already known combination of Equilibrium Logic with Nelson's strong negation. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1104 KW - Answer Set Programming KW - non-monotonic reasoning KW - Equilibrium logic KW - explicit negation Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-469697 SN - 1866-8372 IS - 1104 SP - 908 EP - 924 ER - TY - GEN A1 - Hägele, Claudia A1 - Schlagenhauf, Florian A1 - Rapp, Michael A. A1 - Sterzer, Philipp A1 - Beck, Anne A1 - Bermpohl, Felix A1 - Stoy, Meline A1 - Ströhle, Andreas A1 - Wittchen, Hans-Ulrich A1 - Dolan, Raymond J. A1 - Heinz, Andreas T1 - Dimensional psychiatry BT - reward dysfunction and depressive mood across psychiatric disorders T2 - Postprints der Universität Potsdam : Humanwissenschaftliche Reihe N2 - A dimensional approach in psychiatry aims to identify core mechanisms of mental disorders across nosological boundaries. We compared anticipation of reward between major psychiatric disorders, and investigated whether reward anticipation is impaired in several mental disorders and whether there is a common psychopathological correlate (negative mood) of such an impairment. We used functional magnetic resonance imaging (fMRI) and a monetary incentive delay (MID) task to study the functional correlates of reward anticipation across major psychiatric disorders in 184 subjects, with the diagnoses of alcohol dependence (n = 26), schizophrenia (n = 44), major depressive disorder (MDD, n = 24), bipolar disorder (acute manic episode, n = 13), attention deficit/hyperactivity disorder (ADHD, n = 23), and healthy controls (n = 54). Subjects' individual Beck Depression Inventory-and State-Trait Anxiety Inventory-scores were correlated with clusters showing significant activation during reward anticipation. During reward anticipation, we observed significant group differences in ventral striatal (VS) activation: patients with schizophrenia, alcohol dependence, and major depression showed significantly less ventral striatal activation compared to healthy controls. Depressive symptoms correlated with dysfunction in reward anticipation regardless of diagnostic entity. There was no significant correlation between anxiety symptoms and VS functional activation. Our findings demonstrate a neurobiological dysfunction related to reward prediction that transcended disorder categories and was related to measures of depressed mood. The findings underline the potential of a dimensional approach in psychiatry and strengthen the hypothesis that neurobiological research in psychiatric disorders can be targeted at core mechanisms that are likely to be implicated in a range of clinical entities. T3 - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 653 KW - dimensional KW - fMRI KW - reward system KW - ventral striatum KW - monetary incentive delay task KW - depressive symptoms Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431064 SN - 1866-8364 IS - 653 SP - 331 EP - 341 ER - TY - BOOK A1 - Kuban, Robert A1 - Rotta, Randolf A1 - Nolte, Jörg A1 - Chromik, Jonas A1 - Beilharz, Jossekin Jakob A1 - Pirl, Lukas A1 - Friedrich, Tobias A1 - Lenzner, Pascal A1 - Weyand, Christopher A1 - Juiz, Carlos A1 - Bermejo, Belen A1 - Sauer, Joao A1 - Coelh, Leandro dos Santos A1 - Najafi, Pejman A1 - Pünter, Wenzel A1 - Cheng, Feng A1 - Meinel, Christoph A1 - Sidorova, Julia A1 - Lundberg, Lars A1 - Vogel, Thomas A1 - Tran, Chinh A1 - Moser, Irene A1 - Grunske, Lars A1 - Elsaid, Mohamed Esameldin Mohamed A1 - Abbas, Hazem M. A1 - Rula, Anisa A1 - Sejdiu, Gezim A1 - Maurino, Andrea A1 - Schmidt, Christopher A1 - Hügle, Johannes A1 - Uflacker, Matthias A1 - Nozza, Debora A1 - Messina, Enza A1 - Hoorn, André van A1 - Frank, Markus A1 - Schulz, Henning A1 - Alhosseini Almodarresi Yasin, Seyed Ali A1 - Nowicki, Marek A1 - Muite, Benson K. A1 - Boysan, Mehmet Can A1 - Bianchi, Federico A1 - Cremaschi, Marco A1 - Moussa, Rim A1 - Abdel-Karim, Benjamin M. A1 - Pfeuffer, Nicolas A1 - Hinz, Oliver A1 - Plauth, Max A1 - Polze, Andreas A1 - Huo, Da A1 - Melo, Gerard de A1 - Mendes Soares, Fábio A1 - Oliveira, Roberto Célio Limão de A1 - Benson, Lawrence A1 - Paul, Fabian A1 - Werling, Christian A1 - Windheuser, Fabian A1 - Stojanovic, Dragan A1 - Djordjevic, Igor A1 - Stojanovic, Natalija A1 - Stojnev Ilic, Aleksandra A1 - Weidmann, Vera A1 - Lowitzki, Leon A1 - Wagner, Markus A1 - Ifa, Abdessatar Ben A1 - Arlos, Patrik A1 - Megia, Ana A1 - Vendrell, Joan A1 - Pfitzner, Bjarne A1 - Redondo, Alberto A1 - Ríos Insua, David A1 - Albert, Justin Amadeus A1 - Zhou, Lin A1 - Arnrich, Bert A1 - Szabó, Ildikó A1 - Fodor, Szabina A1 - Ternai, Katalin A1 - Bhowmik, Rajarshi A1 - Campero Durand, Gabriel A1 - Shevchenko, Pavlo A1 - Malysheva, Milena A1 - Prymak, Ivan A1 - Saake, Gunter ED - Meinel, Christoph ED - Polze, Andreas ED - Beins, Karsten ED - Strotmann, Rolf ED - Seibold, Ulrich ED - Rödszus, Kurt ED - Müller, Jürgen T1 - HPI Future SOC Lab – Proceedings 2019 N2 - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events. N2 - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2019 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 09. April und 12. November 2019 im Rahmen des Future SOC Lab Tags vor. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 158 KW - Future SOC Lab KW - research projects KW - multicore architectures KW - in-memory technology KW - cloud computing KW - machine learning KW - artifical intelligence KW - Future SOC Lab KW - Forschungsprojekte KW - Multicore Architekturen KW - In-Memory Technologie KW - Cloud Computing KW - maschinelles Lernen KW - künstliche Intelligenz Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597915 SN - 978-3-86956-564-4 SN - 1613-5652 SN - 2191-1665 IS - 158 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - THES A1 - Richly, Keven T1 - Memory-efficient data management for spatio-temporal applications BT - workload-driven fine-grained configuration optimization for storing spatio-temporal data in columnar In-memory databases N2 - The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows: (1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. (2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance. (3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays. (4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently. N2 - Durch die starke Verbreitung von Systemen zur Positionsbestimmung werden fortlaufend große Mengen an Bewegungsdaten mit einem räumlichen und zeitlichen Bezug gesammelt. Ortungssysteme wie GPS ermöglichen, die Bewegungen verschiedener Objekte (z. B. Personen oder Fahrzeuge) nachzuverfolgen. Diese werden in der Regel durch eine chronologisch geordnete Abfolge beobachteter Aufenthaltsorte repräsentiert. Die Analyse von Bewegungsmustern auf der Grundlage detaillierter Positionsinformationen schafft in unterschiedlichsten Branchen (z. B. Transportwesen, Verkehrssteuerung oder Medizin) die Möglichkeit Geschäftsentscheidungen und -prozesse zu verbessern. Aufgrund der großen Datenmengen, die bei diesen Anwendungen auftreten, stellt die kosteneffiziente Speicherung von Bewegungsdaten eine Herausforderung dar. Dies ist insbesondere der Fall, wenn Hauptspeicherdatenbanken zur Speicherung eingesetzt werden, um die Anforderungen bezüglich interaktiver Antwortzeiten zu erfüllen. Um die verfügbaren Speicherkapazitäten effizient zu nutzen, unterstützen moderne Datenbanksysteme verschiedene Optimierungsmöglichkeiten, um den Speicherbedarf zu reduzieren (z. B. durch Datenkomprimierung) oder die Performance zu erhöhen (z. B. durch Indexstrukturen). Dabei ermöglicht eine horizontale Partitionierung der Daten, dass unabhängig voneinander verschiedene Optimierungen feingranular auf einzelnen Bereichen der Daten angewendet werden können. Die Auswahl von Konfigurationen, die sowohl die Kosten als auch Leistungsanforderungen berücksichtigen, ist jedoch aufgrund der großen Anzahl möglicher Kombinationen -- die aus voneinander abhängigen Einzelentscheidungen bestehen -- komplex. In dieser Dissertation präsentieren wir mehrere Ansätze zur Verbesserung der Datenverwaltung, indem wir die Auswahl verschiedener Datenbankoptimierungen automatisch für die anwendungsspezifischen Zugriffsmuster und Dateneigenschaften anpassen. Diesbezüglich leistet die vorliegende Dissertation die folgenden Beiträge: (1) Wir stellen einen neuen Ansatz vor, um feingranulare Tabellenkonfigurationen für räumlich-zeitliche Workloads zu bestimmen. In diesem Zusammenhang optimiert unser Linear Programming (LP) Ansatz gemeinsam (i) die Datenkompression, (ii) die Sortierung, (iii) die Indizierung und (iv) die Datenplatzierung. Hierzu schlagen wir verschiedene Modelle mit unterschiedlichen Kostenabhängigkeiten vor, um optimierte Konfigurationen für einen gegebenen Workload, ein Speicherbudget und die vorliegenden Dateneigenschaften zu berechnen. Durch die Erweiterung des LP-basierten Ansatzes zur Berücksichtigung von Modifikationskosten und verschiedener potentieller Workloads ist es möglich, die Wartbarkeit und Robustheit der bestimmten Tabellenkonfiguration zu erhöhen. (2) Um die Speicherung von Timestamps in spalten-orientierten Datenbanken zu optimieren, stellen wir einen heuristischen Ansatz für die kombinierte Auswahl eines Speicherlayouts und eines Kompressionsschemas vor. Zudem sind wir durch die Berücksichtigung von Strategien zur Aufteilung von Attributen in der Lage, anwendungsspezifische Optimierungen anzuwenden, die den Speicherbedarf reduzieren und die Performance verbessern. (3) Wir stellen einen Ansatz vor, der in der Vergangenheit beobachtete Bewegungsmuster nutzt, um die Zuweisungsprozesse von Vermittlungsdiensten zur Personenbeförderung zu verbessern. Auf der Grundlage von Standortwahrscheinlichkeiten haben wir verschiedene Strategien für die Vergabe von Fahraufträgen an Fahrer entwickelt, die kritische Verspätungen reduzieren. (4) Abschließend haben wir unsere Datenbankoptimierungen anhand eines realen Datensatzes eines Transportdienstleisters evaluiert. In diesem Zusammenhang zeigen wir, dass wir durch feingranulare workload-basierte Optimierungen den Speicherbedarf (um bis zu 71% bei vergleichbarer Performance) reduzieren oder die Performance (um bis zu 90% bei gleichem Speicherverbrauch) im Vergleich zu regelbasierten Heuristiken verbessern können. Die einzelnen Beiträge stellen neuartige Ansätze für aktuelle Herausforderungen im Bereich des Data Mining und der Datenbankforschung dar. In Kombination ermöglichen sie eine kosteneffizientere Speicherung und Verarbeitung von Bewegungsdaten in Hauptspeicherdatenbanken. KW - spatio-temporal data management KW - trajectory data KW - columnar databases KW - in-memory data management KW - database tuning KW - spaltenorientierte Datenbanken KW - Datenbankoptimierung KW - Hauptspeicher Datenmanagement KW - Datenverwaltung für Daten mit räumlich-zeitlichem Bezug KW - Trajektoriendaten Y1 - 2024 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635473 ER - TY - BOOK A1 - Meinel, Christoph A1 - Döllner, Jürgen Roland Friedrich A1 - Weske, Mathias A1 - Polze, Andreas A1 - Hirschfeld, Robert A1 - Naumann, Felix A1 - Giese, Holger A1 - Baudisch, Patrick A1 - Friedrich, Tobias A1 - Böttinger, Erwin A1 - Lippert, Christoph A1 - Dörr, Christian A1 - Lehmann, Anja A1 - Renard, Bernhard A1 - Rabl, Tilmann A1 - Uebernickel, Falk A1 - Arnrich, Bert A1 - Hölzle, Katharina T1 - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat N2 - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment. N2 - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme. "Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien. Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 KW - Hasso Plattner Institute KW - research school KW - Ph.D. retreat KW - service-oriented systems engineering KW - Hasso-Plattner-Institut KW - Forschungskolleg KW - Klausurtagung KW - Service-oriented Systems Engineering Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132 SN - 978-3-86956-513-2 SN - 1613-5652 SN - 2191-1665 IS - 138 PB - Universitätsverlag Potsdam CY - Potsdam ER -