TY  - THES
A1  - Jiang, Lan
T1  - Discovering metadata in data files
N2  - It is estimated that data scientists spend up to 80% of the time exploring, cleaning, and transforming their data. A major reason for that expenditure is the lack of knowledge about the used data, which are often from different sources and have heterogeneous structures. As a means to describe various properties of data, metadata can help data scientists understand and prepare their data, saving time for innovative and valuable data analytics. However, metadata do not always exist: some data file formats are not capable of storing them; metadata were deleted for privacy concerns; legacy data may have been produced by systems that were not designed to store and handle meta- data. As data are being produced at an unprecedentedly fast pace and stored in diverse formats, manually creating metadata is not only impractical but also error-prone, demanding automatic approaches for metadata detection.

In this thesis, we are focused on detecting metadata in CSV files – a type of plain-text file that, similar to spreadsheets, may contain different types of content at arbitrary positions. We propose a taxonomy of metadata in CSV files and specifically address the discovery of three different metadata: line and cell type, aggregations, and primary keys and foreign keys.

Data are organized in an ad-hoc manner in CSV files, and do not follow a fixed structure, which is assumed by common data processing tools. Detecting the structure of such files is a prerequisite of extracting information from them, which can be addressed by detecting the semantic type, such as header, data, derived, or footnote, of each line or each cell. We propose the supervised- learning approach Strudel to detect the type of lines and cells. CSV files may also include aggregations. An aggregation represents the arithmetic relationship between a numeric cell and a set of other numeric cells. Our proposed AggreCol algorithm is capable of detecting aggregations of five arithmetic functions in CSV files. Note that stylistic features, such as font style and cell background color, do not exist in CSV files. Our proposed algorithms address the respective problems by using only content, contextual, and computational features.

Storing a relational table is also a common usage of CSV files. Primary keys and foreign keys are important metadata for relational databases, which are usually not present for database instances dumped as plain-text files. We propose the HoPF algorithm to holistically detect both constraints in relational databases. Our approach is capable of distinguishing true primary and foreign keys from a great amount of spurious unique column combinations and inclusion dependencies, which can be detected by state-of-the-art data profiling algorithms.
N2  - Schätzungen zufolge verbringen Datenwissenschaftler bis zu 80% ihrer Zeit mit der Erkundung, Bereinigung und Umwandlung ihrer Daten. Ein Hauptgrund für diesen Aufwand ist das fehlende Wissen über die verwendeten Daten, die oft aus unterschiedlichen Quellen stammen und heterogene Strukturen aufweisen.
Als Mittel zur Beschreibung verschiedener Dateneigenschaften können Metadaten Datenwissenschaftlern dabei helfen, ihre Daten zu verstehen und aufzubereiten, und so wertvolle Zeit die Datenanalysen selbst sparen.
Metadaten sind jedoch nicht immer vorhanden: Zum Beispiel sind einige Dateiformate nicht in der Lage, sie zu speichern; Metadaten können aus Datenschutzgründen gelöscht worden sein; oder ältere Daten wurden möglicherweise von Systemen erzeugt, die nicht für die Speicherung und Verarbeitung von Metadaten konzipiert waren. Da Daten in einem noch nie dagewesenen Tempo produziert und in verschiedenen Formaten gespeichert werden, ist die manuelle Erstellung von Metadaten nicht nur unpraktisch, sondern auch fehleranfällig, so dass automatische Ansätze zur Metadatenerkennung erforderlich sind.

In dieser Arbeit konzentrieren wir uns auf die Erkennung von Metadaten in CSV-Dateien - einer Art von Klartextdateien, die, ähnlich wie Tabellenkalkulationen, verschiedene Arten von Inhalten an beliebigen Positionen enthalten können. Wir schlagen eine Taxonomie der Metadaten in CSV-Dateien vor und befassen uns speziell mit der Erkennung von drei verschiedenen Metadaten: Zeile und Zellensemantischer Typ, Aggregationen sowie Primärschlüssel und Fremdschlüssel.

Die Daten sind in CSV-Dateien ad-hoc organisiert und folgen keiner festen Struktur, wie sie von gängigen Datenverarbeitungsprogrammen angenommen wird. Die Erkennung der Struktur solcher Dateien ist eine Voraussetzung für die Extraktion von Informationen aus ihnen, die durch die Erkennung des semantischen Typs jeder Zeile oder jeder Zelle, wie z. B. Kopfzeile, Daten, abgeleitete Daten oder Fußnote, angegangen werden kann. Wir schlagen den Ansatz des überwachten Lernens, genannt „Strudel“ vor, um den strukturellen Typ von Zeilen und Zellen zu klassifizieren. CSV-Dateien können auch Aggregationen enthalten. Eine Aggregation stellt die arithmetische Beziehung zwischen einer numerischen Zelle und einer Reihe anderer numerischer Zellen dar. Der von uns vorgeschlagene „Aggrecol“-Algorithmus ist in der Lage, Aggregationen von fünf arithmetischen Funktionen in CSV-Dateien zu erkennen. Da stilistische Merkmale wie Schriftart und Zellhintergrundfarbe in CSV-Dateien nicht vorhanden sind, die von uns vorgeschlagenen Algorithmen die entsprechenden Probleme, indem sie nur die Merkmale Inhalt, Kontext und Berechnungen verwenden.

Die Speicherung einer relationalen Tabelle ist ebenfalls eine häufige Verwendung von CSV-Dateien. Primär- und Fremdschlüssel sind wichtige Metadaten für relationale Datenbanken, die bei Datenbankinstanzen, die als reine Textdateien gespeichert werden, normalerweise nicht vorhanden sind. Wir schlagen den „HoPF“-Algorithmus vor, um beide Constraints in relationalen Datenbanken ganzheitlich zu erkennen. Unser Ansatz ist in der Lage, echte Primär- und Fremdschlüssel von einer großen Menge an falschen eindeutigen Spaltenkombinationen und Einschlussabhängigkeiten zu unterscheiden, die von modernen Data-Profiling-Algorithmen erkannt werden können.
KW  - data preparation
KW  - metadata detection
KW  - data wrangling
KW  - Datenaufbereitung
KW  - Datentransformation
KW  - Erkennung von Metadaten
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566204
ER  - 
TY  - JOUR
A1  - Rosin, Paul L.
A1  - Lai, Yu-Kun
A1  - Mould, David
A1  - Yi, Ran
A1  - Berger, Itamar
A1  - Doyle, Lars
A1  - Lee, Seungyong
A1  - Li, Chuan
A1  - Liu, Yong-Jin
A1  - Semmo, Amir
A1  - Shamir, Ariel
A1  - Son, Minjung
A1  - Winnemöller, Holger
T1  - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits
JF  - Computational visual media
N2  - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset.
KW  - non-photorealistic rendering (NPR)
KW  - image stylization
KW  - style transfer
KW  - portrait
KW  - evaluation
KW  - benchmark
Y1  - 2022
U6  - https://doi.org/10.1007/s41095-021-0255-3
SN  - 2096-0433
SN  - 2096-0662
VL  - 8
IS  - 3
SP  - 445
EP  - 465
PB  - Springer Nature
CY  - London
ER  - 
TY  - CHAP
A1  - Rojahn, Marcel
A1  - Gronau, Norbert
ED  - Bui, Tung X.
T1  - Openness indicators for the evaluation of digital platforms between the launch and maturity phase
T2  - Proceedings of the 57th Annual Hawaii International Conference on System Sciences
N2  - In recent years, the evaluation of digital platforms has become an important focus in the field of information systems science. The identification of influential indicators that drive changes in digital platforms, specifically those related to openness, is still an unresolved issue. This paper addresses the challenge of identifying measurable indicators and characterizing the transition from launch to maturity in digital platforms. It proposes a systematic analytical approach to identify relevant openness indicators for evaluation purposes. The main contributions of this study are the following (1) the development of a comprehensive procedure for analyzing indicators, (2) the categorization of indicators as evaluation metrics within a multidimensional grid-box model, (3) the selection and evaluation of relevant indicators, (4) the identification and assessment of digital platform architectures during the launch-to-maturity transition, and (5) the evaluation of the applicability of the conceptualization and design process for digital platform evaluation.
KW  - federated industrial platform ecosystems
KW  - technologies
KW  - business models
KW  - data-driven artifacts
KW  - design-science research
KW  - digital platform openness
KW  - evaluation
KW  - morphological analysis
Y1  - 2024
SN  - 978-0-99813-317-1
SP  - 4516
EP  - 4525
PB  - Department of IT Management Shidler College of Business University of Hawaii
CY  - Honolulu, HI
ER  - 
TY  - GEN
A1  - Fandiño, Jorge
T1  - Founded (auto)epistemic equilibrium logic satisfies epistemic splitting
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - In a recent line of research, two familiar concepts from logic programming semantics (unfounded sets and splitting) were extrapolated to the case of epistemic logic programs. The property of epistemic splitting provides a natural and modular way to understand programs without epistemic cycles but, surprisingly, was only fulfilled by Gelfond's original semantics (G91), among the many proposals in the literature. On the other hand, G91 may suffer from a kind of self-supported, unfounded derivations when epistemic cycles come into play. Recently, the absence of these derivations was also formalised as a property of epistemic semantics called foundedness. Moreover, a first semantics proved to satisfy foundedness was also proposed, the so-called Founded Autoepistemic Equilibrium Logic (FAEEL). In this paper, we prove that FAEEL also satisfies the epistemic splitting property something that, together with foundedness, was not fulfilled by any other approach up to date. To prove this result, we provide an alternative characterisation of FAEEL as a combination of G91 with a simpler logic we called Founded Epistemic Equilibrium Logic (FEEL), which is somehow an extrapolation of the stable model semantics to the modal logic S5.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1060 
KW  - answer set programming
KW  - epistemic specifications
KW  - epistemic logic programs
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-469685
SN  - 1866-8372
IS  - 1060
SP  - 671
EP  - 687
ER  - 
TY  - JOUR
A1  - Cabalar, Pedro
A1  - Fandiño, Jorge
A1  - Fariñas del Cerro, Luis
T1  - Splitting epistemic logic programs
JF  - Theory and practice of logic programming / publ. for the Association for Logic Programming
N2  - Epistemic logic programs constitute an extension of the stable model semantics to deal with new constructs called subjective literals. Informally speaking, a subjective literal allows checking whether some objective literal is true in all or some stable models. As it can be imagined, the associated semantics has proved to be non-trivial, since the truth of subjective literals may interfere with the set of stable models it is supposed to query. As a consequence, no clear agreement has been reached and different semantic proposals have been made in the literature. Unfortunately, comparison among these proposals has been limited to a study of their effect on individual examples, rather than identifying general properties to be checked. In this paper, we propose an extension of the well-known splitting property for logic programs to the epistemic case. We formally define when an arbitrary semantics satisfies the epistemic splitting property and examine some of the consequences that can be derived from that, including its relation to conformant planning and to epistemic constraints. Interestingly, we prove (through counterexamples) that most of the existing approaches fail to fulfill the epistemic splitting property, except the original semantics proposed by Gelfond 1991 and a recent proposal by the authors, called Founded Autoepistemic Equilibrium Logic.
KW  - knowledge representation and nonmonotonic reasoning
KW  - logic programming methodology and applications
KW  - theory
Y1  - 2021
U6  - https://doi.org/10.1017/S1471068420000058
SN  - 1471-0684
SN  - 1475-3081
VL  - 21
IS  - 3
SP  - 296
EP  - 316
PB  - Cambridge Univ. Press
CY  - Cambridge [u.a.]
ER  - 
TY  - GEN
A1  - Aguado, Felicidad
A1  - Cabalar, Pedro
A1  - Fandiño, Jorge
A1  - Pearce, David
A1  - Perez, Gilberto
A1  - Vidal, Concepcion
T1  - Revisiting explicit negation in answer set programming
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - A common feature in Answer Set Programming is the use of a second negation, stronger than default negation and sometimes called explicit, strong or classical negation. This explicit negation is normally used in front of atoms, rather than allowing its use as a regular operator. In this paper we consider the arbitrary combination of explicit negation with nested expressions, as those defined by Lifschitz, Tang and Turner. We extend the concept of reduct for this new syntax and then prove that it can be captured by an extension of Equilibrium Logic with this second negation. We study some properties of this variant and compare to the already known combination of Equilibrium Logic with Nelson's strong negation.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1104 
KW  - Answer Set Programming
KW  - non-monotonic reasoning
KW  - Equilibrium logic
KW  - explicit negation
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-469697
SN  - 1866-8372
IS  - 1104
SP  - 908
EP  - 924
ER  - 
TY  - GEN
A1  - Hägele, Claudia
A1  - Schlagenhauf, Florian
A1  - Rapp, Michael A.
A1  - Sterzer, Philipp
A1  - Beck, Anne
A1  - Bermpohl, Felix
A1  - Stoy, Meline
A1  - Ströhle, Andreas
A1  - Wittchen, Hans-Ulrich
A1  - Dolan, Raymond J.
A1  - Heinz, Andreas
T1  - Dimensional psychiatry
BT  - reward dysfunction and depressive mood across psychiatric disorders
T2  - Postprints der Universität Potsdam : Humanwissenschaftliche Reihe
N2  - A dimensional approach in psychiatry aims to identify core mechanisms of mental disorders across nosological boundaries.

We compared anticipation of reward between major psychiatric disorders, and investigated whether reward anticipation is impaired in several mental disorders and whether there is a common psychopathological correlate (negative mood) of such an impairment.

We used functional magnetic resonance imaging (fMRI) and a monetary incentive delay (MID) task to study the functional correlates of reward anticipation across major psychiatric disorders in 184 subjects, with the diagnoses of alcohol dependence (n = 26), schizophrenia (n = 44), major depressive disorder (MDD, n = 24), bipolar disorder (acute manic episode, n = 13), attention deficit/hyperactivity disorder (ADHD, n = 23), and healthy controls (n = 54). Subjects' individual Beck Depression Inventory-and State-Trait Anxiety Inventory-scores were correlated with clusters showing significant activation during reward anticipation.

During reward anticipation, we observed significant group differences in ventral striatal (VS) activation: patients with schizophrenia, alcohol dependence, and major depression showed significantly less ventral striatal activation compared to healthy controls. Depressive symptoms correlated with dysfunction in reward anticipation regardless of diagnostic entity. There was no significant correlation between anxiety symptoms and VS functional activation.

Our findings demonstrate a neurobiological dysfunction related to reward prediction that transcended disorder categories and was related to measures of depressed mood. The findings underline the potential of a dimensional approach in psychiatry and strengthen the hypothesis that neurobiological research in psychiatric disorders can be targeted at core mechanisms that are likely to be implicated in a range of clinical entities.
T3  - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 653 
KW  - dimensional
KW  - fMRI
KW  - reward system
KW  - ventral striatum
KW  - monetary incentive delay task
KW  - depressive symptoms
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431064
SN  - 1866-8364
IS  - 653
SP  - 331
EP  - 341
ER  - 
TY  - BOOK
A1  - Kuban, Robert
A1  - Rotta, Randolf
A1  - Nolte, Jörg
A1  - Chromik, Jonas
A1  - Beilharz, Jossekin Jakob
A1  - Pirl, Lukas
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Weyand, Christopher
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Sauer, Joao
A1  - Coelh, Leandro dos Santos
A1  - Najafi, Pejman
A1  - Pünter, Wenzel
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Vogel, Thomas
A1  - Tran, Chinh
A1  - Moser, Irene
A1  - Grunske, Lars
A1  - Elsaid, Mohamed Esameldin Mohamed
A1  - Abbas, Hazem M.
A1  - Rula, Anisa
A1  - Sejdiu, Gezim
A1  - Maurino, Andrea
A1  - Schmidt, Christopher
A1  - Hügle, Johannes
A1  - Uflacker, Matthias
A1  - Nozza, Debora
A1  - Messina, Enza
A1  - Hoorn, André van
A1  - Frank, Markus
A1  - Schulz, Henning
A1  - Alhosseini Almodarresi Yasin, Seyed Ali
A1  - Nowicki, Marek
A1  - Muite, Benson K.
A1  - Boysan, Mehmet Can
A1  - Bianchi, Federico
A1  - Cremaschi, Marco
A1  - Moussa, Rim
A1  - Abdel-Karim, Benjamin M.
A1  - Pfeuffer, Nicolas
A1  - Hinz, Oliver
A1  - Plauth, Max
A1  - Polze, Andreas
A1  - Huo, Da
A1  - Melo, Gerard de
A1  - Mendes Soares, Fábio
A1  - Oliveira, Roberto Célio Limão de
A1  - Benson, Lawrence
A1  - Paul, Fabian
A1  - Werling, Christian
A1  - Windheuser, Fabian
A1  - Stojanovic, Dragan
A1  - Djordjevic, Igor
A1  - Stojanovic, Natalija
A1  - Stojnev Ilic, Aleksandra
A1  - Weidmann, Vera
A1  - Lowitzki, Leon
A1  - Wagner, Markus
A1  - Ifa, Abdessatar Ben
A1  - Arlos, Patrik
A1  - Megia, Ana
A1  - Vendrell, Joan
A1  - Pfitzner, Bjarne
A1  - Redondo, Alberto
A1  - Ríos Insua, David
A1  - Albert, Justin Amadeus
A1  - Zhou, Lin
A1  - Arnrich, Bert
A1  - Szabó, Ildikó
A1  - Fodor, Szabina
A1  - Ternai, Katalin
A1  - Bhowmik, Rajarshi
A1  - Campero Durand, Gabriel
A1  - Shevchenko, Pavlo
A1  - Malysheva, Milena
A1  - Prymak, Ivan
A1  - Saake, Gunter
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2019
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2019 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 09. April und 12. November 2019 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 158 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597915
SN  - 978-3-86956-564-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 158
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Richly, Keven
T1  - Memory-efficient data management for spatio-temporal applications
BT  - workload-driven fine-grained configuration optimization for storing spatio-temporal data in columnar In-memory databases
N2  - The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. 

To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. 

In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows:
(1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. 
(2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance.   
(3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays.
(4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. 

Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently.
N2  - Durch die starke Verbreitung von Systemen zur Positionsbestimmung werden fortlaufend große Mengen an Bewegungsdaten mit einem räumlichen und zeitlichen Bezug gesammelt. Ortungssysteme wie GPS ermöglichen, die Bewegungen verschiedener Objekte (z. B. Personen oder Fahrzeuge) nachzuverfolgen. Diese werden in der Regel durch eine chronologisch geordnete Abfolge beobachteter Aufenthaltsorte repräsentiert. Die Analyse von Bewegungsmustern auf der Grundlage detaillierter Positionsinformationen schafft in unterschiedlichsten Branchen (z. B. Transportwesen, Verkehrssteuerung oder Medizin) die Möglichkeit Geschäftsentscheidungen und -prozesse zu verbessern. Aufgrund der großen Datenmengen, die bei diesen Anwendungen auftreten, stellt die kosteneffiziente Speicherung von Bewegungsdaten eine Herausforderung dar. Dies ist insbesondere der Fall, wenn Hauptspeicherdatenbanken zur Speicherung eingesetzt werden, um die Anforderungen bezüglich interaktiver Antwortzeiten zu erfüllen.

Um die verfügbaren Speicherkapazitäten effizient zu nutzen, unterstützen moderne Datenbanksysteme verschiedene Optimierungsmöglichkeiten, um den Speicherbedarf zu reduzieren (z. B. durch Datenkomprimierung) oder die Performance zu erhöhen (z. B. durch Indexstrukturen). Dabei ermöglicht eine horizontale Partitionierung der Daten, dass unabhängig voneinander verschiedene Optimierungen feingranular auf einzelnen Bereichen der Daten angewendet werden können. Die Auswahl von Konfigurationen, die sowohl die Kosten als auch Leistungsanforderungen berücksichtigen, ist jedoch aufgrund der großen Anzahl möglicher Kombinationen -- die aus voneinander abhängigen Einzelentscheidungen bestehen -- komplex.

In dieser Dissertation präsentieren wir mehrere Ansätze zur Verbesserung der Datenverwaltung, indem wir die Auswahl verschiedener Datenbankoptimierungen automatisch für die anwendungsspezifischen Zugriffsmuster und Dateneigenschaften anpassen. Diesbezüglich leistet die vorliegende Dissertation die folgenden Beiträge: (1) Wir stellen einen neuen Ansatz vor, um feingranulare Tabellenkonfigurationen für räumlich-zeitliche Workloads zu bestimmen. In diesem Zusammenhang optimiert unser Linear Programming (LP) Ansatz gemeinsam (i) die Datenkompression, (ii) die Sortierung, (iii) die Indizierung und (iv) die Datenplatzierung. Hierzu schlagen wir verschiedene Modelle mit unterschiedlichen Kostenabhängigkeiten vor, um optimierte Konfigurationen für einen gegebenen Workload, ein Speicherbudget und die vorliegenden Dateneigenschaften zu berechnen. Durch die Erweiterung des LP-basierten Ansatzes zur Berücksichtigung von Modifikationskosten und verschiedener potentieller Workloads ist es möglich, die Wartbarkeit und Robustheit der bestimmten Tabellenkonfiguration zu erhöhen.
(2) Um die Speicherung von Timestamps in spalten-orientierten Datenbanken zu optimieren, stellen wir einen heuristischen Ansatz für die kombinierte Auswahl eines Speicherlayouts und eines Kompressionsschemas vor. Zudem sind wir durch die Berücksichtigung von Strategien zur Aufteilung von Attributen in der Lage, anwendungsspezifische Optimierungen anzuwenden, die den Speicherbedarf reduzieren und die Performance verbessern.
(3) Wir stellen einen Ansatz vor, der in der Vergangenheit beobachtete Bewegungsmuster nutzt, um die Zuweisungsprozesse von Vermittlungsdiensten zur Personenbeförderung zu verbessern. Auf der Grundlage von Standortwahrscheinlichkeiten haben wir verschiedene Strategien für die Vergabe von Fahraufträgen an Fahrer entwickelt, die kritische Verspätungen reduzieren.
(4) Abschließend haben wir unsere Datenbankoptimierungen anhand eines realen Datensatzes eines Transportdienstleisters evaluiert. In diesem Zusammenhang zeigen wir, dass wir durch feingranulare workload-basierte Optimierungen den Speicherbedarf (um bis zu 71% bei vergleichbarer Performance) reduzieren oder die Performance (um bis zu 90% bei gleichem Speicherverbrauch) im Vergleich zu regelbasierten Heuristiken verbessern können.

Die einzelnen Beiträge stellen neuartige Ansätze für aktuelle Herausforderungen im Bereich des Data Mining und der Datenbankforschung dar. In Kombination ermöglichen sie eine kosteneffizientere Speicherung und Verarbeitung von Bewegungsdaten in Hauptspeicherdatenbanken.
KW  - spatio-temporal data management
KW  - trajectory data
KW  - columnar databases
KW  - in-memory data management
KW  - database tuning
KW  - spaltenorientierte Datenbanken
KW  - Datenbankoptimierung
KW  - Hauptspeicher Datenmanagement
KW  - Datenverwaltung für Daten mit räumlich-zeitlichem Bezug
KW  - Trajektoriendaten
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635473
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Döllner, Jürgen Roland Friedrich
A1  - Weske, Mathias
A1  - Polze, Andreas
A1  - Hirschfeld, Robert
A1  - Naumann, Felix
A1  - Giese, Holger
A1  - Baudisch, Patrick
A1  - Friedrich, Tobias
A1  - Böttinger, Erwin
A1  - Lippert, Christoph
A1  - Dörr, Christian
A1  - Lehmann, Anja
A1  - Renard, Bernhard
A1  - Rabl, Tilmann
A1  - Uebernickel, Falk
A1  - Arnrich, Bert
A1  - Hölzle, Katharina
T1  - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
N2  - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.

Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.

The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
N2  - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme.

"Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien.

Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 
KW  - Hasso Plattner Institute
KW  - research school
KW  - Ph.D. retreat
KW  - service-oriented systems engineering
KW  - Hasso-Plattner-Institut
KW  - Forschungskolleg
KW  - Klausurtagung
KW  - Service-oriented Systems Engineering
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132
SN  - 978-3-86956-513-2
SN  - 1613-5652
SN  - 2191-1665
IS  - 138
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  -