TY  - JOUR
A1  - Şahin, Muhittin
A1  - Egloffstein, Marc
A1  - Bothe, Max
A1  - Rohloff, Tobias
A1  - Schenk, Nathanael
A1  - Schwerer, Florian
A1  - Ifenthaler, Dirk
T1  - Behavioral Patterns in Enterprise MOOCs at openSAP
JF  - EMOOCs 2021
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517350
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 281
EP  - 288
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Özdemir, Paker Doğu
A1  - Kurban, Caroline Fell
A1  - Pekkan, Zelha Tunç
T1  - MOOC-Based Online Instruction
BT  - A Case Study in Teacher Education
JF  - EMOOCs 2021
N2  - If taking a flipped learning approach, MOOC content can be used for online pre-class instruction. After which students can put the knowledge they gained from the MOOC into practice either synchronously or asynchronously. This study examined one such, asynchronous, course in teacher education. The course ran with 40 students over 13 weeks from February to May 2020. A case study approach was followed using mixed methods to assess the efficacy of the course. Quantitative data was gathered on achievement of learning outcomes, online engagement, and satisfaction. Qualitative data was gathered via student interviews from which a thematic analysis was undertaken. From a combined analysis of the data, three themes emerged as pertinent to course efficacy: quality and quantity of communication and collaboration; suitability of the MOOC; and significance for career development.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-516900
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 17
EP  - 33
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Zuo, Zhe
T1  - From unstructured to structured: Context-based named entity mining from text
T1  - Von unstrukturiert zu strukturiert: Kontextbasierte Gewinnung benannter Entitäten von Text
N2  - With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.

These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.

In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately. 

To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency. 

For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
N2  - Mit den jüngsten Fortschritten in den Gebieten der Informationsextraktion wird die automatisierte Extrahierung strukturierter Informationen aus einer unüberschaubaren Menge unstrukturierter Textdaten eine wichtige Aufgabe, deren manuelle Ausführung  unzumutbar ist. Benannte Entitäten, (z.B. Personen, Organisationen oder Orte), essentielle Bestandteile in Texten, sind normalerweise der Gegenstand strukturierter Informationen aus Textdokumenten. Daher erhält die Aufgabe der Gewinnung benannter Entitäten viel Aufmerksamkeit. Sie besteht aus drei groen Unteraufgaben, nämlich Erkennung benannter Entitäten, Verbindung benannter Entitäten und Extraktion von Beziehungen.

Diese drei Aufgaben zusammen sind der Grundprozess eines Systems zur Gewinnung benannter Entitäten, wobei jede ihre eigene Herausforderung hat und für weitere Anwendungen eingesetzt werden kann. Als ein fundamentaler Aspekt in der Verarbeitung natürlicher Sprache haben Studien zur Erkennung benannter Entitäten eine lange Geschichte, und viele bestehenden Ansätze erbringen verlässliche Ergebnisse. Die Aufgabe zielt darauf ab, Nennungen benannter Entitäten zu extrahieren und ihre Typen zu bestimmen. Verbindung benannter Entitäten hat in letzter Zeit durch die Entwicklung von Wissensdatenbanken, welche reiche Informationen über Entitäten enthalten, viel Aufmerksamkeit erhalten. Das Ziel ist es, Nennungen benannter Entitäten zu unterscheiden und diese mit dazugehörigen Einträgen in einer Wissensdatenbank zu verknüpfen. Der letzte Schritt der Gewinnung benannter Entitäten, die Extraktion von Beziehungen, ist eine stark anspruchsvolle Aufgabe, nämlich die Extraktion semantischer Beziehungen zwischen Entitäten, z.B. die Eigentümerschaft zwischen zwei Firmen.

In dieser Doktorarbeit arbeiten wir den aktuellen Stand der Wissenschaft in den Domäne der Gewinnung benannter Entitäten auf, unter anderem wertvolle Eigenschaften und Evaluationsmethoden. Darüberhinaus präsentieren wir zwei Ansätze von uns, die jeweils ihren Fokus auf die Verbindung benannter Entitäten sowie der Aufgaben der Extraktion von Beziehungen legen.

Um die Aufgabe der Verbindung benannter Entitäten zu lösen schlagen wir hier die Verbindungstechnik BEL vor, welche auf einer textuellen Bandbreite relevanter Begriffe agiert und Entscheidungen einer Kombination von einfacher Klassifizierer aggregiert. Jeder dieser Klassifizierer arbeitet auf einer zufällig ausgewählten Teilmenge der obigen Bandbreite. In umfangreichen Experimenten mit handannotierten sowie Vergleichsdatensätzen hat unser Ansatz andere Lösungen zur Verbindung benannter Entitäten, die auf dem Stand der aktuellen Technik beruhen, sowie in Bezug auf Qualität als auch Effizienz geschlagen.

Für die Aufgabe der Extraktion von Beziehungen fokussieren wir uns auf eine bestimmte Gruppe schwieriger Beziehungstypen, nämlich die Geschäftsbeziehungen zwischen Firmen. Diese Beziehungen können benutzt werden, um wertvolle Erkenntnisse in das Zusammenspiel von Firmen zu gelangen und komplexe Analysen ausführen, beispielsweise die Risikovorhersage oder Bewertung von Firmen. Unsere teilbeaufsichtigte Strategie kann Geschäftsbeziehungen zwischen Firmen anhand nur weniger nutzergegebener Startwerte von Firmenpaaren extrahieren. Dadurch bieten wir auch eine Lösung für das Problem der Richtungserkennung asymmetrischer Beziehungen, beispielsweise der Eigentumsbeziehung. Wir verbessern die Verlässlichkeit des Extraktionsprozesses, indem wir holistische Musteridentifikationsmethoden verwenden, welche die erstellten Extraktionsmuster klassifizieren. Unsere Experimente zeigen, dass wir neue Entitätenpaare akkurat und verlässlich in der Zielbeziehung mit bereits fünf bezeichneten Startpaaren extrahieren können.
KW  - named entity mining
KW  - information extraction
KW  - natural language processing
KW  - Gewinnung benannter Entitäten
KW  - Informationsextraktion
KW  - maschinelle Verarbeitung natürlicher Sprache
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-412576
ER  - 
TY  - BOOK
A1  - Zhang, Shuhao
A1  - Plauth, Max
A1  - Eberhardt, Felix
A1  - Polze, Andreas
A1  - Lehmann, Jens
A1  - Sejdiu, Gezim
A1  - Jabeen, Hajira
A1  - Servadei, Lorenzo
A1  - Möstl, Christian
A1  - Bär, Florian
A1  - Netzeband, André
A1  - Schmidt, Rainer
A1  - Knigge, Marlene
A1  - Hecht, Sonja
A1  - Prifti, Loina
A1  - Krcmar, Helmut
A1  - Sapegin, Andrey
A1  - Jaeger, David
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Friedrich, Tobias
A1  - Rothenberger, Ralf
A1  - Sutton, Andrew M.
A1  - Sidorova, Julia A.
A1  - Lundberg, Lars
A1  - Rosander, Oliver
A1  - Sköld, Lars
A1  - Di Varano, Igor
A1  - van der Walt, Estée
A1  - Eloff, Jan H. P.
A1  - Fabian, Benjamin
A1  - Baumann, Annika
A1  - Ermakova, Tatiana
A1  - Kelkel, Stefan
A1  - Choudhary, Yash
A1  - Cooray, Thilini
A1  - Rodríguez, Jorge
A1  - Medina-Pérez, Miguel Angel
A1  - Trejo, Luis A.
A1  - Barrera-Animas, Ari Yair
A1  - Monroy-Borja, Raúl
A1  - López-Cuevas, Armando
A1  - Ramírez-Márquez, José Emmanuel
A1  - Grohmann, Maria
A1  - Niederleithinger, Ernst
A1  - Podapati, Sasidhar
A1  - Schmidt, Christopher
A1  - Huegle, Johannes
A1  - de Oliveira, Roberto C. L.
A1  - Soares, Fábio Mendes
A1  - van Hoorn, André
A1  - Neumer, Tamas
A1  - Willnecker, Felix
A1  - Wilhelm, Mathias
A1  - Kuster, Bernhard
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2017
T1  - HPI Future SOC Lab – Proceedings 2017
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2017. Selected projects have presented their results on April 25th and November 15th 2017 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2017 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 25. April und 15. November 2017 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 130 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - In-Memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - Künstliche Intelligenz
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-433100
SN  - 978-3-86956-475-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 130
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Wolf, Johannes
T1  - Analysis and visualization of transport infrastructure based on large-scale geospatial mobile mapping data
T1  - Analyse und Visualisierung von Verkehrsinfrastruktur basierend auf großen Mobile-Mapping-Datensätzen
N2  - 3D point clouds are a universal and discrete digital representation of three-dimensional objects and environments. For geospatial applications, 3D point clouds have become a fundamental type of raw data acquired and generated using various methods and techniques. In particular, 3D point clouds serve as raw data for creating digital twins of the built environment.

This thesis concentrates on the research and development of concepts, methods, and techniques for preprocessing, semantically enriching, analyzing, and visualizing 3D point clouds for applications around transport infrastructure. It introduces a collection of preprocessing techniques that aim to harmonize raw 3D point cloud data, such as point density reduction and scan profile detection. Metrics such as, e.g., local density, verticality, and planarity are calculated for later use. One of the key contributions tackles the problem of analyzing and deriving semantic information in 3D point clouds. Three different approaches are investigated: a geometric analysis, a machine learning approach operating on synthetically generated 2D images, and a machine learning approach operating on 3D point clouds without intermediate representation.

In the first application case, 2D image classification is applied and evaluated for mobile mapping data focusing on road networks to derive road marking vector data. The second application case investigates how 3D point clouds can be merged with ground-penetrating radar data for a combined visualization and to automatically identify atypical areas in the data. For example, the approach detects pavement regions with developing potholes. The third application case explores the combination of a 3D environment based on 3D point clouds with panoramic imagery to improve visual representation and the detection of 3D objects such as traffic signs.

The presented methods were implemented and tested based on software frameworks for 3D point clouds and 3D visualization. In particular, modules for metric computation, classification procedures, and visualization techniques were integrated into a modular pipeline-based C++ research framework for geospatial data processing, extended by Python machine learning scripts. All visualization and analysis techniques scale to large real-world datasets such as road networks of entire cities or railroad networks.

The thesis shows that some use cases allow taking advantage of established image vision methods to analyze images rendered from mobile mapping data efficiently. The two presented semantic classification methods working directly on 3D point clouds are use case independent and show similar overall accuracy when compared to each other. While the geometry-based method requires less computation time, the machine learning-based method supports arbitrary semantic classes but requires training the network with ground truth data. Both methods can be used in combination to gradually build this ground truth with manual corrections via a respective annotation tool.

This thesis contributes results for IT system engineering of applications, systems, and services that require spatial digital twins of transport infrastructure such as road networks and railroad networks based on 3D point clouds as raw data. It demonstrates the feasibility of fully automated data flows that map captured 3D point clouds to semantically classified models. This provides a key component for seamlessly integrated spatial digital twins in IT solutions that require up-to-date, object-based, and semantically enriched information about the built environment.
N2  - 3D-Punktwolken sind eine universelle und diskrete digitale Darstellung von dreidimensionalen Objekten und Umgebungen. Für raumbezogene Anwendungen sind 3D-Punktwolken zu einer grundlegenden Form von Rohdaten geworden, die mit verschiedenen Methoden und Techniken erfasst und erzeugt werden. Insbesondere dienen 3D-Punktwolken als Rohdaten für die Erstellung digitaler Zwillinge der bebauten Umwelt.

Diese Arbeit konzentriert sich auf die Erforschung und Entwicklung von Konzepten, Methoden und Techniken zur Vorverarbeitung, semantischen Anreicherung, Analyse und Visualisierung von 3D-Punktwolken für Anwendungen im Bereich der Verkehrsinfrastruktur. Es wird eine Sammlung von Vorverarbeitungstechniken vorgestellt, die auf die Harmonisierung von 3D-Punktwolken-Rohdaten abzielen, so z.B. die Reduzierung der Punktdichte und die Erkennung von Scanprofilen. Metriken wie bspw. die lokale Dichte, Vertikalität und Planarität werden zur späteren Verwendung berechnet. Einer der Hauptbeiträge befasst sich mit dem Problem der Analyse und Ableitung semantischer Informationen in 3D-Punktwolken. Es werden drei verschiedene Ansätze untersucht: Eine geometrische Analyse sowie zwei maschinelle Lernansätze, die auf synthetisch erzeugten 2D-Bildern, bzw. auf 3D-Punktwolken ohne Zwischenrepräsentation arbeiten.

Im ersten Anwendungsfall wird die 2D-Bildklassifikation für Mobile-Mapping-Daten mit Fokus auf Straßennetze angewendet und evaluiert, um Vektordaten für Straßenmarkierungen abzuleiten. Im zweiten Anwendungsfall wird untersucht, wie 3D-Punktwolken mit Bodenradardaten für eine kombinierte Visualisierung und automatische Identifikation atypischer Bereiche in den Daten zusammengeführt werden können. Der Ansatz erkennt zum Beispiel Fahrbahnbereiche mit entstehenden Schlaglöchern. Der dritte Anwendungsfall untersucht die Kombination einer 3D-Umgebung auf Basis von 3D-Punktwolken mit Panoramabildern, um die visuelle Darstellung und die Erkennung von 3D-Objekten wie Verkehrszeichen zu verbessern.

Die vorgestellten Methoden wurden auf Basis von Software-Frameworks für 3D-Punktwolken und 3D-Visualisierung implementiert und getestet. Insbesondere wurden Module für Metrikberechnungen, Klassifikationsverfahren und Visualisierungstechniken in ein modulares, pipelinebasiertes C++-Forschungsframework für die Geodatenverarbeitung integriert, das durch Python-Skripte für maschinelles Lernen erweitert wurde. Alle Visualisierungs- und Analysetechniken skalieren auf große reale Datensätze wie Straßennetze ganzer Städte oder Eisenbahnnetze.

Die Arbeit zeigt, dass es in einigen Anwendungsfällen möglich ist, die Vorteile etablierter Bildverarbeitungsmethoden zu nutzen, um aus Mobile-Mapping-Daten gerenderte Bilder effizient zu analysieren. Die beiden vorgestellten semantischen Klassifikationsverfahren, die direkt auf 3D-Punktwolken arbeiten, sind anwendungsfallunabhängig und zeigen im Vergleich zueinander eine ähnliche Gesamtgenauigkeit. Während die geometriebasierte Methode weniger Rechenzeit benötigt, unterstützt die auf maschinellem Lernen basierende Methode beliebige semantische Klassen, erfordert aber das Trainieren des Netzwerks mit Ground-Truth-Daten. Beide Methoden können in Kombination verwendet werden, um diese Ground Truth mit manuellen Korrekturen über ein entsprechendes Annotationstool schrittweise aufzubauen.

Diese Arbeit liefert Ergebnisse für das IT-System-Engineering von Anwendungen, Systemen und Diensten, die räumliche digitale Zwillinge von Verkehrsinfrastruktur wie Straßen- und Schienennetzen auf der Basis von 3D-Punktwolken als Rohdaten benötigen. Sie demonstriert die Machbarkeit von vollautomatisierten Datenflüssen, die erfasste 3D-Punktwolken auf semantisch klassifizierte Modelle abbilden. Dies stellt eine Schlüsselkomponente für nahtlos integrierte räumliche digitale Zwillinge in IT-Lösungen dar, die aktuelle, objektbasierte und semantisch angereicherte Informationen über die bebaute Umwelt benötigen.
KW  - 3D point cloud
KW  - geospatial data
KW  - mobile mapping
KW  - semantic classification
KW  - 3D visualization
KW  - 3D-Punktwolke
KW  - räumliche Geodaten
KW  - Mobile Mapping
KW  - semantische Klassifizierung
KW  - 3D-Visualisierung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-536129
ER  - 
TY  - JOUR
A1  - Wittig, Alice
A1  - Miranda, Fabio Malcher
A1  - Hölzer, Martin
A1  - Altenburg, Tom
A1  - Bartoszewicz, Jakub Maciej
A1  - Beyvers, Sebastian
A1  - Dieckmann, Marius Alfred
A1  - Genske, Ulrich
A1  - Giese, Sven Hans-Joachim
A1  - Nowicka, Melania
A1  - Richard, Hugues
A1  - Schiebenhoefer, Henning
A1  - Schmachtenberg, Anna-Juliane
A1  - Sieben, Paul
A1  - Tang, Ming
A1  - Tembrockhaus, Julius
A1  - Renard, Bernhard Y.
A1  - Fuchs, Stephan
T1  - CovRadar
BT  - continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance
JF  - Bioinformatics
N2  - The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac411
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - 17
SP  - 4223
EP  - 4225
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Wiemker, Veronika
A1  - Bunova, Anna
A1  - Neufeld, Maria
A1  - Gornyi, Boris
A1  - Yurasova, Elena
A1  - Konigorski, Stefan
A1  - Kalinina, Anna
A1  - Kontsevaya, Anna
A1  - Ferreira-Borges, Carina
A1  - Probst, Charlotte
T1  - Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare
JF  - Digital health
N2  - Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond.
KW  - Alcohol use assessment
KW  - Alcohol Use Disorders Identification Test
KW  - screening tools
KW  - digital health
KW  - mobile applications
KW  - Russia
KW  - primary healthcare
KW  - usability
KW  - acceptability
Y1  - 2022
U6  - https://doi.org/10.1177/20552076211074491
SN  - 2055-2076
VL  - 8
PB  - Sage Publications
CY  - London
ER  - 
TY  - BOOK
A1  - Weber, Benedikt
T1  - Human pose estimation for decubitus prophylaxis
T1  - Verwendung von Posenabschätzung zur Dekubitusprophylaxe
N2  - Decubitus is one of the most relevant diseases in nursing and the most expensive to treat. It is caused by sustained pressure on tissue, so it particularly affects bed-bound patients. This work lays a foundation for pressure mattress-based decubitus prophylaxis by implementing a solution to the single-frame 2D Human Pose Estimation problem.
For this, methods of Deep Learning are employed. Two approaches are examined, a coarse-to-fine Convolutional Neural Network for direct regression of joint coordinates and a U-Net for the derivation of probability distribution heatmaps.

We conclude that training our models on a combined dataset of the publicly available Bodies at Rest and SLP data yields the best results. Furthermore, various preprocessing techniques are investigated, and a hyperparameter optimization is performed to discover an improved model architecture.
Another finding indicates that the heatmap-based approach outperforms direct regression.
This model achieves a mean per-joint position error of 9.11 cm for the Bodies at Rest data and 7.43 cm for the SLP data.
We find that it generalizes well on data from mattresses other than those seen during training but has difficulties detecting the arms correctly.

Additionally, we give a brief overview of the medical data annotation tool annoto we developed in the bachelor project and furthermore conclude that the Scrum framework and agile practices enhanced our development workflow.
N2  - Dekubitus ist eine der relevantesten Krankheiten in der Krankenpflege und die kostspieligste in der Behandlung. Sie wird durch anhaltenden Druck auf Gewebe verursacht, betrifft also insbesondere bettlägerige Patienten. Diese Arbeit legt eine Grundlage für druckmatratzenbasierte Dekubitusprophylaxe, indem eine Lösung für das Einzelbild-2D-Posenabschätzungsproblem implementiert wird.
Dafür werden Methoden des tiefen Lernens verwendet. Zwei Ansätze, basierend auf einem Gefalteten Neuronalen grob-zu-fein Netzwerk zur direkten Regression der Gelenkkoordinaten und auf einem U-Netzwerk zur Ableitung von Wahrscheinlichkeitsverteilungsbildern, werden untersucht.

Wir schlussfolgern, dass das Training unserer Modelle auf einem kombinierten Datensatz, bestehend aus den frei verfügbaren Bodies at Rest und SLP Daten, die besten Ergebnisse liefert. Weiterhin werden diverse Vorverarbeitungsverfahren untersucht und eine Hyperparameteroptimierung zum Finden einer verbesserten Modellarchitektur durchgeführt.
Der wahrscheinlichkeitsverteilungsbasierte Ansatz übertrifft die direkte Regression.
Dieses Modell erreicht einen durchschnittlichen Pro-Gelenk-Positionsfehler von 9,11 cm auf den Bodies at Rest und von 7,43 cm auf den SLP Daten. Wir sehen, dass es gut auf Daten anderer als der im Training verwendeten Matratzen funktioniert, aber Schwierigkeiten mit der korrekten Erkennung der Arme hat. 

Weiterhin geben wir eine kurze Übersicht des medizinischen Datenannotationstools annoto, welches wir im Zusammenhang mit dem Bachelorprojekt entwickelt haben, und schlussfolgern außerdem, dass Scrum und agile Praktiken unseren Entwicklungsprozess verbessert haben.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 153 
KW  - machine learning
KW  - deep learning
KW  - convolutional neural networks
KW  - pose estimation
KW  - decubitus
KW  - telemedicine
KW  - maschinelles Lernen
KW  - tiefes Lernen
KW  - gefaltete neuronale Netze
KW  - Posenabschätzung
KW  - Dekubitus
KW  - Telemedizin
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-567196
SN  - 978-3-86956-551-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 153
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Vogel, Thomas
T1  - Model-driven engineering of self-adaptive software
T1  - Modellgetriebene Entwicklung von Selbst-Adaptiver Software
N2  - The development of self-adaptive software requires the engineering of an adaptation engine that controls the underlying adaptable software by a feedback loop. State-of-the-art approaches prescribe the feedback loop in terms of numbers, how the activities (e.g., monitor, analyze, plan, and execute (MAPE)) and the knowledge are structured to a feedback loop, and the type of knowledge. Moreover, the feedback loop is usually hidden in the implementation or framework and therefore not visible in the architectural design. Additionally, an adaptation engine often employs runtime models that either represent the adaptable software or capture strategic knowledge such as reconfiguration strategies. State-of-the-art approaches do not systematically address the interplay of such runtime models, which would otherwise allow developers to freely design the entire feedback loop.

This thesis presents ExecUtable RuntimE MegAmodels (EUREMA), an integrated model-driven engineering (MDE) solution that rigorously uses models for engineering feedback loops. EUREMA provides a domain-specific modeling language to specify and an interpreter to execute feedback loops. The language allows developers to freely design a feedback loop concerning the activities and runtime models (knowledge) as well as the number of feedback loops. It further supports structuring the feedback loops in the adaptation engine that follows a layered architectural style. Thus, EUREMA makes the feedback loops explicit in the design and enables developers to reason about design decisions. 

To address the interplay of runtime models, we propose the concept of a runtime megamodel, which is a runtime model that contains other runtime models as well as activities (e.g., MAPE) working on the contained models. This concept is the underlying principle of EUREMA. The resulting EUREMA (mega)models are kept alive at runtime and they are directly executed by the EUREMA interpreter to run the feedback loops. Interpretation provides the flexibility to dynamically adapt a feedback loop. In this context, EUREMA supports engineering self-adaptive software in which feedback loops run independently or in a coordinated fashion within the same layer as well as on top of each other in different layers of the adaptation engine. Moreover, we consider preliminary means to evolve self-adaptive software by providing a maintenance interface to the adaptation engine.

This thesis discusses in detail EUREMA by applying it to different scenarios such as single, multiple, and stacked feedback loops for self-repairing and self-optimizing the mRUBiS application. Moreover, it investigates the design and expressiveness of EUREMA, reports on experiments with a running system (mRUBiS) and with alternative solutions, and assesses EUREMA with respect to quality attributes such as performance and scalability.

The conducted evaluation provides evidence that EUREMA as an integrated and open MDE approach for engineering self-adaptive software seamlessly integrates the development and runtime environments using the same formalism to specify and execute feedback loops, supports the dynamic adaptation of feedback loops in layered architectures, and achieves an efficient execution of feedback loops by leveraging incrementality.
N2  - Die Entwicklung von selbst-adaptiven Softwaresystemen erfordert die Konstruktion einer geschlossenen Feedback Loop, die das System zur Laufzeit beobachtet und falls nötig anpasst. Aktuelle Konstruktionsverfahren schreiben eine bestimmte Feedback Loop im Hinblick auf Anzahl und Struktur vor. Die Struktur umfasst die vorhandenen Aktivitäten der Feedback Loop (z. B. Beobachtung, Analyse, Planung und Ausführung einer Adaption) und die Art des hierzu verwendeten Systemwissens. Dieses System- und zusätzlich das strategische Wissen (z. B. Adaptionsregeln) werden in der Regel in Laufzeitmodellen erfasst und in die Feedback Loop integriert. Aktuelle Verfahren berücksichtigen jedoch nicht systematisch die Laufzeitmodelle und deren Zusammenspiel, so dass Entwickler die Feedback Loop nicht frei entwerfen und gestalten können. Folglich wird die Feedback Loop während des Entwurfs der Softwarearchitektur häufig nicht explizit berücksichtigt. 

Diese Dissertation stellt mit EUREMA ein neues Konstruktionsverfahren für Feedback Loops vor. Basierend auf Prinzipien der modellgetriebenen Entwicklung (MDE) setzt EUREMA auf die konsequente Nutzung von Modellen für die Konstruktion, Ausführung und Adaption von selbst-adaptiven Softwaresystemen. Hierzu wird eine domänenspezifische Modellierungssprache (DSL) vorgestellt, mit der Entwickler die Feedback Loop frei entwerfen und gestalten können, d. h. ohne Einschränkung bezüglich der Aktivitäten, Laufzeitmodelle und Anzahl der Feedback Loops. Zusätzlich bietet die DSL eine Architektursicht auf das System, die die Feedback Loops berücksichtigt. Daher stellt die DSL Konstrukte zur Verfügung, mit denen Entwickler während des Entwurfs der Architektur die Feedback Loops explizit definieren und berücksichtigen können.

Um das Zusammenspiel der Laufzeitmodelle zu erfassen, wird das Konzept eines sogenannten Laufzeitmegamodells vorgeschlagen, das alle Aktivitäten und Laufzeitmodelle einer Feedback Loop erfasst. Dieses Konzept dient als Grundlage der vorgestellten DSL. Die bei der Konstruktion und mit der DSL erzeugten (Mega-)Modelle werden zur Laufzeit bewahrt und von einem Interpreter ausgeführt, um das spezifizierte Adaptionsverhalten zu realisieren. Der Interpreteransatz bietet die notwendige Flexibilität, um das Adaptionsverhalten zur Laufzeit anzupassen. Dies ermöglicht über die Entwicklung von Systemen mit mehreren Feedback Loops auf einer Ebene hinaus das Schichten von Feedback Loops im Sinne einer adaptiven Regelung. Zusätzlich bietet EUREMA eine Schnittstelle für Wartungsprozesse an, um das Adaptionsverhalten im laufendem System anzupassen.

Die Dissertation diskutiert den EUREMA-Ansatz und wendet diesen auf verschiedene Problemstellungen an, u. a. auf einzelne, mehrere und koordinierte als auch geschichtete Feedback Loops. Als Anwendungsbeispiel dient die Selbstheilung und Selbstoptimierung des Online-Marktplatzes mRUBiS. Für die Evaluierung von EUREMA werden Experimente mit dem laufenden mRUBiS und mit alternativen Lösungen durchgeführt, das Design und die Ausdrucksmächtigkeit der DSL untersucht und Qualitätsmerkmale wie Performanz und Skalierbarkeit betrachtet. Die Ergebnisse der Evaluierung legen nahe, dass EUREMA als integrierter und offener Ansatz für die Entwicklung selbst-adaptiver Softwaresysteme folgende Beiträge zum Stand der Technik leistet: eine nahtlose Integration der Entwicklungs- und Laufzeitumgebung durch die konsequente Verwendung von Modellen, die dynamische Anpassung des Adaptionsverhaltens in einer Schichtenarchitektur und eine effiziente Ausführung von Feedback Loops durch inkrementelle Verarbeitungsschritte.
KW  - model-driven engineering
KW  - self-adaptive software
KW  - domain-specific modeling
KW  - runtime models
KW  - software evolution
KW  - modellgetriebene Entwicklung
KW  - Selbst-Adaptive Software
KW  - Domänenspezifische Modellierung
KW  - Laufzeitmodelle
KW  - Software-Evolution
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-409755
ER  - 
TY  - JOUR
A1  - Vitagliano, Gerardo
A1  - Hameed, Mazhar
A1  - Jiang, Lan
A1  - Reisener, Lucas
A1  - Wu, Eugene
A1  - Naumann, Felix
T1  - Pollock: a data loading benchmark
JF  - Proceedings of the VLDB Endowment
N2  - Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.
Y1  - 2023
U6  - https://doi.org/10.14778/3594512.3594518
SN  - 2150-8097
VL  - 16
IS  - 8
SP  - 1870
EP  - 1882
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - THES
A1  - Vitagliano, Gerardo
T1  - Modeling the structure of tabular files for data preparation
T1  - Modellierung der Struktur von Tabellarische Dateien für die Datenaufbereitung
N2  - To manage tabular data files and leverage their content in a given downstream task, practitioners often design and execute complex transformation pipelines to prepare them. The complexity of such pipelines stems from different factors, including the nature of the preparation tasks, often exploratory or ad-hoc to specific datasets; the large repertory of tools, algorithms, and frameworks that practitioners need to master; and the volume, variety, and velocity of the files to be prepared. Metadata plays a fundamental role in reducing this complexity: characterizing a file assists end users in the design of data preprocessing pipelines, and furthermore paves the way for suggestion, automation, and optimization of data preparation tasks.
Previous research in the areas of data profiling, data integration, and data cleaning, has focused on extracting and characterizing metadata regarding the content of tabular data files, i.e., about the records and attributes of tables. Content metadata are useful for the latter stages of a preprocessing pipeline, e.g., error correction, duplicate detection, or value normalization, but they require a properly formed tabular input. Therefore, these metadata are not relevant for the early stages of a preparation pipeline, i.e., to correctly parse tables out of files. In this dissertation, we turn our focus to what we call the structure of a tabular data file, i.e., the set of characters within a file that do not represent data values but are required to parse and understand the content of the file. We provide three different approaches to represent file structure, an explicit representation based on context-free grammars; an implicit representation based on file-wise similarity; and a learned representation based on machine learning.
In our first contribution, we use the grammar-based representation to characterize a set of over 3000 real-world csv files and identify multiple structural issues that let files deviate from the csv standard, e.g., by having inconsistent delimiters or containing multiple tables. We leverage our learnings about real-world files and propose Pollock, a benchmark to test how well systems parse csv files that have a non-standard structure, without any previous preparation. We report on our experiments on using Pollock to evaluate the performance of 16 real-world data management systems.
Following, we characterize the structure of files implicitly, by defining a measure of structural similarity for file pairs. We design a novel algorithm to compute this measure, which is based on a graph representation of the files' content. We leverage this algorithm and propose Mondrian, a graphical system to assist users in identifying layout templates in a dataset, classes of files that have the same structure, and therefore can be prepared by applying the same preparation pipeline.
Finally, we introduce MaGRiTTE, a novel architecture that uses self-supervised learning to automatically learn structural representations of files in the form of vectorial embeddings at three different levels: cell level, row level, and file level. We experiment with the application of structural embeddings for several tasks, namely dialect detection, row classification, and data preparation efforts estimation.
Our experimental results show that structural metadata, either identified explicitly on parsing grammars, derived implicitly as file-wise similarity, or learned with the help of machine learning architectures, is fundamental to automate several tasks, to scale up preparation to large quantities of files, and to provide repeatable preparation pipelines.
N2  - Anwender müssen häufig komplexe Pipelines zur Aufbereitung von tabellarischen Dateien entwerfen, um diese verwalten und ihre Inhalte für nachgelagerte Aufgaben nutzen zu können. Die Komplexität solcher Pipelines ergibt sich aus verschiedenen Faktoren, u.a. (i) aus der Art der Aufbereitungsaufgaben, die oft explorativ oder ad hoc für bestimmte Datensätze durchgeführt werden, (ii) aus dem großen Repertoire an Werkzeugen, Algorithmen und Frameworks, die von den Anwendern beherrscht werden müssen, sowie (iii) aus der Menge, der Größe und der Verschiedenartigkeit der aufzubereitenden Dateien. Metadaten spielen eine grundlegende Rolle bei der Verringerung dieser Komplexität: Die Charakterisierung einer Datei hilft den Nutzern bei der Gestaltung von Datenaufbereitungs-Pipelines und ebnet darüber hinaus den Weg für Vorschläge, Automatisierung und Optimierung von Datenaufbereitungsaufgaben. Bisherige Forschungsarbeiten in den Bereichen Data Profiling, Datenintegration und Datenbereinigung konzentrierten sich auf die Extraktion und Charakterisierung von Metadaten über die Inhalte der tabellarischen Dateien, d.h. über die Datensätze und Attribute von Tabellen. Inhalts-basierte Metadaten sind für die letzten Phasen einer Aufbereitungspipeline nützlich, z.B. für die Fehlerkorrektur, die Erkennung von Duplikaten oder die Normalisierung von Werten, aber sie erfordern eine korrekt geformte tabellarische Eingabe. Daher sind diese Metadaten für die frühen Phasen einer Aufbereitungspipeline, d.h. für das korrekte Parsen von Tabellen aus Dateien, nicht relevant. In dieser Dissertation konzentrieren wir uns die Struktur einer tabellarischen Datei nennen, d.h. die Menge der Zeichen in einer Datei, die keine Datenwerte darstellen, aber erforderlich sind, um den Inhalt der Datei zu analysieren und zu verstehen. Wir stellen drei verschiedene Ansätze zur Darstellung der Dateistruktur vor: eine explizite Darstellung auf der Grundlage kontextfreier Grammatiken, eine implizite Darstellung auf der Grundlage von Dateiähnlichkeiten und eine erlernte Darstellung auf der Grundlage von maschinellem Lernen. In unserem ersten Ansatz verwenden wir die grammatikbasierte Darstellung, um eine Menge von über 3000 realen CSV-Dateien zu charakterisieren und mehrere strukturelle Probleme zu identifizieren, die dazu führen, dass Dateien vom CSV-Standard abweichen, z.B. durch inkonsistente Begrenzungszeichen oder dem Enthalten mehrere Tabellen in einer einzelnen Datei. Wir nutzen unsere Erkenntnisse aus realen Dateien und schlagen Pollock vor, einen Benchmark, der testet, wie gut Systeme unaufbereitete CSV-Dateien parsen. Wir berichten über unsere Experimente zur Verwendung von Pollock, in denen wir die Leistung von 16 realen Datenverwaltungssystemen bewerten. Anschließend charakterisieren wir die Struktur von Dateien implizit, indem wir ein Maß für die strukturelle Ähnlichkeit von Dateipaaren definieren. Wir entwickeln einen neuartigen Algorithmus zur Berechnung dieses Maßes, der auf einer Graphen-basierten Darstellung des Dateiinhalts basiert. Wir nutzen diesen Algorithmus und schlagen Mondrian vor, ein grafisches System zur Unterstützung der Benutzer bei der Identifizierung von Layout Vorlagen in einem Datensatz, d.h. von Dateiklassen, die die gleiche Struktur aufweisen und daher mit der gleichen Pipeline aufbereitet werden können. Schließlich stellen wir MaGRiTTE vor, eine neuartige Architektur, die selbst- überwachtes Lernen verwendet, um automatisch strukturelle Darstellungen von Dateien in Form von vektoriellen Einbettungen auf drei verschiedenen Ebenen zu lernen: auf Zellebene, auf Zeilenebene und auf Dateiebene. Wir experimentieren mit der Anwendung von strukturellen Einbettungen für verschiedene Aufgaben, nämlich Dialekterkennung, Zeilenklassifizierung und der Schätzung des Aufwands für die Datenaufbereitung. Unsere experimentellen Ergebnisse zeigen, dass strukturelle Metadaten, die entweder explizit mit Hilfe von Parsing-Grammatiken identifiziert, implizit als Dateiähnlichkeit abgeleitet oder mit Machine-Learning Architekturen erlernt werden, von grundlegender Bedeutung für die Automatisierung verschiedener Aufgaben, die Skalierung der Aufbereitung auf große Mengen von Dateien und die Bereitstellung wiederholbarer Aufbereitungspipelines sind.
KW  - data preparation
KW  - file structure
KW  - Datenaufbereitung
KW  - tabellarische Dateien
KW  - Dateistruktur
KW  - tabular data
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-624351
ER  - 
TY  - BOOK
A1  - van der Walt, Estee
A1  - Odun-Ayo, Isaac
A1  - Bastian, Matthias
A1  - Eldin Elsaid, Mohamed Esam
T1  - Proceedings of the Fifth HPI Cloud Symposium "Operating the Cloud“ 2017
N2  - Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic Operating the Cloud. Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI’s Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. Operating the Cloud aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration.

In these proceedings, the results of the fifth HPI cloud symposium Operating the Cloud 2017 are published. We thank the authors for exciting presentations and insights into their current work and research. Moreover, we look forward to more interesting submissions for the upcoming symposium in 2018.
N2  - Jedes Jahr lädt das Hasso-Plattner-Institut (HPI) Gäste aus der Industrie und der Wissenschaft zu einem kooperativen und wissenschaftlichen Symposium zum Thema Cloud Computing ein. Unser Ziel ist es, ein Forum für den Austausch von Wissen und Erfahrungen zwischen der Industrie und der Wissenschaft zu bieten. Parallel zur Veranstaltung findet der HPI Future SOC Lab Tag statt, der eine zusätzliche attraktive Umgebung für wissenschaftliche und branchenbezogene Diskussionen bietet. Das Symposium zielt darauf ab, eine Plattform für produktive Interaktionen von innovativen Ideen, Visionen und aufkommenden Technologien im Bereich von Cloud Computing zu bitten. 

Anlässlich dieses Symposiums fordern wir die Einreichung von Forschungsarbeiten und Erfahrungsberichte. Dieser technische Bericht umfasst eine Zusammenstellung der im Rahmen des fünften HPI Cloud Symposiums "Operating the Cloud" 2017 angenommenen Forschungspapiere. Wir danken den Autoren für spannende Vorträge und Einblicke in ihre aktuelle Arbeit und Forschung. Darüber hinaus freuen wir uns auf weitere interessante Einreichungen für das kommende Symposium im Laufe des Jahres.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 122 
KW  - Sicherheit
KW  - verteilte Leistungsüberwachung
KW  - Identitätsmanagement
KW  - Leistungsmodelle von virtuellen Maschinen
KW  - Privatsphäre
KW  - security
KW  - distributed performance monitoring
KW  - identity management
KW  - performance models of virtual machines
KW  - privacy
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-411330
SN  - 978-3-86956-432-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 122
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Ulrich, Jens-Uwe
A1  - Lutfi, Ahmad
A1  - Rutzen, Kilian
A1  - Renard, Bernhard Y.
T1  - ReadBouncer
BT  - precise and scalable adaptive sampling for nanopore sequencing
JF  - Bioinformatics
N2  - Motivation: 
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. 

Results: 
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac223
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - SUPPL 1
SP  - 153
EP  - 160
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Trautmann, Justin
A1  - Zhou, Lin
A1  - Brahms, Clemens Markus
A1  - Tunca, Can
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - TRIPOD
BT  - A treadmill walking dataset with IMU, pressure-distribution  and photoelectric data for gait analysis
JF  - Data : open access ʻData in scienceʼ journal
N2  - Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data.
KW  - inertial measurement unit
KW  - gait analysis algorithm
KW  - OptoGait
KW  - Zebris
KW  - data pipeline
KW  - public dataset
Y1  - 2021
U6  - https://doi.org/10.3390/data6090095
SN  - 2306-5729
VL  - 6
IS  - 9
PB  - MDPI
CY  - Basel
ER  - 
TY  - THES
A1  - Torcato Mordido, Gonçalo Filipe
T1  - Diversification, compression, and evaluation methods for generative adversarial networks
N2  - Generative adversarial networks (GANs) have been broadly applied to a wide range of application domains since their proposal. In this thesis, we propose several methods that aim to tackle different existing problems in GANs. Particularly, even though GANs are generally able to generate high-quality samples, the diversity of the generated set is often sub-optimal. Moreover, the common increase of the number of models in the original GANs framework, as well as their architectural sizes, introduces additional costs. Additionally, even though challenging, the proper evaluation of a generated set is an important direction to ultimately improve the generation process in GANs. We start by introducing two diversification methods that extend the original GANs framework to multiple adversaries to stimulate sample diversity in a generated set. Then, we introduce a new post-training compression method based on Monte Carlo methods and importance sampling to quantize and prune the weights and activations of pre-trained neural networks without any additional training. The previous method may be used to reduce the memory and computational costs introduced by increasing the number of models in the original GANs framework. Moreover, we use a similar procedure to quantize and prune gradients during training, which also reduces the communication costs between different workers in a distributed training setting. We introduce several topology-based evaluation methods to assess data generation in different settings, namely image generation and language generation. Our methods retrieve both single-valued and double-valued metrics, which, given a real set, may be used to broadly assess a generated set or separately evaluate sample quality and sample diversity, respectively. Moreover, two of our metrics use locality-sensitive hashing to accurately assess the generated sets of highly compressed GANs. The analysis of the compression effects in GANs paves the way for their efficient employment in real-world applications. Given their general applicability, the methods proposed in this thesis may be extended beyond the context of GANs. Hence, they may be generally applied to enhance existing neural networks and, in particular, generative frameworks.
N2  - Generative adversarial networks (GANs) wurden seit ihrer Einführung in einer Vielzahl von Anwendungsbereichen eingesetzt. In dieser Dissertation schlagen wir einige Verfahren vor, die darauf abzielen, verschiedene bestehende Probleme von GANs zu lösen. Insbesondere, fokussieren wir uns auf das Problem das GANs zwar qualitative hochwertige Samples generieren können, aber die Diversität ist oft sub-optimal. Darüber hinaus, stellt die allgemein übliche Zunahme der Anzahl der Modelle unter dem ursprünglichen GAN-Framework, als auch deren Modellgröße weitere Aufwendungskosten dar. Abschließend, ist die richtige Evaluierung einer generierten Menge, wenn auch herausfordernd, eine wichtige Forschungsrichtung, um letztendlich den Generierungsprozess von GANs zu verbessern.

Wir beginnen mit der Einführung von zwei Diversifizierungsmethoden die das ursprüngliche GAN-Framework um mehrere Gegenspieler erweitern, um die Diversität zu erhöhen. Um den zusätzlichen Speicher- und Rechenaufwand zu reduzieren, führen wir dann eine neue Kompressionsmethode ein. Diese Methode basiert auf den Monte-Carlo-Methoden und Importance Sampling, für das Quantisieren und Pruning der Gewichte und Aktivierungen von schon trainierten neuronalen Netzwerken ohne zusätzliches Trainieren. Wir erweitern die erwähne Methode zusätzlich für das Quantisieren und Pruning von Gradienten während des Trainierens, was die Kommunikationskosten zwischen verschiedenen sogenannten „Workern“ in einer verteilten Trainingsumgebung reduziert. 

Bezüglich der Bewertung der generierten Samples, stellen wir mehrere typologie basierte Evaluationsmethoden vor, die sich auf Bild-und Text konzentrieren. Um verschiedene Anwendungsfälle zu erfassen, liefern unsere vorgestellten Methoden einwertige und doppelwertige Metriken. Diese können einerseits dazu genutzt werden, generierte Samples, oder die Qualität und Verteilung der Samples anhand einer Menge von echten Samples  zu bewerten. Außerdem, verwenden zwei unserer vorgestellten Metriken so genanntes locality-sensitive Hashing, um die generierten Samples von stark komprimierten GANs genau zu bewerten. Die Analyse von Kompressionseffekten in GANs ebnet den Weg für ihren effizienten Einsatz für reale Anwendungen. 

Aufgrund der allgemeinen Anwendungsmöglichkeit von GANs, können die in dieser Arbeit vorgestellten Methoden auch über Kontext von GANs hinaus erweitert werden. Daher könnten sie allgemein auf existierende neuronale Netzwerke angewandt werden und insbesondere auf generative Frameworks.
KW  - deep learning
KW  - generative adversarial networks
KW  - erzeugende gegnerische Netzwerke
KW  - tiefes Lernen
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-535460
ER  - 
TY  - JOUR
A1  - Topali, Paraskevi
A1  - Chounta, Irene-Angelica
A1  - Ortega-Arranz, Alejandro
A1  - Villagrá-Sobrino, Sara L.
A1  - Martínez-Monés, Alejandra
T1  - CoFeeMOOC-v.2
BT  - Designing Contingent Feedback for Massive Open Online Courses
JF  - EMOOCs 2021
N2  - Providing adequate support to MOOC participants is often a challenging task due to massiveness of the learners’ population and the asynchronous communication among peers and MOOC practitioners. This workshop aims at discussing common learners’ problems reported in the literature and reflect on designing adequate feedback interventions with the use of learning data. Our aim is three-fold: a) to pinpoint MOOC aspects that impact the planning of feedback, b) to explore the use of learning data in designing feedback strategies, and c) to propose design guidelines for developing and delivering scaffolding interventions for personalized feedback in MOOCs. To do so, we will carry out hands-on activities that aim to involve participants in interpreting learning data and using them to design adaptive feedback. This workshop appeals to researchers, practitioners and MOOC stakeholders who aim to providing contextualized scaffolding. We envision that this workshop will provide insights for bridging the gap between pedagogical theory and practice when it comes to feedback interventions in MOOCs.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517241
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 209
EP  - 217
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Taleb, Aiham
T1  - Self-supervised deep learning methods for medical image analysis
T1  - Selbstüberwachte Deep Learning Methoden für die medizinische Bildanalyse
N2  - Deep learning has seen widespread application in many domains, mainly for its ability to learn data representations from raw input data. Nevertheless, its success has so far been coupled with the availability of large annotated (labelled) datasets. This is a requirement that is difficult to fulfil in several domains, such as in medical imaging. Annotation costs form a barrier in extending deep learning to clinically-relevant use cases. The labels associated with medical images are scarce, since the generation of expert annotations of multimodal patient data at scale is non-trivial, expensive, and time-consuming. This substantiates the need for algorithms that learn from the increasing amounts of unlabeled data. Self-supervised representation learning algorithms offer a pertinent solution, as they allow solving real-world (downstream) deep learning tasks with fewer annotations. Self-supervised approaches leverage unlabeled samples to acquire generic features about different concepts, enabling annotation-efficient downstream task solving subsequently.
Nevertheless, medical images present multiple unique and inherent challenges for existing self-supervised learning approaches, which we seek to address in this thesis: (i) medical images are multimodal, and their multiple modalities are heterogeneous in nature and imbalanced in quantities, e.g. MRI and CT; (ii) medical scans are multi-dimensional, often in 3D instead of 2D; (iii) disease patterns in medical scans are numerous and their incidence exhibits a long-tail distribution, so it is oftentimes essential to fuse knowledge from different data modalities, e.g. genomics or clinical data, to capture disease traits more comprehensively; (iv) Medical scans usually exhibit more uniform color density distributions, e.g. in dental X-Rays, than natural images. Our proposed self-supervised methods meet these challenges, besides significantly reducing the amounts of required annotations.
We evaluate our self-supervised methods on a wide array of medical imaging applications and tasks. Our experimental results demonstrate the obtained gains in both annotation-efficiency and performance; our proposed methods outperform many approaches from related literature. Additionally, in case of fusion with genetic modalities, our methods also allow for cross-modal interpretability. In this thesis, not only we show that self-supervised learning is capable of mitigating manual annotation costs, but also our proposed solutions demonstrate how to better utilize it in the medical imaging domain. Progress in self-supervised learning has the potential to extend deep learning algorithms application to clinical scenarios.
N2  - Deep Learning findet in vielen Bereichen breite Anwendung, vor allem wegen seiner Fähigkeit, Datenrepräsentationen aus rohen Eingabedaten zu lernen. Dennoch war der Erfolg bisher an die Verfügbarkeit großer annotatierter Datensätze geknüpft. Dies ist eine Anforderung, die in verschiedenen Bereichen, z. B. in der medizinischen Bildgebung, schwer zu erfüllen ist. Die Kosten für die Annotation stellen ein Hindernis für die Ausweitung des Deep Learning auf klinisch relevante Anwendungsfälle dar. Die mit medizinischen Bildern verbundenen Annotationen sind rar, da die Erstellung von Experten Annotationen für multimodale Patientendaten in großem Umfang nicht trivial, teuer und zeitaufwändig ist. Dies unterstreicht den Bedarf an Algorithmen, die aus den wachsenden Mengen an unbeschrifteten Daten lernen. Selbstüberwachte Algorithmen für das Repräsentationslernen bieten eine mögliche Lösung, da sie die Lösung realer (nachgelagerter) Deep-Learning-Aufgaben mit weniger Annotationen ermöglichen. Selbstüberwachte Ansätze nutzen unannotierte Stichproben, um generisches Eigenschaften über verschiedene Konzepte zu erlangen und ermöglichen so eine annotationseffiziente Lösung nachgelagerter Aufgaben.
Medizinische Bilder stellen mehrere einzigartige und inhärente Herausforderungen für existierende selbstüberwachte Lernansätze dar, die wir in dieser Arbeit angehen wollen: (i) medizinische Bilder sind multimodal, und ihre verschiedenen Modalitäten sind von Natur aus heterogen und in ihren Mengen unausgewogen, z.B. (ii) medizinische Scans sind mehrdimensional, oft in 3D statt in 2D; (iii) Krankheitsmuster in medizinischen Scans sind zahlreich und ihre Häufigkeit weist eine Long-Tail-Verteilung auf, so dass es oft unerlässlich ist, Wissen aus verschiedenen Datenmodalitäten, z. B. Genomik oder klinische Daten, zu verschmelzen, um Krankheitsmerkmale umfassender zu erfassen; (iv) medizinische Scans weisen in der Regel eine gleichmäßigere Farbdichteverteilung auf, z. B. in zahnmedizinischen Röntgenaufnahmen, als natürliche Bilder. Die von uns vorgeschlagenen selbstüberwachten Methoden adressieren diese Herausforderungen und reduzieren zudem die Menge der erforderlichen Annotationen erheblich.
Wir evaluieren unsere selbstüberwachten Methoden in verschiedenen Anwendungen und Aufgaben der medizinischen Bildgebung. Unsere experimentellen Ergebnisse zeigen, dass die von uns vorgeschlagenen Methoden sowohl die Effizienz der Annotation als auch die Leistung steigern und viele Ansätze aus der verwandten Literatur übertreffen. Darüber hinaus ermöglichen unsere Methoden im Falle der Fusion mit genetischen Modalitäten auch eine modalübergreifende Interpretierbarkeit. In dieser Arbeit zeigen wir nicht nur, dass selbstüberwachtes Lernen in der Lage ist, die Kosten für manuelle Annotationen zu senken, sondern auch, wie man es in der medizinischen Bildgebung besser nutzen kann. Fortschritte beim selbstüberwachten Lernen haben das Potenzial, die Anwendung von Deep-Learning-Algorithmen auf klinische Szenarien auszuweiten.
KW  - Artificial Intelligence
KW  - machine learning
KW  - unsupervised learning
KW  - representation learning
KW  - Künstliche Intelligenz
KW  - maschinelles Lernen
KW  - Representationlernen
KW  - selbstüberwachtes Lernen
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-644089
ER  - 
TY  - THES
A1  - Taeumel, Marcel
T1  - Data-driven tool construction in exploratory programming environments
T1  - Datengetriebener Werkzeugbau in explorativen Programmierumgebungen
N2  - This work presents a new design for programming environments that promote the exploration of domain-specific software artifacts and the construction of graphical tools for such program comprehension tasks. In complex software projects, tool building is essential because domain- or task-specific tools can support decision making by representing concerns concisely with low cognitive effort. In contrast, generic tools can only support anticipated scenarios, which usually align with programming language concepts or well-known project domains.

However, the creation and modification of interactive tools is expensive because the glue that connects data to graphics is hard to find, change, and test. Even if valuable data is available in a common format and even if promising visualizations could be populated, programmers have to invest many resources to make changes in the programming environment. Consequently, only ideas of predictably high value will be implemented. In the non-graphical, command-line world, the situation looks different and inspiring: programmers can easily build their own tools as shell scripts by configuring and combining filter programs to process data.

We propose a new perspective on graphical tools and provide a concept to build and modify such tools with a focus on high quality, low effort, and continuous adaptability. That is, (1) we propose an object-oriented, data-driven, declarative scripting language that reduces the amount of and governs the effects of glue code for view-model specifications, and (2) we propose a scalable UI-design language that promotes short feedback loops in an interactive, graphical environment such as Morphic known from Self or Squeak/Smalltalk systems.

We implemented our concept as a tool building environment, which we call VIVIDE, on top of Squeak/Smalltalk and Morphic. We replaced existing code browsing and debugging tools to iterate within our solution more quickly. In several case studies with undergraduate and graduate students, we observed that VIVIDE can be applied to many domains such as live language development, source-code versioning, modular code browsing, and multi-language debugging. Then, we designed a controlled experiment to measure the effect on the time to build tools. Several pilot runs showed that training is crucial and, presumably, takes days or weeks, which implies a need for further research.

As a result, programmers as users can directly work with tangible representations of their software artifacts in the VIVIDE environment. Tool builders can write domain-specific scripts to populate views to approach comprehension tasks from different angles. Our novel perspective on graphical tools can inspire the creation of new trade-offs in modularity for both data providers and view designers.
N2  - Diese Arbeit schlägt einen neuartigen Entwurf für Programmierumgebungen vor, welche den Umgang mit domänenspezifischen Software-Artefakten erleichtern und die Konstruktion von unterstützenden, grafischen Werkzeugen fördern. Werkzeugbau ist in komplexen Software-Projekten ein essentieller Bestandteil, weil spezifische, auf Domäne und Aufgabe angepasste, Werkzeuge relevante Themen und Konzepte klar darstellen und somit effizient zur Entscheidungsfindung beitragen können. Im Gegensatz dazu sind vorhandene, traditionelle Werkzeuge nur an allgemeinen, wiederkehrenden Anforderungen ausgerichtet, welche im Spezialfall Gedankengänge nur unzureichend abbilden können.

Leider sind das Erstellen und Anpassen von interaktiven Werkzeugen teuer, weil die Beschreibungen zwischen Information und Repräsentation nur schwer auffindbar, änderbar und prüfbar sind. Selbst wenn relevante Daten verfügbar und vielversprechende Visualisierungen konfigurierbar sind, müssten Programmierer viele Ressourcen für das Verändern ihrer Programmierumgeben investieren. Folglich können nur Ideen von hohem Wert umgesetzt werden, um diese Kosten zu rechtfertigen. Dabei sieht die Situation in der textuellen Welt der Kommandozeile sehr vielversprechend aus. Dort können Programmierer einfach ihre Werkzeuge in Form von Skripten anpassen und kleine Filterprogramme kombinieren, um Daten zu verarbeiten.

Wir stellen eine neuartige Perspektive auf grafische Werkzeuge vor und vermitteln dafür ein Konzept, um diese Werkzeuge mit geringem Aufwand und in hoher Qualität zu konstruieren. Im Detail beinhaltet das, erstens, eine objekt-orientierte, daten-getriebene, deklarative Skriptsprache, um die Programmierschnittstelle zwischen Information und Repräsentation zu vereinfachen. Zweitens ist dies eine skalierbare Entwurfssprache für Nutzerschnitt-stellen, welche kurze Feedback-Schleifen und Interaktivität kombiniert, wie es in den Umgebungen Self oder Squeak/Smalltalk typisch ist.

Wir haben unser Konzept in Form einer neuartigen Umgebung für Werkzeugbau mit Hilfe von Squeak/Smalltalk und Morphic umgesetzt. Die Umgebung trägt den Namen VIVIDE. Damit konnten wir die bestehenden Werkzeuge von Squeak für Quelltextexploration und  ausführung ersetzen, um unsere Lösung kontinuierlich zu verbessern. In mehreren Fallstudien mit Studenten konnten wir beobachten, dass sich VIVIDE in vielen Domänen anwenden lässt: interaktive Entwicklung von Programmiersprachen, modulare Versionierung und Exploration von Quelltext und Fehleranalyse von mehrsprachigen Systemen. Mit Blick auf zukünftige Forschung haben wir ebenfalls ein kontrolliertes Experiment entworfen. Nach einigen Testläufen stellte sich die Trainingsphase von VIVIDE als größte, und somit offene, Herausforderung heraus.

Im Ergebnis sind wir davon überzeugt, dass Programmierer in VIVIDE direkt mit greifbaren, interaktiven Darstellungen relevanter Software-Artefakte arbeiten können. Im Rahmen des Werkzeugbaus können Programmierer kompakte, angepasste Skripte schreiben, die Visualisierungen konfigurieren, um Programmieraufgaben spezifisch aus mehreren Blickwinkeln zu betrachten. Unsere neuartige Perspektive auf grafische Werkzeuge kann damit sowohl das Bereitstellen von Informationen, als auch den Entwurf interaktiver Grafik positiv beeinflussen.
KW  - programming
KW  - tool building
KW  - user interaction
KW  - exploration
KW  - liveness
KW  - immediacy
KW  - direct manipulation
KW  - scripting languages
KW  - Squeak/Smalltalk
KW  - Programmieren
KW  - Werkzeugbau
KW  - Nutzerinteraktion
KW  - Exploration
KW  - Lebendigkeit
KW  - Direkte Manipulation
KW  - Skriptsprachen
KW  - Squeak/Smalltalk
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-444289
ER  - 
TY  - THES
A1  - Shekhar, Sumit
T1  - Image and video processing based on intrinsic attributes
N2  - Advancements in computer vision techniques driven by machine learning have facilitated robust and efficient estimation of attributes such as depth, optical flow, albedo, and shading. To encapsulate all such underlying properties associated with images and videos, we evolve the concept of intrinsic images towards intrinsic attributes. Further, rapid hardware growth in the form of high-quality smartphone cameras, readily available depth sensors, mobile GPUs, or dedicated neural processing units have made image and video processing pervasive. In this thesis, we explore the synergies between the above two advancements and propose novel image and video processing techniques and systems based on them. To begin with, we investigate intrinsic image decomposition approaches and analyze how they can be implemented on mobile devices. We propose an approach that considers not only diffuse reflection but also specular reflection; it allows us to decompose an image into specularity, albedo, and shading on a resource constrained system (e.g., smartphones or tablets) using the depth data provided by the built-in depth sensors. In addition, we explore how on-device depth data can further be used to add an immersive dimension to 2D photos, e.g., showcasing parallax effects via 3D photography. In this regard, we develop a novel system for interactive 3D photo generation and stylization on mobile devices. Further, we investigate how adaptive manipulation of baseline-albedo (i.e., chromaticity) can be used for efficient visual enhancement under low-lighting conditions. The proposed technique allows for interactive editing of enhancement settings while achieving improved quality and performance. We analyze the inherent optical flow and temporal noise as intrinsic properties of a video. We further propose two new techniques for applying the above intrinsic attributes for the purpose of consistent video filtering. To this end, we investigate how to remove temporal inconsistencies perceived as flickering artifacts. One of the techniques does not require costly optical flow estimation, while both provide interactive consistency control. Using intrinsic attributes for image and video processing enables new solutions for mobile devices – a pervasive visual computing device – and will facilitate novel applications for Augmented Reality (AR), 3D photography, and video stylization. The proposed low-light enhancement techniques can also improve the accuracy of high-level computer vision tasks (e.g., face detection) under low-light conditions. Finally, our approach for consistent video filtering can extend a wide range of image-based processing for videos.
N2  - Fortschritte im Bereich der Computer-Vision-Techniken, die durch Maschinelles Lernen vorangetrieben werden, haben eine robuste und effiziente Schätzung von Attributen wie Tiefe, optischer Fluss, Albedo, und Schattierung ermöglicht. Um all diese zugrundeliegenden Eigenschaften von Bildern und Videos zu erfassen, entwickeln wir das Konzept der intrinsischen Bilder zu intrinsischen Attributen weiter. Darüber hinaus hat die rasante Entwicklung der Hardware in Form von hochwertigen Smartphone-Kameras, leicht verfügbaren Tiefensensoren, mobilen GPUs, oder speziellen neuronalen Verarbeitungseinheiten die Bild- und Videoverarbeitung allgegenwärtig gemacht. In dieser Arbeit erforschen wir die Synergien zwischen den beiden oben genannten Fortschritten und schlagen neue Bild- und Videoverarbeitungstechniken und -systeme vor, die auf ihnen basieren. Zunächst untersuchen wir intrinsische Bildzerlegungsansätze und analysieren, wie sie auf mobilen Geräten implementiert werden können. Wir schlagen einen Ansatz vor, der nicht nur die diffuse Reflexion, sondern auch die spiegelnde Reflexion berücksichtigt; er ermöglicht es uns, ein Bild auf einem ressourcenbeschränkten System (z. B. Smartphones oder Tablets) unter Verwendung der von den eingebauten Tiefensensoren bereitgestellten Tiefendaten in Spiegelung, Albedo und Schattierung zu zerlegen. Darüber hinaus erforschen wir, wie geräteinterne Tiefendaten genutzt werden können, um 2D-Fotos eine immersive Dimension hinzuzufügen, z. B. um Parallaxen-Effekte durch 3D-Fotografie darzustellen. In diesem Zusammenhang entwickeln wir ein neuartiges System zur interaktiven 3D-Fotoerstellung und -Stylisierung auf mobilen Geräten. Darüber hinaus untersuchen wir, wie eine adaptive Manipulation der Grundlinie-Albedo (d.h. der Farbintensität) für eine effiziente visuelle Verbesserung bei schlechten Lichtverhältnissen genutzt werden kann. Die vorgeschlagene Technik ermöglicht die interaktive Bearbeitung von Verbesserungseinstellungen bei verbesserter Qualität und Leistung. Wir analysieren den inhärenten optischen Fluss und die zeitliche Konsistenz als intrinsische Eigenschaften eines Videos. Darüber hinaus schlagen wir zwei neue Techniken zur Anwendung der oben genannten intrinsischen Attribute zum Zweck der konsistenten Videofilterung vor. Zu diesem Zweck untersuchen wir, wie zeitliche Inkonsistenzen, die als Flackerartefakte wahrgenommen werden, entfernt werden können. Eine der Techniken erfordert keine kostspielige optische Flussschätzung, während beide eine interaktive Konsistenzkontrolle bieten. Die Verwendung intrinsischer Attribute für die Bild- und Videoverarbeitung ermöglicht neue Lösungen für mobile Geräte - ein visuelles Computergerät, das aufgrund seiner weltweiten Verbreitung von großer Bedeutung ist - und wird neuartige Anwendungen für Augmented Reality (AR), 3D-Fotografie und Videostylisierung ermöglichen. Die vorgeschlagenen Low-Light-Enhancement-Techniken können auch die Genauigkeit von High-Level-Computer-Vision-Aufgaben (z. B. Objekt-Tracking) unter schlechten Lichtverhältnissen verbessern. Schließlich kann unser Ansatz zur konsistenten Videofilterung eine breite Palette von bildbasierten Verarbeitungen für Videos erweitern.
KW  - image processing
KW  - image-based rendering
KW  - non-photorealistic rendering
KW  - image stylization
KW  - computational photography
KW  - Bildverarbeitung
KW  - bildbasiertes Rendering
KW  - Non-photorealistic Rendering
KW  - Computational Photography
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-620049
ER  - 
TY  - JOUR
A1  - Seng, Cheyvuth
A1  - Carlon, May Kristine Jonson
A1  - Gayed, John Maurice
A1  - Cross, Jeffrey S.
T1  - Long-Term Effects of Short-Term Intervention Using MOOCs for Developing Cambodian Undergraduate Research Skills
JF  - EMOOCs 2021
N2  - Developing highly skilled researchers is essential to accelerate the economic progress of developing countries such as Cambodia in South East Asia. While there is continuing research investigating Cambodia’s potential to cultivate such a workforce, the circumstances of undergraduate students in public provincial universities do not receive ample attention. This is crucial as numerous multinational corporations are participating via foreign direct investments in special economic zones at the border provinces and need talented human resources in Cambodia as well as in neighboring Southeast Asian countries such as Thailand and Vietnam. Student’s research capability growth starts with one’s belief in their capacity to use the necessary information tools and their potential to succeed in research. In this research paper, we look at how such beliefs, specifically research self-efficacy and information literacy, can be developed through a short-term intervention that uses MOOCs and assess their long-term effects. Our previous research has shown that short-term training intervention has immediate positive effects on the undergraduate students’ self-efficacies in Cambodian public provincial universities. In this paper, we present the follow-up study results conducted sixteen months after the said short-term training intervention. Results reveal that from follow-up evaluations that while student’s self-efficacies were significantly higher than before the short-term intervention was completed, they were lower than immediately after the intervention. Thus, while perfunctory interventions such as merely introducing the students to MOOCs and other relevant research tools over as little as three weeks can have significant positive effects, efforts must be made to sustain the benefits gained. This implication is essential to developing countries such as Cambodia that need low-cost solutions with immediate positive results in developing human resources to conduct research, particularly in areas far from more developed capital cities.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-516929
VL  - 2021
SP  - 49
EP  - 62
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Seitz, Klara
A1  - Lincke, Jens
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Language and tool support for 3D crochet patterns
BT  - virtual crochet with a graph structure
N2  - Crochet is a popular handcraft all over the world. While other techniques such as knitting or weaving have received technical support over the years through machines, crochet is still a purely manual craft. Not just the act of crochet itself is manual but also the process of creating instructions for new crochet patterns, which is barely supported by domain specific digital solutions. This leads to unstructured and often also ambiguous and erroneous pattern instructions. In this report, we propose a concept to digitally represent crochet patterns. This format incorporates crochet techniques which allows domain specific support for crochet pattern designers during the pattern creation and instruction writing process. As contributions, we present a thorough domain analysis, the concept of a graph structure used as domain specific language to specify crochet patterns and a prototype of a projectional editor using the graph as representation format of patterns and a diagramming system to visualize them in 2D and 3D. By analyzing the domain, we learned about crochet techniques and pain points of designers in their pattern creation workflow. These insights are the basis on which we defined the pattern representation. In order to evaluate our concept, we built a prototype by which the feasibility of the concept is shown and we tested the software with professional crochet designers who approved of the concept.
N2  - Häkeln ist eine weltweit verbreitete Handarbeitskunst. Obwohl andere Techniken, wie stricken und weben über die Zeit maschinelle Unterstützung erhalten haben, ist Häkeln noch heute ein komplett manueller Vorgang. Nicht nur das Häkeln an sich, sondern auch der Prozess zur Anleitungserstellung von neuen Häkeldesigns ist kaum unterstützt mit digitalen Lösungen. In dieser Arbeit stellen wir ein Konzept vor, das Häkelanleitungen digital repräsentiert. Das entwickelte Format integriert Häkeltechniken, wodurch wir den Prozess des Anleitungschreibens für Designer spezifisch für die Häkeldomäne unterstützen können. Als Beiträge analysieren wir umfassend die Häkeldomäne, entwickeln ein Konzept zur Repräsentation von Häkelanleitungen basierend auf einer Graphenstruktur als domänenspezifische Sprache und implementieren einen projektionalen Editor, der auf der besagten Graphenstruktur aufbaut und weiterhin die erstellten Anleitungen als schematische Darstellung in 2D und 3D visualisiert. Durch die Analyse der Domäne lernen wir Häkeltechniken und Schwachstellen beim Ablauf des Anleitungserstellens kennen. Basierend auf diesen Erkenntnissen entwickeln wir das digitale Format, um Anleitungen zu repräsentieren. Für die Evaluierung unseres Konzepts, haben wir einen Prototypen implementiert, der die Machbarkeit demonstriert. Zudem haben wir die Software von professionellen Häkeldesignern testen lassen, die unsere Herangehensweise gutheißen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 137 
KW  - crochet
KW  - visual language
KW  - tools
KW  - computer-aided design
KW  - Häkeln
KW  - visuelle Sprache
KW  - Werkzeuge
KW  - rechnerunterstütztes Konstruieren
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-492530
SN  - 978-3-86956-505-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 137
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schwarzer, Ingo
A1  - Weiß-Saoumi, Said
A1  - Kittel, Roland
A1  - Friedrich, Tobias
A1  - Kaynak, Koraltan
A1  - Durak, Cemil
A1  - Isbarn, Andreas
A1  - Diestel, Jörg
A1  - Knittel, Jens
A1  - Franz, Marquart
A1  - Morra, Carlos
A1  - Stahnke, Susanne
A1  - Braband, Jens
A1  - Dittmann, Johannes
A1  - Griebel, Stephan
A1  - Krampf, Andreas
A1  - Link, Martin
A1  - Müller, Matthias
A1  - Radestock, Jens
A1  - Strub, Leo
A1  - Bleeke, Kai
A1  - Jehl, Leander
A1  - Kapitza, Rüdiger
A1  - Messadi, Ines
A1  - Schmidt, Stefan
A1  - Schwarz-Rüsch, Signe
A1  - Pirl, Lukas
A1  - Schmid, Robert
A1  - Friedenberger, Dirk
A1  - Beilharz, Jossekin Jakob
A1  - Boockmeyer, Arne
A1  - Polze, Andreas
A1  - Röhrig, Ralf
A1  - Schäbe, Hendrik
A1  - Thiermann, Ricky
T1  - RailChain
BT  - Abschlussbericht
N2  - The RailChain project designed, implemented, and experimentally evaluated a juridical recorder that is based on a distributed consensus protocol. That juridical blockchain recorder has been realized as distributed ledger on board the advanced TrainLab (ICE-TD 605 017) of Deutsche Bahn.
For the project, a consortium consisting of DB Systel, Siemens, Siemens Mobility, the Hasso Plattner Institute for Digital Engineering, Technische Universität Braunschweig, TÜV Rheinland InterTraffic, and Spherity has been formed. These partners not only concentrated competencies in railway operation, computer science, regulation, and approval, but also combined experiences from industry, research from academia, and enthusiasm from startups.
Distributed ledger technologies (DLTs) define distributed databases and express a digital protocol for transactions between business partners without the need for a trusted intermediary. The implementation of a blockchain with real-time requirements for the local network of a railway system (e.g., interlocking or train) allows to log data in the distributed system verifiably in real-time. For this, railway-specific assumptions can be leveraged to make modifications to standard blockchains protocols.
EULYNX and OCORA (Open CCS On-board Reference Architecture) are parts of a future European reference architecture for control command and signalling (CCS, Reference CCS Architecture – RCA). Both architectural concepts outline heterogeneous IT systems with components from multiple manufacturers. Such systems introduce novel challenges for the approved and safety-relevant CCS of railways which were considered neither for road-side nor for on-board systems so far. Logging implementations, such as the common juridical recorder on vehicles, can no longer be realized as a central component of a single manufacturer. All centralized approaches are in question.
The research project RailChain is funded by the mFUND program and gives practical evidence that distributed consensus protocols are a proper means to immutably (for legal purposes) store state information of many system components from multiple manufacturers. The results of RailChain have been published, prototypically implemented, and experimentally evaluated in large-scale field tests on the advanced TrainLab. At the same time, the project showed how RailChain can be integrated into the road-side and on-board architecture given by OCORA and EULYNX.
Logged data can now be analysed sooner and also their trustworthiness is being increased. This enables, e.g., auditable predictive maintenance, because it is ensured that data is authentic and unmodified at any point in time.
N2  - Das Projekt RailChain hat einen verteilten Juridical Recorder entworfen, implementiert und experimentell evaluiert, der auf einem echtzeitfähigen verteilten Konsensprotokoll basiert. Dieser Juridical Blockchain Recorder wurde als distributed ledger an Bord des advanced TrainLabs der Deutschen Bahn (ICE-TD 605 017) umgesetzt.
Für das Projekt hat sich ein Konsortium aus DB Systel, Siemens, Siemens Mobility, dem Hasso-Plattner-Institut für Digital Engineering, der Technischen Universität Braunschweig, sowie TÜV Rheinland InterTraffic und Spherity formiert und dabei Kompetenzen aus den Bereichen Bahnbetrieb, Informatik und Zulassungswesen gebündelt. Die Partner kombinieren Erfahrungen aus der Industrie und die akademische Forschung mit der Aufbruchstimmung aus dem Start-Up-Umfeld.
Distributed-Ledger-Technologien (DLTs) definieren verteilte Datenbanken und stellen ein digitales Protokoll für Transaktionen zwischen Geschäftspartnern dar, ohne dass ein Mittelsmann beteiligt sein müsste. Die Implementierung einer Blockchain mit Echtzeitanforderungen für das lokale Netzwerk einer Eisenbahnanlage (z. B. Stellwerk oder Zug) erlaubt es, die im verteilten System entstehenden Daten nachweislich in Echtzeit zu protokollieren. Dabei können eisenbahnspezifische Randbedingungen ausgenutzt werden, um Standard-Blockchain-Protokolle anzupassen.
EULYNX und OCORA (Open CCS On-board Reference Architecture) sind Bestandteile einer zukünftigen europäischen Referenzarchitektur für das Leit- und Sicherungssystem (Reference CCS Architecture – RCA, Control Command and Signalling – CCS). Beide Architekturkonzepte skizzieren herstellerübergreifende, komponentenbasierende heterogene IT-Systeme. Solche Systeme bergen neue Herausforderungen, die bislang im Kontext der zugelassenen, sicherheitsrelevanten Leit- und Sicherungstechnik der Bahn weder strecken- noch fahrzeugseitig adressiert werden mussten. Logbuch-Implementierungen, wie der gängige Juridical Recorder auf Fahrzeugen, können nun nicht mehr als zentrale Systemkomponente eines einzelnen Herstellers umgesetzt werden. Alle zentralisierten Lösungsansätze sind in Frage gestellt.
Das mFUND-geförderte Forschungsprojekt erbringt den praktischen Nachweis, dass Zustandsinformationen über eine Vielzahl von Systemkomponenten herstellerübergreifend und gerichtsfest mittels verteilten Konsensprotokollen gespeichert werden können. Ergebnisse von RailChain wurden publiziert, prototypisch implementiert und in großen Feldtests auf dem advanced TrainLab experimentell evaluiert. Gleichzeitig wurde aufgezeigt, wie sich RailChain in den mit OCORA und EULYNX vorgegebenen fahrzeug- und streckenseitigen Architekturentwurf integrieren lässt.
Daten können dadurch zeitnaher ausgewertet werden und gleichzeitig wird ihre Vertrauenswürdigkeit erhöht. Dies ermöglicht u. a. nachvollziehbare zustandsorientierte Wartung, denn es kann jederzeit sichergestellt werden, dass die Daten authentisch sind und auch nicht verändert wurden.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 152 
KW  - Distributed-Ledger-Technologie (DLT)
KW  - juridical recording
KW  - Konsensprotokolle
KW  - consensus protocols
KW  - Digitalisierung
KW  - digitalization
KW  - Bahnwesen
KW  - railways
KW  - Blockchain
KW  - asset management
KW  - selbstbestimmte Identitäten
KW  - self-sovereign identity
KW  - dezentrale Identitäten
KW  - decentral identities
KW  - überprüfbare Nachweise
KW  - verifiable credentials
KW  - Echtzeit
KW  - real-time
KW  - Standardisierung
KW  - standardization
KW  - Verlässlichkeit
KW  - dependability
KW  - Fehlertoleranz
KW  - fault tolerance
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-577409
SN  - 978-3-86956-550-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 152
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Invariant Analysis for Multi-Agent Graph Transformation Systems using k-Induction
N2  - The analysis of behavioral models such as Graph Transformation Systems (GTSs) is of central importance in model-driven engineering. However, GTSs often result in intractably large or even infinite state spaces and may be equipped with multiple or even infinitely many start graphs. To mitigate these problems, static analysis techniques based on finite symbolic representations of sets of states or paths thereof have been devised. We focus on the technique of k-induction for establishing invariants specified using graph conditions. To this end, k-induction generates symbolic paths backwards from a symbolic state representing a violation of a candidate invariant to gather information on how that violation could have been reached possibly obtaining contradictions to assumed invariants. However, GTSs where multiple agents regularly perform actions independently from each other cannot be analyzed using this technique as of now as the independence among backward steps may prevent the gathering of relevant knowledge altogether.

In this paper, we extend k-induction to GTSs with multiple agents thereby supporting a wide range of additional GTSs. As a running example, we consider an unbounded number of shuttles driving on a large-scale track topology, which adjust their velocity to speed limits to avoid derailing. As central contribution, we develop pruning techniques based on causality and independence among backward steps and verify that k-induction remains sound under this adaptation as well as terminates in cases where it did not terminate before.
N2  - Die Analyse von Verhaltensmodellen wie Graphtransformationssystemen (GTSs) ist von zentraler Bedeutung im Model Driven Engineering. GTSs führen jedoch häufig zu unhanhabbar großen oder sogar unendlichen Zustandsräumen und können mit mehreren oder sogar unendlich vielen Startgraphen ausgestattet sein. Um diese Probleme abzumildern, wurden statische Analysetechniken entwickelt, die auf endlichen symbolischen Darstellungen von Mengen von Zuständen oder Pfaden basieren. Wir konzentrieren uns auf die Technik der k-Induktion zur Ermittlung von Invarianten, die unter Verwendung von Graphbedingungen spezifiziert sind. Zum Zweck der Analyse erzeugt die k-Induktion symbolische Rückwärtspfade von einem symbolischen Zustand, der eine Verletzung einer Kandidateninvariante darstellt, um Informationen darüber zu sammeln, wie diese Verletzung erreicht werden konnte, wodurch möglicherweise Widersprüche zu angenommenen Invarianten gefunden werden. GTSs, bei denen mehrere Agenten regelmäßig unabhängig voneinander Aktionen ausführen, können derzeit jedoch nicht mit dieser Technik analysiert werden, da die Unabhängigkeit zwischen Rückwärtsschritten das Sammeln von relevantem Wissen möglicherweise verhindert.

In diesem Artikel erweitern wir die k-Induktion auf GTSs mit mehreren Agenten und unterstützen dadurch eine breite Palette zusätzlicher GTSs. Als laufendes Beispiel betrachten wir eine unbegrenzte Anzahl von Shuttles, die auf einer großen Tracktopologie fahren und die ihre Geschwindigkeit an Geschwindigkeitsbegrenzungen anpassen, um ein Entgleisen zu vermeiden. Als zentralen Beitrag entwickeln wir Beschneidungstechniken basierend auf Kausalität und Unabhängigkeit zwischen Rückwärtsschritten und verifizieren, dass die k-Induktion unter dieser Anpassung korrekt bleibt und in Fällen terminiert, in denen sie zuvor nicht terminierte.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 143 
KW  - k-inductive invariant checking
KW  - causality
KW  - parallel and sequential independence
KW  - symbolic analysis
KW  - bounded backward model checking
KW  - k-induktive Invariantenprüfung
KW  - Kausalität
KW  - parallele und Sequentielle Unabhängigkeit
KW  - symbolische Analyse
KW  - Bounded Backward Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545851
SN  - 978-3-86956-531-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 143
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Probabilistic metric temporal graph logic
N2  - Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior adhere to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time.

In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.
N2  - Cyber-physische Systeme umfassen häufig ein komplexes nebenläufiges Verhalten mit Zeitbeschränkungen und probabilistischen Fehlern auf Anforderung. Die Analyse, ob solche Systeme mit probabilistischem gezeitetem Verhalten einer vorgegebenen Spezifikation entsprechen, ist essentiell. Wenn die Zustände des Systems durch Graphen dargestellt werden können, kann der regelbasierte Formalismus von probabilistischen gezeiteten Graphtransformationssystemen (PTGTSs) verwendet werden, um die Strukturdynamik sowie das probabilistische und gezeitete Verhalten des Systems geeignet zu erfassen. Die Modellprüfungsunterstützung für PTGTSs bzgl. Eigenschaften, die unter Verwendung von Probabilistic Timed Computation Tree Logic (PTCTL) spezifiziert wurden, wurde bereits entwickelt. Darüber hinaus wurde das gezeitete graphenbasierte Laufzeitmonitoring mittels metrischer temporaler Graphlogik (MTGL) entwickelt, um metrische temporale Eigenschaften auf identifizierten Untergraphen und ihre strukturellen Änderungen über die Zeit zu erfassen.

In diesem Artikel (a) erweitern wir MTGL auf die probabilistische metrische temporale Graphlogik (PMTGL), indem wir die Spezifikation probabilistischer Eigenschaften zulassen, (b) passen unseren MTGL-Prüfungsansatz auf PTGTSs an und (c) kombinieren die Ansätze für PTCTL-Modellprüfung und MTGL-Prüfung, um einen beschränkten Modellprüfungsansatz (BMC-Ansatz) für PMTGL zu erhalten. In unserer Auswertung wenden wir eine Implementierung unseres BMC-Ansatzes in AutoGraph auf ein Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 146 
KW  - cyber-physical systems
KW  - probabilistic timed systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - bounded model checking
KW  - cyber-physische Systeme
KW  - probabilistische gezeitete Systeme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - Bounded Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545867
SN  - 978-3-86956-532-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 146
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Probabilistic metric temporal graph logic
N2  - Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior adhere to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time. In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.
N2  - Cyber-physische Systeme umfassen häufig ein komplexes nebenläufiges Verhalten mit Zeitbeschränkungen und probabilistischen Fehlern auf Anforderung. Die Analyse, ob solche Systeme mit probabilistischem gezeitetem Verhalten einer vorgegebenen Spezifikation entsprechen, ist essentiell. Wenn die Zustände des Systems durch Graphen dargestellt werden können, kann der regelbasierte Formalismus von probabilistischen gezeiteten Graphtransformationssystemen (PTGTSs) verwendet werden, um die Strukturdynamik sowie das probabilistische und gezeitete Verhalten des Systems geeignet zu erfassen. Die Modellprüfungsunterstützung für PTGTSs bzgl. Eigenschaften, die unter Verwendung von probabilistischer zeitgesteuerter Berechnungsbaumlogik (PTCTL) spezifiziert wurden, wurde bereits entwickelt. Darüber hinaus wurde das gezeitete graphenbasierte Laufzeitmonitoring mittels metrischer temporaler Graphlogik (MTGL) entwickelt, um metrische temporale Eigenschaften auf identifizierten Untergraphen und ihre strukturellen Änderungen über die Zeit zu erfassen.

In diesem Artikel (a) erweitern wir MTGL auf die probabilistische metrische temporale Graphlogik (PMTGL), indem wir die Spezifikation probabilistischer Eigenschaften zulassen, (b) passen unseren MTGL-Prüfungsansatz auf PTGTSs an und (c) kombinieren die Ansätze für PTCTL-Modellprüfung und MTGL-Prüfung, um  einen beschränkten Modellprüfungsansatz (BMC-Ansatz) für PMTGL zu erhalten. In unserer Auswertung wenden wir eine Implementierung unseres BMC-Ansatzes in AutoGraph auf ein Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 140 
KW  - cyber-physische Systeme
KW  - probabilistische gezeitete Systeme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - Bounded Model Checking
KW  - cyber-physical systems
KW  - probabilistic timed systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - bounded model checking
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-515066
SN  - 978-3-86956-517-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 140
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - A logic-based incremental approach to graph repair
T1  - Ein logikbasierter inkrementeller Ansatz für Graphreparatur
N2  - Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond: For example, in model-driven engineering, the abstract syntax of models is usually encoded using graphs. Flexible edit operations temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. Similarly, in graph databases—managing the storage and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints, requiring also graph repair. We present a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing repairs. In our context, we formalize consistency by so-called graph conditions being equivalent to first-order logic on graphs. We present two kind of repair algorithms: State-based repair restores consistency independent of the graph update history, whereas deltabased (or incremental) repair takes this history explicitly into account. Technically, our algorithms rely on an existing model generation algorithm for graph conditions implemented in AutoGraph. Moreover, the delta-based approach uses the new concept of satisfaction (ST) trees for encoding if and how a graph satisfies a graph condition. We then demonstrate how to manipulate these STs incrementally with respect to a graph update.
N2  - Die Reparatur von Graphen, die Wiederherstellung der Konsistenz eines Graphen, spielt in mehreren Bereichen der Informatik und darüber hinaus eine herausragende Rolle: Beispielsweise wird in der modellgetriebenen Konstruktion die abstrakte Syntax von Modellen in der Regel mithilfe von Graphen kodiert.
Flexible Bearbeitungsvorgänge erstellen vorübergehend inkonsistente Diagramme, die kein gültiges Modell darstellen, und erfordern daher eine Reparatur des Diagramms.
Auf ähnliche Weise können Aktualisierungen in Graphendatenbanken - die das Speichern und Bearbeiten von Graphendaten verwalten - dazu führen, dass eine bestimmte Datenbank einige Integritätsbeschränkungen nicht erfüllt und auch eine Graphreparatur erforderlich macht.

Wir präsentieren einen logikbasierten inkrementellen Ansatz für die Graphreparatur, der eine solide und vollständige (nach Beendigung) Übersicht über die am wenigsten verändernden Reparaturen erstellt.
In unserem Kontext formalisieren wir die Konsistenz mittels sogenannten Graphbedingungen die der Logik erster Ordnung in Graphen entsprechen.
Wir stellen zwei Arten von Reparaturalgorithmen vor: Die zustandsbasierte Reparatur stellt die Konsistenz unabhängig vom Verlauf der Graphänderung wieder her, während die deltabasierte (oder inkrementelle) Reparatur diesen Verlauf explizit berücksichtigt.
Technisch stützen sich unsere Algorithmen auf einen vorhandenen Modellgenerierungsalgorithmus für in AutoGraph implementierte Graphbedingungen.
Darüber hinaus verwendet der deltabasierte Ansatz das neue Konzept der Erfüllungsbäume (STs) zum Kodieren, ob und wie ein Graph eine Graphbedingung erfüllt.
Wir zeigen dann, wie diese STs in Bezug auf eine Graphaktualisierung inkrementell manipuliert werden.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 126 
KW  - nested graph conditions
KW  - graph repair
KW  - model repair
KW  - consistency restoration
KW  - verschachtelte Graphbedingungen
KW  - Graphreparatur
KW  - Modellreparatur
KW  - Konsistenzrestauration
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427517
SN  - 978-3-86956-462-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 126
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Sapegin, Andrey
T1  - High-Speed Security Log Analytics Using Hybrid Outlier Detection
N2  - The rapid development and integration of Information Technologies over the last decades influenced all areas of our life, including the business world. Yet not only the modern enterprises become digitalised, but also security and criminal threats move into the digital sphere. To withstand these threats, modern companies must be aware of all activities within their computer networks.
The keystone for such continuous security monitoring is a Security Information and Event Management (SIEM) system that collects and processes all security-related log messages from the entire enterprise network. However, digital transformations and technologies, such as network virtualisation and widespread usage of mobile communications, lead to a constantly increasing number of monitored devices and systems. As a result, the amount of data that has to be processed by a SIEM system is increasing rapidly. Besides that, in-depth security analysis of the captured data requires the application of rather sophisticated outlier detection algorithms that have a high computational complexity. Existing outlier detection methods often suffer from performance issues and are not directly applicable for high-speed and high-volume analysis of heterogeneous security-related events, which becomes a major challenge for modern SIEM systems nowadays.
This thesis provides a number of solutions for the mentioned challenges. First, it proposes a new SIEM system architecture for high-speed processing of security events, implementing parallel, in-memory and in-database processing principles. The proposed architecture also utilises the most efficient log format for high-speed data normalisation. Next, the thesis offers several novel high-speed outlier detection methods, including generic Hybrid Outlier Detection that can efficiently be used for Big Data analysis. Finally, the special User Behaviour Outlier Detection is proposed for better threat detection and analysis of particular user behaviour cases.
The proposed architecture and methods were evaluated in terms of both performance and accuracy, as well as compared with classical architecture and existing algorithms. These evaluations were performed on multiple data sets, including simulated data, well-known public intrusion detection data set, and real data from the large multinational enterprise. The evaluation results have proved the high performance and efficacy of the developed methods.
All concepts proposed in this thesis were integrated into the prototype of the SIEM system, capable of high-speed analysis of Big Security Data, which makes this integrated SIEM platform highly relevant for modern enterprise security applications.
N2  - In den letzten Jahrzehnten hat die schnelle Weiterentwicklung und Integration der Informationstechnologien alle Bereich unseres Lebens beeinflusst, nicht zuletzt auch die Geschäftswelt. Aus der zunehmenden Digitalisierung des modernen Unternehmens ergeben sich jedoch auch neue digitale Sicherheitsrisiken und kriminelle Bedrohungen. Um sich vor diesen Bedrohungen zu schützen, muss das digitale Unternehmen alle Aktivitäten innerhalb seines Firmennetzes verfolgen.
Der Schlüssel zur kontinuierlichen Überwachung aller sicherheitsrelevanten Informationen ist ein sogenanntes Security Information und Event Management (SIEM) System, das alle Meldungen innerhalb des Firmennetzwerks zentral sammelt und verarbeitet. Jedoch führt die digitale Transformation der Unternehmen sowie neue Technologien, wie die Netzwerkvirtualisierung und mobile Endgeräte, zu einer konstant steigenden Anzahl zu überwachender Geräte und Systeme. Dies wiederum hat ein kontinuierliches Wachstum der Datenmengen zur Folge, die das SIEM System verarbeiten muss. Innerhalb eines möglichst kurzen Zeitraumes muss somit eine sehr große Datenmenge (Big Data) analysiert werden, um auf Bedrohungen zeitnah reagieren zu können. Eine gründliche Analyse der sicherheitsrelevanten Aspekte der aufgezeichneten Daten erfordert den Einsatz fortgeschrittener Algorithmen der Anomalieerkennung, die eine hohe Rechenkomplexität aufweisen. Existierende Methoden der Anomalieerkennung haben oftmals Geschwindigkeitsprobleme und sind deswegen nicht anwendbar für die sehr schnelle Analyse sehr großer Mengen heterogener sicherheitsrelevanter Ereignisse.
Diese Arbeit schlägt eine Reihe möglicher Lösungen für die benannten Herausforderungen vor. Zunächst wird eine neuartige SIEM Architektur vorgeschlagen, die es erlaubt Ereignisse mit sehr hoher Geschwindigkeit zu verarbeiten. Das System basiert auf den Prinzipien der parallelen Programmierung, sowie der In-Memory und In-Database Datenverarbeitung. Die vorgeschlagene Architektur verwendet außerdem das effizienteste Datenformat zur Vereinheitlichung der Daten in sehr hoher Geschwindigkeit. Des Weiteren wurden im Rahmen dieser Arbeit mehrere neuartige Hochgeschwindigkeitsverfahren zur Anomalieerkennung entwickelt. Eines ist die Hybride Anomalieerkennung (Hybrid Outlier Detection), die sehr effizient auf Big Data eingesetzt werden kann. Abschließend wird eine spezifische Anomalieerkennung für Nutzerverhaltens (User Behaviour Outlier Detection) vorgeschlagen, die eine verbesserte Bedrohungsanalyse von spezifischen Verhaltensmustern der Benutzer erlaubt.
Die entwickelte Systemarchitektur und die Algorithmen wurden sowohl mit Hinblick auf Geschwindigkeit, als auch Genauigkeit evaluiert und mit traditionellen Architekturen und existierenden Algorithmen verglichen. Die Evaluation wurde auf mehreren Datensätzen durchgeführt, unter anderem simulierten Daten, gut erforschten öffentlichen Datensätzen und echten Daten großer internationaler Konzerne. Die Resultate der Evaluation belegen die Geschwindigkeit und Effizienz der entwickelten Methoden.
Alle Konzepte dieser Arbeit wurden in den Prototyp des SIEM Systems integriert, das in der Lage ist Big Security Data mit sehr hoher Geschwindigkeit zu analysieren. Dies zeigt das diese integrierte SIEM Plattform eine hohe praktische Relevanz für moderne Sicherheitsanwendungen besitzt.
T2  - Sicherheitsanalyse in Hochgeschwindigkeit mithilfe der Hybride Anomalieerkennung
KW  - intrusion detection
KW  - security
KW  - machine learning
KW  - anomaly detection
KW  - outlier detection
KW  - novelty detection
KW  - in-memory
KW  - SIEM
KW  - IDS
KW  - Angriffserkennung
KW  - Sicherheit
KW  - Machinelles Lernen
KW  - Anomalieerkennung
KW  - In-Memory
KW  - SIEM
KW  - IDS
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-426118
ER  - 
TY  - THES
A1  - Sakizloglou, Lucas
T1  - Evaluating temporal queries over history-aware architectural runtime models
T1  - Ausführung temporaler Anfragen über geschichtsbewusste Architektur-Laufzeitmodelle
N2  - In model-driven engineering, the adaptation of large software systems with dynamic structure is enabled by architectural runtime models. Such a model represents an abstract state of the system as a graph of interacting components. Every relevant change in the system is mirrored in the model and triggers an evaluation of model queries, which search the model for structural patterns that should be adapted. This thesis focuses on a type of runtime models where the expressiveness of the model and model queries is extended to capture past changes and their timing. These history-aware models and temporal queries enable more informed decision-making during adaptation, as they support the formulation of requirements on the evolution of the pattern that should be adapted. However, evaluating temporal queries during adaptation poses significant challenges. First, it implies the capability to specify and evaluate requirements on the structure, as well as the ordering and timing in which structural changes occur. Then, query answers have to reflect that the history-aware model represents the architecture of a system whose execution may be ongoing, and thus answers may depend on future changes. Finally, query evaluation needs to be adequately fast and memory-efficient despite the increasing size of the history---especially for models that are altered by numerous, rapid changes.

The thesis presents a query language and a querying approach for the specification and evaluation of temporal queries. These contributions aim to cope with the challenges of evaluating temporal queries at runtime, a prerequisite for history-aware architectural monitoring and adaptation which has not been systematically treated by prior model-based solutions. The distinguishing features of our contributions are: the specification of queries based on a temporal logic which encodes structural patterns as graphs; the provision of formally precise query answers which account for timing constraints and ongoing executions; the incremental evaluation which avoids the re-computation of query answers after each change; and the option to discard history that is no longer relevant to queries. The query evaluation searches the model for occurrences of a pattern whose evolution satisfies a temporal logic formula. Therefore, besides model-driven engineering, another related research community is runtime verification. The approach differs from prior logic-based runtime verification solutions by supporting the representation and querying of structure via graphs and graph queries, respectively, which is more efficient for queries with complex patterns. We present a prototypical implementation of the approach and measure its speed and memory consumption in monitoring and adaptation scenarios from two application domains, with executions of an increasing size. We assess scalability by a comparison to the state-of-the-art from both related research communities. The implementation yields promising results, which pave the way for sophisticated history-aware self-adaptation solutions and indicate that the approach constitutes a highly effective technique for runtime monitoring on an architectural level.
N2  - In der modellgetriebenen Entwicklung wird die Adaptation großer Softwaresysteme mit dynamischer Struktur durch Architektur-Laufzeitmodelle ermöglicht. Ein solches Modell stellt einen abstrakten Zustand des Systems als einen Graphen von interagierenden Komponenten dar. Jede relevante Änderung im System spiegelt sich im Modell wider und löst eine Ausführung von Modellanfragen aus, die das Modell nach zu adaptierenden Strukturmustern durchsuchen. Diese Arbeit konzentriert sich auf eine Art von Laufzeitmodellen, bei denen die Ausdruckskraft des Modells und der Modellanfragen erweitert wird, um vergangene Änderungen und deren Zeitpunkt zu erfassen. Diese geschichtsbewussten Modelle und temporalen Anfragen ermöglichen eine fundiertere Entscheidungsfindung während der Adaptation, da sie die Formulierung von Anforderungen an die Entwicklung des Musters, das adaptiert werden soll, unterstützen. Die Ausführung von temporalen Anfragen während der Adaptation stellt jedoch eine große Herausforderung dar. Zunächst müssen Anforderungen an die Struktur sowie an die Reihenfolge und den Zeitpunkt von Strukturänderungen spezifiziert und evaluiert werden. Weiterhin müssen die Antworten auf die Anfragen berücksichtigen, dass das geschichtsbewusste Modell die Architektur eines Systems darstellt, dessen Ausführung fortlaufend sein kann, sodass die Antworten von zukünftigen Änderungen abhängen können. Schließlich muss die Anfrageausführung trotz der zunehmenden Größe der Historie hinreichend schnell und speichereffizient sein---insbesondere bei Modellen, die durch zahlreiche, schnelle Änderungen verändert werden.

In dieser Arbeit werden eine Sprache für die Spezifikation von temporalen Anfragen sowie eine Technik für deren Ausführung vorgestellt. Diese Beiträge zielen darauf ab, die Herausforderungen bei der Ausführung temporaler Anfragen zur Laufzeit zu bewältigen---eine Voraussetzung für ein geschichtsbewusstes Architekturmonitoring und geschichtsbewusste Architekturadaptation, die von früheren modellbasierten Lösungen nicht systematisch behandelt wurde. Die besonderen Merkmale unserer Beiträge sind: die Spezifikation von Anfragen auf der Basis einer temporalen Logik, die strukturelle Muster als Graphen kodiert; die Bereitstellung formal präziser Anfrageantworten, die temporale Einschränkungen und laufende Ausführungen berücksichtigen; die inkrementelle Ausführung, die die Neuberechnung von Abfrageantworten nach jeder Änderung vermeidet; und die Option, Historie zu verwerfen, die für Abfragen nicht mehr relevant ist. Bei der Anfrageausführung wird das Modell nach dem Auftreten eines Musters durchsucht, dessen Entwicklung eine temporallogische Formel erfüllt. Neben der modellgetriebenen Entwicklung ist daher die Laufzeitverifikation ein weiteres verwandtes Forschungsgebiet. Der Ansatz unterscheidet sich von bisherigen logikbasierten Lösungen zur Laufzeitverifikation, indem er die Darstellung und Abfrage von Strukturen über Graphen bzw. Graphanfragen unterstützt, was bei Anfragen mit komplexen Mustern effizienter ist. Wir stellen eine prototypische Implementierung des Ansatzes vor und messen seine Laufzeit und seinen Speicherverbrauch in Monitoring- und Adaptationsszenarien aus zwei Anwendungsdomänen mit Ausführungen von zunehmender Größe. Wir bewerten die Skalierbarkeit durch einen Vergleich mit dem Stand der Technik aus beiden verwandten Forschungsgebieten. Die Implementierung liefert vielversprechende Ergebnisse, die den Weg für anspruchsvolle geschichtsbewusste Selbstadaptationslösungen ebnen und darauf hindeuten, dass der Ansatz eine effektive Technik für das Laufzeitmonitoring auf Architekturebene darstellt.
KW  - architectural adaptation
KW  - history-aware runtime models
KW  - incremental graph query evaluation
KW  - model-driven software engineering
KW  - temporal graph queries
KW  - Architekturadaptation
KW  - geschichtsbewusste Laufzeit-Modelle
KW  - inkrementelle Ausführung von Graphanfragen
KW  - modellgetriebene Softwaretechnik
KW  - temporale Graphanfragen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604396
ER  - 
TY  - JOUR
A1  - Rosin, Paul L.
A1  - Lai, Yu-Kun
A1  - Mould, David
A1  - Yi, Ran
A1  - Berger, Itamar
A1  - Doyle, Lars
A1  - Lee, Seungyong
A1  - Li, Chuan
A1  - Liu, Yong-Jin
A1  - Semmo, Amir
A1  - Shamir, Ariel
A1  - Son, Minjung
A1  - Winnemöller, Holger
T1  - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits
JF  - Computational visual media
N2  - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset.
KW  - non-photorealistic rendering (NPR)
KW  - image stylization
KW  - style transfer
KW  - portrait
KW  - evaluation
KW  - benchmark
Y1  - 2022
U6  - https://doi.org/10.1007/s41095-021-0255-3
SN  - 2096-0433
SN  - 2096-0662
VL  - 8
IS  - 3
SP  - 445
EP  - 465
PB  - Springer Nature
CY  - London
ER  - 
TY  - THES
A1  - Rohloff, Tobias
T1  - Learning analytics at scale
BT  - supporting learning and teaching in MOOCs with data-driven insights
N2  - Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers.

The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended:

(1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action.

For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research.
N2  - Digitale Technologien sind Wegbereiter für innovative Bildungsansätze. Das Lernformat der Massive Open Online Courses (MOOCs) bietet einen einfachen und globalen Zugang zu lebenslangem Lernen und ist oft kostengünstiger und flexibler als klassische Präsenzlehre. Dabei können sich Tausende von Lernenden meist ohne Zulassungsbeschränkung in Kurse einschreiben, wodurch jedoch auch Herausforderungen entstehen. Eine individuelle Betreuung durch Lehrende ist kaum möglich und das Durchhaltevermögen und der Lernerfolg hängen von selbstregulatorischen Fähigkeiten der Lernenden ab. Hier bietet Technologie die Möglichkeit zur Unterstützung. Die Nutzung von Daten zur Entscheidungsfindung transformiert bereits viele Bereiche, aber im Bildungswesen ist dies noch eine junge Forschungsdisziplin. Als Learning Analytics (LA) wird das Messen, Erfassen, Analysieren und Auswerten von Daten über Lernende und ihren Lernkontext verstanden, mit dem Ziel, das Lernen und die Lernumgebungen zu verstehen und zu verbessern. Die riesige Menge an Daten, die MOOCs über das Lernverhalten und den Lernerfolg produzieren, bietet die Möglichkeit, das menschliche Lernen zu studieren und Ansätze zu entwickeln, die den Anforderungen von Lernenden und Lehrenden gerecht werden.

Der Schwerpunkt dieser Dissertation liegt auf der Implementierung von LA für die Größenordnung von MOOCs und erforscht dabei, wie datengetriebene Technologie das Lernen und Lehren in diesem Kontext unterstützen kann. Zu diesem Zweck wurden mehrere Forschungsprototypen iterativ für die HPI-MOOC-Plattform entwickelt. Daher wurden diese in einer authentischen und realen Lernumgebung getestet und evaluiert. Die meisten Ergebnisse lassen sich auf konzeptioneller Ebene auch auf andere MOOC-Plattformen übertragen, wodurch der Forschungsbeitrag dieser Arbeit praktische Erkenntnisse über das theoretisch Mögliche hinaus liefert. Insgesamt wurden vier Systemkomponenten entwickelt und erweitert:

(1) Die LA-Architektur: Eine technische Infrastruktur zum Sammeln, Verarbeiten und Analysieren von ereignisgesteuerten Lerndaten basierend auf einem schemaagnostischem Pipelining in einer serviceorientierten MOOC-Plattform. (2) Das LA-Dashboard für Lernende: Ein Werkzeug zur datengesteuerten Unterstützung der Selbstregulierung, insbesondere um Lernende in die Lage zu versetzen, ihre Lernaktivitäten, ihren Fortschritt und ihren Lernerfolg selbst zu evaluieren und zu planen. (3) Personalisierte Lernziele: Eine Reihe von Funktionen, um den Lernerfolg besser mit persönlichen Absichten zu verknüpfen, die auf ausgewählten Lernzielen basieren, um Leitlinien anzubieten und die bereitgestellten datengetriebenen Einblicke über den Lernfortschritt darauf abzustimmen. (4) Das LA-Dashboard für Lehrende: Ein Hilfsmittel, das Lehrkräfte mit datengetriebenen Erkenntnissen unterstützt, um ihre Kurse mit Tausenden von Lernenden zu überblicken, mögliche Probleme zu erkennen und fundierte Maßnahmen zu ergreifen.

Für alle untersuchten Aspekte dieser Dissertation werden verwandte Forschungsarbeiten vorgestellt, Entwicklungsprozesse und Implementierungskonzepte erläutert und Evaluierungen in Fallstudien durchgeführt. Unter anderem konnte durch den Einsatz des Dashboards für Lernende in Kombination mit personalisierten Lernzielen verbesserte Zertifizierungsraten von 11,62% bis 12,63% nachgewiesen werden. Außerdem wurde beobachtet, dass das Dashboard für Lehrende ein entscheidendes Werkzeug und ein integraler Bestandteil für die Lehre in MOOCs ist. Neben den Ergebnissen und Beiträgen werden generelle Einschränkungen der Arbeit diskutiert, die insgesamt eine fundierte Grundlage für praktische Implikationen und zukünftige Forschungsvorhaben schaffen.
KW  - Learning Analytics
KW  - MOOCs
KW  - Self-Regulated Learning
KW  - E-Learning
KW  - Service-Oriented Architecture
KW  - Online Learning Environments
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-526235
ER  - 
TY  - THES
A1  - Risch, Julian
T1  - Reader comment analysis on online news platforms
N2  - Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-consuming content moderation. As a consequence, many platforms limited or even shut down comment sections completely. In this thesis, we present deep learning approaches for comment classification, recommendation, and prediction to foster respectful and engaging online discussions. The main focus is on two kinds of comments: toxic comments, which make readers leave a discussion, and engaging comments, which make readers join a discussion. First, we discourage and remove toxic comments, e.g., insults or threats. To this end, we present a semi-automatic comment moderation process, which is based on fine-grained text classification models and supports moderators. Our experiments demonstrate that data augmentation, transfer learning, and ensemble learning allow training robust classifiers even on small datasets. To establish trust in the machine-learned models, we reveal which input features are decisive for their output with attribution-based explanation methods. Second, we encourage and highlight engaging comments, e.g., serious questions or factual statements. We automatically identify the most engaging comments, so that readers need not scroll through thousands of comments to find them. The model training process builds on upvotes and replies as a measure of reader engagement. We also identify comments that address the article authors or are otherwise relevant to them to support interactions between journalists and their readership. Taking into account the readers' interests, we further provide personalized recommendations of discussions that align with their favored topics or involve frequent co-commenters. Our models outperform multiple baselines and recent related work in experiments on comment datasets from different platforms.
N2  - Kommentarspalten von Online-Nachrichtenplattformen sind ein essentieller Ort, um Meinungen zu äußern und politische Themen zu diskutieren. Der Missbrauch durch Trolle und Verbreiter von Hass und Spam lässt jedoch Zweifel aufkommen, ob der Nutzen die Kosten der zeitaufwendigen Kommentarmoderation rechtfertigt. Als Konsequenz daraus haben viele Plattformen ihre Kommentarspalten eingeschränkt oder sogar ganz abgeschaltet. In dieser Arbeit stellen wir Deep-Learning-Verfahren zur Klassifizierung, Empfehlung und Vorhersage von Kommentaren vor, um respektvolle und anregende Online-Diskussionen zu fördern. Das Hauptaugenmerk liegt dabei auf zwei Arten von Kommentaren: toxische Kommentare, die die Leser veranlassen, eine Diskussion zu verlassen, und anregende Kommentare, die die Leser veranlassen, sich an einer Diskussion zu beteiligen. Im ersten Schritt identifizieren und entfernen wir toxische Kommentare, z.B. Beleidigungen oder Drohungen. Zu diesem Zweck stellen wir einen halbautomatischen Moderationsprozess vor, der auf feingranularen Textklassifikationsmodellen basiert und Moderatoren unterstützt. Unsere Experimente zeigen, dass Datenanreicherung, Transfer- und Ensemble-Lernen das Trainieren robuster Klassifikatoren selbst auf kleinen Datensätzen ermöglichen. Um Vertrauen in die maschinell gelernten Modelle zu schaffen, zeigen wir mit attributionsbasierten Erklärungsmethoden auf, welche Teile der Eingabe für ihre Ausgabe entscheidend sind. Im zweiten Schritt ermutigen und markieren wir anregende Kommentare, z.B. ernsthafte Fragen oder sachliche Aussagen.
Wir identifizieren automatisch die anregendsten Kommentare, so dass die Leser nicht durch Tausende von Kommentaren blättern müssen, um sie zu finden. Der Trainingsprozess der Modelle baut auf Upvotes und Kommentarantworten als Maß für die Aktivität der Leser auf.
Wir identifizieren außerdem Kommentare, die sich an die Artikelautoren richten oder anderweitig für sie relevant sind, um die Interaktion zwischen Journalisten und ihrer Leserschaft zu unterstützen. Unter Berücksichtigung der Interessen der Leser bieten wir darüber hinaus personalisierte Diskussionsempfehlungen an, die sich an den von ihnen bevorzugten Themen oder häufigen Diskussionspartnern orientieren. In Experimenten mit Kommentardatensätzen von verschiedenen Plattformen übertreffen unsere Modelle mehrere grundlegende Vergleichsverfahren und aktuelle verwandte Arbeiten.
T2  - Analyse von Leserkommentaren auf Online-Nachrichtenplattformen
KW  - machine learning
KW  - Maschinelles Lernen
KW  - text classification
KW  - Textklassifikation
KW  - social media
KW  - Soziale Medien
KW  - hate speech detection
KW  - Hasserkennung
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-489222
ER  - 
TY  - THES
A1  - Richter, Rico
T1  - Concepts and techniques for processing and rendering of massive 3D point clouds
T1  - Konzepte und Techniken für die Verarbeitung und das Rendering von Massiven 3D-Punktwolken
N2  - Remote sensing technology, such as airborne, mobile, or terrestrial laser scanning, and photogrammetric techniques, are fundamental approaches for efficient, automatic creation of digital representations of spatial environments. For example, they allow us to generate 3D point clouds of landscapes, cities, infrastructure networks, and sites. As essential and universal category of geodata, 3D point clouds are used and processed by a growing number of applications, services, and systems such as in the domains of urban planning, landscape architecture, environmental monitoring, disaster management, virtual geographic environments as well as for spatial analysis and simulation.
While the acquisition processes for 3D point clouds become more and more reliable and widely-used, applications and systems are faced with more and more 3D point cloud data. In addition, 3D point clouds, by their very nature, are raw data, i.e., they do not contain any structural or semantics information. Many processing strategies common to GIS such as deriving polygon-based 3D models generally do not scale for billions of points. GIS typically reduce data density and precision of 3D point clouds to cope with the sheer amount of data, but that results in a significant loss of valuable information at the same time.
This thesis proposes concepts and techniques designed to efficiently store and process massive 3D point clouds. To this end, object-class segmentation approaches are presented to attribute semantics to 3D point clouds, used, for example, to identify building, vegetation, and ground structures and, thus, to enable processing, analyzing, and visualizing 3D point clouds in a more effective and efficient way. Similarly, change detection and updating strategies for 3D point clouds are introduced that allow for reducing storage requirements and incrementally updating 3D point cloud databases. In addition, this thesis presents out-of-core, real-time rendering techniques used to interactively explore 3D point clouds and related analysis results. All techniques have been implemented based on specialized spatial data structures, out-of-core algorithms, and GPU-based processing schemas to cope with massive 3D point clouds having billions of points.  
All proposed techniques have been evaluated and demonstrated their applicability to the field of geospatial applications and systems, in particular for tasks such as classification, processing, and visualization. Case studies for 3D point clouds of entire cities with up to 80 billion points show that the presented approaches open up new ways to manage and apply large-scale, dense, and time-variant 3D point clouds as required by a rapidly growing number of applications and systems.
N2  - Fernerkundungstechnologien wie luftgestütztes, mobiles oder terrestrisches Laserscanning und photogrammetrische Techniken sind grundlegende Ansätze für die effiziente, automatische Erstellung von digitalen Repräsentationen räumlicher Umgebungen. Sie ermöglichen uns zum Beispiel die Erzeugung von 3D-Punktwolken für Landschaften, Städte, Infrastrukturnetze und Standorte. 3D-Punktwolken werden als wesentliche und universelle Kategorie von Geodaten von einer wachsenden Anzahl an Anwendungen, Diensten und Systemen genutzt und verarbeitet, zum Beispiel in den Bereichen Stadtplanung, Landschaftsarchitektur, Umweltüberwachung, Katastrophenmanagement, virtuelle geographische Umgebungen sowie zur räumlichen Analyse und Simulation.
Da die Erfassungsprozesse für 3D-Punktwolken immer zuverlässiger und verbreiteter werden, sehen sich Anwendungen und Systeme mit immer größeren 3D-Punktwolken-Daten konfrontiert. Darüber hinaus enthalten 3D-Punktwolken als Rohdaten von ihrer Art her keine strukturellen oder semantischen Informationen. Viele GIS-übliche Verarbeitungsstrategien, wie die Ableitung polygonaler 3D-Modelle, skalieren in der Regel nicht für Milliarden von Punkten. GIS reduzieren typischerweise die Datendichte und Genauigkeit von 3D-Punktwolken, um mit der immensen Datenmenge umgehen zu können, was aber zugleich zu einem signifikanten Verlust wertvoller Informationen führt.
Diese Arbeit präsentiert Konzepte und Techniken, die entwickelt wurden, um massive 3D-Punktwolken effizient zu speichern und zu verarbeiten. Hierzu werden Ansätze für die Objektklassen-Segmentierung vorgestellt, um 3D-Punktwolken mit Semantik anzureichern; so lassen sich beispielsweise Gebäude-, Vegetations- und Bodenstrukturen identifizieren, wodurch die Verarbeitung, Analyse und Visualisierung von 3D-Punktwolken effektiver und effizienter durchführbar werden. Ebenso werden Änderungserkennungs- und Aktualisierungsstrategien für 3D-Punktwolken vorgestellt, mit denen Speicheranforderungen reduziert und Datenbanken für 3D-Punktwolken inkrementell aktualisiert werden können. Des Weiteren beschreibt diese Arbeit Out-of-Core Echtzeit-Rendering-Techniken zur interaktiven Exploration von 3D-Punktwolken und zugehöriger Analyseergebnisse. Alle Techniken wurden mit Hilfe spezialisierter räumlicher Datenstrukturen, Out-of-Core-Algorithmen und GPU-basierter Verarbeitungs-schemata implementiert, um massiven 3D-Punktwolken mit Milliarden von Punkten gerecht werden zu können.
Alle vorgestellten Techniken wurden evaluiert und die Anwendbarkeit für Anwendungen und Systeme, die mit raumbezogenen Daten arbeiten, wurde insbesondere für Aufgaben wie Klassifizierung, Verarbeitung und Visualisierung demonstriert. Fallstudien für 3D-Punktwolken von ganzen Städten mit bis zu 80 Milliarden Punkten zeigen, dass die vorgestellten Ansätze neue Wege zur Verwaltung und Verwendung von großflächigen, dichten und zeitvarianten 3D-Punktwolken eröffnen, die von einer wachsenden Anzahl an Anwendungen und Systemen benötigt werden.
KW  - 3D point clouds
KW  - 3D-Punktwolken
KW  - real-time rendering
KW  - Echtzeit-Rendering
KW  - 3D visualization
KW  - 3D-Visualisierung
KW  - classification
KW  - Klassifizierung
KW  - change detection
KW  - Veränderungsanalyse
KW  - LiDAR
KW  - LiDAR
KW  - remote sensing
KW  - Fernerkundung
KW  - mobile mapping
KW  - Mobile-Mapping
KW  - Big Data
KW  - Big Data
KW  - GPU
KW  - GPU
KW  - laserscanning
KW  - Laserscanning
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-423304
ER  - 
TY  - JOUR
A1  - Richly, Keven
A1  - Schlosser, Rainer
A1  - Boissier, Martin
T1  - Budget-conscious fine-grained configuration optimization for spatio-temporal applications
JF  - Proceedings of the VLDB Endowment
N2  - Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
KW  - General Earth and Planetary Sciences
KW  - Water Science and Technology
KW  - Geography, Planning and Development
Y1  - 2022
U6  - https://doi.org/10.14778/3565838.3565858
SN  - 2150-8097
VL  - 15
IS  - 13
SP  - 4079
EP  - 4092
PB  - Association for Computing Machinery (ACM)
CY  - [New York]
ER  - 
TY  - THES
A1  - Richly, Keven
T1  - Memory-efficient data management for spatio-temporal applications
BT  - workload-driven fine-grained configuration optimization for storing spatio-temporal data in columnar In-memory databases
N2  - The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. 

To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. 

In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows:
(1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. 
(2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance.   
(3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays.
(4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. 

Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently.
N2  - Durch die starke Verbreitung von Systemen zur Positionsbestimmung werden fortlaufend große Mengen an Bewegungsdaten mit einem räumlichen und zeitlichen Bezug gesammelt. Ortungssysteme wie GPS ermöglichen, die Bewegungen verschiedener Objekte (z. B. Personen oder Fahrzeuge) nachzuverfolgen. Diese werden in der Regel durch eine chronologisch geordnete Abfolge beobachteter Aufenthaltsorte repräsentiert. Die Analyse von Bewegungsmustern auf der Grundlage detaillierter Positionsinformationen schafft in unterschiedlichsten Branchen (z. B. Transportwesen, Verkehrssteuerung oder Medizin) die Möglichkeit Geschäftsentscheidungen und -prozesse zu verbessern. Aufgrund der großen Datenmengen, die bei diesen Anwendungen auftreten, stellt die kosteneffiziente Speicherung von Bewegungsdaten eine Herausforderung dar. Dies ist insbesondere der Fall, wenn Hauptspeicherdatenbanken zur Speicherung eingesetzt werden, um die Anforderungen bezüglich interaktiver Antwortzeiten zu erfüllen.

Um die verfügbaren Speicherkapazitäten effizient zu nutzen, unterstützen moderne Datenbanksysteme verschiedene Optimierungsmöglichkeiten, um den Speicherbedarf zu reduzieren (z. B. durch Datenkomprimierung) oder die Performance zu erhöhen (z. B. durch Indexstrukturen). Dabei ermöglicht eine horizontale Partitionierung der Daten, dass unabhängig voneinander verschiedene Optimierungen feingranular auf einzelnen Bereichen der Daten angewendet werden können. Die Auswahl von Konfigurationen, die sowohl die Kosten als auch Leistungsanforderungen berücksichtigen, ist jedoch aufgrund der großen Anzahl möglicher Kombinationen -- die aus voneinander abhängigen Einzelentscheidungen bestehen -- komplex.

In dieser Dissertation präsentieren wir mehrere Ansätze zur Verbesserung der Datenverwaltung, indem wir die Auswahl verschiedener Datenbankoptimierungen automatisch für die anwendungsspezifischen Zugriffsmuster und Dateneigenschaften anpassen. Diesbezüglich leistet die vorliegende Dissertation die folgenden Beiträge: (1) Wir stellen einen neuen Ansatz vor, um feingranulare Tabellenkonfigurationen für räumlich-zeitliche Workloads zu bestimmen. In diesem Zusammenhang optimiert unser Linear Programming (LP) Ansatz gemeinsam (i) die Datenkompression, (ii) die Sortierung, (iii) die Indizierung und (iv) die Datenplatzierung. Hierzu schlagen wir verschiedene Modelle mit unterschiedlichen Kostenabhängigkeiten vor, um optimierte Konfigurationen für einen gegebenen Workload, ein Speicherbudget und die vorliegenden Dateneigenschaften zu berechnen. Durch die Erweiterung des LP-basierten Ansatzes zur Berücksichtigung von Modifikationskosten und verschiedener potentieller Workloads ist es möglich, die Wartbarkeit und Robustheit der bestimmten Tabellenkonfiguration zu erhöhen.
(2) Um die Speicherung von Timestamps in spalten-orientierten Datenbanken zu optimieren, stellen wir einen heuristischen Ansatz für die kombinierte Auswahl eines Speicherlayouts und eines Kompressionsschemas vor. Zudem sind wir durch die Berücksichtigung von Strategien zur Aufteilung von Attributen in der Lage, anwendungsspezifische Optimierungen anzuwenden, die den Speicherbedarf reduzieren und die Performance verbessern.
(3) Wir stellen einen Ansatz vor, der in der Vergangenheit beobachtete Bewegungsmuster nutzt, um die Zuweisungsprozesse von Vermittlungsdiensten zur Personenbeförderung zu verbessern. Auf der Grundlage von Standortwahrscheinlichkeiten haben wir verschiedene Strategien für die Vergabe von Fahraufträgen an Fahrer entwickelt, die kritische Verspätungen reduzieren.
(4) Abschließend haben wir unsere Datenbankoptimierungen anhand eines realen Datensatzes eines Transportdienstleisters evaluiert. In diesem Zusammenhang zeigen wir, dass wir durch feingranulare workload-basierte Optimierungen den Speicherbedarf (um bis zu 71% bei vergleichbarer Performance) reduzieren oder die Performance (um bis zu 90% bei gleichem Speicherverbrauch) im Vergleich zu regelbasierten Heuristiken verbessern können.

Die einzelnen Beiträge stellen neuartige Ansätze für aktuelle Herausforderungen im Bereich des Data Mining und der Datenbankforschung dar. In Kombination ermöglichen sie eine kosteneffizientere Speicherung und Verarbeitung von Bewegungsdaten in Hauptspeicherdatenbanken.
KW  - spatio-temporal data management
KW  - trajectory data
KW  - columnar databases
KW  - in-memory data management
KW  - database tuning
KW  - spaltenorientierte Datenbanken
KW  - Datenbankoptimierung
KW  - Hauptspeicher Datenmanagement
KW  - Datenverwaltung für Daten mit räumlich-zeitlichem Bezug
KW  - Trajektoriendaten
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635473
ER  - 
TY  - BOOK
A1  - Reschke, Jakob
A1  - Taeumel, Marcel
A1  - Pape, Tobias
A1  - Niephaus, Fabio
A1  - Hirschfeld, Robert
T1  - Towards version control in object-based systems
T1  - Ein Vorschlag zur Versionsverwaltung in objektbasierten Systemen
N2  - Version control is a widely used practice among software developers. It reduces the risk of changing their software and allows them to manage different configurations and to collaborate with others more efficiently. This is amplified by code sharing platforms such as GitHub or Bitbucket. Most version control systems track files (e.g., Git, Mercurial, and Subversion do), but some programming environments do not operate on files, but on objects instead (many Smalltalk implementations do). Users of such environments want to use version control for their objects anyway. Specialized version control systems, such as the ones available for Smalltalk systems (e.g., ENVY/Developer and Monticello), focus on a small subset of objects that can be versioned. Most of these systems concentrate on the tracking of methods, classes, and configurations of these. Other user-defined and user-built objects are either not eligible for version control at all, tracking them involves complicated workarounds, or a fixed, domain-unspecific serialization format is used that does not equally suit all kinds of objects. Moreover, these version control systems that are specific to a programming environment require their own code sharing platforms; popular, well-established platforms for file-based version control systems cannot be used or adapter solutions need to be implemented and maintained.

To improve the situation for version control of arbitrary objects, a framework for tracking, converting, and storing of objects is presented in this report. It allows editions of objects to be stored in an exchangeable, existing backend version control system. The platforms of the backend version control system can thus be reused. Users and objects have control over how objects are captured for the purpose of version control. Domain-specific requirements can be implemented. The storage format (i.e. the file format, when file-based backend version control systems are used) can also vary from one object to another. Different editions of objects can be compared and sets of changes can be applied to graphs of objects. A generic way for capturing and restoring that supports most kinds of objects is described. It models each object as a collection of slots. Thus, users can begin to track their objects without first having to implement version control supplements for their own kinds of objects. The proposed architecture is evaluated using a prototype implementation that can be used to track objects in Squeak/Smalltalk with Git. The prototype improves the suboptimal standing of user objects with respect to version control described above and also simplifies some version control tasks for classes and methods as well. It also raises new problems, which are discussed in this report as well.
N2  - Versionsverwaltung ist unter Softwareentwicklern weit verbreitet. Sie verringert das Risiko beim Ändern der Software und erlaubt den Entwicklern verschiedene Konfigurationen zu verwalten und effizienter zusammenzuarbeiten. Dies wird durch Plattformen zum Teilen von Code wie GitHub oder Bitbucket zusätzlich unterstützt. Die meisten Versionsverwaltungssysteme verfolgen Dateien (z.B. Git, Mercurial und Subversion), aber manche Programmierumgebungen arbeiten nicht mit Dateien, sondern mit Objekten (viele Smalltalk-Implementierungen tun dies). Nutzer dieser Umgebungen möchten Versionsverwaltung für ihre Objekte dennoch einsetzen können. Spezialisierte Versionsverwaltungssysteme, wie die für Smalltalk verfügbaren (z.B. ENVY/Developer und Monticello), konzentrieren sich auf Methoden, Klassen und Konfigurationen selbiger. Andere von Benutzern definierte und konstruierte Objekte können damit oftmals gar nicht oder nur über komplizierte Umwege erfasst werden oder es wird ein fest vorgegebenes Format zur Serialisierung verwendet, das nicht für alle Arten von Objekten gleichermaßen geeignet ist. Desweiteren können beliebte, bereits existierende Plattformen für dateibasierte Versionsverwaltung von diesen Systemen nicht verwendet werden oder Adapterlösungen müssen implementiert und gepflegt werden.

Um die Situation von Versionsverwaltung für beliebige Objekte zu verbessern, stellt diese Arbeit ein Framework zum Nachverfolgen, Konvertieren und Speichern von Objekten vor. Es erlaubt Editionen von Objekten in einem austauschbaren, bestehenden Backend-Versionsverwaltungssystem zu speichern. Plattformen für dieses System können daher weiterbenutzt werden. Nutzer und Objekte können beeinflussen, wie Objekte zur Versionsverwaltung erfasst werden. Domänenspezifische Anforderungen lassen sich umsetzen. Das Speicherformat (d.h. das Dateiformat, wenn ein dateibasiertes Backend benutzt wird) kann auch von Objekt zu Objekt anders sein. Verschiedene Editionen von Objekten können verglichen und Änderungen auf Objektgraphen übertragen werden. Ein allgemeiner Ansatz zum Erfassen und Wiederherstellen von Objekten wird beschrieben, welcher jedes Objekt als eine Ansammlung von Slots betrachtet. Dadurch können Nutzer sofort anfangen ihre Objekte zu versionieren, ohne dass sie ihre Objekte zunächst zur Versionsverwaltung erweitern müssen. Die vorgeschlagene Architektur wird anhand einer Prototyp-Implementierung evaluiert, die es erlaubt Objekte in Squeak/Smalltalk mit Git zu versionieren. Der Prototyp verbessert den oben beschriebenen benachteiligten Status von Benutzerobjekten im Bezug auf Versionsverwaltung und erleichtert auch manche Versionsverwaltungs-Operationen für Klassen und Methoden. Er fördert auch neue Probleme zutage, die ebenfalls in dieser Arbeit diskutiert werden. Insofern ist diese Arbeit als ein erster Schritt in Richtung vollumfänglicher Versionsverwaltung für beliebige Objekte zu betrachten.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 121 
KW  - version control
KW  - object-oriented programming
KW  - exploratory programming
KW  - serialization
KW  - Versionsverwaltung
KW  - objektorientiertes Programmieren
KW  - exploratives Programmieren
KW  - Serialisierung
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-410812
SN  - 978-3-86956-430-2
SN  - 1613-5652
SN  - 2191-1665
VL  - 121
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Rana, Kaushik
A1  - Mohapatra, Durga Prasad
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Sköld, Lars
A1  - Lopes Grim, Luís Fernando
A1  - Sampaio Gradvohl, André Leon
A1  - Cremerius, Jonas
A1  - Siegert, Simon
A1  - Weltzien, Anton von
A1  - Baldi, Annika
A1  - Klessascheck, Finn
A1  - Kalancha, Svitlana
A1  - Lichtenstein, Tom
A1  - Shaabani, Nuhad
A1  - Meinel, Christoph
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Schumann, David
A1  - Wiese, Ingmar
A1  - Sarna, Nicole
A1  - Wiese, Lena
A1  - Tashkandi, Araek Sami
A1  - van der Walt, Estée
A1  - Eloff, Jan H. P.
A1  - Schmidt, Christopher
A1  - Hügle, Johannes
A1  - Horschig, Siegfried
A1  - Uflacker, Matthias
A1  - Najafi, Pejman
A1  - Sapegin, Andrey
A1  - Cheng, Feng
A1  - Stojanovic, Dragan
A1  - Stojnev Ilić, Aleksandra
A1  - Djordjevic, Igor
A1  - Stojanovic, Natalija
A1  - Predic, Bratislav
A1  - González-Jiménez, Mario
A1  - de Lara, Juan
A1  - Mischkewitz, Sven
A1  - Kainz, Bernhard
A1  - van Hoorn, André
A1  - Ferme, Vincenzo
A1  - Schulz, Henning
A1  - Knigge, Marlene
A1  - Hecht, Sonja
A1  - Prifti, Loina
A1  - Krcmar, Helmut
A1  - Fabian, Benjamin
A1  - Ermakova, Tatiana
A1  - Kelkel, Stefan
A1  - Baumann, Annika
A1  - Morgenstern, Laura
A1  - Plauth, Max
A1  - Eberhard, Felix
A1  - Wolff, Felix
A1  - Polze, Andreas
A1  - Cech, Tim
A1  - Danz, Noel
A1  - Noack, Nele Sina
A1  - Pirl, Lukas
A1  - Beilharz, Jossekin Jakob
A1  - De Oliveira, Roberto C. L.
A1  - Soares, Fábio Mendes
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Mühle, Alexander
A1  - Grüner, Andreas
A1  - Saxena, Vageesh
A1  - Gayvoronskaya, Tatiana
A1  - Weyand, Christopher
A1  - Krause, Mirko
A1  - Frank, Markus
A1  - Bischoff, Sebastian
A1  - Behrens, Freya
A1  - Rückin, Julius
A1  - Ziegler, Adrian
A1  - Vogel, Thomas
A1  - Tran, Chinh
A1  - Moser, Irene
A1  - Grunske, Lars
A1  - Szárnyas, Gábor
A1  - Marton, József
A1  - Maginecz, János
A1  - Varró, Dániel
A1  - Antal, János Benjamin
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2018
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2018. Selected projects have presented their results on April 17th and November 14th 2017 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftler:innen eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen Systeme, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler:innen in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2018 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 17. April und 14. November 2018 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 151 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-563712
SN  - 978-3-86956-547-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 151
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Puri, Manish
A1  - Varde, Aparna S.
A1  - Melo, Gerard de
T1  - Commonsense based text mining on urban policy
JF  - Language resources and evaluation
N2  - Local laws on urban policy, i.e., ordinances directly affect our daily life in various ways (health, business etc.), yet in practice, for many citizens they remain impervious and complex. This article focuses on an approach to make urban policy more accessible and comprehensible to the general public and to government officials, while also addressing pertinent social media postings. Due to the intricacies of the natural language, ranging from complex legalese in ordinances to informal lingo in tweets, it is practical to harness human judgment here. To this end, we mine ordinances and tweets via reasoning based on commonsense knowledge so as to better account for pragmatics and semantics in the text. Ours is pioneering work in ordinance mining, and thus there is no prior labeled training data available for learning. This gap is filled by commonsense knowledge, a prudent choice in situations involving a lack of adequate training data. The ordinance mining can be beneficial to the public in fathoming policies and to officials in assessing policy effectiveness based on public reactions. This work contributes to smart governance, leveraging transparency in governing processes via public involvement. We focus significantly on ordinances contributing to smart cities, hence an important goal is to assess how well an urban region heads towards a smart city as per its policies mapping with smart city characteristics, and the corresponding public satisfaction.
KW  - Commonsense reasoning
KW  - Opinion mining
KW  - Ordinances
KW  - Smart cities
KW  - Social
KW  - media
KW  - Text mining
Y1  - 2022
U6  - https://doi.org/10.1007/s10579-022-09584-6
SN  - 1574-020X
SN  - 1574-0218
VL  - 57
SP  - 733
EP  - 763
PB  - Springer
CY  - Dordrecht [u.a.]
ER  - 
TY  - JOUR
A1  - Ponce, Eva
A1  - Srinath, Sindhu
A1  - Allegue, Laura
T1  - Integrating Community Teaching in MOOCs
JF  - EMOOCs 2021
N2  - The MITx MicroMasters Program in Supply Chain Management (SCM) is a Massive Open Online Course (MOOC) based program that aims to impart quantitative and qualitative knowledge to SCM enthusiasts all around the world. The program that started in 2014 with just one course, now offers 5 courses and one final proctored exam, which allows a learner to gain a MicroMasters credential upon completion. While the courses are delivered in the form of pre-recorded videos by the faculty members of Massachusetts Institute of Technology (MIT), the questions and comments posted by learners in discussion forums are addressed by a group of Community Teaching Assistants (CTAs) who volunteer for this role. The MITx staff carefully selects CTAs for each run of the individual courses as they take on a co-facilitator’s role in the program. This paper highlights the importance of community teaching, discusses the profile of CTAs involved with the program, their recruitment, training, tasks and responsibilities, engagement, and rewarding process. In the end we also share a few recommendations based on the lessons learned in community teaching during the last five years of running more than 45 MOOC courses, that could help other MOOC teams deliver a high-touch experience.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517123
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 95
EP  - 109
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Poce, Antonella
A1  - Re, Maria Rosaria
A1  - Valente, Mara
T1  - Evaluating OERs in Museum Education Context
BT  - A Collaborative Online Experience
JF  - EMOOCs 2021
N2  - This paper aims to present the results of a higher education experience promoted by the research centres INTELLECT (University of Modena and Reggio Emilia) and CDM (University of Roma Tre), as part of difference master’s degrees programme of the academic years 2018/2019, 2019/2020, and 2020/2021. Through different online activities, 37 students attended and evaluated a MOOC on museum education content, such promoting their professionals and transverse skills, such as critical thinking, and developing their knowledge relative to OERs, within culture and heritage education contexts. Moreover, results from the online evaluation activities support the implementation of the MOOC in a collaborative way: during the academic years, evaluation data have been used by researcher to make changes to the course modules, thus realizing a more effective online path from and educational point of view.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517178
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 159
EP  - 168
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Comprior
BT  - Facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets
JF  - BMC Bioinformatics
N2  - Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.

Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.

Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
KW  - Feature selection
KW  - Prior knowledge
KW  - Gene expression
KW  - Reproducible benchmarking
Y1  - 2021
U6  - https://doi.org/10.1186/s12859-021-04308-z
SN  - 1471-2105
VL  - 22
SP  - 1
EP  - 15
PB  - Springer Nature
CY  - London
ER  - 
TY  - THES
A1  - Perlich, Anja
T1  - Digital collaborative documentation in mental healthcare
T1  - Digitale Mittel zur kooperativen Dokumentation im Bereich der psychischen Gesundheit
N2  - With the growth of information technology, patient attitudes are shifting – away from passively receiving care towards actively taking responsibility for their well- being. Handling doctor-patient relationships collaboratively and providing patients access to their health information are crucial steps in empowering patients. In mental healthcare, the implicit consensus amongst practitioners has been that sharing medical records with patients may have an unpredictable, harmful impact on clinical practice. In order to involve patients more actively in mental healthcare processes, Tele-Board MED (TBM) allows for digital collaborative documentation in therapist-patient sessions. The TBM software system offers a whiteboard-inspired graphical user interface that allows therapist and patient to jointly take notes during the treatment session. Furthermore, it provides features to automatically reuse the digital treatment session notes for the creation of treatment session summaries and clinical case reports. This thesis presents the development of the TBM system and evaluates its effects on 1) the fulfillment of the therapist’s duties of clinical case documentation, 2) patient engagement in care processes, and 3) the therapist-patient relationship. Following the design research methodology, TBM was developed and tested in multiple evaluation studies in the domains of cognitive behavioral psychotherapy and addiction care. The results show that therapists are likely to use TBM with patients if they have a technology-friendly attitude and when its use suits the treatment context. Support in carrying out documentation duties as well as fulfilling legal requirements contributes to therapist acceptance. Furthermore, therapists value TBM as a tool to provide a discussion framework and quick access to worksheets during treatment sessions. Therapists express skepticism, however, regarding technology use in patient sessions and towards complete record transparency in general. Patients expect TBM to improve the communication with their therapist and to offer a better recall of discussed topics when taking a copy of their notes home after the session. Patients are doubtful regarding a possible distraction of the therapist and usage in situations when relationship-building is crucial. When applied in a clinical environment, collaborative note-taking with TBM encourages patient engagement and a team feeling between therapist and patient. Furthermore, it increases the patient’s acceptance of their diagnosis, which in turn is an important predictor for therapy success. In summary, TBM has a high potential to deliver more than documentation support and record transparency for patients, but also to contribute to a collaborative doctor-patient relationship. This thesis provides design implications for the development of digital collaborative documentation systems in (mental) healthcare as well as recommendations for a successful implementation in clinical practice.
N2  - Die Verbreitung von Informationstechnologie kann die Rolle von Patienten verändern: weg vom passiven Erhalt ärztlicher Zuwendung hin zur eigenverantwortlichen Mitwirkung an ihrer Genesung. Wesentliche Schritte zur Ermündigung von Patienten sind eine gute Zusammenarbeit mit dem behandelnden Arzt und der Zugang zu den eigenen Akten. Unter Psychotherapeuten gibt es jedoch einen impliziten Konsens darüber, dass die Einsicht in psychiatrische Akten unvorhersehbare, nachteilige Effekte auf die klinische Praxis hervorrufen könnte. Um auch Patienten aktiver an der Erhaltung und Wiederherstellung ihrer mentalen Gesundheit zu beteiligen, ermöglicht Tele-Board MED (TBM) das gemeinschaftliche Erstellen von digitalen Notizen. Diese Dissertation beschreibt die Entwicklung des TBM Software-Systems, das es Therapeut und Patient ermöglicht, gemeinsam während der Sitzung wie auf einem Whiteboard Notizen zu machen. Außerdem bietet TBM Funktionen, um auf Grundlage der digitalen Gesprächsnotizen automatisch Sitzungsprotokolle und klinische Fallberichte zu erstellen. Methodologisch basiert die Entwicklung und Evaluierung von TBM auf dem Paradigma für Design Research. Es wurden vielfältige Studien in den Bereichen der Verhaltens- und Suchttherapie durchgeführt, um die Auswirkungen auf folgende Aspekte zu evaluieren: 1) die Erfüllung der Dokumentationspflichten von Therapeuten, 2) das Engagement von Patienten in Behandlungsprozessen und 3) die Beziehung zwischen Patient und Therapeut. Die Studien haben gezeigt, dass Therapeuten dazu geneigt sind, TBM mit ihren Patienten zu nutzen, wenn sie technologie-freundlich eingestellt sind und wenn es zum Behandlungskontext passt. Zur Akzeptanz tragen auch die schnelle Erstellung von klinischen Dokumenten sowie die Erfüllung der gesetzlichen Forderung nach Aktentransparenz bei. Weiterhin schätzen Therapeuten TBM als Werkzeug, um Therapiegespräche zu strukturieren und während der Sitzung schnell auf Arbeitsblätter zuzugreifen. Therapeuten äußerten hingegen auch Skepsis gegenüber der Technologienutzung im Patientengespräch und vollständiger Aktentransparenz. Patienten erhoffen sich von TBM eine verbesserte Kommunikation mit ihrem Therapeuten und denken, dass sie sich besser an die Gesprächsinhalte erinnern können, wenn sie eine Kopie ihrer Akte erhalten. Patienten brachten Bedenken zum Ausdruck, TBM in Situationen zu nutzen, in denen der Beziehungsaufbau im Vordergrund steht, und darüber, dass Therapeuten sich abgelenkt fühlen könnten. Als TBM im klinischen Umfeld eingesetzt wurde, wurde ein erhöhtes Patientenengagement und ein gesteigertes Teamgefühl beobachtet. Außerdem stieg bei Patienten die Akzeptanz ihrer Diagnosen, welche wiederum ein wichtiger Prädiktor für Therapieerfolg ist. Zusammenfassend lässt sich festhalten, dass TBM großes Potential hat: Über die damit mögliche Dokumentationsunterstützung und Aktentransparenz hinaus wird auch die Zusammenarbeit von Therapeut und Patient unterstützt. Diese Dissertation fasst Kriterien zur Entwicklung von gemeinschaftlichen Dokumentationssystemen in der (psychischen) Gesundheitsfürsorge sowie Empfehlungen für eine erfolgreiche Implementierung in der klinischen Praxis zusammen.
KW  - medical documentation
KW  - psychotherapy
KW  - addiction care
KW  - computer-mediated therapy
KW  - digital whiteboard
KW  - patient empowerment
KW  - doctor-patient relationship
KW  - design research
KW  - user experience
KW  - evaluation
KW  - medizinische Dokumentation
KW  - Psychotherapie
KW  - Suchtberatung und -therapie
KW  - computervermittelte Therapie
KW  - digitales Whiteboard
KW  - Patientenermündigung
KW  - Arzt-Patient-Beziehung
KW  - Design-Forschung
KW  - User Experience
KW  - Evaluation
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-440292
ER  - 
TY  - JOUR
A1  - Perach, Shai
A1  - Alexandron, Giora
T1  - A MOOC-Based Computer Science Program for Middle School
BT  - Results, Challenges, and the Covid-19 Effect
JF  - EMOOCs 2021
N2  - In an attempt to pave the way for more extensive Computer Science Education (CSE) coverage in K-12, this research developed and made a preliminary evaluation of a blended-learning Introduction to CS program based on an academic MOOC. Using an academic MOOC that is pedagogically effective and engaging, such a program may provide teachers with disciplinary scaffolds and allow them to focus their attention on enhancing students’ learning experience and nurturing critical 21st-century skills such as self-regulated learning. As we demonstrate, this enabled us to introduce an academic level course to middle-school students. In this research, we developed the principals and initial version of such a program, targeting ninth-graders in science-track classes who learn CS as part of their standard curriculum. We found that the middle-schoolers who participated in the program achieved academic results on par with undergraduate students taking this MOOC for academic credit. Participating students also developed a more accurate perception of the essence of CS as a scientific discipline. The unplanned school closure due to the COVID19 pandemic outbreak challenged the research but underlined the advantages of such a MOOCbased blended learning program above classic pedagogy in times of global or local crises that lead to school closure. While most of the science track classes seem to stop learning CS almost entirely, and the end-of-year MoE exam was discarded, the program’s classes smoothly moved to remote learning mode, and students continued to study at a pace similar to that experienced before the school shut down.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517133
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 111
EP  - 127
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Pape, Tobias
T1  - Efficient compound values in virtual machines
N2  - Compound values are not universally supported in virtual machine (VM)-based programming systems and languages. However, providing data structures with value characteristics can be beneficial. On one hand, programming systems and languages can adequately represent physical quantities with compound values and avoid inconsistencies, for example, in representation of large numbers. On the other hand, just-in-time (JIT) compilers, which are often found in VMs, can rely on the fact that compound values are immutable, which is an important property in optimizing programs. Considering this, compound values have an optimization potential that can be put to use by implementing them in VMs in a way that is efficient in memory usage and execution time. Yet, optimized compound values in VMs face certain challenges: to maintain consistency, it should not be observable by the program whether compound values are represented in an optimized way by a VM; an optimization should take into account, that the usage of compound values can exhibit certain patterns at run-time; and that necessary value-incompatible properties due to implementation restrictions should be reduced.

We propose a technique to detect and compress common patterns of compound value usage at run-time to improve memory usage and execution speed. Our approach identifies patterns of frequent compound value references and introduces abbreviated forms for them. Thus, it is possible to store multiple inter-referenced compound values in an inlined memory representation, reducing the overhead of metadata and object references. We extend our approach by a notion of limited mutability, using cells that act as barriers for our approach and provide a location for shared, mutable access with the possibility of type specialization. We devise an extension to our approach that allows us to express automatic unboxing of boxed primitive data types in terms of our initial technique. We show that our approach is versatile enough to express another optimization technique that relies on values, such as Booleans, that are unique throughout a programming system. Furthermore, we demonstrate how to re-use learned usage patterns and optimizations across program runs, thus reducing the performance impact of pattern recognition.

We show in a best-case prototype that the implementation of our approach is feasible and can also be applied to general purpose programming systems, namely implementations of the Racket language and Squeak/Smalltalk. In several micro-benchmarks, we found that our approach can effectively reduce memory consumption and improve execution speed.
N2  - Zusammengesetzte Werte werden in VM-basierten Programmiersystemen und -sprachen nicht durchgängig unterstützt. Die Bereitstellung von Datenstrukturen mit Wertemerkmalen kann jedoch von Vorteil sein. Einerseits können Programmiersysteme und Sprachen physikalische Größen mit zusammengesetzten Werten, wie beispielsweise bei der Darstellung großer Zahlen, adäquat darstellen und Inkonsistenzen vermeiden. Andererseits können sich Just-in-time-Compiler, die oft in VMs zu finden sind, darauf verlassen, dass zusammengesetzte Werte unveränderlich sind, was eine wichtige Eigenschaft bei der Programmoptimierung ist. In Anbetracht dessen haben zusammengesetzte Werte ein Optimierungspotenzial, das genutzt werden kann, indem sie in VMs so implementiert werden, dass sie effizient in Speichernutzung und Ausführungszeit sind. Darüber hinaus stehen optimierte zusammengesetzte Werte in VMs vor bestimmten Herausforderungen: Um die Konsistenz zu erhalten, sollte das Programm nicht beobachten können, ob zusammengesetzte Werte durch eine VM in einer optimierten Weise dargestellt werden; eine Optimierung sollte berücksichtigen, dass die Verwendung von zusammengesetzten Werten bestimmte Muster zur Laufzeit aufweisen kann; und dass wertinkompatible Eigenschaften vermindert werden sollten, die nur aufgrund von Implementierungsbeschränkungen notwendig sind.

Wir schlagen eine Verfahrensweise vor, um gängige Muster der Verwendung von zusammengesetzten Werten zur Laufzeit zu erkennen und zu komprimieren, um die Speichernutzung und Ausführungsgeschwindigkeit zu verbessern. Unser Ansatz identifiziert Muster häufiger zusammengesetzter Wertreferenzen und führt für sie abgekürzte Formen ein. Dies ermöglicht es, mehrere miteinander verknüpfte zusammengesetzte Werte in einer eingebetteten Art und Weise im Speicher darzustellen, wodurch der Verwaltungsaufwand, der sich aus Metadaten und Objektreferenzen ergibt, reduziert wird. Wir erweitern unseren Ansatz um ein Konzept der eingeschränkten Veränderbarkeit, indem wir Zellen verwenden, die als Barrieren für unseren Ansatz dienen und einen Platz für einen gemeinsamen, schreibenden Zugriff mit der Möglichkeit der Typspezialisierung bieten. Wir
entwickeln eine Erweiterung unseres Ansatzes, die es uns ermöglicht, mithilfe unserer ursprünglichen Technik das automatische Entpacken von primitiven geboxten Datentypen auszudrücken. Wir zeigen, dass unser Ansatz vielseitig genug ist, um auch eine andere Optimierungstechnik auszudrücken, die sich auf einzigartige Werte in einem Programmiersystem, wie beispielsweise Booleans, stützt. Darüber hinaus zeigen wir, wie erlernte Nutzungsmuster und Optimierungen über Programmausführungen hinweg wiederverwendet werden können, wodurch die Auswirkungen der Mustererkennung auf die Leistung reduziert werden.

Wir zeigen in einem Best-Case-Prototyp, dass unser Ansatzes umsetzbar ist und auch auf allgemeinere Programmiersysteme wie Racket und Squeak/Smalltalk angewendet werden kann. In mehreren Mikro-Benchmarks haben wir festgestellt, dass unser Ansatz den Speicherverbrauch effektiv reduzieren und die Ausführungsgeschwindigkeit verbessern kann.
KW  - Compound Values
KW  - Objects
KW  - Data Structure Optimization
KW  - Virtual Machines
KW  - Smalltalk
KW  - Verbundwerte
KW  - Objekte
KW  - Datenstrukturoptimierung
KW  - Virtuelle Maschinen
KW  - Smalltalk
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-499134
ER  - 
TY  - BOOK
A1  - Niephaus, Fabio
A1  - Felgentreff, Tim
A1  - Hirschfeld, Robert
T1  - Squimera
BT  - a live, Smalltalk-based IDE for dynamic programming languages
N2  - Programmierwerkzeuge, die verschiedene Programmiersprachen unterstützen und sich konsistent bedienen lassen, sind hilfreich für Softwareentwickler, weil diese sich nicht erst mit neuen Werkzeugen vertraut machen müssen, wenn sie in einer neuen Sprache entwickeln wollen. Außerdem ist es nützlich, verschiedene Programmiersprachen in einer Anwendung kombinieren zu können, da Entwickler dann Softwareframeworks und -bibliotheken nicht in der jeweiligen Sprache nachbauen müssen und stattdessen bestehende Software wiederverwenden können.

Dennoch haben Entwickler eine sehr große Auswahl, wenn sie nach Werkzeugen suchen, die teilweise zudem speziell nur für eine Sprache ausgelegt sind. Einige integrierte Entwicklungsumgebungen unterstützen verschiedene Programmiersprachen, können aber häufig keine konsistente Bedienung ihrer Werkzeuge gewährleisten, da die jeweiligen Ausführungsumgebungen der Sprachen zu verschieden sind. Darüber hinaus gibt es bereits Mechansimen, die es erlauben, Programme aus anderen Sprachen in einem Programm wiederzuverwenden. Dazu werden häufig das Betriebssystem oder eine Netzwerkverbindung verwendet. Programmierwerkzeuge unterstützen jedoch häufig eine solche Indirektion nicht und sind deshalb nur eingeschränkt nutzbar bei beispielsweise Debugging Szenarien.

In dieser Arbeit stellen wir einen neuartigen Ansatz vor, der das Programmiererlebnis in Bezug auf das Arbeiten mit mehreren dynamischen Programmiersprachen verbessern soll. Dazu verwenden wir die Werkzeuge einer Smalltalk Programmierumgebung wieder und entwickeln eine virtuelle Ausführungsumgebung, die verschiedene Sprachen gleichermaßen unterstützt.

Der auf unserem Ansatz basierende Prototyp Squimera demonstriert, dass es möglich ist, Programmierwerkzeuge in der Art wiederzuverwenden, sodass sie sich für verschiedene Programmiersprachen gleich verhalten und somit die Arbeit für Entwickler vereinfachen. Außerdem ermöglicht Squimera einfaches Wiederverwenden und darüber hinaus das Verschmischen von in unterschiedlichen Sprachen geschriebenen Softwarebibliotheken und -frameworks und erlaubt dabei zusätzlich Debugging über mehrere Sprachen hinweg.
N2  - Software development tools that work and behave consistently across different programming languages are helpful for developers, because they do not have to familiarize themselves with new tooling whenever they decide to use a new language. Also, being able to combine multiple programming languages in a program increases reusability, as developers do not have to recreate software frameworks and libraries in the language they develop in and can reuse existing software instead.

However, developers often have a broad choice with regard to tools, some of which are designed for only one specific programming language. Various Integrated Development Environments have support for multiple languages, but are usually unable to provide a consistent programming experience due to different features of language runtimes. Furthermore, common mechanisms that allow reuse of software written in other languages usually use the operating system or a network connection as the abstract layer. Tools, however, often cannot support such indirections well and are therefore less useful in debugging scenarios for example.

In this report, we present a novel approach that aims to improve the programming experience with regard to working with multiple high-level programming languages. As part of this approach, we reuse the tools of a Smalltalk programming environment for other languages and build a multi-language virtual execution environment which is able to provide the same runtime capabilities for all languages.

The prototype system Squimera is an implementation of our approach and demonstrates that it is possible to reuse development tools, so that they behave in the same way across all supported programming languages. In addition, it provides convenient means to reuse and even mix software libraries and frameworks written in different languages without breaking the debugging experience.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 120 
KW  - Programmiererlebnis
KW  - integrierte Entwicklungsumgebungen
KW  - mehrsprachige Ausführungsumgebungen
KW  - Interpreter
KW  - Debugging
KW  - Smalltalk
KW  - Python
KW  - Ruby
KW  - programming experience
KW  - integrated development environments
KW  - polyglot execution environments
KW  - interpreters
KW  - debugging
KW  - small talk
KW  - Python
KW  - Ruby
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-403387
SN  - 978-3-86956-422-7
IS  - 120
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Niephaus, Fabio
T1  - Exploratory tool-building platforms for polyglot virtual machines
N2  - Polyglot programming allows developers to use multiple programming languages within the same software project. While it is common to use more than one language in certain programming domains, developers also apply polyglot programming for other purposes such as to re-use software written in other languages. Although established approaches to polyglot programming come with significant limitations, for example, in terms of performance and tool support, developers still use them to be able to combine languages.
Polyglot virtual machines (VMs) such as GraalVM provide a new level of polyglot programming, allowing languages to directly interact with each other. This reduces the amount of glue code needed to combine languages, results in better performance, and enables tools such as debuggers to work across languages. However, only a little research has focused on novel tools that are designed to support developers in building software with polyglot VMs. One reason is that tool-building is often an expensive activity, another one is that polyglot VMs are still a moving target as their use cases and requirements are not yet well understood.
In this thesis, we present an approach that builds on existing self-sustaining programming systems such as Squeak/Smalltalk to enable exploratory programming, a practice for exploring and gathering software requirements, and re-use their extensive tool-building capabilities in the context of polyglot VMs. Based on TruffleSqueak, our implementation for the GraalVM, we further present five case studies that demonstrate how our approach helps tool developers to design and build tools for polyglot programming. We further show that TruffleSqueak can also be used by application developers to build and evolve polyglot applications at run-time and by language and runtime developers to understand the dynamic behavior of GraalVM languages and internals. Since our platform allows all these developers to apply polyglot programming, it can further help to better understand the advantages, use cases, requirements, and challenges of polyglot VMs. Moreover, we demonstrate that our approach can also be applied to other polyglot VMs and that insights gained through it are transferable to other programming systems.
We conclude that our research on tools for polyglot programming is an important step toward making polyglot VMs more approachable for developers in practice. With good tool support, we believe polyglot VMs can make it much more common for developers to take advantage of multiple languages and their ecosystems when building software.
N2  - Durch Polyglottes Programmieren können Softwareentwickler:innen mehrere Programmiersprachen für das Bauen von Software verwenden. Während diese Art von Programmierung in einigen Programmierdomänen üblich ist, wenden Entwickler:innen Polyglottes Programmieren auch aus anderen Gründen an, wie zum Beispiel, um Software über Programmiersprachen hinweg wiederverwenden zu können. Obwohl die bestehenden Ansätze zum Polyglotten Programmieren mit erheblichen Einschränkungen verbunden sind, wie beispielsweise in Bezug zur Laufzeitperformance oder der Unterstützung durch Programmierwerkzeuge, werden sie dennoch von Entwickler:innen genutzt, um Sprachen kombinieren zu können.
Mehrsprachige Ausführungsumgebungen wie zum Beispiel GraalVM bieten Polyglottes Programmieren auf einer neuen Ebene an, welche es Sprachen erlaubt, direkt miteinander zu interagieren. Dadurch wird die Menge an notwendigem Glue Code beim Kombinieren von Sprachen reduziert und die Laufzeitperformance verbessert. Außerdem können Debugger und andere Programmierwerkzeuge über mehrere Sprachen hinweg verwendet werden. Jedoch hat sich bisher nur wenig wissenschaftliche Arbeit mit neuartigen Werkzeugen beschäftigt, die darauf ausgelegt sind, Entwickler:innen beim Polyglotten Programmieren mit mehrsprachigen Ausführungsumgebungen zu unterstützen. Ein Grund dafür ist, dass das Bauen von Werkzeugen üblicherweise sehr aufwendig ist. Ein anderer Grund ist, dass sich mehrsprachige Ausführungsumgebungen immer noch ständig weiterentwickeln, da ihre Anwendungsfälle und Anforderungen noch nicht ausreichend verstanden sind.
In dieser Arbeit stellen wir einen Ansatz vor, der auf selbsttragenden Programmiersystemen wie zum Beispiel Squeak/Smalltalk aufbaut, um Exploratives Programmieren, eine Praktik zum Explorieren und Erfassen von Softwareanforderungen, sowie das Wiederverwenden ihrer umfangreichen Fähigkeiten zum Bauen von Werkzeugen im Rahmen von mehrsprachigen Ausführungsumgebungen zu ermöglichen. Basierend auf TruffleSqueak, unserer Implementierung für die GraalVM, zeigen wir anhand von fünf Fallstudien, wie unser Ansatz Werkzeugentwickler:innen dabei hilft, neue Werkzeuge zum Polyglotten Programmieren zu entwerfen und zu bauen. Außerdem demonstrieren wir, dass TruffleSqueak auch von Anwendungsentwickler:innen zum Bauen und Erweitern von polyglotten Anwendungen zur Laufzeit genutzt werden kann und Sprach- sowie Laufzeitentwickler:innen dabei hilft, das dynamische Verhalten von GraalVM-Sprachen und -Interna zu verstehen. Da unsere Plattform dabei all diesen Entwickler:innen Polyglottes Programmieren erlaubt, trägt sie außerdem dazu bei, dass Vorteile, Anwendungsfälle, Anforderungen und Herausforderungen von mehrsprachigen Ausführungsumgebungen besser verstanden werden können. Darüber hinaus zeigen wir, dass unser Ansatz auch auf andere mehrsprachige Ausführungsumgebungen angewandt werden kann und dass die Erkenntnisse, die man durch unseren Ansatz gewinnen kann, auch auf andere Programmiersysteme übertragbar sind.
Wir schlussfolgern, dass unsere Forschung an Werkzeugen zum Polyglotten Programmieren ein wichtiger Schritt ist, um mehrsprachige Ausführungsumgebungen zugänglicher für Entwickler:innen in der Praxis zu machen. Wir sind davon überzeugt, dass diese Ausführungsumgebungen mit guter Werkzeugunterstützung dazu führen können, dass Softwareentwickler:innen häufiger von den Vorteilen der Verwendung mehrerer Programmiersprachen zum Bauen von Software profitieren wollen.
KW  - polyglot programming
KW  - polyglottes Programmieren
KW  - programming tools
KW  - Programmierwerkzeuge
KW  - Smalltalk
KW  - Smalltalk
KW  - GraalVM
KW  - GraalVM
KW  - virtual machines
KW  - virtuelle Maschinen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-571776
ER  - 
TY  - JOUR
A1  - Navarro, Marisa
A1  - Orejas, Fernando
A1  - Pino, Elvira
A1  - Lambers, Leen
T1  - A navigational logic for reasoning about graph properties
JF  - Journal of logical and algebraic methods in programming
N2  - Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete.
KW  - Graph logic
KW  - Algebraic methods
KW  - Formal modelling
KW  - Specification
Y1  - 2021
U6  - https://doi.org/10.1016/j.jlamp.2020.100616
SN  - 2352-2208
SN  - 2352-2216
VL  - 118
PB  - Elsevier Science
CY  - Amsterdam [u.a.]
ER  - 
TY  - THES
A1  - Najafi, Pejman
T1  - Leveraging data science & engineering for advanced security operations
T1  - Der Einsatz von Data Science & Engineering für fortschrittliche Security Operations
N2  - The Security Operations Center (SOC) represents a specialized unit responsible for managing security within enterprises. To aid in its responsibilities, the SOC relies heavily on a Security Information and Event Management (SIEM) system that functions as a centralized repository for all security-related data, providing a comprehensive view of the organization's security posture. Due to the ability to offer such insights, SIEMS are considered indispensable tools facilitating SOC functions, such as monitoring, threat detection, and incident response.

Despite advancements in big data architectures and analytics, most SIEMs fall short of keeping pace. Architecturally, they function merely as log search engines, lacking the support for distributed large-scale analytics.  Analytically, they rely on rule-based correlation, neglecting the adoption of more advanced data science and machine learning techniques. 

This thesis first proposes a blueprint for next-generation SIEM systems that emphasize distributed processing and multi-layered storage to enable data mining at a big data scale. Next, with the architectural support, it introduces two data mining approaches for advanced threat detection as part of SOC operations.

First, a novel graph mining technique that formulates threat detection within the SIEM system as a large-scale graph mining and inference problem, built on the principles of guilt-by-association and exempt-by-reputation. The approach entails the construction of a Heterogeneous Information Network (HIN) that models shared characteristics and associations among entities extracted from SIEM-related events/logs. Thereon, a novel graph-based inference algorithm is used to infer a node's maliciousness score based on its associations with other entities in the HIN. Second, an innovative outlier detection technique that imitates a SOC analyst's reasoning process to find anomalies/outliers. The approach emphasizes explainability and simplicity, achieved by combining the output of simple context-aware univariate submodels that calculate an outlier score for each entry.

Both approaches were tested in academic and real-world settings, demonstrating high performance when compared to other algorithms as well as practicality alongside a large enterprise's SIEM system.

This thesis establishes the foundation for next-generation SIEM systems that can enhance today's SOCs and facilitate the transition from human-centric to data-driven security operations.
N2  - In einem Security Operations Center (SOC) werden alle sicherheitsrelevanten Prozesse, Daten und Personen einer Organisation zusammengefasst. Das Herzstück des SOCs ist ein Security Information and Event Management (SIEM)-System, welches als zentraler Speicher aller sicherheitsrelevanten Daten fungiert und einen Überblick über die Sicherheitslage einer Organisation geben kann. SIEM-Systeme sind unverzichtbare Werkzeuge für viele SOC-Funktionen wie Monitoring, Threat Detection und Incident Response.

Trotz der Fortschritte bei Big-Data-Architekturen und -Analysen können die meisten SIEMs nicht mithalten. Sie fungieren nur als Protokollsuchmaschine und unterstützen keine verteilte Data Mining und Machine Learning.

In dieser Arbeit wird zunächst eine Blaupause für die nächste Generation von SIEM-Systemen vorgestellt, welche Daten verteilt, verarbeitet und in mehreren Schichten speichert, damit auch Data Mining im großen Stil zu ermöglichen. Zudem werden zwei Data Mining-Ansätze vorgeschlagen, mit denen auch anspruchsvolle Bedrohungen erkannt werden können.

Der erste Ansatz ist eine neue Graph-Mining-Technik, bei der SIEM-Daten als Graph strukturiert werden und Reputationsinferenz mithilfe der Prinzipien guiltby-association (Kontaktschuld) und exempt-by-reputation (Reputationsbefreiung) implementiert wird. Der Ansatz nutzt ein heterogenes Informationsnetzwerk (HIN), welches gemeinsame Eigenschaften und Assoziationen zwischen Entitäten aus Event Logs verknüpft. Des Weiteren ermöglicht ein neuer Inferenzalgorithmus die Bestimmung der Schädlichkeit eines Kontos anhand seiner Verbindungen zu anderen Entitäten im HIN. Der zweite Ansatz ist eine innovative Methode zur Erkennung von Ausreißern, die den Entscheidungsprozess eines SOC-Analysten imitiert. Diese Methode ist besonders einfach und interpretierbar, da sie einzelne univariate Teilmodelle kombiniert, die sich jeweils auf eine kontextualisierte Eigenschaft einer Entität beziehen.

Beide Ansätze wurden sowohl akademisch als auch in der Praxis getestet und haben im Vergleich mit anderen Methoden auch in großen Unternehmen eine hohe Qualität bewiesen.

Diese Arbeit bildet die Grundlage für die nächste Generation von SIEM-Systemen, welche den Übergang von einer personalzentrischen zu einer datenzentrischen Perspektive auf SOCs ermöglichen.
KW  - cybersecurity
KW  - endpoint security
KW  - threat detection
KW  - intrusion detection
KW  - apt
KW  - advanced threats
KW  - advanced persistent threat
KW  - zero-day
KW  - security analytics
KW  - data-driven
KW  - data mining
KW  - data science
KW  - anomaly detection
KW  - outlier detection
KW  - graph mining
KW  - graph inference
KW  - machine learning
KW  - Advanced Persistent Threats
KW  - fortschrittliche Angriffe
KW  - Anomalieerkennung
KW  - APT
KW  - Cyber-Sicherheit
KW  - Data-Mining
KW  - Data-Science
KW  - datengetrieben
KW  - Endpunktsicherheit
KW  - Graphableitung
KW  - Graph-Mining
KW  - Einbruchserkennung
KW  - Machine-Learning
KW  - Ausreißererkennung
KW  - Sicherheitsanalyse
KW  - Bedrohungserkennung
KW  - 0-day
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-612257
ER  - 
TY  - GEN
A1  - Monti, Remo
A1  - Rautenstrauch, Pia
A1  - Ghanbari, Mahsa
A1  - James, Alva Rani
A1  - Kirchler, Matthias
A1  - Ohler, Uwe
A1  - Konigorski, Stefan
A1  - Lippert, Christoph
T1  - Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 16 
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-586078
IS  - 16
ER  - 
TY  - JOUR
A1  - Monti, Remo
A1  - Rautenstrauch, Pia
A1  - Ghanbari, Mahsa
A1  - James, Alva Rani
A1  - Kirchler, Matthias
A1  - Ohler, Uwe
A1  - Konigorski, Stefan
A1  - Lippert, Christoph
T1  - Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
JF  - Nature Communications
N2  - Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Y1  - 2022
U6  - https://doi.org/10.1038/s41467-022-32864-2
SN  - 2041-1723
VL  - 13
PB  - Nature Publishing Group UK
CY  - London
ER  - 
TY  - JOUR
A1  - Mihaescu, Vlad
A1  - Andone, Diana
A1  - Vasiu, Radu
T1  - DigiCulture MOOC Courses Piloting with Students
JF  - EMOOCs 2021
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517339
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 275
EP  - 279
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Willems, Christian
A1  - Staubitz, Thomas
A1  - Sauer, Dominic
A1  - Hagedorn, Christiane
T1  - openHPI
T1  - openHPI
BT  - 10 Years of MOOCs at the Hasso Plattner Institute
BT  - 10 Jahre MOOCs am Hasso-Plattner-Institut
N2  - On the occasion of the 10th openHPI anniversary, this technical report provides information about the HPI MOOC platform, including its core features, technology, and architecture.

In an introduction, the platform family with all partner platforms is presented; these now amount to nine platforms, including openHPI. This section introduces openHPI as an advisor and research partner in various projects. 

In the second chapter, the functionalities and common course formats of the platform are presented. The functionalities are divided into learner and admin features. The learner features section provides detailed information about performance records, courses, and the learning materials of which a course is composed: videos, texts, and quizzes. In addition, the learning materials can be enriched by adding external exercise tools that communicate with the HPI MOOC platform via the Learning Tools Interoperability (LTI) standard. Furthermore, the concept of peer assessments completed the possible learning materials.
The section then proceeds with further information on the discussion forum, a fundamental concept of MOOCs compared to traditional e-learning offers. The section is concluded with a description of the quiz recap, learning objectives, mobile applications, gameful learning, and the help desk.

The next part of this chapter deals with the admin features. The described functionality is restricted to describing the news and announcements, dashboards and statistics, reporting capabilities, research options with A/B testing, the course feed, and the TransPipe tool to support the process of creating automated or manual subtitles. The platform supports a large variety of additional features, but a detailed description of these features goes beyond the scope of this report.
The chapter then elaborates on common course formats and openHPI teaching activities at the HPI. The chapter concludes with some best practices for course design and delivery.

The third chapter provides insights into the technology and architecture behind openHPI. A special characteristic of the openHPI project is the conscious decision to operate the complete application from bare metal to platform development. Hence, the chapter starts with a section about the openHPI Cloud, including detailed information about the data center and devices, the used cloud software OpenStack and Ceph, as well as the openHPI Cloud Service provided for the HPI.

Afterward, a section on the application technology stack and development tooling describes the application infrastructure components, the used automation, the deployment pipeline, and the tools used for monitoring and alerting. The chapter is concluded with detailed information about the technology stack and concrete platform implementation details. The section describes the service-oriented Ruby on Rails application, inter-service communication, and public APIs. It also provides more information on the design system and components used in the application. The section concludes with a discussion of the original microservice architecture, where we share our insights and reasoning for migrating back to a monolithic application.

The last chapter provides a summary and an outlook on the future of digital education.
N2  - Anlässlich des 10-jährigen Jubiläums von openHPI informiert dieser technische Bericht über die HPI-MOOC-Plattform einschließlich ihrer Kernfunktionen, Technologie und Architektur.
In einer Einleitung wird die Plattformfamilie mit allen Partnerplattformen vorgestellt; diese belaufen sich inklusive openHPI aktuell auf neun Plattformen. In diesem Abschnitt wird außerdem gezeigt, wie openHPI als Berater und Forschungspartner in verschiedenen Projekten fungiert. 

Im zweiten Kapitel werden die Funktionalitäten und gängigen Kursformate der Plattform präsentiert. Die Funktionalitäten sind in Lerner- und Admin-Funktionen unterteilt. Der Bereich Lernerfunktionen bietet detaillierte Informationen zu Leistungsnachweisen, Kursen und den Lernmaterialien, aus denen sich ein Kurs zusammensetzt: Videos, Texte und Quiz. Darüber hinaus können die Lernmaterialien durch externe Übungstools angereichert werden, die über den Standard Learning Tools Interoperability (LTI) mit der HPI MOOC-Plattform kommunizieren. Das Konzept der Peer-Assessments rundet die möglichen Lernmaterialien ab.
Der Abschnitt geht dann weiter auf das Diskussionsforum ein, das einen grundlegenden Unterschied von MOOCs im Vergleich zu traditionellen E-Learning-Angeboten darstellt. Zum Abschluss des Abschnitts folgen eine Beschreibung von Quiz-Recap, Lernzielen, mobilen Anwendungen, spielerischen Lernens und dem Helpdesk.

Der nächste Teil dieses Kapitels beschäftigt sich mit den Admin-Funktionen. Die Funktionalitätsbeschreibung beschränkt sich Neuigkeiten und Ankündigungen, Dashboards und Statistiken, Berichtsfunktionen, Forschungsoptionen mit A/B-Tests, den Kurs-Feed und das TransPipe-Tool zur Unterstützung beim Erstellen von automatischen oder manuellen Untertiteln. Die Plattform unterstützt außerdem eine Vielzahl zusätzlicher Funktionen, doch eine detaillierte Beschreibung dieser Funktionen würde den Rahmen des Berichts sprengen.
Das Kapitel geht dann auf gängige Kursformate und openHPI-Lehrveranstaltungen am HPI ein, bevor es mit einigen Best Practices für die Gestaltung und Durchführung von Kursen schließt.
Zum Abschluss des technischen Berichts gibt das letzte Kapitel eine Zusammenfassung und einen Ausblick auf die Zukunft der digitalen Bildung. 

Ein besonderes Merkmal des openHPI-Projekts ist die bewusste Entscheidung, die komplette Anwendung von den physischen Netzwerkkomponenten bis zur Plattformentwicklung eigenständig zu betreiben. Bei der vorliegenden deutschen Variante handelt es sich um eine gekürzte Übersetzung des technischen Berichts 148, bei der kein Einblick in die Technologien und Architektur von openHPI gegeben wird. Interessierte Leser:innen können im technischen Bericht 148 (vollständige englische Version) detaillierte Informationen zum Rechenzentrum und den Geräten, der Cloud-Software und dem openHPI Cloud Service aber auch zu Infrastruktur-Anwendungskomponenten wie Entwicklungstools, Automatisierung, Deployment-Pipeline und Monitoring erhalten. Außerdem finden sich dort weitere Informationen über den Technologiestack und konkrete Implementierungsdetails der Plattform inklusive der serviceorientierten Ruby on Rails-Anwendung, die Kommunikation zwischen den Diensten, öffentliche APIs, sowie Designsystem und -komponenten. Der Abschnitt schließt mit einer Diskussion über die ursprüngliche Microservice-Architektur und die Migration zu einer monolithischen Anwendung.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 148 
KW  - openHPI
KW  - MOOC
KW  - digital learning platform
KW  - digital enlightenment
KW  - lifelong learning
KW  - openHPI
KW  - MOOC
KW  - digitale Lernplattform
KW  - digitale Aufklärung
KW  - lebenslanges Lernen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-560208
SN  - 978-3-86956-544-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 148
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Gayvoronskaya, Tatiana
A1  - Schnjakin, Maxim
T1  - Blockchain
BT  - hype or innovation
N2  - The term blockchain has recently become a buzzword, but only few know what exactly lies behind this approach. According to a survey, issued in the first quarter of 2017, the term is only known by 35 percent of German medium-sized enterprise representatives. However, the blockchain technology is very interesting for the mass media because of its rapid development and global capturing of different markets.

For example, many see blockchain technology either as an all-purpose weapon— which only a few have access to—or as a hacker technology for secret deals in the darknet. The innovation of blockchain technology is found in its successful combination of already existing approaches: such as decentralized networks, cryptography, and consensus models. This innovative concept makes it possible to exchange values in a decentralized system. At the same time, there is no requirement for trust between its nodes (e.g. users).

With this study the Hasso Plattner Institute would like to help readers form their own opinion about blockchain technology, and to distinguish between truly innovative properties and hype.

The authors of the present study analyze the positive and negative properties of the blockchain architecture and suggest possible solutions, which can contribute to the efficient use of the technology. We recommend that every company define a clear target for the intended application, which is achievable with a reasonable cost-benefit ration, before deciding on this technology. Both the possibilities and the limitations of blockchain technology need to be considered. The relevant steps that must be taken in this respect are summarized /summed up for the reader in this study.

Furthermore, this study elaborates on urgent problems such as the scalability of the blockchain, appropriate consensus algorithm and security, including various types of possible attacks and their countermeasures. New blockchains, for example, run the risk of reducing security, as changes to existing technology can lead to lacks in the security and failures.

After discussing the innovative properties and problems of the blockchain technology, its implementation is discussed. There are a lot of implementation opportunities for companies available who are interested in the blockchain realization. The numerous applications have either their own blockchain as a basis or use existing and widespread blockchain systems. Various consortia and projects offer "blockchain-as-a-serviceänd help other companies to develop, test and deploy their own applications.

This study gives a detailed overview of diverse relevant applications and projects in the field of blockchain technology. As this technology is still a relatively young and fast developing approach, it still lacks uniform standards to allow the cooperation of different systems and to which all developers can adhere. Currently, developers are orienting themselves to Bitcoin, Ethereum and Hyperledger systems, which serve as the basis for many other blockchain applications.

The goal is to give readers a clear and comprehensive overview of blockchain technology and its capabilities.
N2  - Der Begriff Blockchain ist in letzter Zeit zu einem Schlagwort geworden, aber nur wenige wissen, was sich genau dahinter verbirgt. Laut einer Umfrage, die im ersten Quartal 2017 veröffentlicht wurde, ist der Begriff nur bei 35 Prozent der deutschen Mittelständler bekannt. Dabei ist die Blockchain-Technologie durch ihre rasante Entwicklung und die globale Eroberung unterschiedlicher Märkte für Massenmedien sehr interessant.

So sehen viele die Blockchain-Technologie entweder als eine Allzweckwaffe, zu der aber nur wenige einen Zugang haben, oder als eine Hacker-Technologie für geheime Geschäfte im Darknet. Dabei liegt die Innovation der Blockchain-Technologie in ihrer erfolgreichen Zusammensetzung bereits vorhandener Ansätze: dezentrale Netzwerke, Kryptographie, Konsensfindungsmodelle. Durch das innovative Konzept wird ein Werte-Austausch in einem dezentralen System möglich. Dabei wird kein Vertrauen zwischen dessen Knoten (z.B. Nutzer) vorausgesetzt.

Mit dieser Studie möchte das Hasso-Plattner-Institut den Lesern helfen, ihren eigenen Standpunkt zur Blockchain-Technologie zu finden und dabei dazwischen unterscheiden zu können, welche Eigenschaften wirklich innovativ und welche nichts weiter als ein Hype sind.

Die Autoren der vorliegenden Arbeit analysieren positive und negative Eigenschaften, welche die Blockchain-Architektur prägen, und stellen mögliche Anpassungs- und Lösungsvorschläge vor, die zu einem effizienten Einsatz der Technologie beitragen können. Jedem Unternehmen, bevor es sich für diese Technologie entscheidet, wird dabei empfohlen, für den geplanten Anwendungszweck zunächst ein klares Ziel zu definieren, das mit einem angemessenen Kosten-Nutzen-Verhältnis angestrebt werden kann. Dabei sind sowohl die Möglichkeiten als auch die Grenzen der Blockchain-Technologie zu beachten. Die relevanten Schritte, die es in diesem Zusammenhang zu beachten gilt, fasst die Studie für die Leser übersichtlich zusammen.

Es wird ebenso auf akute Fragestellungen wie Skalierbarkeit der Blockchain, geeigneter Konsensalgorithmus und Sicherheit eingegangen, darunter verschiedene Arten möglicher Angriffe und die entsprechenden Gegenmaßnahmen zu deren Abwehr. Neue Blockchains etwa laufen Gefahr, geringere Sicherheit zu bieten, da Änderungen an der bereits bestehenden Technologie zu Schutzlücken und Mängeln führen können.

Nach Diskussion der innovativen Eigenschaften und Probleme der Blockchain-Technologie wird auf ihre Umsetzung eingegangen. Interessierten Unternehmen stehen viele Umsetzungsmöglichkeiten zur Verfügung. Die zahlreichen Anwendungen haben entweder eine eigene Blockchain als Grundlage oder nutzen bereits bestehende und weitverbreitete Blockchain-Systeme. Zahlreiche Konsortien und Projekte bieten „Blockchain-as-a-Service“ an und unterstützen andere Unternehmen beim Entwickeln, Testen und Bereitstellen von Anwendungen.

Die Studie gibt einen detaillierten Überblick über zahlreiche relevante Einsatzbereiche und Projekte im Bereich der Blockchain-Technologie. Dadurch, dass sie noch relativ jung ist und sich schnell entwickelt, fehlen ihr noch einheitliche Standards, die Zusammenarbeit der verschiedenen Systeme erlauben und an die sich alle Entwickler halten können. Aktuell orientieren sich Entwickler an Bitcoin-, Ethereum- und Hyperledger-Systeme, diese dienen als Grundlage für viele weitere Blockchain-Anwendungen.

Ziel ist, den Lesern einen klaren und umfassenden Überblick über die Blockchain-Technologie und deren Möglichkeiten zu vermitteln.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 124 
KW  - ACINQ
KW  - altchain
KW  - alternative chain
KW  - ASIC
KW  - atomic swap
KW  - Australian securities exchange
KW  - bidirectional payment channels
KW  - Bitcoin Core
KW  - bitcoins
KW  - BitShares
KW  - Blockchain Auth
KW  - blockchain consortium
KW  - cross-chain
KW  - inter-chain
KW  - blocks
KW  - blockchain
KW  - Blockstack ID
KW  - Blockstack
KW  - blumix platform
KW  - BTC
KW  - Byzantine Agreement
KW  - chain
KW  - cloud
KW  - Colored Coins
KW  - confirmation period
KW  - contest period
KW  - DAO
KW  - Delegated Proof-of-Stake
KW  - decentralized autonomous organization
KW  - Distributed Proof-of-Research
KW  - double hashing
KW  - DPoS
KW  - ECDSA
KW  - Eris
KW  - Ether
KW  - Ethereum
KW  - E-Wallet
KW  - Federated Byzantine Agreement
KW  - federated voting
KW  - FollowMyVote
KW  - Fork
KW  - Gridcoin
KW  - Hard Fork
KW  - Hashed Timelock Contracts
KW  - hashrate
KW  - identity management
KW  - smart contracts
KW  - Internet of Things
KW  - IoT
KW  - BCCC
KW  - Japanese Blockchain Consortium
KW  - consensus algorithm
KW  - consensus protocol
KW  - ledger assets
KW  - Lightning Network
KW  - Lock-Time-Parameter
KW  - merged mining
KW  - merkle root
KW  - micropayment
KW  - micropayment channels
KW  - Microsoft Azur
KW  - miner
KW  - mining
KW  - mining hardware
KW  - minting
KW  - Namecoin
KW  - NameID
KW  - NASDAQ
KW  - nonce
KW  - off-chain transaction
KW  - Onename
KW  - OpenBazaar
KW  - Oracles
KW  - Orphan Block
KW  - P2P
KW  - Peercoin
KW  - peer-to-peer network
KW  - pegged sidechains
KW  - PoB
KW  - PoS
KW  - PoW
KW  - Proof-of-Burn
KW  - Proof-of-Stake
KW  - Proof-of-Work
KW  - quorum slices
KW  - Ripple
KW  - rootstock
KW  - scarce tokens
KW  - difficulty
KW  - SCP
KW  - SHA
KW  - sidechain
KW  - Simplified Payment Verification
KW  - scalability of blockchain
KW  - Slock.it
KW  - Soft Fork
KW  - SPV
KW  - Steemit
KW  - Stellar Consensus Protocol
KW  - Storj
KW  - The Bitfury Group
KW  - transaction
KW  - Two-Way-Peg
KW  - The DAO
KW  - Unspent Transaction Output
KW  - contracts
KW  - Watson IoT
KW  - difficulty target
KW  - Zookos triangle
KW  - Blockchain-Konsortium R3
KW  - blockchain-übergreifend
KW  - Blöcke
KW  - Blockkette
KW  - Blumix-Plattform
KW  - dezentrale autonome Organisation
KW  - doppelter Hashwert
KW  - Identitätsmanagement
KW  - intelligente Verträge
KW  - Internet der Dinge
KW  - Japanisches Blockchain-Konsortium
KW  - Kette
KW  - Konsensalgorithmus
KW  - Konsensprotokoll
KW  - Micropayment-Kanäle
KW  - Off-Chain-Transaktionen
KW  - Peer-to-Peer Netz
KW  - Schwierigkeitsgrad
KW  - Skalierbarkeit der Blockchain
KW  - Transaktion
KW  - Verträge
KW  - Zielvorgabe
KW  - Zookos Dreieck
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-414525
SN  - 978-3-86956-441-8
SN  - 1613-5652
SN  - 2191-1665
IS  - 124
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Galbas, Michael
A1  - Hagebölling, David
T1  - Digital sovereignty: insights from Germany’s education sector
T1  - Digitale Souveränität: Erkenntnisse aus dem deutschen Bildungssektor
N2  - Digital technology offers significant political, economic, and societal opportunities. At the same time, the notion of digital sovereignty has become a leitmotif in German discourse: the state’s capacity to assume its responsibilities and safeguard society’s – and individuals’ – ability to shape the digital transformation in a self-determined way. The education sector is exemplary for the challenge faced by Germany, and indeed Europe, of harnessing the benefits of digital technology while navigating concerns around sovereignty. It encompasses education as a core public good, a rapidly growing field of business, and growing pools of highly sensitive personal data. The report describes pathways to mitigating the tension between digitalization and sovereignty at three different levels – state, economy, and individual – through the lens of concrete technical projects in the education sector: the HPI Schul-Cloud (state sovereignty), the MERLOT data spaces (economic sovereignty), and the openHPI platform (individual sovereignty).
N2  - Digitale Technologien bieten erhebliche politische, wirtschaftliche und gesellschaftliche Chancen. Zugleich ist der Begriff digitale Souveränität zu einem Leitmotiv im deutschen Diskurs über digitale Technologien geworden: das heißt, die Fähigkeit des Staates, seine Verantwortung wahrzunehmen und die Befähigung der Gesellschaft – und des Einzelnen – sicherzustellen, die digitale Transformation selbstbestimmt zu gestalten. Exemplarisch für die Herausforderung in Deutschland und Europa, die Vorteile digitaler Technologien zu nutzen und gleichzeitig Souveränitätsbedenken zu berücksichtigen, steht der Bildungssektor. Er umfasst Bildung als zentrales öffentliches Gut, ein schnell aufkommendes Geschäftsfeld und wachsende Bestände an hochsensiblen personenbezogenen Daten. Davon ausgehend beschreibt der Bericht Wege zur Entschärfung des Spannungsverhältnisses zwischen Digitalisierung und Souveränität auf drei verschiedenen Ebenen – Staat, Wirtschaft und Individuum – anhand konkreter technischer Projekte im Bildungsbereich: die HPI Schul-Cloud (staatliche Souveränität), die MERLOT-Datenräume (wirtschaftliche Souveränität) und die openHPI-Plattform (individuelle Souveränität).
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 157 
KW  - digitalization
KW  - digital sovereignty
KW  - digital education
KW  - HPI Schul-Cloud
KW  - MERLOT
KW  - openHPI
KW  - European Union
KW  - Digitalisierung
KW  - digitale Souveränität
KW  - digitale Bildung
KW  - HPI Schul-Cloud
KW  - MERLOT
KW  - openHPI
KW  - Europäische Union
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597723
SN  - 978-3-86956-561-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 157
SP  - 1
EP  - 27
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Meinel, Christoph
A1  - Döllner, Jürgen Roland Friedrich
A1  - Weske, Mathias
A1  - Polze, Andreas
A1  - Hirschfeld, Robert
A1  - Naumann, Felix
A1  - Giese, Holger
A1  - Baudisch, Patrick
A1  - Friedrich, Tobias
A1  - Böttinger, Erwin
A1  - Lippert, Christoph
A1  - Dörr, Christian
A1  - Lehmann, Anja
A1  - Renard, Bernhard
A1  - Rabl, Tilmann
A1  - Uebernickel, Falk
A1  - Arnrich, Bert
A1  - Hölzle, Katharina
T1  - Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
N2  - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.

Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.

The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
N2  - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme.

"Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien.

Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein weites Spektrum aktueller Forschungsthemen ab. Dazu zählen unter anderem Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; sowie Services Specification, Composition, and Enactment.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 138 
KW  - Hasso Plattner Institute
KW  - research school
KW  - Ph.D. retreat
KW  - service-oriented systems engineering
KW  - Hasso-Plattner-Institut
KW  - Forschungskolleg
KW  - Klausurtagung
KW  - Service-oriented Systems Engineering
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-504132
SN  - 978-3-86956-513-2
SN  - 1613-5652
SN  - 2191-1665
IS  - 138
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Maximova, Maria
A1  - Schneider, Sven
A1  - Giese, Holger
T1  - Compositional analysis of probabilistic timed graph transformation systems
N2  - The analysis of behavioral models is of high importance for cyber-physical systems, as the systems often encompass complex behavior based on e.g. concurrent components with mutual exclusion or probabilistic failures on demand. The rule-based formalism of probabilistic timed graph transformation systems is a suitable choice when the models representing states of the system can be understood as graphs and timed and probabilistic behavior is important. However, model checking PTGTSs is limited to systems with rather small state spaces.

We present an approach for the analysis of large scale systems modeled as probabilistic timed graph transformation systems by systematically decomposing their state spaces into manageable fragments. To obtain qualitative and quantitative analysis results for a large scale system, we verify that results obtained for its fragments serve as overapproximations for the corresponding results of the large scale system. Hence, our approach allows for the detection of violations of qualitative and quantitative safety properties for the large scale system under analysis. We consider a running example in which we model shuttles driving on tracks of a large scale topology and for which we verify that shuttles never collide and are unlikely to execute emergency brakes. In our evaluation, we apply an implementation of our approach to the running example.
N2  - Die Analyse von Verhaltensmodellen ist für cyber-physikalische Systeme von hoher Bedeutung, da die Systeme häufig komplexes Verhalten umfassen, das z.B. parallele Komponenten mit gegenseitigem Ausschluss oder probabilistischen Fehlern bei Bedarf umfasst. Der regelbasierte Formalismus probabilistischer zeitgesteuerter Graphtransformationssysteme ist eine geeignete Wahl, wenn die Modelle, die Zustände des Systems darstellen, als Graphen verstanden werden können und zeitgesteuertes und probabilistisches Verhalten wichtig ist. Modelchecking von PTGTSs ist jedoch auf Systeme mit relativ kleinen Zustandsräumen beschränkt.

Wir präsentieren einen Ansatz zur Analyse von Großsystemen, die als probabilistische zeitgesteuerte Graphtransformationssysteme modelliert wurden, indem ihre Zustandsräume systematisch in überschaubare Fragmente zerlegt werden. Um qualitative und quantitative Analyseergebnisse für ein Großsystem zu erhalten, überprüfen wir, ob die für seine Fragmente erhaltenen Ergebnisse als Überannäherungen für die entsprechenden Ergebnisse des Großsystems dienen. Unser Ansatz ermöglicht es daher, Verstöße gegen qualitative und quantitative Sicherheitseigenschaften für das untersuchte Großsystem zu erkennen. Wir betrachten ein Beispiel, in dem wir Shuttles modellieren, die auf Gleisen einer großen Topologie fahren, und für die wir überprüfen, dass Shuttles niemals kollidieren und wahrscheinlich keine Notbremsungen ausführen. In unserer Auswertung wenden wir eine Implementierung unseres Ansatzes auf das Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 133 
KW  - cyber-physical systems
KW  - graph transformation systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - probabilistic timed systems
KW  - compositional analysis
KW  - model checking
KW  - Cyber-physikalische Systeme
KW  - Graphentransformationssysteme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - probabilistische zeitgesteuerte Systeme
KW  - Modellprüfung
KW  - kompositionale Analyse
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-490131
SN  - 978-3-86956-501-9
SN  - 1613-5652
SN  - 2191-1665
IS  - 133
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Maximova, Maria
A1  - Schneider, Sven
A1  - Giese, Holger
T1  - Interval probabilistic timed graph transformation systems
N2  - The formal modeling and analysis is of crucial importance for software development processes following the model based approach. We present the formalism of Interval Probabilistic Timed Graph Transformation Systems (IPTGTSs) as a high-level modeling language. This language supports structure dynamics (based on graph transformation), timed behavior (based on clocks, guards, resets, and invariants as in Timed Automata (TA)), and interval probabilistic behavior (based on Discrete Interval Probability Distributions). That is, for the probabilistic behavior, the modeler using IPTGTSs does not need to provide precise probabilities, which are often impossible to obtain, but rather provides a probability range instead from which a precise probability is chosen nondeterministically. In fact, this feature on capturing probabilistic behavior distinguishes IPTGTSs from Probabilistic Timed Graph Transformation Systems (PTGTSs) presented earlier.
Following earlier work on Interval Probabilistic Timed Automata (IPTA) and PTGTSs, we also provide an analysis tool chain for IPTGTSs based on inter-formalism transformations. In particular, we provide in our tool AutoGraph a translation of IPTGTSs to IPTA and rely on a mapping of IPTA to Probabilistic Timed Automata (PTA) to allow for the usage of the Prism model checker. The tool Prism can then be used to analyze the resulting PTA w.r.t. probabilistic real-time queries asking for worst-case and best-case probabilities to reach a certain set of target states in a given amount of time.
N2  - Die formale Modellierung und Analyse ist für Softwareentwicklungsprozesse nach dem modellbasierten Ansatz von entscheidender Bedeutung. Wir präsentieren den Formalismus von Interval Probabilistic Timed Graph Transformation Systems (IPTGTS) als Modellierungssprache auf hoher abstrakter Ebene. Diese Sprache unterstützt Strukturdynamik (basierend auf Graphtransformation), zeitgesteuertes Verhalten (basierend auf Clocks, Guards, Resets und Invarianten wie in Timed Automata (TA)) und intervallwahrscheinliches Verhalten (basierend auf diskreten Intervallwahrscheinlichkeitsverteilungen). Das heißt, für das probabilistische Verhalten muss der Modellierer, der IPTGTS verwendet, keine genauen Wahrscheinlichkeiten bereitstellen, die oft nicht zu bestimmen sind, sondern stattdessen einen Wahrscheinlichkeitsbereich bereitstellen, aus dem eine genaue Wahrscheinlichkeit nichtdeterministisch ausgewählt wird. Tatsächlich unterscheidet diese Funktion zur Erfassung des probabilistischen Verhaltens IPTGTS von den zuvor vorgestellten PTGTS (Probabilistic Timed Graph Transformation Systems).
Nach früheren Arbeiten zu Intervall Probabilistic Timed Automata (IPTA) und PTGTS bieten wir auch eine Analyse-Toolkette für IPTGTS, die auf Interformalismus-Transformationen basiert. Insbesondere bieten wir in unserem Tool AutoGraph eine Übersetzung von IPTGTSs in IPTA und stützen uns auf eine Zuordnung von IPTA zu probabilistischen zeitgesteuerten Automaten (PTA), um die Verwendung des Prism-Modellprüfers zu ermöglichen. Das Werkzeug Prism kann dann verwendet werden, um den resultierenden PTA bezüglich probabilistische Echtzeitabfragen (in denen nach Worst-Case- und Best-Case-Wahrscheinlichkeiten gefragt wird, um einen bestimmten Satz von Zielzuständen in einem bestimmten Zeitraum zu erreichen) zu analysieren.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 134 
KW  - cyber-physical systems
KW  - graph transformation systems
KW  - interval timed automata
KW  - timed automata
KW  - qualitative analysis
KW  - quantitative analysis
KW  - probabilistic timed systems
KW  - interval probabilistic timed systems
KW  - model checking
KW  - cyber-physikalische Systeme
KW  - Graphentransformationssysteme
KW  - Interval Timed Automata
KW  - Timed Automata
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - probabilistische zeitgesteuerte Systeme
KW  - interval probabilistische zeitgesteuerte Systeme
KW  - Modellprüfung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-512895
SN  - 978-3-86956-502-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 134
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Marx, Susanne
A1  - Freundlich, Heidi
A1  - Klotz, Michael
A1  - Kylänen, Mika
A1  - Niedoszytko, Grazyna
A1  - Swacha, Jakub
A1  - Vollerthum, Anne
T1  - Towards an Online Learning Community on Digitalization in Tourism
JF  - EMOOCs 2021
N2  - Information technology and digital solutions as enablers in the tourism sector require continuous development of skills, as digital transformation is characterized by fast change, complexity and uncertainty. This research investigates how a cMOOC concept could support the tourism industry. A consortium of three universities, a tourism association, and a tourist attraction investigates online learning needs and habits of tourism industry stakeholders in the field of digitalization in a cross-border study in the Baltic Sea region. The multi-national survey (n = 244) reveals a high interest in participating in an online learning community, with two-thirds of respondents seeing opportunities to contributing to such community apart from consuming knowledge. The paper demonstrates preferred ways of learning, motivational and hampering aspects as well as types of possible contributions.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-515986
SN  - 978-3-86956-512-5
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Maldonado-Mahauad, Jorge
A1  - Valdiviezo, Javier
A1  - Carvallo, Juan Pablo
A1  - Samaniego-Erazo, Nicolay
T1  - The MOOC-CEDIA Observatory
BT  - Study of the Current Situation of MOOCs and Recommendations To Improve Their Adoption in Ecuadorian Universities
JF  - EMOOCs 2021
N2  - In the last few years, an important amount of Massive Open Online Courses (MOOCS) has been made available to the worldwide community, mainly by European and North American universities (i.e. United States). Since its emergence, the adoption of these educational resources has been widely studied by several research groups and universities with the aim of understanding their evolution and impact in educational models, through the time. In the case of Latin America, data from the MOOC-UC Observatory (updated until 2018) shows that, the adoption of these courses by universities in the region has been slow and heterogeneous. In the specific case of Ecuador, although some data is available, there is lack of information regarding the construction, publication and/or adoption of such courses by universities in the country. Moreover, there are not updated studies designed to identify and analyze the barriers and factors affecting the adoption of MOOCs in the country. The aim of this work is to present the MOOC-CEDIA Observatory, a web platform that offers interactive visualizations on the adoption of MOOCs in Ecuador. The main results of the study show that: (1) until 2020 there have been 99 MOOCs in Ecuador, (2) the domains of MOOCs are mostly related to applied sciences, social sciences and natural sciences, with the humanities being the least covered, (3) Open edX and Moodle are the most widely used platforms to deploy such courses. It is expected that the conclusions drawn from this analysis, will allow the design of recommendations aimed to promote the creation and use of quality MOOCs in Ecuador and help institutions to chart the route for their adoption, both for internal use by their community but also by society in general.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517153
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 143
EP  - 158
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Lopes, Pedro
T1  - Interactive Systems Based on Electrical Muscle Stimulation
N2  - How can interactive devices connect with users in the most immediate and intimate way? This question has driven interactive computing for decades. Throughout the last decades, we witnessed how mobile devices moved computing into users’ pockets, and recently, wearables put computing in constant physical contact with the user’s skin. In both cases moving the devices closer to users allowed devices to sense more of the user, and thus act more personal. The main question that drives our research is: what is the next logical step? 
Some researchers argue that the next generation of interactive devices will move past the user’s skin and be directly implanted inside the user’s body. This has already happened in that we have pacemakers, insulin pumps, etc. However, we argue that what we see is not devices moving towards the inside of the user’s body, but rather towards the body’s biological “interface” they need to address in order to perform their function.
To implement our vision, we created a set of devices that intentionally borrow parts of the user’s body for input and output, rather than adding more technology to the body. 
In this dissertation we present one specific flavor of such devices, i.e., devices that borrow the user’s muscles. We engineered I/O devices that interact with the user by reading and controlling muscle activity. To achieve the latter, our devices are based on medical-grade signal generators and electrodes attached to the user’s skin that send electrical impulses to the user’s muscles; these impulses then cause the user’s muscles to contract. 
While electrical muscle stimulation (EMS) devices have been used to regenerate lost motor functions in rehabilitation medicine since the 1960s, in this dissertation, we propose a new perspective: EMS as a means for creating interactive systems. 
We start by presenting seven prototypes of interactive devices that we have created to illustrate several benefits of EMS.  These devices form two main categories: (1) Devices that allow users eyes-free access to information by means of their proprioceptive sense, such as the value of a variable in a computer system, a tool, or a plot; (2) Devices that increase immersion in virtual reality by simulating large forces, such as wind, physical impact, or walls and heavy objects. 
Then, we analyze the potential of EMS to build interactive systems that miniaturize well and discuss how they leverage our proprioceptive sense as an I/O modality. We proceed by laying out the benefits and disadvantages of both EMS and mechanical haptic devices, such as exoskeletons. 
We conclude by sketching an outline for future research on EMS by listing open technical, ethical and philosophical questions that we left unanswered.
N2  - Wie können interaktive Geräte auf unmittelbare und eng verknüpfte Weise mit dem Nutzer kommunizieren? Diese Frage beschäftigt die Forschung im Bereich Computer Interaktion seit Jahrzehnten. Besonders in den letzten Jahren haben wir miterlebt, wie Nutzer interaktive Geräte dauerhaft bei sich führen, im Falle von sogenannten Wearables sogar als Teil der Kleidung oder als Accessoires. In beiden Fällen sind die Geräte näher an den Nutzer gerückt, wodurch sie mehr Informationen vom Nutzer sammeln können und daher persönlicher erscheinen. Die Hauptfrage, die unsere Forschung antreibt, ist: Was ist der nächste logische Schritt in der Entwicklung interaktiver Geräte?
Mache Wissenschaftler argumentieren, dass die Haut nicht mehr die Barriere für die nächste Generation von interaktiven Geräten sein wird, sondern dass diese direkt in den Körper der Nutzer implantiert werden. Zum Teil ist dies auch bereits passiert, wie Herzschrittmacher oder Insulinpumpen zeigen. Wir argumentieren jedoch, dass Geräte sich in Zukunft nicht zwingend innerhalb des Körpers befinden müssen, sondern sich an der richtigen „Schnittstelle“ befinden sollen, um die Funktion des Gerätes zu ermöglichen. 
Um diese Entwicklung voranzutreiben haben wir Geräte entwickelt, die Teile des Körpers selbst als Ein- und Ausgabe-Schnittstelle verwenden, anstatt weitere Geräte an den Körper anzubringen.
In dieser Dissertation zeigen wir eine bestimmte Art dieser Geräte, nämlich solche, die Muskeln verwenden. Wir haben Ein-/Ausgabegeräte gebaut, die mit dem Nutzer interagieren indem sie Muskelaktivität erkennen und kontrollieren. Um Muskelaktivität zu kontrollieren benutzen wir Signalgeber von medizinischer Qualität, die mithilfe von auf die Haut geklebten Elektroden elektrische Signale an die Muskeln des Nutzers senden. Diese Signale bewirken dann eine Kontraktion des Muskels.
Geräte zur elektrischen Muskelstimulation (EMS) werden seit den 1960er-Jahren zur Regeneration von motorischen Funktionen verwendet. In dieser Dissertation schlagen wir jedoch einen neuen Ansatz vor: elektrische Muskelstimulation als Kommunikationskanal zwischen Mensch und interaktiven Computersysteme. 
Zunächst stellen wir unsere sieben interaktiven Prototypen vor, welche die zahlreichen Vorteile von EMS demonstrieren. Diese Geräte können in zwei Hauptkategorien unterteilt werden: (1) Geräte, die Nutzern Zugang zu Information direkt über ihre propriozeptive Wahrnehmung geben ohne einen visuellen Reiz. Diese Informationen können zum Beispiel Variablen, Diagramme oder die Handhabung von Werkzeugen beinhalten. (2) Des Weiteren zeigen wir Geräte, welche die Immersion in virtuelle Umgebungen erhöhen indem sie physikalische Kräfte wie Wind, physischen Kontakt, Wände oder schwere Objekte, simulieren.
Wir analysieren in dieser Arbeit außerdem das Potential von EMS für miniaturisierte interaktive Systeme und diskutieren, wie solche EMS Systeme die propriozeptive Wahrnehmung wirksam als Ein-/Ausgabemodalität nutzen können. Dazu stellen wir die Vor- und Nachteile von EMS und mechanisch-haptischen Geräten, wie zum Beispiel Exoskeletten, gegenüber. 
Zum Abschluss skizzieren wir zukünftige Richtungen in der Erforschung von interaktiven EMS Systemen, indem wir bislang offen gebliebene technische, ethische und philosophische Fragen aufzeigen.
KW  - electrical muscle stimulation
KW  - wearables
KW  - virtual reality
KW  - Wearable
KW  - elektrische Muskelstimulation
KW  - virtuelle Realität
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-421165
ER  - 
TY  - THES
A1  - Lindinger, Jakob
T1  - Variational inference for composite Gaussian process models
T1  - Variationelle Inferenz für zusammengesetzte Gauß-Prozess Modelle
N2  - Most machine learning methods provide only point estimates when being queried to predict on new data. This is problematic when the data is corrupted by noise, e.g. from imperfect measurements, or when the queried data point is very different to the data that the machine learning model has been trained with. Probabilistic modelling in machine learning naturally equips predictions with corresponding uncertainty estimates which allows a practitioner to incorporate information about measurement noise into the modelling process and to know when not to trust the predictions. A well-understood, flexible probabilistic framework is provided by Gaussian processes that are ideal as building blocks of probabilistic models. They lend themself naturally to the problem of regression, i.e., being given a set of inputs and corresponding observations and then predicting likely observations for new unseen inputs, and can also be adapted to many more machine learning tasks. However, exactly inferring the optimal parameters of such a Gaussian process model (in a computationally tractable manner) is only possible for regression tasks in small data regimes. Otherwise, approximate inference methods are needed, the most prominent of which is variational inference.
In this dissertation we study models that are composed of Gaussian processes embedded in other models in order to make those more flexible and/or probabilistic. The first example are deep Gaussian processes which can be thought of as a small network of Gaussian processes and which can be employed for flexible regression. The second model class that we study are Gaussian process state-space models. These can be used for time-series modelling, i.e., the task of being given a stream of data ordered by time and then predicting future observations. For both model classes the state-of-the-art approaches offer a trade-off between expressive models and computational properties (e.g. speed or convergence properties) and mostly employ variational inference. Our goal is to improve inference in both models by first getting a deep understanding of the existing methods and then, based on this, to design better inference methods. We achieve this by either exploring the existing trade-offs or by providing general improvements applicable to multiple methods.
We first provide an extensive background, introducing Gaussian processes and their sparse (approximate and efficient) variants. We continue with a description of the models under consideration in this thesis, deep Gaussian processes and Gaussian process state-space models, including detailed derivations and a theoretical comparison of existing methods.
Then we start analysing deep Gaussian processes more closely: Trading off the properties (good optimisation versus expressivity) of state-of-the-art methods in this field, we propose a new variational inference based approach. We then demonstrate experimentally that our new algorithm leads to better calibrated uncertainty estimates than existing methods.
Next, we turn our attention to Gaussian process state-space models, where we closely analyse the theoretical properties of existing methods.The understanding gained in this process leads us to propose a new inference scheme for general Gaussian process state-space models that incorporates effects on multiple time scales. This method is more efficient than previous approaches for long timeseries and outperforms its comparison partners on data sets in which effects on multiple time scales (fast and slowly varying dynamics) are present.
Finally, we propose a new inference approach for Gaussian process state-space models that trades off the properties of state-of-the-art methods in this field. By combining variational inference with another approximate inference method, the Laplace approximation, we design an efficient algorithm that outperforms its comparison partners since it achieves better calibrated uncertainties.
N2  - Bei Vorhersagen auf bisher ungesehenen Datenpunkten liefern die meisten maschinellen Lernmethoden lediglich Punktprognosen. Dies kann problematisch sein, wenn die Daten durch Rauschen verfälscht sind, z. B. durch unvollkommene Messungen, oder wenn der abgefragte Datenpunkt sich stark von den Daten unterscheidet, mit denen das maschinelle Lernmodell trainiert wurde. Mithilfe probabilistischer Modellierung (einem Teilgebiet des maschinellen Lernens) werden die Vorhersagen der Methoden auf natürliche Weise durch Unsicherheiten ergänzt. Dies erlaubt es, Informationen über Messunsicherheiten in den Modellierungsprozess mit einfließen zu lassen, sowie abzuschätzen, bei welchen Vorhersagen dem Modell vertraut werden kann. Grundlage vieler probabilistischer Modelle bilden Gaußprozesse, die gründlich erforscht und äußerst flexibel sind und daher häufig als Bausteine für größere Modelle dienen. Für Regressionsprobleme, was heißt, von einem Datensatz bestehend aus Eingangsgrößen und zugehörigen Messungen auf wahrscheinliche Messwerte für bisher ungesehene Eingangsgrößen zu schließen, sind Gaußprozesse hervorragend geeignet. Zusätzlich können sie an viele weitere Aufgabenstellungen des maschinellen Lernens angepasst werden. Die Bestimmung der optimalen Parameter eines solchen Gaußprozessmodells (in einer annehmbaren Zeit) ist jedoch nur für Regression auf kleinen Datensätzen möglich. In allen anderen Fällen muss auf approximative Inferenzmethoden zurückgegriffen werden, wobei variationelle Inferenz die bekannteste ist.  In dieser Dissertation untersuchen wir Modelle, die Gaußprozesse eingebettet in andere Modelle enthalten, um Letztere flexibler und/oder probabilistisch zu machen. Das erste Beispiel hierbei sind tiefe Gaußprozesse, die man sich als kleines Netzwerk von Gaußprozessen vorstellen kann und die für flexible Regression eingesetzt werden können. Die zweite Modellklasse, die wir genauer analysieren ist die der Gaußprozess-Zustandsraummodelle. Diese können zur Zeitreihenmodellierung verwendet werden, das heißt, um zukünftige Datenpunkte auf Basis eines nach der Zeit geordneten Eingangsdatensatzes vorherzusagen. Für beide genannten Modellklassen bieten die modernsten Ansatze einen Kompromiss zwischen expressiven Modellen und wunschenswerten rechentechnischen Eigenschaften (z. B. Geschwindigkeit oder Konvergenzeigenschaften). Desweiteren wird für die meisten Methoden variationelle Inferenz verwendet. Unser Ziel ist es, die Inferenz für beide Modellklassen zu verbessern, indem wir zunächst ein tieferes Verständnis der bestehenden Ansätze erlangen und darauf aufbauend bessere Inferenzverfahren entwickeln. Indem wir die bestehenden Kompromisse der heutigen Methoden genauer untersuchen, oder dadurch, dass wir generelle Verbesserungen anbieten, die sich auf mehrere Modelle anwenden lassen, erreichen wir dieses Ziel.  Wir beginnen die Thesis mit einer umfassender Einführung, die den notwendigen technischen Hintergrund zu Gaußprozessen sowie spärlichen (approximativen und effizienten) Gaußprozessen enthält. Anschließend werden die in dieser Thesis behandelten Modellklassen, tiefe Gaußprozesse und Gaußprozess-Zustandsraummodelle, eingeführt, einschließlich detaillierter Herleitungen und eines theoretischen Vergleichs existierender Methoden.  Darauf aufbauend untersuchen wir zuerst tiefe Gaußprozesse genauer und entwickeln dann eine neue Inferenzmethode. Diese basiert darauf, die wünschenswerten Eigenschaften (gute Optimierungseigenschaften gegenüber Expressivität) der modernsten Ansätze gegeneinander abzuwägen. Anschließend zeigen wir experimentell, dass unser neuer Algorithmus zu besser kalibrierten Unsicherheitsabschätzungen als bei bestehenden Methoden führt.  Als Nächstes wenden wir uns Gaußprozess-Zustandsraummodelle zu, wo wir zuerst die theoretischen Eigenschaften existierender Ansätze genau analysieren. Wir nutzen das dabei gewonnene Verständnis, um ein neues Inferenzverfahren für Gaußprozess-Zustandsraummodelle einzuführen, welches Effekte auf verschiedenen Zeitskalen berücksichtigt. Für lange Zeitreihen ist diese Methode effizienter als bisherige Ansätze. Darüber hinaus übertrifft sie ihre Vergleichspartner auf Datensätzen, bei denen Effekte auf mehreren Zeitskalen (sich schnell und langsam verändernde Signale) auftreten.  Zuletzt schlagen wir ein weiteres neues Inferenzverfahren für Gaußprozess-Zustandsraummodelle vor, das die Eigenschaften der aktuellsten Methoden auf diesem Gebiet gegeneinander abwägt. Indem wir variationelle Inferenz mit einem weiteren approximativen Inferenzverfahren, der Laplace- Approximation, kombinieren, entwerfen wir einen effizienten Algorithmus der seine Vergleichspartner dadurch übertrifft, dass er besser kalibrierte Unsicherheitsvorhersagen erzielt.
KW  - probabilistic machine learning
KW  - Gaussian processes
KW  - variational inference
KW  - deep Gaussian processes
KW  - Gaussian process state-space models
KW  - Gauß-Prozess Zustandsraummodelle
KW  - Gauß-Prozesse
KW  - tiefe Gauß-Prozesse
KW  - probabilistisches maschinelles Lernen
KW  - variationelle Inferenz
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604441
ER  - 
TY  - JOUR
A1  - Langseth, Inger
A1  - Jacobsen, Dan Yngve
A1  - Haugsbakken, Halvdan
T1  - MOOCs for Flexible and Lifelong Learning in Higher Education
BT  - The Struggle from within Loosely Coupled Organizations?
JF  - EMOOCs 2021
N2  - In this paper, we take a closer look at the development of Massive Open Online Courses (MOOC) in Norway. We want to contribute to nuancing the image of a sound and sustainable policy for flexible and lifelong learning at national and institutional levels and point to some critical areas of improvement in higher education institutions (HEI). 10 semistructured qualitative interviews were carried out in the autumn 2020 at ten different HE institutions across Norway. The informants were strategically selected among employees involved in MOOC-technology, MOOCproduction and MOOC-support over a period of time stretching from 2010–2020. A main finding is that academics engaged in MOOCs find that their entrepreneurial ideas and results, to a large extent, are overlooked at higher institutional levels, and that progress is frustratingly slow. So far, there seems to be little common understanding of the MOOC-concept and the disruptive and transformative effect that MOOC-technology may have at HEIs. At national levels, digital strategies, funding and digital infrastructure are mainly provided in governmental silos. We suggest that governmental bodies and institutional stake holders pay more attention to entrepreneurial MOOC-initiatives to develop sustainability in flexible and lifelong learning in HEIs. This involves connecting the generous funding of digital projects to the provision of a national portal and platform for Open Access to education. To facilitate sustainable lifelong learning in and across HEIs, more quality control to enhance the legitimacy of MOOC certificates and micro-credentials is also a necessary measure.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-516930
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 63
EP  - 78
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Ladleif, Jan
A1  - Weske, Mathias
T1  - Which event happened first?
BT  - Deferred choice on blockchain using oracles
JF  - Frontiers in blockchain
N2  - First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
KW  - business processes
KW  - business process management
KW  - deferred choice
KW  - workflow patterns
KW  - blockchain
KW  - smart contracts
KW  - oracles
KW  - formal semantics
Y1  - 2021
U6  - https://doi.org/10.3389/fbloc.2021.758169
SN  - 2624-7852
VL  - 4
SP  - 1
EP  - 16
PB  - Frontiers in Blockchain
CY  - Lausanne, Schweiz
ER  - 
TY  - GEN
A1  - Ladleif, Jan
A1  - Weske, Mathias
T1  - Which Event Happened First? Deferred Choice on Blockchain Using Oracles
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 11 
KW  - business processes
KW  - business process management
KW  - deferred choice
KW  - workflow patterns
KW  - blockchain
KW  - smart contracts
KW  - oracles
KW  - formal semantics
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-550681
VL  - 4
SP  - 1
EP  - 16
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Kuban, Robert
A1  - Rotta, Randolf
A1  - Nolte, Jörg
A1  - Chromik, Jonas
A1  - Beilharz, Jossekin Jakob
A1  - Pirl, Lukas
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Weyand, Christopher
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Sauer, Joao
A1  - Coelh, Leandro dos Santos
A1  - Najafi, Pejman
A1  - Pünter, Wenzel
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Vogel, Thomas
A1  - Tran, Chinh
A1  - Moser, Irene
A1  - Grunske, Lars
A1  - Elsaid, Mohamed Esameldin Mohamed
A1  - Abbas, Hazem M.
A1  - Rula, Anisa
A1  - Sejdiu, Gezim
A1  - Maurino, Andrea
A1  - Schmidt, Christopher
A1  - Hügle, Johannes
A1  - Uflacker, Matthias
A1  - Nozza, Debora
A1  - Messina, Enza
A1  - Hoorn, André van
A1  - Frank, Markus
A1  - Schulz, Henning
A1  - Alhosseini Almodarresi Yasin, Seyed Ali
A1  - Nowicki, Marek
A1  - Muite, Benson K.
A1  - Boysan, Mehmet Can
A1  - Bianchi, Federico
A1  - Cremaschi, Marco
A1  - Moussa, Rim
A1  - Abdel-Karim, Benjamin M.
A1  - Pfeuffer, Nicolas
A1  - Hinz, Oliver
A1  - Plauth, Max
A1  - Polze, Andreas
A1  - Huo, Da
A1  - Melo, Gerard de
A1  - Mendes Soares, Fábio
A1  - Oliveira, Roberto Célio Limão de
A1  - Benson, Lawrence
A1  - Paul, Fabian
A1  - Werling, Christian
A1  - Windheuser, Fabian
A1  - Stojanovic, Dragan
A1  - Djordjevic, Igor
A1  - Stojanovic, Natalija
A1  - Stojnev Ilic, Aleksandra
A1  - Weidmann, Vera
A1  - Lowitzki, Leon
A1  - Wagner, Markus
A1  - Ifa, Abdessatar Ben
A1  - Arlos, Patrik
A1  - Megia, Ana
A1  - Vendrell, Joan
A1  - Pfitzner, Bjarne
A1  - Redondo, Alberto
A1  - Ríos Insua, David
A1  - Albert, Justin Amadeus
A1  - Zhou, Lin
A1  - Arnrich, Bert
A1  - Szabó, Ildikó
A1  - Fodor, Szabina
A1  - Ternai, Katalin
A1  - Bhowmik, Rajarshi
A1  - Campero Durand, Gabriel
A1  - Shevchenko, Pavlo
A1  - Malysheva, Milena
A1  - Prymak, Ivan
A1  - Saake, Gunter
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2019
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2019 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 09. April und 12. November 2019 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 158 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597915
SN  - 978-3-86956-564-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 158
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Kruse, Sebastian
T1  - Scalable data profiling
T1  - Skalierbares Data Profiling
BT  - distributed discovery and analysis of structural metadata
BT  - Entdecken und Analysieren struktureller Metadaten
N2  - Data profiling is the act of extracting structural metadata from datasets. Structural metadata, such as data dependencies and statistics, can support data management operations, such as data integration and data cleaning. Data management often is the most time-consuming activity in any data-related project. Its support is extremely valuable in our data-driven world, so that more time can be spent on the actual utilization of the data, e. g., building analytical models. In most scenarios, however, structural metadata is not given and must be extracted first. Therefore, efficient data profiling methods are highly desirable.

Data profiling is a computationally expensive problem; in fact, most dependency discovery problems entail search spaces that grow exponentially in the number of attributes. To this end, this thesis introduces novel discovery algorithms for various types of data dependencies – namely inclusion dependencies, conditional inclusion dependencies, partial functional dependencies, and partial unique column combinations – that considerably improve over state-of-the-art algorithms in terms of efficiency and that scale to datasets that cannot be processed by existing algorithms. The key to those improvements are not only algorithmic innovations, such as novel pruning rules or traversal strategies, but also algorithm designs tailored for distributed execution. While distributed data profiling has been mostly neglected by previous works, it is a logical consequence on the face of recent hardware trends and the computational hardness of dependency discovery.

To demonstrate the utility of data profiling for data management, this thesis furthermore presents Metacrate, a database for structural metadata. Its salient features are its flexible data model, the capability to integrate various kinds of structural metadata, and its rich metadata analytics library. We show how to perform a data anamnesis of unknown, complex datasets based on this technology. In particular, we describe in detail how to reconstruct the schemata and assess their quality as part of the data anamnesis.

The data profiling algorithms and Metacrate have been carefully implemented, integrated with the Metanome data profiling tool, and are available as free software. In that way, we intend to allow for easy repeatability of our research results and also provide them for actual usage in real-world data-related projects.
N2  - Data Profiling bezeichnet das Extrahieren struktureller Metadaten aus Datensätzen. Stukturelle Metadaten, z.B. Datenabhängigkeiten und Statistiken, können bei der Datenverwaltung unterstützen. Tatsächlich beansprucht das Verwalten von Daten, z.B. Datenreinigung und -integration, in vielen datenbezogenen Projekten einen Großteil der Zeit. Die Unterstützung solcher verwaltenden Aktivitäten ist in unserer datengetriebenen Welt insbesondere deswegen sehr wertvoll, weil so mehr Zeit auf die eigentlich wertschöpfende Arbeit mit den Daten verwendet werden kann, z.B. auf das Erstellen analytischer Modelle. Allerdings sind strukturelle Metadaten in den meisten Fällen nicht oder nur unvollständig vorhanden und müssen zunächst extahiert werden. Somit sind effiziente Data-Profiling-Methoden erstrebenswert.

Probleme des Data Profiling sind in der Regel sehr berechnungsintensiv: Viele Datenabhängigkeitstypen spannen einen exponentiell in der Anzahl der Attribute wachsenden Suchraum auf. Aus diesem Grund beschreibt die vorliegende Arbeit neue Algorithmen zum Auffinden verschiedener Arten von Datenabhängigkeiten – nämlich Inklusionsabhängigkeiten, bedingter Inklusionsabhängigkeiten, partieller funktionaler Abhängigkeiten sowie partieller eindeutiger Spaltenkombinationen – die bekannte Algorithmen in Effizienz und Skalierbarkeit deutlich übertreffen und somit Datensätze verarbeiten können, an denen bisherige Algorithmen gescheitert sind.

Um die Nützlichkeit struktureller Metadaten für die Datenverwaltung zu demonstrieren, stellt diese Arbeit des Weiteren das System Metacrate vor, eine Datenbank für strukturelle Metadaten. Deren besondere Merkmale sind ein flexibles Datenmodell; die Fähigkeit, verschiedene Arten struktureller Metadaten zu integrieren; und eine umfangreiche Bibliothek an Metadatenanalysen. Mithilfe dieser Technologien führen wir eine Datenanamnese unbekannter, komplexer Datensätze durch. Insbesondere beschreiben wir dabei ausführlicher, wie Schemata rekonstruiert und deren Qualität abgeschätzt werden können.

Wir haben oben erwähnte Data-Profiling-Algorithmen sowie Metacrate sorgfältig implementiert, mit dem Data-Profiling-Programm Metanome integriert und stellen beide als freie Software zur Verfügung. Dadurch wollen wir nicht nur die Nachvollziehbarkeit unserer Forschungsergebnisse möglichst einfach gestalten, sondern auch deren Einsatz in der Praxis ermöglichen.
KW  - data profiling
KW  - metadata
KW  - inclusion dependencies
KW  - functional dependencies
KW  - distributed computation
KW  - metacrate
KW  - Data Profiling
KW  - Metadaten
KW  - Inklusionsabhängigkeiten
KW  - funktionale Abhängigkeiten
KW  - verteilte Berechnung
KW  - Metacrate
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-412521
ER  - 
TY  - THES
A1  - Koßmann, Jan
T1  - Unsupervised database optimization
BT  - efficient index selection & data dependency-driven query optimization
N2  - The amount of data stored in databases and the complexity of database workloads are ever- increasing. Database management systems (DBMSs) offer many configuration options, such as index creation or unique constraints, which must be adapted to the specific instance to efficiently process large volumes of data. Currently, such database optimization is complicated, manual work performed by highly skilled database administrators (DBAs). In cloud scenarios, manual database optimization even becomes infeasible: it exceeds the abilities of the best DBAs due to the enormous number of deployed DBMS instances (some providers maintain millions of instances), missing domain knowledge resulting from data privacy requirements, and the complexity of the configuration tasks.

Therefore, we investigate how to automate the configuration of DBMSs efficiently with the help of unsupervised database optimization. While there are numerous configuration options, in this thesis, we focus on automatic index selection and the use of data dependencies, such as functional dependencies, for query optimization. Both aspects have an extensive performance impact and complement each other by approaching unsupervised database optimization from different perspectives.

Our contributions are as follows: (1) we survey automated state-of-the-art index selection algorithms regarding various criteria, e.g., their support for index interaction. We contribute an extensible platform for evaluating the performance of such algorithms with industry-standard datasets and workloads. The platform is well-received by the community and has led to follow-up research. With our platform, we derive the strengths and weaknesses of the investigated algorithms. We conclude that existing solutions often have scalability issues and cannot quickly determine (near-)optimal solutions for large problem instances. (2) To overcome these limitations, we present two new algorithms. Extend determines (near-)optimal solutions with an iterative heuristic. It identifies the best index configurations for the evaluated benchmarks. Its selection runtimes are up to 10 times lower compared with other near-optimal approaches. SWIRL is based on reinforcement learning and delivers solutions instantly. These solutions perform within 3 % of the optimal ones. Extend and SWIRL are available as open-source implementations.

(3) Our index selection efforts are complemented by a mechanism that analyzes workloads to determine data dependencies for query optimization in an unsupervised fashion. We describe and classify 58 query optimization techniques based on functional, order, and inclusion dependencies as well as on unique column combinations. The unsupervised mechanism and three optimization techniques are implemented in our open-source research DBMS Hyrise. Our approach reduces the Join Order Benchmark’s runtime by 26 % and accelerates some TPC-DS queries by up to 58 times.

Additionally, we have developed a cockpit for unsupervised database optimization that allows interactive experiments to build confidence in such automated techniques. In summary, our contributions improve the performance of DBMSs, support DBAs in their work, and enable them to contribute their time to other, less arduous tasks.
N2  - Sowohl die Menge der in Datenbanken gespeicherten Daten als auch die Komplexität der Datenbank-Workloads steigen stetig an. Datenbankmanagementsysteme bieten viele Konfigurationsmöglichkeiten, zum Beispiel das Anlegen von Indizes oder die Definition von Unique Constraints. Diese Konfigurations-möglichkeiten müssen für die spezifische Datenbankinstanz angepasst werden, um effizient große Datenmengen verarbeiten zu können. Heutzutage wird die komplizierte Datenbankoptimierung manuell von hochqualifizierten Datenbankadministratoren vollzogen. In Cloud-Szenarien ist die manuelle Daten-bankoptimierung undenkbar: Die enorme Anzahl der verwalteten Systeme (einige Anbieter verwalten Millionen von Instanzen), das fehlende Domänenwissen durch Datenschutzanforderungen und die Kom-plexität der Konfigurationsaufgaben übersteigen die Fähigkeiten der besten Datenbankadministratoren.

Aus diesen Gründen betrachten wir, wie die Konfiguration von Datenbanksystemen mit der Hilfe von Unsupervised Database Optimization effizient automatisiert werden kann. Während viele Konfigura-tionsmöglichkeiten existieren, konzentrieren wir uns auf die automatische Indexauswahl und die Nutzung von Datenabhängigkeiten, zum Beispiel Functional Dependencies, für die Anfrageoptimierung. Beide Aspekte haben großen Einfluss auf die Performanz und ergänzen sich gegenseitig, indem sie Unsupervised Database Optimization aus verschiedenen Perspektiven betrachten. 

Wir leisten folgende Beiträge: (1) Wir untersuchen dem Stand der Technik entsprechende automatisierte Indexauswahlalgorithmen hinsichtlich verschiedener Kriterien, zum Beispiel bezüglich ihrer Berücksichtigung von Indexinteraktionen. Wir stellen eine erweiterbare Plattform zur Leistungsevaluierung solcher Algorithmen mit Industriestandarddatensätzen und -Workloads zur Verfügung. Diese Plattform wird von der Forschungsgemeinschaft aktiv verwendet und hat bereits zu weiteren Forschungsarbeiten geführt. Mit unserer Plattform leiten wir die Stärken und Schwächen der untersuchten Algorithmen ab. Wir kommen zu dem Schluss, dass bestehende Lösung häufig Skalierungsschwierigkeiten haben und nicht in der Lage sind, schnell (nahezu) optimale Lösungen für große Problemfälle zu ermitteln. (2) Um diese Einschränkungen zu bewältigen, stellen wir zwei neue Algorithmen vor. Extend ermittelt (nahezu) optimale Lösungen mit einer iterativen Heuristik. Das Verfahren identifiziert die besten Indexkonfigurationen für die evaluierten Benchmarks und seine Laufzeit ist bis zu 10-mal geringer als die Laufzeit anderer nahezu optimaler Ansätze. SWIRL basiert auf Reinforcement Learning und ermittelt Lösungen ohne Wartezeit. Diese Lösungen weichen maximal 3 % von den optimalen Lösungen ab. Extend und SWIRL sind verfügbar als Open-Source-Implementierungen.

(3) Ein Mechanismus, der mittels automatischer Workload-Analyse Datenabhängigkeiten für die Anfrageoptimierung bestimmt, ergänzt die vorigen Beiträge. Wir beschreiben und klassifizieren 58 Techniken, die auf Functional, Order und Inclusion Dependencies sowie Unique Column Combinations basieren. Der Analysemechanismus und drei Optimierungstechniken sind in unserem Open-Source-Forschungsdatenbanksystem Hyrise implementiert. Der Ansatz reduziert die Laufzeit des Join Order Benchmark um 26 % und erreicht eine bis zu 58-fache Beschleunigung einiger TPC-DS-Anfragen.

Darüber hinaus haben wir ein Cockpit für Unsupervised Database Optimization entwickelt. Dieses Cockpit ermöglicht interaktive Experimente, um Vertrauen in automatisierte Techniken zur Datenbankoptimie-rung zu schaffen. Zusammenfassend lässt sich festhalten, dass unsere Beiträge die Performanz von Datenbanksystemen verbessern, Datenbankadministratoren in ihrer Arbeit unterstützen und ihnen ermöglichen, ihre Zeit anderen, weniger mühsamen, Aufgaben zu widmen.
KW  - Datenbank
KW  - Datenbanksysteme
KW  - database
KW  - DBMS
KW  - Hyrise
KW  - index selection
KW  - database systems
KW  - RL
KW  - reinforcement learning
KW  - query optimization
KW  - data dependencies
KW  - functional dependencies
KW  - order dependencies
KW  - unique column combinations
KW  - inclusion dependencies
KW  - funktionale Abhängigkeiten
KW  - Anfrageoptimierung
KW  - Query-Optimierung
KW  - extend
KW  - SWIRL
KW  - unsupervised
KW  - database optimization
KW  - self-driving
KW  - autonomous
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589490
ER  - 
TY  - JOUR
A1  - Koskinen, Johanna
A1  - Kairikko, Anette
A1  - Suonpää, Maija
T1  - Hybrid MOOCs Enabling Global Collaboration Between Learners
JF  - EMOOCs 2021
N2  - The COVID-19 pandemic has accelerated the pace of digital transformation, which has forced people to quickly adapt to working and collaborating online. Learning in digital environments has without a doubt gained increased significance during this rather unique time and, therefore, Massive Open Online Courses (MOOCs) have more potential to attract a wider target audience. This has also brought about more possibilities for global collaboration among learners as learning is not limited to physical spaces. Despite the wide interest in MOOCs, there is a need for further research on the global collaboration potential they offer. The aim of this paper is to adopt an action research approach to study how a hybrid MOOC design enables learners’ global collaboration. During the years 2019–2020 together with an international consortium called Corship (Corporate Edupreneurship) we jointly designed, created and implemented a hybrid model MOOC, called the “Co-innovation Journey for Startups and Corporates”. It was targeted towards startup entrepreneurs, corporate representatives and higher education students and it was funded by the EU. The MOOC started with 2,438 enrolled learners and the completion rate for the first four weeks was 29.7%. Out of these 208 learners enrolled for the last two weeks, which in turn had a completion rate of 58%. These figures were clearly above the general average for MOOCs. According to our findings, we argue that a hybrid MOOC design may foster global collaboration within a learning community even beyond the course boundaries. The course included four weeks of independent learning, an xMOOC part, and two weeks of collaborative learning, a cMOOC part. The xMOOC part supported learners in creating a shared knowledge base, which enhanced the collaborative learning when entering the cMOOC part of the course.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-516917
SN  - 978-3-86956-512-5
SP  - 35
EP  - 48
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Klinke, Paula
A1  - Verhoeven, Silvan
A1  - Roth, Felix
A1  - Hagemann, Linus
A1  - Alnawa, Tarik
A1  - Lincke, Jens
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Tool support for collaborative creation of interactive storytelling media
N2  - Scrollytellings are an innovative form of web content. Combining the benefits of books, images, movies, and video games, they are a tool to tell compelling stories and provide excellent learning opportunities. Due to their multi-modality, creating high-quality scrollytellings is not an easy task. Different professions, such as content designers, graphics designers, and developers, need to collaborate to get the best out of the possibilities the scrollytelling format provides. Collaboration unlocks great potential. However, content designers cannot create scrollytellings directly and always need to consult with developers to implement their vision. This can result in misunderstandings. Often, the resulting scrollytelling will not match the designer’s vision sufficiently, causing unnecessary iterations. Our project partner Typeshift specializes in the creation of individualized scrollytellings for their clients. Examined existing solutions for authoring interactive content are not optimally suited for creating highly customized scrollytellings while still being able to manipulate all their elements programmatically. Based on their experience and expertise, we developed an editor to author scrollytellings in the lively.next live-programming environment. In this environment, a graphical user interface for content design is combined with powerful possibilities for programming behavior with the morphic system. The editor allows content designers to take on large parts of the creation process of scrollytellings on their own, such as creating the visible elements, animating content, and fine-tuning the scrollytelling. Hence, developers can focus on interactive elements such as simulations and games. Together with Typeshift, we evaluated the tool by recreating an existing scrollytelling and identified possible future enhancements. Our editor streamlines the creation process of scrollytellings. Content designers and developers can now both work on the same scrollytelling. Due to the editor inside of the lively.next environment, they can both work with a set of tools familiar to them and their traits. Thus, we mitigate unnecessary iterations and misunderstandings by enabling content designers to realize large parts of their vision of a scrollytelling on their own. Developers can add advanced and individual behavior. Thus, developers and content designers benefit from a clearer distribution of tasks while keeping the benefits of collaboration.
N2  - Scrollytellings sind innovative Webinhalte. Indem sie die Vorteile von Büchern, Bildern, Filmen und Videospielen vereinen, sind sie ein Werkzeug um Geschichten fesselnd zu erzählen und Lehrinhalte besonders effektiv zu vermitteln. Die Erstellung von Scrollytellings ist aufgrund ihrer Multimodalität keine einfache Aufgabe. Verschiedene Berufszweige wie Content-Designer:innen, Grafikdesigner:innen und Entwickler:innen müssen zusammenarbeiten, um das volle Potential des Scrollytelingformats auszuschöpfen. Jedoch können ContentDesigner:innen Scrollytellings nicht direkt selbst erstellen, sondern müssen ihre Vision stets gemeinsam mit Entwickler:innen umsetzen. Dabei können unnötige Iterationen über das Scrollytelling auftreten, wenn dieses den Visionen der Content-Designer:innen noch nicht entspricht. Außerdem können Missverständnisse entstehen. Unser Projektpartner Typeshift hat sich auf die Erstellung von, für seine Kund:innen individualisierten, Scrollytellings spezialisiert. Aufbauend auf Typeshifts Erfahrungen und Expertise haben wir einen Editor entwickelt, um Scrollytellings in der Live-Programmierumgebung lively.next zu erstellen. In lively.next wird eine graphische Oberfläche für die Erstellung von Inhalten mit weitreichenden Möglichkeiten zur Programmierung von Verhalten durch das Morphic-System kombiniert. Der Editor erlaubt es Content-Designer:innen eigenständig große Teile des Erstellungsprozesses von Scrollytellings durchzuführen, zum Beispiel das Erzeugen visueller Elemente, deren Animation sowie die Feinjustierung des gesamten Scrollytellings. So können Entwickler:innen sich auf die Erstellung von komplexen interaktiven Elementen, wie Simulationen oder Spiele, konzentrieren. Zusammen mit Typeshift haben wir die Nutzbarkeit unseres Editors durch die Nachbildung eines bereits existierenden Scrollytellings evaluiert und mögliche Verbesserungen identifiziert. Unser Editor vereinfacht den Erstellungsprozess von Scrollytellings. Content Designer:innen und Entwickler:innen können jetzt beide an demselben Scrollytelling arbeiten. Durch den Editor, der in lively.next integriert ist, können beide Parteien mit den ihnen bekannten und vertrauten Werkzeugen arbeiten. Durch den Editor verringern wir unnötige Iterationen und Missverständnisse und erlauben Content-Designer:innen große Teile ihrer Vision eines Scrollytellings eigenständig umzusetzen. Entwickler:innen können zusätzliches, individuelles Verhalten hinzufügen. So profitieren Entwickler:innen und Content-Designer:innen von einer besseren Aufgabenteilung, während die Vorteile von Zusammenarbeit bestehen bleiben.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 141 
KW  - scrollytelling
KW  - interactive media
KW  - web-based development
KW  - Lively Kernel
KW  - Scrollytelling
KW  - interaktive Medien
KW  - webbasierte Entwicklung
KW  - Lively Kernel
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-518570
SN  - 978-3-86956-521-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 141
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Klimke, Jan
T1  - Web-based provisioning and application of large-scale virtual 3D city models
T1  - Webbasierte Bereitstellung und Anwendung von  großen virtuellen 3D-Stadtmodellen
N2  - Virtual 3D city models represent and integrate a variety of spatial data and georeferenced data related to urban areas. With the help of improved remote-sensing technology, official 3D cadastral data, open data or geodata crowdsourcing, the quantity and availability of such data are constantly expanding and its quality is ever improving for many major cities and metropolitan regions. There are numerous fields of applications for such data, including city planning and development, environmental analysis and simulation, disaster and risk management, navigation systems, and interactive city maps.

The dissemination and the interactive use of virtual 3D city models represent key technical functionality required by nearly all corresponding systems, services, and applications. The size and complexity of virtual 3D city models, their management, their handling, and especially their visualization represent challenging tasks. For example, mobile applications can hardly handle these models due to their massive data volume and data heterogeneity. Therefore, the efficient usage of all computational resources (e.g., storage, processing power, main memory, and graphics hardware, etc.) is a key requirement for software engineering in this field. Common approaches are based on complex clients that require the 3D model data (e.g., 3D meshes and 2D textures) to be transferred to them and that then render those received 3D models. However, these applications have to implement most stages of the visualization pipeline on client side. Thus, as high-quality 3D rendering processes strongly depend on locally available computer graphics resources, software engineering faces the challenge of building robust cross-platform client implementations.

Web-based provisioning aims at providing a service-oriented software architecture that consists of tailored functional components for building web-based and mobile applications that manage and visualize virtual 3D city models. This thesis presents corresponding concepts and techniques for web-based provisioning of virtual 3D city models. In particular, it introduces services that allow us to efficiently build applications for virtual 3D city models based on a fine-grained service concept. The thesis covers five main areas:

1. A Service-Based Concept for Image-Based Provisioning of
Virtual 3D City Models It creates a frame for a broad range of services related to the rendering and image-based dissemination of virtual 3D city models.

2. 3D Rendering Service for Virtual 3D City Models This service provides efficient, high-quality 3D rendering functionality for virtual 3D city models. In particular, it copes with requirements such as standardized data formats, massive model texturing, detailed 3D geometry, access to associated feature data, and non-assumed frame-to-frame coherence for parallel service requests. In addition, it supports thematic and artistic styling based on an expandable graphics effects library.

3. Layered Map Service for Virtual 3D City Models It generates a map-like representation of virtual 3D city models using an oblique view. It provides high visual quality, fast initial loading times, simple map-based interaction and feature data access. Based on a configurable client framework, mobile and web-based applications for virtual 3D city models can be created easily.

4. Video Service for Virtual 3D City Models It creates and synthesizes videos from virtual 3D city models. Without requiring client-side 3D rendering capabilities, users can create camera paths by a map-based user interface, configure scene contents, styling, image overlays, text overlays, and their transitions. The service significantly reduces the manual effort typically required to produce such videos. The videos can automatically be updated when the underlying data changes.

5. Service-Based Camera Interaction It supports task-based 3D camera interactions, which can be integrated seamlessly into service-based visualization applications. It is demonstrated how to build such web-based interactive applications for virtual 3D city models using this camera service.

These contributions provide a framework for design, implementation, and deployment of future web-based applications, systems, and services for virtual 3D city models. The approach shows how to decompose the complex, monolithic functionality of current 3D geovisualization systems into independently designed, implemented, and operated service- oriented units. In that sense, this thesis also contributes to microservice architectures for 3D geovisualization systems—a key challenge of today’s IT systems engineering to build scalable IT solutions.
N2  - Virtuelle 3D-Stadtmodelle repräsentieren und integrieren eine große Bandbreite von Geodaten und georeferenzierten Daten über städtische Gebiete. Verfügbarkeit, Quantität und Qualität solcher Daten verbessern sich ständig für viele Städte und Metropolregionen, nicht zuletzt bedingt durch verbesserte Erfassungstechnologien, amtliche 3D-Kataster, offene Geodaten oder Geodaten-Crowdsourcing. Die Anwendungsfelder für virtuelle 3D-Stadtmodelle sind vielfältig. Sie reichen von Stadtplanung und Stadtentwicklung, Umweltanalysen und -simulationen, über Katastrophen- und Risikomanagement, bis hin zu Navigationssystemen und interaktiven Stadtkarten.

Die Verbreitung und interaktive Nutzung von virtuellen 3D-Stadtmodellen stellt hierbei eine technische Kernfunktionalität für fast alle entsprechenden Systeme, Services und Anwendungen dar. Aufgrund der Komplexität und Größe virtueller 3D-Stadtmodelle stellt ihre Verwaltung, ihre Verarbeitung und insbesondere ihre Visualisierung eine große Herausforderung dar. Daher können zum Beispiel mobile Anwendungen virtuelle 3D-Stadtmodelle, wegen ihres massiven Datenvolumens und ihrer Datenheterogenität, kaum effizient handhaben. Die effiziente Nutzung von Rechenressourcen, wie zum Beispiel Prozessorleistung, Hauptspeicher, Festplattenspeicher und Grafikhardware, bildet daher eine Schlüsselanforderung an die Softwaretechnik in diesem Bereich. Heutige Ansätze beruhen häufig auf komplexen Clients, zu denen 3D-Modelldaten (z.B. 3D-Netze und 2D- Texturen) transferiert werden müssen und die das Rendering dieser Daten selbst ausführen. Nachteilig ist dabei unter anderem, dass sie die meisten Stufen der Visualisierungspipeline auf der Client-Seite ausführen müssen. Es ist daher softwaretechnisch schwer, robuste Cross-Plattform-Implementierungen für diese Clients zu erstellen, da hoch qualitative 3D-Rendering-Prozesse nicht unwesentlich von lokalen computergrafischen Ressourcen abhängen.

Die webbasierte Bereitstellung virtueller 3D-Stadtmodelle beruht auf einer serviceorientierten Softwarearchitektur. Diese besteht aus spezifischen funktionalen Komponenten für die Konstruktion von mobilen oder webbasierten Anwendungen für die Verarbeitung und Visualisierung von komplexen virtuellen 3D-Stadtmodellen. Diese Arbeit beschreibt entsprechende Konzepte und Techniken für eine webbasierte Bereitstellung von virtuellen 3D-Stadtmodellen. Es werden insbesondere Services vorgestellt, die eine effiziente Entwicklung von Anwendungen für virtuelle 3D-Stadtmodelle auf Basis eines feingranularen Dienstekonzepts ermöglichen. Die Arbeit gliedert sich in fünf thematische Hauptbeiträge:

1. Ein servicebasiertes Konzept für die bildbasierte Bereitstellung von virtuellen 3D-Stadtmodellen: Es wird ein konzeptioneller Rahmen für eine Reihe von Services in Bezug auf das Rendering und die bildbasierte Bereitstellung virtueller 3D-Stadtmodelle eingeführt.

2. 3D-Rendering-Service für virtuelle 3D-Stadtmodelle: Dieser Service stellt eine effiziente, hochqualitative 3D-Renderingfunktionalität für virtuelle 3D-Stadtmodelle bereit. Insbesondere werden Anforderungen, wie zum Beispiel standardisierte Datenformate, massive Modelltexturierung, detaillierte 3D-Geometrien, Zugriff auf assoziierte Fachdaten und fehlende Frame-zu-Frame-Kohärenz bei parallelen Serviceanfragen erfüllt. Der Service unterstützt zudem die thematische und gestalterische Stilisierung der Darstellungen auf Basis einer erweiterbaren Grafikeffektbibliothek.

3. Layered-Map-Service für virtuelle 3D-Stadtmodelle: Dieser Service generiert eine kartenverwandte Darstellung in Form einer Schrägansicht auf virtuelle 3D-Stadtmodelle in hoher Renderingqualität. Er weist eine schnelle initiale Ladezeit, eine einfache, kartenbasierte Interaktion und Zugang zu den Fachdaten des virtuellen 3D-Stadtmodells auf. Mittels eines konfigurierbaren Client-Frameworks können damit sowohl mobile, als auch webbasierte Anwendungen für virtuelle 3D Stadtmodelle einfach erstellt werden.

4. Video-Service für virtuelle 3D-Stadtmodelle: Dieser Service erstellt und synthetisiert Videos aus virtuellen 3D-Stadtmodellen. Nutzern wird ermöglicht 3D-Kamerapfade auf einfache Weise über eine kartenbasierte Nutzungsschnittstelle zu erstellen. Weiterhin können die Szeneninhalte, die Stilisierung der Szene, sowie Bild- und Textüberlagerungen konfigurieren und Übergänge zwischen einzelnen Szenen festzulegen, ohne dabei clientseitige 3D-Rendering-Fähigkeiten vorauszusetzen. Das System reduziert den manuellen Aufwand für die Produktion von Videos für virtuelle 3D-Stadtmodelle erheblich. Videos können zudem automatisiert aktualisiert werden, wenn sich zugrunde liegende Daten ändern.

5. Servicebasierte Kamerainteraktion Die vorgestellten Services unterstützen aufgabenbasierte 3D-Kamerainteraktionen und deren Integration in servicebasierte Visualisierungsanwendungen. Es wird gezeigt, wie webbasierte interaktive Anwendungen für virtuelle 3D-Stadtmodelle mit Hilfe von Kameraservices umgesetzt werden können.

Diese Beiträge bieten einen Rahmen für das Design, die Implementierung und die Bereitstellung zukünftiger webbasierter Anwendungen, Systeme und Services für virtuelle 3D-Stadtmodelle. Der Ansatz zeigt, wie die meist komplexe, monolithische Funktionalität heutiger 3D-Geovisualisierungssysteme in unabhängig entworfene, implementierte und betriebene serviceorientierte Einheiten zerlegt werden kann. In diesem Sinne stellt diese Arbeit auch einen Beitrag für die Entwicklung von Microservice-Architekturen für 3D-Geovisualisierungssysteme bereit – eine aktuelle Herausforderung in der Softwaresystemtechnik in Hinblick auf den Aufbau skalierender IT-Lösungen.
KW  - 3D city model
KW  - 3D geovisualization
KW  - 3D portrayal
KW  - serverside 3D rendering
KW  - CityGML
KW  - 3D-Stadtmodell
KW  - 3D-Geovisualisierung
KW  - 3D-Rendering
KW  - serverseitiges 3D-Rendering
KW  - serviceorientierte Architekturen
KW  - service-oriented architectures
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-428053
ER  - 
TY  - JOUR
A1  - Khalil, Mohammad
T1  - Who Are the Students of MOOCs?
BT  - Experience from Learning Analytics Clustering Techniques
JF  - EMOOCs 2021
N2  - Clustering in education is important in identifying groups of objects in order to find linked patterns of correlations in educational datasets. As such, MOOCs provide a rich source of educational datasets which enable a wide selection of options to carry out clustering and an opportunity for cohort analyses. In this experience paper, five research studies on clustering in MOOCs are reviewed, drawing out several reasonings, methods, and students’ clusters that reflect certain kinds of learning behaviours. The collection of the varied clusters shows that each study identifies and defines clusters according to distinctive engagement patterns. Implications and a summary are provided at the end of the paper.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517298
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 259
EP  - 269
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Kerr, John
A1  - Lorenz, Anja
A1  - Schön, Sandra
A1  - Ebner, Martin
A1  - Wittke, Andreas
T1  - Open Tools and Methods to Support the Development of MOOCs
BT  - A Collection of How-tos, Monster Assignment and Kits
JF  - EMOOCs 2021
N2  - There are a plethora of ways to guide and support people to learn about MOOC (massive open online course) development, from their first interest, sourcing supportive resources, methods and tools to better aid their understanding of the concepts and pedagogical approaches of MOOC design, to becoming a MOOC developer. This contribution highlights tools and methods that are openly available and re-usable under Creative Commons licenses. Our collection builds upon the experiences from three MOOC development and hosting teams with joint experiences of several hundred MOOCs (University of Applied Sciences in Lübeck, Graz University of Technology, University of Glasgow) in three European countries, which are Germany, Austria and the UK. The contribution recommends and shares experiences with short articles and poster for first information sharing a Monster MOOC assignment for beginners, a MOOC canvas for first sketches, the MOOC design kit for details of instructional design and a MOOC for MOOC makers and a MOOC map as introduction into a certain MOOC platform.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517219
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 187
EP  - 200
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Calle, Alejandro
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Weidmann, Vera
A1  - Lowitzki, Leon
A1  - Mirtschin, Marvin
A1  - Hoorn, André van
A1  - Frank, Markus
A1  - Schulz, Henning
A1  - Stojanovic, Dragan
A1  - Stojanovic, Natalija
A1  - Stojnev Ilic, Aleksandra
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Weyand, Christopher
A1  - Wagner, Markus
A1  - Plauth, Max
A1  - Polze, Andreas
A1  - Nowicki, Marek
A1  - Seth, Sugandh
A1  - Kaur Chahal, Kuljit
A1  - Singh, Gurwinder
A1  - Speth, Sandro
A1  - Janes, Andrea
A1  - Camilli, Matteo
A1  - Ziegler, Erik
A1  - Schmidberger, Marcel
A1  - Pörschke, Mats
A1  - Bartz, Christian
A1  - Lorenz, Martin
A1  - Meinel, Christoph
A1  - Beilich, Robert
A1  - Bertazioli, Dario
A1  - Carlomagno, Cristiano
A1  - Bedoni, Marzia
A1  - Messina, Vincenzina
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
ED  - Sommer, Jürgen
T1  - HPI Future SOC Lab
BT  - Proceedings 2020
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
This technical report presents results of research projects executed in 2020. Selected projects have presented their results on April 21st and November 10th 2020 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2020 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 21. April und 10. November 2020 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 159 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-598014
SN  - 978-3-86956-565-1
SN  - 1613-5652
SN  - 2191-1665
IS  - 159
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Jonson Carlon, May Kristine
A1  - Gaddem, Mohamed Rami
A1  - Hernández Reyes, César Augusto
A1  - Nagahama, Toru
A1  - Cross, Jeffrey S.
T1  - Investigating Mechanical Engineering Learners’ Satisfaction with a Revised Monozukuri MOOC
JF  - EMOOCs 2021
N2  - Aside from providing instructional materials to the public, developing massive open online courses (MOOCs) can benefit institutions in different ways. Some examples include providing training opportunities for their students aspiring to work in the online learning space, strengthening its brand recognition through courses appealing to enthusiasts, and enabling online linkages with other universities. One such example is the monozukuri MOOC offered by the Tokyo Institute of Technology on edX, which initially presented the Japanese philosophy of making things in the context of a mechanical engineering course. In this paper, we describe the importance of involving a course development team with a diverse background. The monozukuri MOOC and its revision enabled us to showcase an otherwise distinctively Japanese topic (philosophy) as an intersection of various topics of interest to learners with an equally diverse background. The revision resulted in discussing monozukuri in a mechanical engineering lesson and how monozukuri is actively being practiced in the Japanese workplace and academic setting while juxtaposing it to the relatively Western concept of experiential learning. Aside from presenting the course with a broader perspective, the revision had been an exercise for its team members on working in a multicultural environment within a Japanese institution, thus developing their project management and communication skills.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517266
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 237
EP  - 247
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Jiang, Lan
T1  - Discovering metadata in data files
N2  - It is estimated that data scientists spend up to 80% of the time exploring, cleaning, and transforming their data. A major reason for that expenditure is the lack of knowledge about the used data, which are often from different sources and have heterogeneous structures. As a means to describe various properties of data, metadata can help data scientists understand and prepare their data, saving time for innovative and valuable data analytics. However, metadata do not always exist: some data file formats are not capable of storing them; metadata were deleted for privacy concerns; legacy data may have been produced by systems that were not designed to store and handle meta- data. As data are being produced at an unprecedentedly fast pace and stored in diverse formats, manually creating metadata is not only impractical but also error-prone, demanding automatic approaches for metadata detection.

In this thesis, we are focused on detecting metadata in CSV files – a type of plain-text file that, similar to spreadsheets, may contain different types of content at arbitrary positions. We propose a taxonomy of metadata in CSV files and specifically address the discovery of three different metadata: line and cell type, aggregations, and primary keys and foreign keys.

Data are organized in an ad-hoc manner in CSV files, and do not follow a fixed structure, which is assumed by common data processing tools. Detecting the structure of such files is a prerequisite of extracting information from them, which can be addressed by detecting the semantic type, such as header, data, derived, or footnote, of each line or each cell. We propose the supervised- learning approach Strudel to detect the type of lines and cells. CSV files may also include aggregations. An aggregation represents the arithmetic relationship between a numeric cell and a set of other numeric cells. Our proposed AggreCol algorithm is capable of detecting aggregations of five arithmetic functions in CSV files. Note that stylistic features, such as font style and cell background color, do not exist in CSV files. Our proposed algorithms address the respective problems by using only content, contextual, and computational features.

Storing a relational table is also a common usage of CSV files. Primary keys and foreign keys are important metadata for relational databases, which are usually not present for database instances dumped as plain-text files. We propose the HoPF algorithm to holistically detect both constraints in relational databases. Our approach is capable of distinguishing true primary and foreign keys from a great amount of spurious unique column combinations and inclusion dependencies, which can be detected by state-of-the-art data profiling algorithms.
N2  - Schätzungen zufolge verbringen Datenwissenschaftler bis zu 80% ihrer Zeit mit der Erkundung, Bereinigung und Umwandlung ihrer Daten. Ein Hauptgrund für diesen Aufwand ist das fehlende Wissen über die verwendeten Daten, die oft aus unterschiedlichen Quellen stammen und heterogene Strukturen aufweisen.
Als Mittel zur Beschreibung verschiedener Dateneigenschaften können Metadaten Datenwissenschaftlern dabei helfen, ihre Daten zu verstehen und aufzubereiten, und so wertvolle Zeit die Datenanalysen selbst sparen.
Metadaten sind jedoch nicht immer vorhanden: Zum Beispiel sind einige Dateiformate nicht in der Lage, sie zu speichern; Metadaten können aus Datenschutzgründen gelöscht worden sein; oder ältere Daten wurden möglicherweise von Systemen erzeugt, die nicht für die Speicherung und Verarbeitung von Metadaten konzipiert waren. Da Daten in einem noch nie dagewesenen Tempo produziert und in verschiedenen Formaten gespeichert werden, ist die manuelle Erstellung von Metadaten nicht nur unpraktisch, sondern auch fehleranfällig, so dass automatische Ansätze zur Metadatenerkennung erforderlich sind.

In dieser Arbeit konzentrieren wir uns auf die Erkennung von Metadaten in CSV-Dateien - einer Art von Klartextdateien, die, ähnlich wie Tabellenkalkulationen, verschiedene Arten von Inhalten an beliebigen Positionen enthalten können. Wir schlagen eine Taxonomie der Metadaten in CSV-Dateien vor und befassen uns speziell mit der Erkennung von drei verschiedenen Metadaten: Zeile und Zellensemantischer Typ, Aggregationen sowie Primärschlüssel und Fremdschlüssel.

Die Daten sind in CSV-Dateien ad-hoc organisiert und folgen keiner festen Struktur, wie sie von gängigen Datenverarbeitungsprogrammen angenommen wird. Die Erkennung der Struktur solcher Dateien ist eine Voraussetzung für die Extraktion von Informationen aus ihnen, die durch die Erkennung des semantischen Typs jeder Zeile oder jeder Zelle, wie z. B. Kopfzeile, Daten, abgeleitete Daten oder Fußnote, angegangen werden kann. Wir schlagen den Ansatz des überwachten Lernens, genannt „Strudel“ vor, um den strukturellen Typ von Zeilen und Zellen zu klassifizieren. CSV-Dateien können auch Aggregationen enthalten. Eine Aggregation stellt die arithmetische Beziehung zwischen einer numerischen Zelle und einer Reihe anderer numerischer Zellen dar. Der von uns vorgeschlagene „Aggrecol“-Algorithmus ist in der Lage, Aggregationen von fünf arithmetischen Funktionen in CSV-Dateien zu erkennen. Da stilistische Merkmale wie Schriftart und Zellhintergrundfarbe in CSV-Dateien nicht vorhanden sind, die von uns vorgeschlagenen Algorithmen die entsprechenden Probleme, indem sie nur die Merkmale Inhalt, Kontext und Berechnungen verwenden.

Die Speicherung einer relationalen Tabelle ist ebenfalls eine häufige Verwendung von CSV-Dateien. Primär- und Fremdschlüssel sind wichtige Metadaten für relationale Datenbanken, die bei Datenbankinstanzen, die als reine Textdateien gespeichert werden, normalerweise nicht vorhanden sind. Wir schlagen den „HoPF“-Algorithmus vor, um beide Constraints in relationalen Datenbanken ganzheitlich zu erkennen. Unser Ansatz ist in der Lage, echte Primär- und Fremdschlüssel von einer großen Menge an falschen eindeutigen Spaltenkombinationen und Einschlussabhängigkeiten zu unterscheiden, die von modernen Data-Profiling-Algorithmen erkannt werden können.
KW  - data preparation
KW  - metadata detection
KW  - data wrangling
KW  - Datenaufbereitung
KW  - Datentransformation
KW  - Erkennung von Metadaten
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566204
ER  - 
TY  - CHAP
A1  - Jacqmin, Julien
A1  - Özdemir, Paker Doğu
A1  - Fell Kurban, Caroline
A1  - Tunç Pekkan, Zelha
A1  - Koskinen, Johanna
A1  - Suonpää, Maija
A1  - Seng, Cheyvuth
A1  - Carlon, May Kristine Jonson
A1  - Gayed, John Maurice
A1  - Cross, Jeffrey S.
A1  - Langseth, Inger
A1  - Jacobsen, Dan Yngve
A1  - Haugsbakken, Halvdan
A1  - Bethge, Joseph
A1  - Serth, Sebastian
A1  - Staubitz, Thomas
A1  - Wuttke, Tobias
A1  - Nordemann, Oliver
A1  - Das, Partha-Pratim
A1  - Meinel, Christoph
A1  - Ponce, Eva
A1  - Srinath, Sindhu
A1  - Allegue, Laura
A1  - Perach, Shai
A1  - Alexandron, Giora
A1  - Corti, Paola
A1  - Baudo, Valeria
A1  - Turró, Carlos
A1  - Moura Santos, Ana
A1  - Nilsson, Charlotta
A1  - Maldonado-Mahauad, Jorge
A1  - Valdiviezo, Javier
A1  - Carvallo, Juan Pablo
A1  - Samaniego-Erazo, Nicolay
A1  - Poce, Antonella
A1  - Re, Maria Rosaria
A1  - Valente, Mara
A1  - Karp Gershon, Sa’ar
A1  - Ruipérez-Valiente, José A.
A1  - Despujol, Ignacio
A1  - Busquets, Jaime
A1  - Kerr, John
A1  - Lorenz, Anja
A1  - Schön, Sandra
A1  - Ebner, Martin
A1  - Wittke, Andreas
A1  - Beirne, Elaine
A1  - Nic Giolla Mhichíl, Mairéad
A1  - Brown, Mark
A1  - Mac Lochlainn, Conchúr
A1  - Topali, Paraskevi
A1  - Chounta, Irene-Angelica
A1  - Ortega-Arranz, Alejandro
A1  - Villagrá-Sobrino, Sara L.
A1  - Martínez-Monés, Alejandra
A1  - Blackwell, Virginia Katherine
A1  - Wiltrout, Mary Ellen
A1  - Rami Gaddem, Mohamed
A1  - Hernández Reyes, César Augusto
A1  - Nagahama, Toru
A1  - Buchem, Ilona
A1  - Okatan, Ebru
A1  - Khalil, Mohammad
A1  - Casiraghi, Daniela
A1  - Sancassani, Susanna
A1  - Brambilla, Federica
A1  - Mihaescu, Vlad
A1  - Andone, Diana
A1  - Vasiu, Radu
A1  - Şahin, Muhittin
A1  - Egloffstein, Marc
A1  - Bothe, Max
A1  - Rohloff, Tobias
A1  - Schenk, Nathanael
A1  - Schwerer, Florian
A1  - Ifenthaler, Dirk
A1  - Hense, Julia
A1  - Bernd, Mike
ED  - Meinel, Christoph
ED  - Staubitz, Thomas
ED  - Schweiger, Stefanie
ED  - Friedl, Christian
ED  - Kiers, Janine
ED  - Ebner, Martin
ED  - Lorenz, Anja
ED  - Ubachs, George
ED  - Mongenet, Catherine
ED  - Ruipérez-Valiente, José A.
ED  - Cortes Mendez, Manoel
T1  - EMOOCs 2021
N2  - From June 22 to June 24, 2021, Hasso Plattner Institute, Potsdam, hosted the seventh European MOOC Stakeholder Summit (EMOOCs 2021) together with the eighth ACM Learning@Scale Conference.
Due to the COVID-19 situation, the conference was held fully online.
The boost in digital education worldwide as a result of the pandemic was also one of the main topics of this year’s EMOOCs. All institutions of learning have been forced to transform and redesign their educational methods, moving from traditional models to hybrid or completely online models at scale. The learnings, derived from practical experience and research, have been explored in EMOOCs 2021 in six tracks and additional workshops, covering various aspects of this field. In this publication, we present papers from the conference’s Experience Track, the Policy Track, the Business Track, the International Track, and the Workshops.
KW  - e-learning
KW  - microcredential
KW  - MOOC
KW  - digital education
KW  - experience
KW  - online course design
KW  - online course creation
KW  - higher education
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-510300
SN  - 978-3-86956-512-5
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Jacqmin, Julien
T1  - What Drives Enrollment in Massive Open Online Courses?
BT  - Evidences from a French MOOC Platform
JF  - EMOOCs 2021
N2  - The goal of this paper is to study the demand factors driving enrollment in massive open online courses. Using course level data from a French MOOC platform, we study the course, teacher and institution related characteristics that influence the enrollment decision of students, in a setting where enrollment is open to all students without administrative barriers. Coverage from social and traditional media done around the course is a key driver. In addition, the language of instruction and the (estimated) amount of work needed to complete the course also have a significant impact. The data also suggests that the presence of same-side externalities is limited. Finally, preferences of national and of international students tend to differ on several dimensions.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-516899
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 1
EP  - 16
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Ihde, Sven
A1  - Pufahl, Luise
A1  - Völker, Maximilian
A1  - Goel, Asvin
A1  - Weske, Mathias
T1  - A framework for modeling and executing task
BT  - specific resource allocations in business processes
JF  - Computing : archives for informatics and numerical computation
N2  - As resources are valuable assets, organizations have to decide which resources to allocate to business process tasks in a way that the process is executed not only effectively but also efficiently. Traditional role-based resource allocation leads to effective process executions, since each task is performed by a resource that has the required skills and competencies to do so. However, the resulting allocations are typically not as efficient as they could be, since optimization techniques have yet to find their way in traditional business process management scenarios. On the other hand, operations research provides a rich set of analytical methods for supporting problem-specific decisions on resource allocation. This paper provides a novel framework for creating transparency on existing tasks and resources, supporting individualized allocations for each activity in a process, and the possibility to integrate problem-specific analytical methods of the operations research domain. To validate the framework, the paper reports on the design and prototypical implementation of a software architecture, which extends a traditional process engine with a dedicated resource management component. This component allows us to define specific resource allocation problems at design time, and it also facilitates optimized resource allocation at run time. The framework is evaluated using a real-world parcel delivery process. The evaluation shows that the quality of the allocation results increase significantly with a technique from operations research in contrast to the traditional applied rule-based approach.
KW  - Process Execution
KW  - Business Process Management
KW  - Resource Allocation
KW  - Resource Management
KW  - Activity-oriented Optimization
Y1  - 2022
U6  - https://doi.org/10.1007/s00607-022-01093-2
SN  - 0010-485X
SN  - 1436-5057
VL  - 104
SP  - 2405
EP  - 2429
PB  - Springer
CY  - Wien
ER  - 
TY  - THES
A1  - Huegle, Johannes
T1  - Causal discovery in practice: Non-parametric conditional independence testing and tooling for causal discovery
T1  - Kausale Entdeckung in der Praxis: Nichtparametrische bedingte Unabhängigkeitstests und Werkzeuge für die Kausalentdeckung
N2  - Knowledge about causal structures is crucial for decision support in various domains. For example, in discrete manufacturing, identifying the root causes of failures and quality deviations that interrupt the highly automated production process requires causal structural knowledge. However, in practice, root cause analysis is usually built upon individual expert knowledge about associative relationships. But, "correlation does not imply causation", and misinterpreting associations often leads to incorrect conclusions. Recent developments in methods for causal discovery from observational data have opened the opportunity for a data-driven examination. Despite its potential for data-driven decision support, omnipresent challenges impede causal discovery in real-world scenarios. In this thesis, we make a threefold contribution to improving causal discovery in practice.

(1) The growing interest in causal discovery has led to a broad spectrum of methods with specific assumptions on the data and various implementations. Hence, application in practice requires careful consideration of existing methods, which becomes laborious when dealing with various parameters, assumptions, and implementations in different programming languages. Additionally, evaluation is challenging due to the lack of ground truth in practice and limited benchmark data that reflect real-world data characteristics.
To address these issues, we present a platform-independent modular pipeline for causal discovery and a ground truth framework for synthetic data generation that provides comprehensive evaluation opportunities, e.g., to examine the accuracy of causal discovery methods in case of inappropriate assumptions.

(2) Applying constraint-based methods for causal discovery requires selecting a conditional independence (CI) test, which is particularly challenging in mixed discrete-continuous data omnipresent in many real-world scenarios. In this context, inappropriate assumptions on the data or the commonly applied discretization of continuous variables reduce the accuracy of CI decisions, leading to incorrect causal structures. 
Therefore, we contribute a non-parametric CI test leveraging k-nearest neighbors methods and prove its statistical validity and power in mixed discrete-continuous data, as well as the asymptotic consistency when used in constraint-based causal discovery. An extensive evaluation of synthetic and real-world data shows that the proposed CI test outperforms state-of-the-art approaches in the accuracy of CI testing and causal discovery, particularly in settings with low sample sizes. 

(3) To show the applicability and opportunities of causal discovery in practice, we examine our contributions in real-world discrete manufacturing use cases. For example, we showcase how causal structural knowledge helps to understand unforeseen production downtimes or adds decision support in case of failures and quality deviations in automotive body shop assembly lines.
N2  - Kenntnisse über die Strukturen zugrundeliegender kausaler Mechanismen sind eine Voraussetzung für die Entscheidungsunterstützung in verschiedenen Bereichen. In der Fertigungsindustrie beispielsweise erfordert die Fehler-Ursachen-Analyse von Störungen und Qualitätsabweichungen, die den hochautomatisierten Produktionsprozess unterbrechen, kausales Strukturwissen. In Praxis stützt sich die Fehler-Ursachen-Analyse in der Regel jedoch auf individuellem Expertenwissen über assoziative Zusammenhänge. Aber "Korrelation impliziert nicht Kausalität", und die Fehlinterpretation assoziativer Zusammenhänge führt häufig zu falschen Schlussfolgerungen. Neueste Entwicklungen von Methoden des kausalen Strukturlernens haben die Möglichkeit einer datenbasierten Betrachtung eröffnet. Trotz seines Potenzials zur datenbasierten Entscheidungsunterstützung wird das kausale Strukturlernen in der Praxis jedoch durch allgegenwärtige Herausforderungen erschwert. In dieser Dissertation leisten wir einen dreifachen Beitrag zur Verbesserung des kausalen Strukturlernens in der Praxis.

(1) Das wachsende Interesse an kausalem Strukturlernen hat zu einer Vielzahl von Methoden mit spezifischen statistischen Annahmen über die Daten und verschiedenen Implementierungen geführt. Daher erfordert die Anwendung in der Praxis eine sorgfältige Prüfung der vorhandenen Methoden, was eine Herausforderung darstellt, wenn verschiedene Parameter, Annahmen und Implementierungen in unterschiedlichen Programmiersprachen betrachtet werden. Hierbei wird die Evaluierung von Methoden des kausalen Strukturlernens zusätzlich durch das Fehlen von "Ground Truth" in der Praxis und begrenzten Benchmark-Daten, welche die Eigenschaften realer Datencharakteristiken widerspiegeln, erschwert.
Um diese Probleme zu adressieren, stellen wir eine plattformunabhängige modulare Pipeline für kausales Strukturlernen und ein Tool zur Generierung synthetischer Daten vor, die umfassende Evaluierungsmöglichkeiten bieten, z.B. um Ungenauigkeiten von Methoden des Lernens kausaler Strukturen bei falschen Annahmen an die Daten aufzuzeigen.

(2) Die Anwendung von constraint-basierten Methoden des kausalen Strukturlernens erfordert die Wahl eines bedingten Unabhängigkeitstests (CI-Test), was insbesondere bei gemischten diskreten und kontinuierlichen Daten, die in vielen realen Szenarien allgegenwärtig sind, die Anwendung erschwert. Beispielsweise führen falsche Annahmen der CI-Tests oder die Diskretisierung kontinuierlicher Variablen zu einer Verschlechterung der Korrektheit der Testentscheidungen, was in fehlerhaften kausalen Strukturen resultiert. 
Um diese Probleme zu adressieren, stellen wir einen nicht-parametrischen CI-Test vor, der auf Nächste-Nachbar-Methoden basiert, und beweisen dessen statistische Validität und Trennschärfe bei gemischten diskreten und kontinuierlichen Daten, sowie dessen asymptotische Konsistenz in constraint-basiertem kausalem Strukturlernen. Eine umfangreiche Evaluation auf synthetischen und realen Daten zeigt, dass der vorgeschlagene CI-Test bestehende Verfahren hinsichtlich der Korrektheit der Testentscheidung und gelernter kausaler Strukturen übertrifft, insbesondere bei geringen Stichprobengrößen. 

(3) Um die Anwendbarkeit und Möglichkeiten kausalen Strukturlernens in der Praxis aufzuzeigen, untersuchen wir unsere Beiträge in realen Anwendungsfällen aus der Fertigungsindustrie. Wir zeigen an mehreren Beispielen aus der automobilen Karosseriefertigungen wie kausales Strukturwissen helfen kann, unvorhergesehene Produktionsausfälle zu verstehen oder eine Entscheidungsunterstützung bei Störungen und Qualitätsabweichungen zu geben.
KW  - causal discovery
KW  - causal structure learning
KW  - causal AI
KW  - non-parametric conditional independence testing
KW  - manufacturing
KW  - causal reasoning
KW  - mixed data
KW  - kausale KI
KW  - kausale Entdeckung
KW  - kausale Schlussfolgerung
KW  - kausales Strukturlernen
KW  - Fertigung
KW  - gemischte Daten
KW  - nicht-parametrische bedingte Unabhängigkeitstests
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635820
ER  - 
TY  - JOUR
A1  - Hense, Julia
A1  - Bernd, Mike
T1  - Podcasts, Microcontent & MOOCs
BT  - The Integration of Digital Learning Formats into HEI Lectures
JF  - EMOOCs 2021
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517363
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 289
EP  - 295
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Halfpap, Stefan
T1  - Integer linear programming-based heuristics for partially replicated database clusters and selecting indexes
T1  - Auf ganzzahliger linearer Optimierung basierende Heuristiken für partiell-replizierte Datenbankcluster und das Auswählen von Indizes
N2  - Column-oriented database systems can efficiently process transactional and analytical queries on a single node. However, increasing or peak analytical loads can quickly saturate single-node database systems. Then, a common scale-out option is using a database cluster with a single primary node for transaction processing and read-only replicas. Using (the naive) full replication, queries are distributed among nodes independently of the accessed data. This approach is relatively expensive because all nodes must store all data and apply all data modifications caused by inserts, deletes, or updates.
In contrast to full replication, partial replication is a more cost-efficient implementation: Instead of duplicating all data to all replica nodes, partial replicas store only a subset of the data while being able to process a large workload share. Besides lower storage costs, partial replicas enable (i) better scaling because replicas must potentially synchronize only subsets of the data modifications and thus have more capacity for read-only queries and (ii) better elasticity because replicas have to load less data and can be set up faster. However, splitting the overall workload evenly among the replica nodes while optimizing the data allocation is a challenging assignment problem.
The calculation of optimized data allocations in a partially replicated database cluster can be modeled using integer linear programming (ILP). ILP is a common approach for solving assignment problems, also in the context of database systems. Because ILP is not scalable, existing approaches (also for calculating partial allocations) often fall back to simple (e.g., greedy) heuristics for larger problem instances. Simple heuristics may work well but can lose optimization potential.
In this thesis, we present optimal and ILP-based heuristic programming models for calculating data fragment allocations for partially replicated database clusters. Using ILP, we are flexible to extend our models to (i) consider data modifications and reallocations and (ii) increase the robustness of allocations to compensate for node failures and workload uncertainty. We evaluate our approaches for TPC-H, TPC-DS, and a real-world accounting workload and compare the results to state-of-the-art allocation approaches. Our evaluations show significant improvements for varied allocation’s properties: Compared to existing approaches, we can, for example, (i) almost halve the amount of allocated data, (ii) improve the throughput in case of node failures and workload uncertainty while using even less memory, (iii) halve the costs of data modifications, and (iv) reallocate less than 90% of data when adding a node to the cluster. Importantly, we can calculate the corresponding ILP-based heuristic solutions within a few seconds. Finally, we demonstrate that the ideas of our ILP-based heuristics are also applicable to the index selection problem.
N2  - Spaltenorientierte Datenbanksysteme können transaktionale und analytische Abfragen effizient auf einem einzigen Rechenknoten verarbeiten. Steigende Lasten oder Lastspitzen können Datenbanksysteme mit nur einem Rechenknoten jedoch schnell überlasten. Dann besteht eine gängige Skalierungsmöglichkeit darin, einen Datenbankcluster mit einem einzigen Rechenknoten für die Transaktionsverarbeitung und Replikatknoten für lesende Datenbankanfragen zu verwenden. Bei der (naiven) vollständigen Replikation werden Anfragen unabhängig von den Daten, auf die zugegriffen wird, auf die Knoten verteilt. Dieser Ansatz ist relativ teuer, da alle Knoten alle Daten speichern und alle Datenänderungen anwenden müssen, die durch das Einfügen, Löschen oder Aktualisieren von Datenbankeinträgen verursacht werden.
Im Gegensatz zur vollständigen Replikation ist die partielle Replikation eine kostengünstige Alternative: Anstatt alle Daten auf alle Replikationsknoten zu duplizieren, speichern partielle Replikate nur eine Teilmenge der Daten und können gleichzeitig einen großen Anteil der Anfragelast verarbeiten. Neben niedrigeren Speicherkosten ermöglichen partielle Replikate (i) eine bessere Skalierung, da Replikate potenziell nur Teilmengen der Datenänderungen synchronisieren müssen und somit mehr Kapazität für lesende Anfragen haben, und (ii) eine bessere Elastizität, da Replikate weniger Daten laden müssen und daher schneller eingesetzt werden können. Die gleichmäßige Lastbalancierung auf die Replikatknoten bei gleichzeitiger Optimierung der Datenzuweisung ist jedoch ein schwieriges Zuordnungsproblem.
Die Berechnung einer optimierten Datenverteilung in einem Datenbankcluster mit partiellen Replikaten kann mithilfe der ganzzahligen linearen Optimierung (engl. integer linear programming, ILP) durchgeführt werden. ILP ist ein gängiger Ansatz zur Lösung von Zuordnungsproblemen, auch im Kontext von Datenbanksystemen. Da ILP nicht skalierbar ist, greifen bestehende Ansätze (auch zur Berechnung von partiellen Replikationen) für größere Probleminstanzen oft auf einfache Heuristiken (z.B. Greedy-Algorithmen) zurück. Einfache Heuristiken können gut funktionieren, aber auch Optimierungspotenzial einbüßen.
In dieser Arbeit stellen wir optimale und ILP-basierte heuristische Ansätze zur Berechnung von Datenzuweisungen für partiell-replizierte Datenbankcluster vor. Mithilfe von ILP können wir unsere Ansätze flexibel erweitern, um (i) Datenänderungen und -umverteilungen zu berücksichtigen und (ii) die Robustheit von Zuweisungen zu erhöhen, um Knotenausfälle und Unsicherheiten bezüglich der Anfragelast zu kompensieren. Wir evaluieren unsere Ansätze für TPC-H, TPC-DS und eine reale Buchhaltungsanfragelast und vergleichen die Ergebnisse mit herkömmlichen Verteilungsansätzen. Unsere Auswertungen zeigen signifikante Verbesserungen für verschiedene Eigenschaften der berechneten Datenzuordnungen: Im Vergleich zu bestehenden Ansätzen können wir beispielsweise (i) die Menge der gespeicherten Daten in Cluster fast halbieren, (ii) den Anfragedurchsatz bei Knotenausfällen und unsicherer Anfragelast verbessern und benötigen dafür auch noch weniger Speicher, (iii) die Kosten von Datenänderungen halbieren, und (iv) weniger als 90 % der Daten umverteilen, wenn ein Rechenknoten zum Cluster hinzugefügt wird. Wichtig ist, dass wir die entsprechenden ILP-basierten heuristischen Lösungen innerhalb weniger Sekunden berechnen können. Schließlich demonstrieren wir, dass die Ideen von unseren ILP-basierten Heuristiken auch auf das Indexauswahlproblem anwendbar sind.
KW  - database systems
KW  - integer linear programming
KW  - partial replication
KW  - index selection
KW  - load balancing
KW  - Datenbanksysteme
KW  - Indexauswahl
KW  - ganzzahlige lineare Optimierung
KW  - Lastverteilung
KW  - partielle Replikation
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-633615
ER  - 
TY  - THES
A1  - Hagedorn, Christopher
T1  - Parallel execution of causal structure learning on graphics processing units
T1  - Parallele Ausführung von kausalem Strukturlernen auf Grafikprozessoren
N2  - Learning the causal structures from observational data is an omnipresent challenge in data science. The amount of observational data available to Causal Structure Learning (CSL) algorithms is increasing as data is collected at high frequency from many data sources nowadays. While processing more data generally yields higher accuracy in CSL, the concomitant increase in the runtime of CSL algorithms hinders their widespread adoption in practice. CSL is a parallelizable problem. Existing parallel CSL algorithms address execution on multi-core Central Processing Units (CPUs) with dozens of compute cores. However, modern computing systems are often heterogeneous and equipped with Graphics Processing Units (GPUs) to accelerate computations. Typically, these GPUs provide several thousand compute cores for massively parallel data processing. 
To shorten the runtime of CSL algorithms, we design efficient execution strategies that leverage the parallel processing power of GPUs. Particularly, we derive GPU-accelerated variants of a well-known constraint-based CSL method, the PC algorithm, as it allows choosing a statistical Conditional Independence test (CI test) appropriate to the observational data characteristics. 
Our two main contributions are: (1) to reflect differences in the CI tests, we design three GPU-based variants of the PC algorithm tailored to CI tests that handle data with the following characteristics. We develop one variant for data assuming the Gaussian distribution model, one for discrete data, and another for mixed discrete-continuous data and data with non-linear relationships. Each variant is optimized for the appropriate CI test leveraging GPU hardware properties, such as shared or thread-local memory. Our GPU-accelerated variants outperform state-of-the-art parallel CPU-based algorithms by factors of up to 93.4× for data assuming the Gaussian distribution model, up to 54.3× for discrete data, up to 240× for continuous data with non-linear relationships and up to 655× for mixed discrete-continuous data. However, the proposed GPU-based variants are limited to datasets that fit into a single GPU’s memory. (2) To overcome this shortcoming, we develop approaches to scale our GPU-based variants beyond a single GPU’s memory capacity. For example, we design an out-of-core GPU variant that employs explicit memory management to process arbitrary-sized datasets. Runtime measurements on a large gene expression dataset reveal that our out-of-core GPU variant is 364 times faster than a parallel CPU-based CSL algorithm. Overall, our proposed GPU-accelerated variants speed up CSL in numerous settings to foster CSL’s adoption in practice and research.
N2  - Das Lernen von kausalen Strukturen aus Beobachtungsdatensätzen ist eine allgegenwärtige Herausforderung im Data Science-Bereich. Die für die Algorithmen des kausalen Strukturlernens (CSL) zur Verfügung stehende Menge von Beobachtungsdaten nimmt zu, da heutzutage mit hoher Frequenz Daten aus vielen Datenquellen gesammelt werden. Während die Verarbeitung von höheren Datenmengen im Allgemeinen zu einer höheren Genauigkeit bei CSL führt, hindert die damit einhergehende Erhöhung der Laufzeit von CSL-Algorithmen deren breite Anwendung in der Praxis. CSL ist ein parallelisierbares Problem. Bestehende parallele CSL-Algorithmen eignen sich für die Ausführung auf Mehrkern-Hauptprozessoren (CPUs) mit Dutzenden von Rechenkernen. Moderne Computersysteme sind jedoch häufig heterogen. Um notwendige Berechnungen zu beschleunigen, sind die Computersysteme typischerweise mit Grafikprozessoren (GPUs) ausgestattet, wobei diese GPUs mehrere tausend Rechenkerne für eine massive parallele Datenverarbeitung bereitstellen. 
Um die Laufzeit von Algorithmen für das kausale Strukturlernen zu verkürzen, entwickeln wir im Rahmen dieser Arbeit effiziente Ausführungsstrategien, die die parallele Verarbeitungsleistung von GPUs nutzen. Dabei entwerfen wir insbesondere GPU-beschleunigte Varianten des PC-Algorithmus, der eine bekannte Constraint-basierte CSL-Methode ist. Dieser Algorithmus ermöglicht die Auswahl eines – den Eigenschaften der Beobachtungsdaten entsprechenden – statistischen Tests auf bedingte Unabhängigkeit (CI-Test). 
Wir leisten in dieser Doktorarbeit zwei wissenschaftliche Hauptbeiträge: (1) Um den Unterschieden in den CI-Tests Rechnung zu tragen, entwickeln wir drei GPU-basierte, auf CI-Tests zugeschnittene Varianten des PC-Algorithmus. Dadurch können Daten mit den folgenden Merkmalen verarbeitet werden: eine Variante fokussiert sich auf Daten, die das Gaußsche Verteilungsmodell annehmen, eine weitere auf diskrete Daten und die dritte Variante setzt den Fokus auf gemischte diskret-kontinuierliche Daten sowie Daten mit nicht-linearen funktionalen Beziehungen. Jede Variante ist für den entsprechenden CI-Test optimiert und nutzt Eigenschaften der GPU-Hardware wie beispielsweise ”Shared Memory” oder ”Thread-local Memory” aus. Unsere GPU-beschleunigten Varianten übertreffen die modernsten parallelen CPU-basierten Algorithmen um Faktoren von bis zu 93,4x für Daten, die das Gaußsche Verteilungsmodell annehmen, bis zu 54,3x für diskrete Daten, bis zu 240x für kontinuierliche Daten mit nichtlinearen Beziehungen und bis zu 655x für gemischte diskret-kontinuierliche Daten. Die vorgeschlagenen GPU-basierten Varianten sind dabei jedoch auf Datensätze beschränkt, die in den Speicher einer einzelnen GPU passen. (2) Um diese Schwachstelle zu beseitigen, entwickeln wir Ansätze zur Skalierung unserer GPU-basierten Varianten über die Speicherkapazität einer einzelnen GPU hinaus. So entwerfen wir beispielsweise eine auf einer expliziten Speicherverwaltung aufbauenden Out-of-Core-Variante für eine einzelne GPU, um Datensätze beliebiger Größe zu verarbeiten. Laufzeitmessungen auf einem großen Genexpressionsdatensatz zeigen, dass unsere Out-of-Core GPU-Variante 364-mal schneller ist als ein paralleler CPU-basierter CSL-Algorithmus. 
Insgesamt beschleunigen unsere vorgestellten GPU-basierten Varianten das kausale Strukturlernen in zahlreichen Situationen und unterstützen dadurch die breite Anwendung des kausalen Strukturlernens in Praxis und Forschung.
KW  - causal structure learning
KW  - GPU acceleration
KW  - causal discovery
KW  - parallel processing
KW  - GPU-Beschleunigung
KW  - kausale Entdeckung
KW  - kausales Strukturlernen
KW  - parallele Verarbeitung
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597582
ER  - 
TY  - THES
A1  - Grütze, Toni
T1  - Adding value to text with user-generated content
N2  - In recent years, the ever-growing amount of documents on the Web as well as in closed systems for private or business contexts led to a considerable increase of valuable textual information about topics, events, and entities. It is a truism that the majority of information (i.e., business-relevant data) is only available in unstructured textual form. The text mining research field comprises various practice areas that have the common goal of harvesting high-quality information from textual data. These information help addressing users' information needs.

In this thesis, we utilize the knowledge represented in user-generated content (UGC) originating from various social media services to improve text mining results. These social media platforms provide a plethora of information with varying focuses. In many cases, an essential feature of such platforms is to share relevant content with a peer group. Thus, the data exchanged in these communities tend to be focused on the interests of the user base. The popularity of social media services is growing continuously and the inherent knowledge is available to be utilized. We show that this knowledge can be used for three different tasks.

Initially, we demonstrate that when searching persons with ambiguous names, the information from Wikipedia can be bootstrapped to group web search results according to the individuals occurring in the documents. We introduce two models and different means to handle persons missing in the UGC source. We show that the proposed approaches outperform traditional algorithms for search result clustering. Secondly, we discuss how the categorization of texts according to continuously changing community-generated folksonomies helps users to identify new information related to their interests. We specifically target temporal changes in the UGC and show how they influence the quality of different tag recommendation approaches. Finally, we introduce an algorithm to attempt the entity linking problem, a necessity for harvesting entity knowledge from large text collections. The goal is the linkage of mentions within the documents with their real-world entities. A major focus lies on the efficient derivation of coherent links.

For each of the contributions, we provide a wide range of experiments on various text corpora as well as different sources of UGC.
The evaluation shows the added value that the usage of these sources provides and confirms the appropriateness of leveraging user-generated content to serve different information needs.
N2  - Die steigende Zahl an Dokumenten, welche in den letzten Jahren im Web sowie in geschlossenen Systemen aus dem privaten oder geschäftlichen Umfeld erstellt wurden, führte zu einem erheblichen Zuwachs an wertvollen Informationen über verschiedenste Themen, Ereignisse, Organisationen und Personen. Die meisten Informationen liegen lediglich in unstrukturierter, textueller Form vor. Das Forschungsgebiet des "Text Mining" befasst sich mit dem schwierigen Problem, hochwertige Informationen in strukturierter Form aus Texten zu gewinnen. Diese Informationen können dazu eingesetzt werden, Nutzern dabei zu helfen, ihren Informationsbedarf zu stillen.

In dieser Arbeit nutzen wir Wissen, welches in nutzergenerierten Inhalten verborgen ist und aus unterschiedlichsten sozialen Medien stammt, um Text Mining Ergebnisse zu verbessern. Soziale Medien bieten eine Fülle an Informationen mit verschiedenen Schwerpunkten. Eine wesentliche Funktion solcher Medien ist es, den Nutzern zu ermöglichen, Inhalte mit ihrer Interessensgruppe zu teilen. Somit sind die ausgetauschten Daten in diesen Diensten häufig auf die Interessen der Nutzerbasis ausgerichtet. Die Popularität sozialer Medien wächst stetig und führt dazu, dass immer mehr inhärentes Wissen verfügbar wird. Dieses Wissen kann unter anderem für drei verschiedene Aufgabenstellungen genutzt werden.

Zunächst zeigen wir, dass Informationen aus Wikipedia hilfreich sind, um Ergebnisse von Personensuchen im Web nach den in ihnen diskutierten Personen aufzuteilen. Dazu führen wir zwei Modelle zur Gruppierung der Ergebnisse und verschiedene Methoden zum Umgang mit fehlenden Wikipedia Einträgen ein, und zeigen, dass die entwickelten Ansätze traditionelle Methoden zur Gruppierung von Suchergebnissen übertreffen. Des Weiteren diskutieren wir, wie die Klassifizierung von Texten auf Basis von "Folksonomien" Nutzern dabei helfen kann, neue Informationen zu identifizieren, die ihren Interessen entsprechen. Wir konzentrieren uns insbesondere auf temporäre Änderungen in den nutzergenerierten Inhalten, um zu zeigen, wie stark ihr Einfluss auf die Qualität verschiedener "Tag"-Empfehlungsmethoden ist. Zu guter Letzt führen wir einen Algorithmus ein, der es ermöglicht, Nennungen von Echtweltinstanzen in Texten zu disambiguieren und mit ihren Repräsentationen in einer Wissensdatenbank zu verknüpfen. Das Hauptaugenmerk liegt dabei auf der effizienten Erkennung von kohärenten Verknüpfungen.

Wir stellen für jeden Teil der Arbeit eine große Vielfalt an Experimenten auf diversen Textkorpora und unterschiedlichen Quellen von nutzergenerierten Inhalten an. Damit heben wir das Potential hervor, das die Nutzung jener Quellen bietet, um die unterschiedlichen Informationsbedürfnisse abzudecken.
T2  - Mehrwert für Texte mittels nutzergenerierter Inhalte
KW  - nutzergenerierte Inhalte
KW  - text mining
KW  - Klassifikation
KW  - Clusteranalyse
KW  - Entitätsverknüpfung
KW  - user-generated content
KW  - text mining
KW  - classification
KW  - clustering
KW  - entity linking
Y1  - 2018
ER  - 
TY  - THES
A1  - Grüner, Andreas
T1  - Towards practical and trust-enhancing attribute aggregation for self-sovereign identity
N2  - Identity management is at the forefront of applications’ security posture. It separates the unauthorised user from the legitimate individual. Identity management models have evolved from the isolated to the centralised paradigm and identity federations. Within this advancement, the identity provider emerged as a trusted third party that holds a powerful position. Allen postulated the novel self-sovereign identity paradigm to establish a new balance. Thus, extensive research is required to comprehend its virtues and limitations. Analysing the new paradigm, initially, we investigate the blockchain-based self-sovereign identity concept structurally. Moreover, we examine trust requirements in this context by reference to patterns. These shapes comprise major entities linked by a decentralised identity provider. By comparison to the traditional models, we conclude that trust in credential management and authentication is removed. Trust-enhancing attribute aggregation based on multiple attribute providers provokes a further trust shift. Subsequently, we formalise attribute assurance trust modelling by a metaframework. It encompasses the attestation and trust network as well as the trust decision process, including the trust function, as central components. A secure attribute assurance trust model depends on the security of the trust function. The trust function should consider high trust values and several attribute authorities. Furthermore, we evaluate classification, conceptual study, practical analysis and simulation as assessment strategies of trust models. For realising trust-enhancing attribute aggregation, we propose a probabilistic approach. The method exerts the principle characteristics of correctness and validity. These values are combined for one provider and subsequently for multiple issuers. We embed this trust function in a model within the self-sovereign identity ecosystem. To practically apply the trust function and solve several challenges for the service provider that arise from adopting self-sovereign identity solutions, we conceptualise and implement an identity broker. The mediator applies a component-based architecture to abstract from a single solution. Standard identity and access management protocols build the interface for applications. We can conclude that the broker’s usage at the side of the service provider does not undermine self-sovereign principles, but fosters the advancement of the ecosystem. The identity broker is applied to sample web applications with distinct attribute requirements to showcase usefulness for authentication and attribute-based access control within a case study.
N2  - Das Identitätsmanagement ist Kernbestandteil der Sicherheitsfunktionen von Applikationen. Es unterscheidet berechtigte Benutzung von illegitimer Verwendung. Die Modelle des Identitätsmanagements haben sich vom isolierten zum zentralisierten Paradigma und darüber hinaus zu Identitätsverbünden weiterentwickelt. Im Rahmen dieser Evolution ist der Identitätsanbieter zu einer mächtigen vertrauenswürdigen dritten Partei aufgestiegen. Zur Etablierung eines bis jetzt noch unvorstellbaren Machtgleichgewichts wurde der Grundgedanke der selbstbestimmten Identität proklamiert. Eine tiefgehende Analyse des neuen Konzepts unterstützt auf essentielle Weise das generelle Verständnis der Vorzüge und Defizite. Bei der Analyse des Modells untersuchen wir zu Beginn strukturelle Komponenten des selbstbestimmten Identitätsmanagements basierend auf der Blockchain Technologie. Anschließend erforschen wir Vertrauensanforderungen in diesem Kontext anhand von Mustern. Diese schematischen Darstellungen illustrieren das Verhältnis der Hauptakteure im Verbund mit einem dezentralisierten Identitätsanbieter. Im Vergleich zu den traditionellen Paradigmen, können wir festellen, dass kein Vertrauen mehr in das Verwalten von Anmeldeinformationen und der korrekten Authentifizierung benötigt wird. Zusätzlich bewirkt die Verwendung von vertrauensfördernder Attributaggregation eine weitere Transformation der Vertrauenssituation. Darauffolgend formalisieren wir die Darstellung von Vertrauensmodellen in Attribute Assurance mit Hilfe eines Meta-Frameworks. Als zentrale Komponenten sind das Attestierungs- und Vertrauensnetzwerk sowie der Vertrauensentscheidungsprozess, einschließlich der Vertrauensfunktion, enthalten. Ein sicheres Vertrauensmodell beruht auf der Sicherheit der Vertrauensfunktion. Hohe Vertrauenswerte sowie mehrere Attributaussteller sollten dafür berücksichtigt werden. Des Weiteren evaluieren wir Klassifikation, die konzeptionelle und praktische Analyse sowie die Simulation als Untersuchungsansätze für Vertrauensmodelle. Für die Umsetzung der vertrauensfördernden Attributaggregation schlagen wir einen wahrscheinlichkeitstheoretischen Ansatz vor. Die entwickelte Methode basiert auf den primären Charakteristiken der Korrektheit und Gültigkeit von Attributen. Diese Indikatoren werden für einen und anschließend für mehrere Merkmalsanbieter kombiniert. Zusätzlich betten wir die daraus entstehende Vertrauensfunktion in ein vollständiges Modell auf Basis des Ökosystem von selbstbestimmten Identitäten ein. Für die praktische Anwendung der Vertrauensfunktion und die Überwindung mehrerer Herausforderungen für den Dienstanbieter, bei der Einführung selbstbestimmter Identitätslösungen, konzipieren und implementieren wir einen Identitätsbroker. Dieser Vermittler besteht aus einer komponentenbasierten Architektur, um von einer dedizierten selbstbestimmten Identitätslösung zu abstrahieren. Zusätzlich bilden etablierte Identitäts- und Zugriffsverwaltungsprotokolle die Schnittstelle zu herkömmlichen Anwendungen. Der Einsatz des Brokers auf der Seite des Dienstanbieters unterminiert nicht die Grundsätze der selbstbestimmten Identität. Im Gegensatz wird die Weiterentwicklung des entsprechenden Ökosystems gefördert. Innerhalb einer Fallstudie wird die Verwendung des Identitätsbrokers bei Anwendungen mit unterschiedlichen Anforderungen an Benutzerattribute betrachtet, um die Nützlichkeit bei der Authentifizierung und Attributbasierten Zugriffskontrolle zu demonstrieren.
KW  - identity
KW  - self-sovereign identity
KW  - trust
KW  - attribute assurance
KW  - Identität
KW  - selbst-souveräne Identitäten
KW  - Vertrauen
KW  - Attributsicherung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-567450
ER  - 
TY  - BOOK
A1  - Giese, Holger
A1  - Maximova, Maria
A1  - Sakizloglou, Lucas
A1  - Schneider, Sven
T1  - Metric temporal graph logic over typed attributed graphs
N2  - Various kinds of typed attributed graphs are used to represent states of systems from a broad range of domains. For dynamic systems, established formalisms such as graph transformations provide a formal model for defining state sequences. We consider the extended case where time elapses between states and introduce a logic to reason about these sequences. With this logic we express properties on the structure and attributes of states as well as on the temporal occurrence of states that are related by their inner structure, which no formal logic over graphs accomplishes concisely so far. Firstly, we introduce graphs with history by equipping every graph element with the timestamp of its creation and, if applicable, its deletion. Secondly, we define a logic on graphs by integrating the temporal operator until into the well-established logic of nested graph conditions. Thirdly, we prove that our logic is equally expressive to nested graph conditions by providing a suitable reduction. Finally, the implementation of this reduction allows for the tool-based analysis of metric temporal properties for state sequences.
N2  - Verschiedene Arten von getypten attributierten Graphen werden benutzt, um Zustände von Systemen in vielen unterschiedlichen Anwendungsbereichen zu beschreiben. Der etablierte Formalismus der Graphtransformationen bietet ein formales Model, um Zustandssequenzen für dynamische Systeme zu definieren. Wir betrachten den erweiterten Fall von solchen Sequenzen, in dem Zeit zwischen zwei verschiedenen Systemzuständen vergeht, und führen eine Logik ein, um solche Sequenzen zu beschreiben. Mit dieser Logik drücken wir zum einen Eigenschaften über die Struktur und die Attribute von Zuständen aus und beschreiben zum anderen temporale Vorkommen von Zuständen, die durch ihre innere Struktur verbunden sind. Solche Eigenschaften können bisher von keiner der existierenden Logiken auf Graphen vergleichbar darstellt werden. Erstens führen wir Graphen mit Änderungshistorie ein, indem wir jedes Graphelement mit einem Zeitstempel seiner Erzeugung und, wenn nötig, seiner Löschung versehen. Zweitens definieren wir eine Logik auf Graphen, indem wir den Temporaloperator Until in die wohl-etablierte Logik der verschachtelten Graphbedingungen integrieren. Drittens beweisen wir, dass unsere Logik gleich ausdrucksmächtig ist, wie die Logik der verschachtelten Graphbedingungen, indem wir eine passende Reduktionsoperation definieren. Zuletzt erlaubt uns die Implementierung dieser Reduktionsoperation die werkzeukbasierte Analyse von metrisch-temporallogischen Eigenschaften für Zustandssequenzen zu führen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 123 
KW  - nested graph conditions
KW  - sequence properties
KW  - symbolic graphs
KW  - typed attributed graphs
KW  - metric temporal logic
KW  - temporal logic
KW  - runtime monitoring
KW  - verschachtelte Anwendungsbedingungen
KW  - Sequenzeigenschaften
KW  - symbolische Graphen
KW  - getypte Attributierte Graphen
KW  - metrische Temporallogik
KW  - Temporallogik
KW  - Runtime-monitoring
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-411351
SN  - 978-3-86956-433-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 123
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Giese, Holger
A1  - Maximova, Maria
A1  - Sakizloglou, Lucas
A1  - Schneider, Sven
T1  - Metric temporal graph logic over typed attributed graphs
BT  - extended version
N2  - Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond: For example, in model-driven engineering, the abstract syntax of models is usually encoded using graphs. Flexible edit operations temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. Similarly, in graph databases—managing the storage and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints, requiring also graph repair.
We present a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing repairs. In our context, we formalize consistency by so-called graph conditions being equivalent to first-order logic on graphs. We present two kind of repair algorithms: State-based repair restores consistency independent of the graph update history, whereas deltabased (or incremental) repair takes this history explicitly into account. Technically, our algorithms rely on an existing model generation algorithm for graph conditions implemented in AutoGraph. Moreover, the delta-based approach uses the new concept of satisfaction (ST) trees for encoding if and how a graph satisfies a graph condition. We then demonstrate how to manipulate these STs incrementally with respect to a graph update.
N2  - Verschiedene Arten typisierter attributierter Graphen können verwendet werden, um Systemzustände aus einem breiten Bereich von Domänen darzustellen. Für dynamische Systeme können etablierte Formalismen wie die Graphtransformation ein formales Modell für die Definition von Zustandssequenzen liefern. Wir betrachten den Fall, in dem zwischen Zustandsänderungen Zeit vergehen kann, und führen eine Logik ein, die als Metric Temporal Graph Logic (MTGL) bezeichnet wird, um über solche zeitgesteuerten Graphsequenzen zu urteilen. Mit dieser Logik drücken wir Eigenschaften der Struktur und der Attribute von Zuständen sowie des Auftretens von Zuständen über die Zeit aus, die durch ihre innere Struktur miteinander verbunden sind, was bisher keine formale Logik über Graphen präzise bewerkstelligt. Erstens, basierend auf zeitgesteuerten Graphsequenzen als Modelle für die Systemevolution, definieren wir MTGL, indem wir den zeitlichen Operator bis zu einer gewissen Zeitgrenze in die etablierte Logik von (verschachtelten) Graphbedingungen integrieren. Zweitens skizzieren wir, wie eine endliche zeitgesteuerte Diagrammsequenz als einzelnes Diagramm dargestellt werden kann, das alle zeitlichen Änderungen enthält (als Diagramm mit Verlauf bezeichnet), wie die Erfüllung von MTGL-Bedingungen für ein solches Diagramm definiert werden kann, und zeigen, dass beide Darstellungen dieselben MTGL-Bedingungen erfüllen. Drittens zeigen wir, wie MTGL-Bedingungen auf (verschachtelte) Diagrammbedingungen reduziert werden können, und zeigen anhand dieser Reduzierung, dass beide zugrunde liegenden Logiken gleichermaßen aussagekräftig sind. Schließlich stellen wir eine Erweiterung des Tools AutoGraph vor, mit der die Erfüllung der MTGL-Bedingungen für zeitgesteuerte Diagrammsequenzen überprüft werden kann, indem die Erfüllung der (verschachtelten) Diagrammbedingungen überprüft wird, die unter Verwendung der vorgeschlagenen Reduzierung für das Diagramm mit dem Verlauf entsprechend dem zeitgesteuerten Diagramm erhalten wurden.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 127 
KW  - typisierte attributierte Graphen
KW  - metrisch temporale Graph Logic
KW  - Spezifikation von gezeiteten Graph Transformationen
KW  - typed attributed graphs
KW  - metric termporal graph logic
KW  - specification of timed graph transformations
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427522
SN  - 978-3-86956-463-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 127
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Giese, Holger
A1  - Henkler, Stefan
A1  - Hirsch, Martin
T1  - A multi-paradigm approach supporting the modular execution of reconfigurable hybrid systems
N2  - Advanced mechatronic systems have to integrate existing technologies from mechanical, electrical and software engineering. They must be able to adapt their structure and behavior at runtime by reconfiguration to react flexibly to changes in the environment. Therefore, a tight integration of structural and behavioral models of the different domains is required. This integration results in complex reconfigurable hybrid systems, the execution logic of which cannot be addressed directly with existing standard modeling, simulation, and code-generation techniques. We present in this paper how our component-based approach for reconfigurable mechatronic systems, M ECHATRONIC UML, efficiently handles the complex interplay of discrete behavior and continuous behavior in a modular manner. In addition, its extension to even more flexible reconfiguration cases is presented.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 410 
KW  - code generation
KW  - hybrid systems
KW  - reconfigurable systems
KW  - simulation
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-402896
ER  - 
TY  - JOUR
A1  - Gershon, Sa’ar Karp
A1  - Ruipérez-Valiente, José A.
A1  - Alexandron, Giora
T1  - MOOC Monetization Changes and Completion Rates
BT  - Are Learners from Countries of Different Development Status Equally Affected?
JF  - EMOOCs 2021
N2  - Massive Open Online Courses (MOOCs) offer online courses at low cost for anyone with an internet access. At its early days, the MOOC movement raised the flag of democratizing education, but soon enough, this utopian idea collided with the need to find sustainable business models. Moving from open access to a new financially sustainable certification and monetization policy in December 2015 we aim at this change-point and observe the completion rates before and after this monetary change. In this study we investigate the impact of the change on learners from countries of different development status. Our findings suggest that this change has lowered the completion rates among learners from developing countries, increasing gaps that already existed between global learners from countries of low and high development status. This suggests that more inclusive monetization policies may help MOOCs benefits to spread more equally among global learners.
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-517189
SN  - 978-3-86956-512-5
VL  - 2021
SP  - 169
EP  - 179
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Gerken, Stefanie
A1  - Uebernickel, Falk
A1  - de Paula, Danielly
T1  - Design Thinking: a Global Study on Implementation Practices in Organizations
T1  - Design Thinking: eine globale Studie über Implementierungspraktiken in Organisationen
BT  - Past - Present - Future
BT  - Vergangenheit - Gegenwart - Zukunft
N2  - These days design thinking is no longer a “new approach”. Among practitioners, as well as academics, interest in the topic has gathered pace over the last two decades. However, opinions are divided over the longevity of the phenomenon: whether design thinking is merely “old wine in new bottles,” a passing trend, or still evolving as it is being spread to an increasing number of organizations and industries. Despite its growing relevance and the diffusion of design thinking, knowledge on the actual status quo in organizations remains scarce. With a new study, the research team of Prof. Uebernickel and Stefanie Gerken investigates temporal developments and changes in design thinking practices in organizations over the past six years comparing the results of the 2015 “Parts without a whole” study with current practices and future developments. Companies of all sizes and from different parts of the world participated in the survey. The findings from qualitative interviews with experts, i.e., people who have years of knowledge with design thinking, were cross-checked with the results from an exploratory analysis of the survey data. This analysis uncovers significant variances and similarities in how design thinking is interpreted and applied in businesses.
N2  - Heutzutage ist Design Thinking kein "neuer Ansatz" mehr. Unter Praktikern und Akademikern hat das Interesse an diesem Thema in den letzten zwei Jahrzehnten stark zugenommen. Die Meinungen sind jedoch geteilt, ob Design Thinking lediglich "alter Wein in neuen Schläuchen" ist, ein vorübergehender Trend, oder ein sich weiterentwickelndes Phänomen, welches in immer mehr Organisationen und Branchen Fuß fast. Trotz der wachsenden Relevanz und Verbreitung von Design Thinking ist das Wissen über den tatsächlichen Status quo in Organisationen nach wie vor spärlich. Mit einer neuen Studie untersucht das Forschungsteam von Prof. Uebernickel, Stefanie Gerken und Dr. Danielly de Paula die zeitlichen Entwicklungen und Veränderungen von Design Thinking Praktiken in Organisationen über die letzten sechs Jahre und vergleicht die Ergebnisse der Studie "Parts without a whole" aus dem Jahr 2015 mit aktuellen Praktiken und perspektivischen Entwicklungen. An der Studie haben Unternehmen aller Größen und aus verschiedenen Teilen der Welt teilgenommen. Um dem komplexen Untersuchungsgegenstand gerecht zu werden, wurde eine Mixed-Method-Ansatz gewählt: Die Erkenntnisse aus qualitativen Experteninterviews, d.h. Personen, die sich seit Jahren mit dem Thema Design Thinking in der Praxis beschäftigen, wurden mit den Ergebnissen einer quantitativen Analyse von Umfragedaten abgeglichen. Die vorliegende Studie erörtert signifikante Unterschiede und Gemeinsamkeiten bei der Interpretation und Anwendung von Design Thinking in Unternehmen.
KW  - Design Thinking
KW  - Agile
KW  - Implementation in Organizations
KW  - life-centered
KW  - human-centered
KW  - Innovation
KW  - Behavior change
KW  - Problem Solving
KW  - Creative
KW  - Solution Space
KW  - Process
KW  - Mindset
KW  - Tools
KW  - Wicked Problems
KW  - VUCA-World
KW  - Ambiguity
KW  - Interdisciplinary Teams
KW  - Multidisciplinary Teams
KW  - Impact
KW  - Measurement
KW  - Ideation
KW  - Agilität
KW  - agil
KW  - Ambiguität
KW  - Verhaltensänderung
KW  - Kreativität
KW  - Design Thinking
KW  - Ideenfindung
KW  - Auswirkungen
KW  - Implementierung in Organisationen
KW  - Innovation
KW  - interdisziplinäre Teams
KW  - Messung
KW  - Denkweise
KW  - multidisziplinäre Teams
KW  - Problemlösung
KW  - Prozess
KW  - Lösungsraum
KW  - Werkzeuge
KW  - Aktivitäten
KW  - verzwickte Probleme
KW  - menschenzentriert
KW  - lebenszentriert
KW  - VUCA-World
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-534668
SN  - 978-3-86956-525-5
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Garus, Marcel
A1  - Sawahn, Rohan
A1  - Wanke, Jonas
A1  - Tiedt, Clemens
A1  - Granzow, Clara
A1  - Kuffner, Tim
A1  - Rosenbaum, Jannis
A1  - Hagemann, Linus
A1  - Wollnik, Tom
A1  - Woth, Lorenz
A1  - Auringer, Felix
A1  - Kantusch, Tobias
A1  - Roth, Felix
A1  - Hanff, Konrad
A1  - Schilli, Niklas
A1  - Seibold, Leonard
A1  - Lindner, Marc Fabian
A1  - Raschack, Selina
ED  - Grapentin, Andreas
ED  - Tiedt, Clemens
ED  - Polze, Andreas
T1  - Operating systems II - student projects
N2  - This technical report presents the results of student projects which were prepared during the lecture “Operating Systems II” offered by the “Operating Systems and Middleware” group at HPI in the Summer term of 2020. The lecture covered ad- vanced aspects of operating system implementation and architecture on topics such as Virtualization, File Systems and Input/Output Systems. In addition to attending the lecture, the participating students were encouraged to gather practical experience by completing a project on a closely related topic over the course of the semester. The results of 10 selected exceptional projects are covered in this report.

The students have completed hands-on projects on the topics of Operating System Design Concepts and Implementation, Hardware/Software Co-Design, Reverse Engineering, Quantum Computing, Static Source-Code Analysis, Operating Systems History, Application Binary Formats and more. It should be recognized that over the course of the semester all of these projects have achieved outstanding results which went far beyond the scope and the expec- tations of the lecture, and we would like to thank all participating students for their commitment and their effort in completing their respective projects, as well as their work on compiling this report.
N2  - Dieser technische Bericht beschriebt die Ergebnisse der Projekte, welche im Rahmen der Lehrveranstaltung "Betriebssysteme II" on teilnehmenden Studierenden durchgeführt wurden. Die Lehrveranstaltung wurde von der "Betriebssysteme und Middleware" am HPI im Sommersemester 2020 durchgeführt und behandele fortgeschrittene Aspekte der Betriebssystemarchitektur und -Implementierung am Beispiel der Virtualisierung, der Dateisysteme und der Eingabe/Ausgabe (I/O) Systeme. Zusätzlich zu den Vorlesungen wurden die Studierenden angeleitet, durch die Durchführung eines begleitenden Projekts praktische Erfahrungen im Umgang mit den behandelten Themen zu sammeln. Die Ergebnisse von 10 ausgewählten, herausragenden Projekten werden in diesem Report vorgestellt.

Die Studierenden haben unter anderem Projekte zu den Themen Betriebssystemdesign und -Implementierung, Hardware/Software Co-Design, Reverse Engineering, Quanten-Computing, Statische Quellcodeanalyse, Betriebssystemgeschichte, dem Binärformat von ausführbaren Dateien durchgeführt. Es ist anzuerkennen, dass alle teilnehmenden Studierenden im Verlauf des Semesters herausragende Ergebnisse erzielt haben, die weit über die Anforderungen der Lehrveranstaltung hinausgingen. Wir möchten uns bei allen teilnehmenden Studierenden für Ihren Einsatz bei der Durchführung der Projekte, sowie bei der Erstellung dieses Reports bedanken.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 142 
KW  - operating systems
KW  - network protocols
KW  - software/hardware co-design
KW  - static source-code analysis
KW  - reverse engineering
KW  - quantum computing
KW  - Betriebssysteme
KW  - Netzwerkprotokolle
KW  - Software/Hardware Co-Design
KW  - statische Quellcodeanalyse
KW  - Reverse Engineering
KW  - Quanten-Computing
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-526363
SN  - 978-3-86956-524-8
SN  - 1613-5652
SN  - 2191-1665
IS  - 142
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Garrels, Tim
A1  - Khodabakhsh, Athar
A1  - Renard, Bernhard Y.
A1  - Baum, Katharina
T1  - LazyFox: fast and parallelized overlapping community detection in large graphs
JF  - PEERJ Computer Science
N2  - The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, FOX, that detects such overlapping communities. FOX measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LAZYFOX, a multi-threaded adaptation of the FOX algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LAZYFOX enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LAZYFOX's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.
KW  - Overlapping community detection
KW  - Large networks
KW  - Weighted clustering coefficient
KW  - Heuristic triangle estimation
KW  - Parallelized algorithm
KW  - C++ tool
KW  - Runtime improvement
KW  - Open source
KW  - Graph algorithm
KW  - Community analysis
Y1  - 2023
U6  - https://doi.org/10.7717/peerj-cs.1291
SN  - 2376-5992
VL  - 9
PB  - PeerJ Inc.
CY  - London
ER  - 
TY  - BOOK
A1  - Freund, Rieke
A1  - Rätsch, Jan Philip
A1  - Hradilak, Franziska
A1  - Vidic, Benedikt
A1  - Heß, Oliver
A1  - Lißner, Nils
A1  - Wölert, Hendrik
A1  - Lincke, Jens
A1  - Beckmann, Tom
A1  - Hirschfeld, Robert
T1  - Implementing a crowd-sourced picture archive for Bad Harzburg
N2  - Pictures are a medium that helps make the past tangible and preserve memories. Without context, they are not able to do so. Pictures are brought to life by their associated stories. However, the older pictures become, the fewer contemporary witnesses can tell these stories.
Especially for large, analog picture archives, knowledge and memories are spread over many people. This creates several challenges: First, the pictures must be digitized to save them from decaying and make them available to the public. Since a simple listing of all the pictures is confusing, the pictures should be structured accessibly. Second, known information that makes the stories vivid needs to be added to the pictures. Users should get the opportunity to contribute their knowledge and memories. To make this usable for all interested parties, even for older, less technophile generations, the interface should be intuitive and error-tolerant.
The resulting requirements are not covered in their entirety by any existing software solution without losing the intuitive interface or the scalability of the system.
Therefore, we have developed our digital picture archive within the scope of a bachelor project in cooperation with the Bad Harzburg-Stiftung. For the implementation of this web application, we use the UI framework React in the frontend, which communicates via a GraphQL interface with the Content Management System Strapi in the backend. The use of this system enables our project partner to create an efficient process from scanning analog pictures to presenting them to visitors in an organized and annotated way. To customize the solution for both picture delivery and information contribution for our target group, we designed prototypes and evaluated them with people from Bad Harzburg. This helped us gain valuable insights into our system’s usability and future challenges as well as requirements.
Our web application is already being used daily by our project partner. During the project, we still came up with numerous ideas for additional features to further support the exchange of knowledge.
N2  - Bilder können dabei helfen, die Vergangenheit greifbar zu machen und Erinnerungen zu bewahren, doch alleinstehende Bilder ohne Kontext erreichen das nur schwer. Der große Wert besteht in den Geschichten, die mit den Bildern verbunden sind. Je älter die Bilder jedoch werden, desto weniger Zeitzeugen können von diesen Geschichten berichten.
Besonders für große analoge Bildarchive, bei denen sich das Wissen und die Erinnerungen auf viele Personen verteilen, entstehen dadurch verschiedene Herausforderungen: Zunächst müssen die Bilder digitalisiert werden, um sie vor dem Zerfall zu schützen und um sie der Öffentlichkeit zugänglich machen zu können. Da eine einfache Aufreihung aller Bilder unübersichtlich ist, sollten die Bilder in eine zugängliche Struktur gebracht werden. Des Weiteren müssen zu den Bildern bekannte Informationen, aus denen ihre Geschichten erfahrbar werden, hinzugefügt werden. Nutzende sollen die Möglichkeit haben, eigenes Wissen und Erinnerungen beizutragen. Um dies für alle Interessierten, auch für ältere, evtl. wenig technikaffine Personen, nutzbar zu machen, sollte die Oberfläche eine intuitive und fehlertolerante Nutzung ermöglichen.
Die sich daraus ergebenden Anforderungen werden von keiner existierenden Softwarelösung im Gesamten abgedeckt, ohne die intuitive Oberfläche oder die Skalierbarkeit des Systems zu verlieren.

Daher haben wir im Rahmen eines Bachelorprojekts in Zusammenarbeit mit der Bad Harzburg-Stiftung ein eigenes digitales Bildarchiv entwickelt. Für die Umsetzung dieser Webapplikation nutzen wir das UI-Framework React im Frontend, welches über eine GraphQL-Schnittstelle mit dem Content Management System Strapi im Backend kommuniziert. Die Nutzung dieses Systems ermöglicht unserem Projektpartner einen effizienten Prozess vom Scannen der analogen Bilder bis zum geordneten und annotierten Darstellen für Besuchende. Um die Lösung sowohl für das Bereitstellen der Bilder als auch für das Beitragen von Informationen auf unsere Zielgruppe zuzuschneiden, haben wir Prototypen entworfen und mit Menschen aus Bad Harzburg getestet, um ihre Eindrücke auszuwerten. Mit diesen konnten wir wertvolle Erkenntnisse über die Nutzbarkeit und noch offene Herausforderungen und Anforderungen gewinnen.
Unsere Webanwendung ist bei unserem Projektpartner bereits im täglichen Einsatz. Trotzdem haben wir während des Projekts noch zahlreiche Ideen für zusätzliche Funktionen erarbeitet, um den Wissensaustausch weiter zu fördern.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 149 
KW  - digital picture archive
KW  - analog-to-digital conversion
KW  - user-generated content
KW  - intuitive interfaces
KW  - digitales Bildarchiv
KW  - Analog-zu-Digital-Konvertierung
KW  - benutzergenerierte Inhalte
KW  - intuitive Benutzeroberflächen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-560291
SN  - 978-3-86956-545-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 149
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Freitas da Cruz, Harry
A1  - Pfahringer, Boris
A1  - Martensen, Tom
A1  - Schneider, Frederic
A1  - Meyer, Alexander
A1  - Böttinger, Erwin
A1  - Schapranow, Matthieu-Patrick
T1  - Using interpretability approaches to update "black-box" clinical prediction models
BT  - an external validation study in nephrology
JF  - Artificial intelligence in medicine : AIM
N2  - Despite advances in machine learning-based clinical prediction models, only few of such models are actually deployed in clinical contexts. Among other reasons, this is due to a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients initially developed on the MIMIC-III dataset when applied to an external cohort of an American research hospital. To help account for the performance differences observed, we utilized interpretability methods based on feature importance, which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. The knowledge gleaned upon derivation can be potentially useful to assist model update during validation for more generalizable and simpler models. We argue that interpretability methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.
KW  - Clinical predictive modeling
KW  - Nephrology
KW  - Validation
KW  - Interpretability
KW  - methods
Y1  - 2021
U6  - https://doi.org/10.1016/j.artmed.2020.101982
SN  - 0933-3657
SN  - 1873-2860
VL  - 111
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - BOOK
A1  - Flotterer, Boris
A1  - Maximova, Maria
A1  - Schneider, Sven
A1  - Dyck, Johannes
A1  - Zöllner, Christian
A1  - Giese, Holger
A1  - Hély, Christelle
A1  - Gaucherel, Cédric
T1  - Modeling and Formal Analysis of Meta-Ecosystems with Dynamic Structure using Graph Transformation
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam
N2  - The dynamics of ecosystems is of crucial importance. Various model-based approaches exist to understand and analyze their internal effects. In this paper, we model the space structure dynamics and ecological dynamics of meta-ecosystems using the formal technique of Graph Transformation (short GT). We build GT models to describe how a meta-ecosystem (modeled as a graph) can evolve over time (modeled by GT rules) and to analyze these GT models with respect to qualitative properties such as the existence of structural stabilities. As a case study, we build three GT models describing the space structure dynamics and ecological dynamics of three different savanna meta-ecosystems. The first GT model considers a savanna meta-ecosystem that is limited in space to two ecosystem patches, whereas the other two GT models consider two savanna meta-ecosystems that are unlimited in the number of ecosystem patches and only differ in one GT rule describing how the space structure of the meta-ecosystem grows. In the first two GT models, the space structure dynamics and ecological dynamics of the meta-ecosystem shows two main structural stabilities: the first one based on grassland-savanna-woodland transitions and the second one based on grassland-desert transitions. The transition between these two structural stabilities is driven by high-intensity fires affecting the tree components. In the third GT model, the GT rule for savanna regeneration induces desertification and therefore a collapse of the meta-ecosystem. We believe that GT models provide a complementary avenue to that of existing approaches to rigorously study ecological phenomena.
N2  - Die Dynamik von Ökosystemen ist von entscheidender Bedeutung. Es gibt verschiedene modellbasierte Ansätze, um ihre internen Effekte zu verstehen und zu analysieren. In diesem Beitrag modellieren wir die Raumstrukturdynamik und ökologische Dynamik von Metaökosystemen mit der formalen Technik der Graphtransformation (kurz GT). Wir bauen GT-Modelle, um zu beschreiben, wie sich ein Meta-Ökosystem (modelliert als Graph) im Laufe der Zeit entwickeln kann (modelliert durch GT-Regeln) und analysieren diese GT-Modelle hinsichtlich qualitativer Eigenschaften wie das Vorhandensein struktureller Stabilitäten. Als Fallstudie bauen wir drei GT-Modelle, die die Dynamik der Raumstruktur und die ökologische Dynamik von drei verschiedenen Savannen-Meta-Ökosystemen beschreiben. Das erste GT-Modell betrachtet ein Savannen-Meta-Ökosystem, das räumlich auf zwei Ökosystem-Abschnitte begrenzt ist, während die anderen beiden GT-Modelle zwei Savannen-Meta-Ökosysteme betrachten, die in der Anzahl von Ökosystem-Abschnitten uneingeschränkt sind und sich nur in einer GT-Regel unterscheiden, die beschreibt, wie die Raumstruktur des Meta-Ökosystems wächst. In den ersten beiden GT-Modellen zeigen die Raumstrukturdynamik und die ökologische Dynamik des Metaökosystems zwei Hauptstrukturstabilitäten: die erste basiert auf Grasland-Savannen-Wald-Übergängen und die zweite basiert auf Grasland-Wüsten-Übergängen. Der Übergang zwischen diesen beiden strukturellen Stabilitäten wird durch hochintensive Brände angetrieben, die die Baumkomponenten beeinträchtigen. Beim dritten GT-Modell führt die Savannenregeneration beschreibende GT-Regel zur Wüstenbildung und damit zum Kollaps des Meta-Ökosystems. Wir glauben, dass GT-Modelle eine gute Ergänzung zu bestehenden Ansätzen darstellen, um ökologische Phänomene rigoros zu untersuchen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 147 
KW  - dynamic systems
KW  - discrete-event model
KW  - qualitative model
KW  - savanna
KW  - trajectories
KW  - desertification
KW  - dynamische Systeme
KW  - diskretes Ereignismodell
KW  - qualitatives Modell
KW  - Savanne
KW  - Trajektorien
KW  - Wüstenbildung
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-547643
SN  - 978-3-86956-533-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 147
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Elsaid, Mohamed Esameldin Mohamed
T1  - Virtual machines live migration cost modeling and prediction
T1  - Modellierung und Vorhersage der Live-Migrationskosten für Virtuelle Maschinen
N2  - Dynamic resource management is an essential requirement for private and public cloud computing environments. With dynamic resource management, the physical resources assignment to the cloud virtual resources depends on the actual need of the applications or the running services, which enhances the cloud physical resources utilization and reduces the offered services cost. In addition, the virtual resources can be moved across different physical resources in the cloud environment without an obvious impact on the running applications or services production. This means that the availability of the running services and applications in the cloud is independent on the hardware resources including the servers, switches and storage failures. This increases the reliability of using cloud services compared to the classical data-centers environments.
In this thesis we briefly discuss the dynamic resource management topic and then deeply focus on live migration as the definition of the compute resource dynamic management. Live migration is a commonly used and an essential feature in cloud and virtual data-centers environments. Cloud computing load balance, power saving and fault tolerance features are all dependent on live migration to optimize the virtual and physical resources usage. As we will discuss in this thesis, live migration shows many benefits to cloud and virtual data-centers environments, however the cost of live migration can not be ignored. Live migration cost includes the migration time, downtime, network overhead, power consumption increases and CPU overhead.
IT admins run virtual machines live migrations without an idea about the migration cost. So, resources bottlenecks, higher migration cost and migration failures might happen. The first problem that we discuss in this thesis is how to model the cost of the virtual machines live migration. Secondly, we investigate how to make use of machine learning techniques to help the cloud admins getting an estimation of this cost before initiating the migration for one of multiple virtual machines. Also, we discuss the optimal timing for a specific virtual machine before live migration to another server. Finally, we propose practical solutions that can be used by the cloud admins to be integrated with the cloud administration portals to answer the raised research questions above.
Our research methodology to achieve the project objectives is to propose empirical models based on using VMware test-beds with different benchmarks tools. Then we make use of the machine learning techniques to propose a prediction approach for virtual machines live migration cost. Timing optimization for live migration is also proposed in this thesis based on using the cost prediction and data-centers network utilization prediction. Live migration with persistent memory clusters is also discussed at the end of the thesis. The cost prediction and timing optimization techniques proposed in this thesis could be practically integrated with VMware vSphere cluster portal such that the IT admins can now use the cost prediction feature and timing optimization option before proceeding with a virtual machine live migration.
Testing results show that our proposed approach for VMs live migration cost prediction shows acceptable results with less than 20% prediction error and can be easily implemented and integrated with VMware vSphere as an example of a commonly used resource management portal for virtual data-centers and private cloud environments. The results show that using our proposed VMs migration timing optimization technique also could save up to 51% of migration time of the VMs migration time for memory intensive workloads and up to 27% of the migration time for network intensive workloads. This timing optimization technique can be useful for network admins to save migration time with utilizing higher network rate and higher probability of success.
At the end of this thesis, we discuss the persistent memory technology as a new trend in servers memory technology. Persistent memory modes of operation and configurations are discussed in detail to explain how live migration works between servers with different memory configuration set up. Then, we build a VMware cluster with persistent memory inside server and also with DRAM only servers to show the live migration cost difference between the VMs with DRAM only versus the VMs with persistent memory inside.
N2  - Die dynamische Ressourcenverwaltung ist eine wesentliche Voraussetzung für private und öffentliche Cloud-Computing-Umgebungen. Bei der dynamischen Ressourcenverwaltung hängt die Zuweisung der physischen Ressourcen zu den virtuellen Cloud-Ressourcen vom tatsächlichen Bedarf der Anwendungen oder der laufenden Dienste ab, was die Auslastung der physischen Cloud-Ressourcen verbessert und die Kosten für die angebotenen Dienste reduziert. Darüber hinaus können die virtuellen Ressourcen über verschiedene physische Ressourcen in der Cloud-Umgebung verschoben werden, ohne dass dies einen offensichtlichen Einfluss auf die laufenden Anwendungen oder die Produktion der Dienste hat. Das bedeutet, dass die Verfügbarkeit der laufenden Dienste und Anwendungen in der Cloud unabhängig von den Hardwareressourcen einschließlich der Server, Netzwerke und Speicherausfälle ist. Dies erhöht die Zuverlässigkeit bei der Nutzung von Cloud-Diensten im Vergleich zu klassischen Rechenzentrumsumgebungen.
In dieser Arbeit wird das Thema der dynamischen Ressourcenverwaltung kurz erörtert, um sich dann eingehend mit der Live-Migration als Definition der dynamischen Verwaltung von Compute-Ressourcen zu beschäftigen. Live-Migration ist eine häufig verwendete und wesentliche Funktion in Cloud- und virtuellen Rechenzentrumsumgebungen. Cloud-Computing-Lastausgleich, Energiespar- und Fehlertoleranzfunktionen sind alle von der Live-Migration abhängig, um die Nutzung der virtuellen und physischen Ressourcen zu optimieren. Wie wir in dieser Arbeit erörtern werden, zeigt die Live-Migration viele Vorteile für Cloud- und virtuelle Rechenzentrumsumgebungen, jedoch können die Kosten der Live-Migration nicht ignoriert werden. Zu den Kosten der Live-Migration gehören die Migrationszeit, die Ausfallzeit, der Netzwerk-Overhead, der Anstieg des Stromverbrauchs und der CPU-Overhead.
IT-Administratoren führen Live-Migrationen von virtuellen Maschinen durch, ohne eine Vorstellung von den Migrationskosten zu haben. So kann es zu Ressourcenengpässen, höheren Migrationskosten und Migrationsfehlern kommen. Das erste Problem, das wir in dieser Arbeit diskutieren, ist, wie man die Kosten der Live-Migration virtueller Maschinen modellieren kann. Zweitens untersuchen wir, wie maschinelle Lerntechniken eingesetzt werden können, um den Cloud-Administratoren zu helfen, eine Schätzung dieser Kosten zu erhalten, bevor die Migration für eine oder mehrere virtuelle Maschinen eingeleitet wird. Außerdem diskutieren wir das optimale Timing für eine bestimmte virtuelle Maschine vor der Live-Migration auf einen anderen Server. Schließlich schlagen wir praktische Lösungen vor, die von den Cloud-Admins verwendet werden können, um in die Cloud-Administrationsportale integriert zu werden, um die oben aufgeworfenen Forschungsfragen zu beantworten.
Unsere Forschungsmethodik zur Erreichung der Projektziele besteht darin, empirische Modelle vorzuschlagen, die auf der Verwendung von VMware-Testbeds mit verschiedenen Benchmark-Tools basieren. Dann nutzen wir die Techniken des maschinellen Lernens, um einen Vorhersageansatz für die Kosten der Live-Migration virtueller Maschinen vorzuschlagen. Die Timing-Optimierung für die Live-Migration wird ebenfalls in dieser Arbeit vorgeschlagen, basierend auf der Kostenvorhersage und der Vorhersage der Netzwerkauslastung des Rechenzentrums. Die Live-Migration mit Clustern mit persistentem Speicher wird ebenfalls am Ende der Arbeit diskutiert.
Die in dieser Arbeit vorgeschlagenen Techniken zur Kostenvorhersage und Timing-Optimierung könnten praktisch in das VMware vSphere-Cluster-Portal integriert werden, so dass die IT-Administratoren nun die Funktion zur Kostenvorhersage und die Option zur Timing-Optimierung nutzen können, bevor sie mit einer Live-Migration der virtuellen Maschine fortfahren.
Die Testergebnisse zeigen, dass unser vorgeschlagener Ansatz für die VMs-Live-Migrationskostenvorhersage akzeptable Ergebnisse mit weniger als 20\% Fehler in der Vorhersagegenauigkeit zeigt und leicht implementiert und in VMware vSphere als Beispiel für ein häufig verwendetes Ressourcenmanagement-Portal für virtuelle Rechenzentren und private Cloud-Umgebungen integriert werden kann. Die Ergebnisse zeigen, dass mit der von uns vorgeschlagenen Technik zur Timing-Optimierung der VMs-Migration auch bis zu 51\% der Migrationszeit für speicherintensive Workloads und bis zu 27\% der Migrationszeit für netzwerkintensive Workloads eingespart werden können. Diese Timing-Optimierungstechnik kann für Netzwerkadministratoren nützlich sein, um Migrationszeit zu sparen und dabei eine höhere Netzwerkrate und eine höhere Erfolgswahrscheinlichkeit zu nutzen.
Am Ende dieser Arbeit wird die persistente Speichertechnologie als neuer Trend in der Server-Speichertechnologie diskutiert. Die Betriebsarten und Konfigurationen des persistenten Speichers werden im Detail besprochen, um zu erklären, wie die Live-Migration zwischen Servern mit unterschiedlichen Speicherkonfigurationen funktioniert. Dann bauen wir einen VMware-Cluster mit persistentem Speicher im Server und auch mit Servern nur mit DRAM auf, um den Kostenunterschied bei der Live-Migration zwischen den VMs mit nur DRAM und den VMs mit persistentem Speicher im Server zu zeigen.
KW  - virtual
KW  - cloud
KW  - computing
KW  - machines
KW  - live migration
KW  - machine learning
KW  - prediction
KW  - Wolke
KW  - Computing
KW  - Live-Migration
KW  - maschinelles Lernen
KW  - Maschinen
KW  - Vorhersage
KW  - virtuell
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-540013
ER  - 
TY  - BOOK
A1  - Eichenroth, Friedrich
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Fast packrat parsing in a live programming environment
BT  - improving left-recursion in parsing expression grammars
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam
N2  - Language developers who design domain-specific languages or new language features need a way to make fast changes to language definitions. Those fast changes require immediate feedback. Also, it should be possible to parse the developed languages quickly to handle extensive sets of code.

Parsing expression grammars provides an easy to understand method for language definitions. Packrat parsing is a method to parse grammars of this kind, but this method is unable to handle left-recursion properly. Existing solutions either partially rewrite left-recursive rules and partly forbid them, or use complex extensions to packrat parsing that are hard to understand and cost-intensive. We investigated methods to make parsing as fast as possible, using easy to follow algorithms while not losing the ability to make fast changes to grammars.

We focused our efforts on two approaches.

One is to start from an existing technique for limited left-recursion rewriting and enhance it to work for general left-recursive grammars. The second approach is to design a grammar compilation process to find left-recursion before parsing, and in this way, reduce computational costs wherever possible and generate ready to use parser classes.

Rewriting parsing expression grammars is a task that, if done in a general way, unveils a large number of cases such that any rewriting algorithm surpasses the complexity of other left-recursive parsing algorithms. Lookahead operators introduce this complexity. However, most languages have only little portions that are left-recursive and in virtually all cases, have no indirect or hidden left-recursion. This means that the distinction of left-recursive parts of grammars from components that are non-left-recursive holds great improvement potential for existing parsers.

In this report, we list all the required steps for grammar rewriting to handle left-recursion, including grammar analysis, grammar rewriting itself, and syntax tree restructuring. Also, we describe the implementation of a parsing expression grammar framework in Squeak/Smalltalk and the possible interactions with the already existing parser Ohm/S. We quantitatively benchmarked this framework directing our focus on parsing time and the ability to use it in a live programming context. Compared with Ohm, we achieved massive parsing time improvements while preserving the ability to use our parser it as a live programming tool.

The work is essential because, for one, we outlined the difficulties and complexity that come with grammar rewriting. Also, we removed the existing limitations that came with left-recursion by eliminating them before parsing.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 135 
KW  - packrat parsing
KW  - parsing expression grammars
KW  - left recursion
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-491242
SN  - 978-3-86956-503-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 135
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Ehrig, Hartmut
A1  - Golas, Ulrike
A1  - Habel, Annegret
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - M-adhesive transformation systems with nested application conditions
BT  - Part 1: parallelism, concurrency and amalgamation
T2  - Postprints der Universität Potsdam : Digital Engineering Reihe
N2  - Nested application conditions generalise the well-known negative application conditions and are important for several application domains. In this paper, we present Local Church-Rosser, Parallelism, Concurrency and Amalgamation Theorems for rules with nested application conditions in the framework of M-adhesive categories, where M-adhesive categories are slightly more general than weak adhesive high-level replacement categories. Most of the proofs are based on the corresponding statements for rules without application conditions and two shift lemmas stating that nested application conditions can be shifted over morphisms and rules.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 1 
KW  - level-replacement systems
KW  - graph-transformations
KW  - distributed systems
KW  - synchronization
KW  - confluence
KW  - categories
KW  - programs
KW  - grammars
KW  - model
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-415651
IS  - 001
ER  - 
TY  - BOOK
A1  - Dürsch, Falco
A1  - Rein, Patrick
A1  - Mattis, Toni
A1  - Hirschfeld, Robert
T1  - Learning from failure
BT  - a history-based, lightweight test prioritization technique connecting software changes to test failures
N2  - Regression testing is a widespread practice in today's software industry to ensure software product quality. Developers derive a set of test cases, and execute them frequently to ensure that their change did not adversely affect existing functionality. As the software product and its test suite grow, the time to feedback during regression test sessions increases, and impedes programmer productivity: developers wait longer for tests to complete, and delays in fault detection render fault removal increasingly difficult.

Test case prioritization addresses the problem of long feedback loops by reordering test cases, such that test cases of high failure probability run first, and test case failures become actionable early in the testing process. We ask, given test execution schedules reconstructed from publicly available data, to which extent can their fault detection efficiency improved, and which technique yields the most efficient test schedules with respect to APFD?

To this end, we recover regression 6200 test sessions from the build log files of Travis CI, a popular continuous integration service, and gather 62000 accompanying changelists. We evaluate the efficiency of current test schedules, and examine the prioritization results of state-of-the-art lightweight, history-based heuristics. We propose and evaluate a novel set of prioritization algorithms, which connect software changes and test failures in a matrix-like data structure.

Our studies indicate that the optimization potential is substantial, because the existing test plans score only 30% APFD. The predictive power of past test failures proves to be outstanding: simple heuristics, such as repeating tests with failures in recent sessions, result in efficiency scores of 95% APFD. The best-performing matrix-based heuristic achieves a similar score of 92.5% APFD. In contrast to prior approaches, we argue that matrix-based techniques are useful beyond the scope of effective prioritization, and enable a number of use cases involving software maintenance.

We validate our findings from continuous integration processes by extending a continuous testing tool within development environments with means of test prioritization, and pose further research questions. We think that our findings are suited to propel adoption of (continuous) testing practices, and that programmers' toolboxes should contain test prioritization as an existential productivity tool.
N2  - Regressionstests sind in der heutigen Softwareindustrie weit verbreitete Praxis um die Qualität eines Softwareprodukts abzusichern. Dabei leiten Entwickler von den gestellten Anforderungen Testfälle ab und führen diese wiederholt aus, um sicherzustellen, dass ihre Änderungen die bereits existierende Funktionalität nicht negativ beeinträchtigen. Steigt die Größe und Komplexität der Software und ihrer Testsuite, so wird die Feedbackschleife der Testausführungen länger, und mindert die Produktivität der Entwickler: Sie warten länger auf das Testergebnis, und die Fehlerbehebung gestaltet sich umso schwieriger, je länger die Ursache zurückliegt.

Um die Feedbackschleife zu verkürzen, ändern Testpriorisierungs-Algorithmen die Reihenfolge der Testfälle, sodass Testfälle, die mit hoher Wahrscheinlichkeit fehlschlagen, zuerst ausgeführt werden. Der vorliegende Bericht beschäftigt sich mit der Frage nach der Effizienz von Testplänen, welche aus öffentlich einsehbaren Daten rekonstruierbar sind, und welche anwendbaren Priorisierungs-Techniken die effizienteste Testreihenfolge in Bezug auf APFD hervorbringen.

Zu diesem Zweck werden 6200 Testsitzungen aus den Logdateien von Travis CI, einem oft verwendeten Dienst für Continuous Integration, und über 62000 Änderungslisten rekonstruiert. Auf dieser Grundlage wird die Effizienz der derzeitigen Testpläne bewertet, als auch solcher, die aus der Neupriorisierung durch leichtgewichtige, verlaufsbasierte Algorithmen hervorgehen. Zudem schlägt der vorliegende Bericht eine neue Gruppe von Ansätzen vor, die Testfehlschläge und Softwareänderungen mit Hilfe einer Matrix in Bezug setzt.

Da die beobachteten Testreihenfolgen nur 30% APFD erzielen, liegt wesentliches Potential für Optimierung vor. Dabei besticht die Vorhersagekraft der unmittelbar vorangegangen Testfehlschläge: einfache Heuristiken, wie das Wiederholen von Tests, welche kürzlich fehlgeschlagen sind, führen zu Testplänen mit einer Effizienz von 95% APFD. Matrix-basierte Ansätze erreichen eine Fehlererkennungsrate von bis zu 92.5% APFD. Im Gegensatz zu den bisher bekannten Ansätzen sind die matrix-basierten Techniken auch über den Zweck der Testpriorisierung hinaus nützlich, und sind in der Softwarewartung anwendbar.

Zusätzlich werden die Ergebnisse der vorliegenden Studie für Continuous Integration Systeme im Kontext integrierter Entwicklungsumgebungen validiert, indem ein Tool für Continuous Testing um Testpriorisierung erweitert wird. Dies führt zu neuen Forschungsfragen. Die Untersuchungsergebnisse sind geeignet die Einführung von Continuous Testing zu befördern, und untermauern, dass Werkzeuge der Testpriorisierung für produktive Softwareentwicklung essenziell sind.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 145 
KW  - test case prioritization
KW  - continuous integration
KW  - regression testing
KW  - version control
KW  - live programming
KW  - heuristics
KW  - data set
KW  - test results
KW  - GitHub
KW  - Java
KW  - Testpriorisierungs
KW  - kontinuierliche Integration
KW  - Regressionstests
KW  - Versionsverwaltung
KW  - Live-Programmierung
KW  - Heuristiken
KW  - Datensatz
KW  - Testergebnisse
KW  - GitHub
KW  - Java
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-537554
SN  - 978-3-86956-528-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 145
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Draisbach, Uwe
T1  - Efficient duplicate detection and the impact of transitivity
T1  - Effiziente Dublettenerkennung und der Einfluss von Transitivität
N2  - Duplicate detection describes the process of finding multiple representations of the same real-world entity in the absence of a unique identifier, and has many application areas, such as customer relationship management, genealogy and social sciences, or online shopping. Due to the increasing amount of data in recent years, the problem has become even more challenging on the one hand, but has led to a renaissance in duplicate detection research on the other hand.
This thesis examines the effects and opportunities of transitive relationships on the duplicate detection process. Transitivity implies that if record pairs ⟨ri,rj⟩ and ⟨rj,rk⟩ are classified as duplicates, then also record pair ⟨ri,rk⟩ has to be a duplicate. However, this reasoning might contradict with the pairwise classification, which is usually based on the similarity of objects. An essential property of similarity, in contrast to equivalence, is that similarity is not necessarily transitive.
First, we experimentally evaluate the effect of an increasing data volume on the threshold selection to classify whether a record pair is a duplicate or non-duplicate. Our experiments show that independently of the pair selection algorithm and the used similarity measure, selecting a suitable threshold becomes more difficult with an increasing number of records due to an increased probability of adding a false duplicate to an existing cluster. Thus, the best threshold changes with the dataset size, and a good threshold for a small (possibly sampled) dataset is not necessarily a good threshold for a larger (possibly complete) dataset. As data grows over time, earlier selected thresholds are no longer a suitable choice, and the problem becomes worse for datasets with larger clusters.
Second, we present with the Duplicate Count Strategy (DCS) and its enhancement DCS++ two alternatives to the standard Sorted Neighborhood Method (SNM) for the selection of candidate record pairs. DCS adapts SNMs window size based on the number of detected duplicates and DCS++ uses transitive dependencies to save complex comparisons for finding duplicates in larger clusters. We prove that with a proper (domain- and data-independent!) threshold, DCS++ is more efficient than SNM without loss of effectiveness.
Third, we tackle the problem of contradicting pairwise classifications. Usually, the transitive closure is used for pairwise classifications to obtain a transitively closed result set. However, the transitive closure disregards negative classifications. We present three new and several existing clustering algorithms and experimentally evaluate them on various datasets and under various algorithm configurations. The results show that the commonly used transitive closure is inferior to most other clustering algorithms, especially for the precision of results. In scenarios with larger clusters, our proposed EMCC algorithm is, together with Markov Clustering, the best performing clustering approach for duplicate detection, although its runtime is longer than Markov Clustering due to the subexponential time complexity. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available.
N2  - Dubletten sind mehrere Repräsentationen derselben Entität in einem Datenbestand. Diese zu identifizieren ist das Ziel der Dublettenerkennung, wobei in der Regel Paare von Datensätzen anhand von Ähnlichkeitsmaßen miteinander verglichen und unter Verwendung eines Schwellwerts als Dublette oder Nicht-Dublette klassifiziert werden. Für Dublettenerkennung existieren verschiedene Anwendungsbereiche, beispielsweise im Kundenbeziehungsmanagement, beim Onlineshopping, der Genealogie und in den Sozialwissenschaften. Der in den letzten Jahren zu beobachtende Anstieg des gespeicherten Datenvolumens erschwert die Dublettenerkennung, da die Anzahl der benötigten Vergleiche quadratisch mit der Anzahl der Datensätze wächst. Durch Verwendung eines geeigneten Paarauswahl-Algorithmus kann die Anzahl der zu vergleichenden Paare jedoch reduziert und somit die Effizienz gesteigert werden.
Die Dissertation untersucht die Auswirkungen und Möglichkeiten transitiver Beziehungen auf den Dublettenerkennungsprozess. Durch Transitivität lässt sich beispielsweise ableiten, dass aufgrund einer Klassifikation der Datensatzpaare ⟨ri,rj⟩ und ⟨rj,rk⟩ als Dublette auch die Datensätze ⟨ri,rk⟩ eine Dublette sind. Dies kann jedoch im Widerspruch zu einer paarweisen Klassifizierung stehen, denn im Unterschied zur Äquivalenz ist die Ähnlichkeit von Objekten nicht notwendigerweise transitiv.
Im ersten Teil der Dissertation wird die Auswirkung einer steigenden Datenmenge auf die Wahl des Schwellwerts zur Klassifikation von Datensatzpaaren als Dublette oder Nicht-Dublette untersucht. Die Experimente zeigen, dass unabhängig von dem gewählten Paarauswahl-Algorithmus und des gewählten Ähnlichkeitsmaßes die Wahl eines geeigneten Schwellwerts mit steigender Datensatzanzahl schwieriger wird, da die Gefahr fehlerhafter Cluster-Zuordnungen steigt. Der optimale Schwellwert eines Datensatzes variiert mit dessen Größe. So ist ein guter Schwellwert für einen kleinen Datensatz (oder eine Stichprobe) nicht notwendigerweise ein guter Schwellwert für einen größeren (ggf. vollständigen) Datensatz. Steigt die Datensatzgröße im Lauf der Zeit an, so muss ein einmal gewählter Schwellwert ggf. nachjustiert werden. Aufgrund der Transitivität ist dies insbesondere bei Datensätzen mit größeren Clustern relevant.
Der zweite Teil der Dissertation beschäftigt sich mit Algorithmen zur Auswahl geeigneter Datensatz-Paare für die Klassifikation. Basierend auf der Sorted Neighborhood Method (SNM) werden mit der Duplicate Count Strategy (DCS) und ihrer Erweiterung DCS++ zwei neue Algorithmen vorgestellt. DCS adaptiert die Fenstergröße in Abhängigkeit der Anzahl gefundener Dubletten und DCS++ verwendet zudem die transitive Abhängigkeit, um kostspielige Vergleiche einzusparen und trotzdem größere Cluster von Dubletten zu identifizieren. Weiterhin wird bewiesen, dass mit einem geeigneten Schwellwert DCS++ ohne Einbußen bei der Effektivität effizienter als die Sorted Neighborhood Method ist.
Der dritte und letzte Teil der Arbeit beschäftigt sich mit dem Problem widersprüchlicher paarweiser Klassifikationen. In vielen Anwendungsfällen wird die Transitive Hülle zur Erzeugung konsistenter Cluster verwendet, wobei hierbei paarweise Klassifikationen als Nicht-Dublette missachtet werden. Es werden drei neue und mehrere existierende Cluster-Algorithmen vorgestellt und experimentell mit verschiedenen Datensätzen und Konfigurationen evaluiert. Die Ergebnisse zeigen, dass die Transitive Hülle den meisten anderen Clustering-Algorithmen insbesondere bei der Precision, definiert als Anteil echter Dubletten an der Gesamtzahl klassifizierter Dubletten, unterlegen ist. In Anwendungsfällen mit größeren Clustern ist der vorgeschlagene EMCC-Algorithmus trotz seiner subexponentiellen Laufzeit zusammen mit dem Markov-Clustering der beste Clustering-Ansatz für die Dublettenerkennung. EMCC übertrifft Markov Clustering insbesondere hinsichtlich der Precision der Ergebnisse und hat zusätzlich den Vorteil, dass dieser auch ohne Ähnlichkeitswerte eingesetzt werden kann.
KW  - Datenqualität
KW  - Datenintegration
KW  - Dubletten
KW  - Duplikaterkennung
KW  - data quality
KW  - data integration
KW  - duplicate detection
KW  - deduplication
KW  - entity resolution
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-572140
ER  - 
TY  - THES
A1  - Doskoč, Vanja
T1  - Mapping restrictions in behaviourally correct learning
N2  - In this thesis, we investigate language learning in the formalisation of Gold [Gol67]. Here, a learner, being successively presented all information of a target language, conjectures which language it believes to be shown. Once these hypotheses converge syntactically to a correct explanation of the target language, the learning is considered successful. Fittingly, this is termed explanatory learning. To model learning strategies, we impose restrictions on the hypotheses made, for example requiring the conjectures to follow a monotonic behaviour. This way, we can study the impact a certain restriction has on learning. 

Recently, the literature shifted towards map charting. Here, various seemingly unrelated restrictions are contrasted, unveiling interesting relations between them. The results are then depicted in maps. For explanatory learning, the literature already provides maps of common restrictions for various forms of data presentation.

In the case of behaviourally correct learning, where the learners are required to converge semantically instead of syntactically, the same restrictions as in explanatory learning have been investigated. However, a similarly complete picture regarding their interaction has not been presented yet. 

In this thesis, we transfer the map charting approach to behaviourally correct learning. In particular, we complete the partial results from the literature for many well-studied restrictions and provide full maps for behaviourally correct learning with different types of data presentation. We also study properties of learners assessed important in the literature. We are interested whether learners are consistent, that is, whether their conjectures include the data they are built on. While learners cannot be assumed consistent in explanatory learning, the opposite is the case in behaviourally correct learning. Even further, it is known that learners following different restrictions may be assumed consistent. We contribute to the literature by showing that this is the case for all studied restrictions. 

We also investigate mathematically interesting properties of learners. In particular, we are interested in whether learning under a given restriction may be done with strongly Bc-locking learners. Such learners are of particular value as they allow to apply simulation arguments when, for example, comparing two learning paradigms to each other. The literature gives a rich ground on when learners may be assumed strongly Bc-locking, which we complete for all studied restrictions.
N2  - In dieser Arbeit untersuchen wir das Sprachenlernen in der Formalisierung von Gold [Gol67]. Dabei stellt ein Lerner, dem nacheinander die volle Information einer Zielsprache präsentiert wird, Vermutungen darüber auf, welche Sprache er glaubt, präsentiert zu bekommen. Sobald diese Hypothesen syntaktisch zu einer korrekten Erklärung der Zielsprache konvergieren, wird das Lernen als erfolgreich angesehen. Dies wird passenderweise als erklärendes Lernen bezeichnet. Um Lernstrategien zu modellieren, werden den aufgestellten Hypothesen Einschränkungen auferlegt, zum Beispiel, dass die Vermutungen einem monotonen Verhalten folgen müssen. Auf diese Weise können wir untersuchen, welche Auswirkungen eine bestimmte Einschränkung auf das Lernen hat. 

In letzter Zeit hat sich die Literatur in Richtung Kartographie verlagert. Hier werden verschiedene, scheinbar nicht zusammenhängende Restriktionen einander gegenübergestellt, wodurch interessante Beziehungen zwischen ihnen aufgedeckt werden. Die Ergebnisse werden dann in so genannten Karten dargestellt. Für das erklärende Lernen gibt es in der Literatur bereits Karten geläufiger Einschränkungen für verschiedene Formen der Datenpräsentation.

Im Falle des verhaltenskorrekten Lernens, bei dem die Lerner nicht syntaktisch, sondern semantisch konvergieren sollen, wurden die gleichen Einschränkungen wie beim erklärenden Lernen untersucht. Ein ähnlich vollständiges Bild hinsichtlich ihrer Interaktion wurde jedoch noch nicht präsentiert. 

In dieser Arbeit übertragen wir den Kartographie-Ansatz auf das verhaltenskorrekte Lernen. Insbesondere vervollständigen wir die Teilergebnisse aus der Literatur für viele gut untersuchte Restriktionen und liefern Karten für verhaltenskorrektes Lernen mit verschiedenen Arten der Datenpräsentation. Wir untersuchen auch Eigenschaften von Lernern, die in der Literatur als wichtig eingestuft werden. Uns interessiert, ob die Lerner konsistent sind, das heißt ob ihre Vermutungen die Daten einschließen, auf denen sie aufgebaut sind. Während man beim erklärenden Lernen nicht davon ausgehen kann, dass die Lerner konsistent sind, ist beim verhaltenskorrekten Lernen das Gegenteil der Fall. Es ist sogar bekannt, dass Lerner, die verschiedenen Einschränkungen folgen, als konsistent angenommen werden können. Wir tragen zur Literatur bei, indem wir zeigen, dass dies für alle untersuchten Restriktionen der Fall ist. 

Wir untersuchen auch mathematisch interessante Eigenschaften von Lernern. Insbesondere interessiert uns, ob das Lernen unter einer gegebenen Restriktion mit stark Bc-sperrenden Lernern durchgeführt werden kann. Solche Lerner sind von besonderem Wert, da sie es erlauben, Simulationsargumente anzuwenden, wenn man zum Beispiel zwei Lernparadigmen miteinander vergleicht. Die Literatur bietet eine reichhaltige Grundlage dafür, wann Lerner als stark Bc-sperrend angenommen werden können, die wir auf alle untersuchten Einschränkungen erweitern.
KW  - language learning in the limit
KW  - behaviourally correct learning
KW  - maps
KW  - consistent learning
KW  - strongly behaviourally correct locking
KW  - verhaltenskorrektes Lernen
KW  - konsistentes Lernen
KW  - Sprachlernen im Limes
KW  - Karten
KW  - stark verhaltenskorrekt sperrend
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-593110
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Kötzing, Timo
T1  - Multiplicative Up-Drift
JF  - Algorithmica
N2  - Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a (1+delta)-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on 1/delta} (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in delta (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large delta and show stronger results for this setting.
KW  - drift theory
KW  - evolutionary computation
KW  - stochastic process
Y1  - 2020
U6  - https://doi.org/10.1007/s00453-020-00775-7
SN  - 0178-4617
SN  - 1432-0541
VL  - 83
IS  - 10
SP  - 3017
EP  - 3058
PB  - Springer
CY  - New York
ER  -