TY  - THES
A1  - Zuo, Zhe
T1  - From unstructured to structured: Context-based named entity mining from text
T1  - Von unstrukturiert zu strukturiert: Kontextbasierte Gewinnung benannter Entitäten von Text
N2  - With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.

These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.

In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately. 

To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency. 

For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
N2  - Mit den jüngsten Fortschritten in den Gebieten der Informationsextraktion wird die automatisierte Extrahierung strukturierter Informationen aus einer unüberschaubaren Menge unstrukturierter Textdaten eine wichtige Aufgabe, deren manuelle Ausführung  unzumutbar ist. Benannte Entitäten, (z.B. Personen, Organisationen oder Orte), essentielle Bestandteile in Texten, sind normalerweise der Gegenstand strukturierter Informationen aus Textdokumenten. Daher erhält die Aufgabe der Gewinnung benannter Entitäten viel Aufmerksamkeit. Sie besteht aus drei groen Unteraufgaben, nämlich Erkennung benannter Entitäten, Verbindung benannter Entitäten und Extraktion von Beziehungen.

Diese drei Aufgaben zusammen sind der Grundprozess eines Systems zur Gewinnung benannter Entitäten, wobei jede ihre eigene Herausforderung hat und für weitere Anwendungen eingesetzt werden kann. Als ein fundamentaler Aspekt in der Verarbeitung natürlicher Sprache haben Studien zur Erkennung benannter Entitäten eine lange Geschichte, und viele bestehenden Ansätze erbringen verlässliche Ergebnisse. Die Aufgabe zielt darauf ab, Nennungen benannter Entitäten zu extrahieren und ihre Typen zu bestimmen. Verbindung benannter Entitäten hat in letzter Zeit durch die Entwicklung von Wissensdatenbanken, welche reiche Informationen über Entitäten enthalten, viel Aufmerksamkeit erhalten. Das Ziel ist es, Nennungen benannter Entitäten zu unterscheiden und diese mit dazugehörigen Einträgen in einer Wissensdatenbank zu verknüpfen. Der letzte Schritt der Gewinnung benannter Entitäten, die Extraktion von Beziehungen, ist eine stark anspruchsvolle Aufgabe, nämlich die Extraktion semantischer Beziehungen zwischen Entitäten, z.B. die Eigentümerschaft zwischen zwei Firmen.

In dieser Doktorarbeit arbeiten wir den aktuellen Stand der Wissenschaft in den Domäne der Gewinnung benannter Entitäten auf, unter anderem wertvolle Eigenschaften und Evaluationsmethoden. Darüberhinaus präsentieren wir zwei Ansätze von uns, die jeweils ihren Fokus auf die Verbindung benannter Entitäten sowie der Aufgaben der Extraktion von Beziehungen legen.

Um die Aufgabe der Verbindung benannter Entitäten zu lösen schlagen wir hier die Verbindungstechnik BEL vor, welche auf einer textuellen Bandbreite relevanter Begriffe agiert und Entscheidungen einer Kombination von einfacher Klassifizierer aggregiert. Jeder dieser Klassifizierer arbeitet auf einer zufällig ausgewählten Teilmenge der obigen Bandbreite. In umfangreichen Experimenten mit handannotierten sowie Vergleichsdatensätzen hat unser Ansatz andere Lösungen zur Verbindung benannter Entitäten, die auf dem Stand der aktuellen Technik beruhen, sowie in Bezug auf Qualität als auch Effizienz geschlagen.

Für die Aufgabe der Extraktion von Beziehungen fokussieren wir uns auf eine bestimmte Gruppe schwieriger Beziehungstypen, nämlich die Geschäftsbeziehungen zwischen Firmen. Diese Beziehungen können benutzt werden, um wertvolle Erkenntnisse in das Zusammenspiel von Firmen zu gelangen und komplexe Analysen ausführen, beispielsweise die Risikovorhersage oder Bewertung von Firmen. Unsere teilbeaufsichtigte Strategie kann Geschäftsbeziehungen zwischen Firmen anhand nur weniger nutzergegebener Startwerte von Firmenpaaren extrahieren. Dadurch bieten wir auch eine Lösung für das Problem der Richtungserkennung asymmetrischer Beziehungen, beispielsweise der Eigentumsbeziehung. Wir verbessern die Verlässlichkeit des Extraktionsprozesses, indem wir holistische Musteridentifikationsmethoden verwenden, welche die erstellten Extraktionsmuster klassifizieren. Unsere Experimente zeigen, dass wir neue Entitätenpaare akkurat und verlässlich in der Zielbeziehung mit bereits fünf bezeichneten Startpaaren extrahieren können.
KW  - named entity mining
KW  - information extraction
KW  - natural language processing
KW  - Gewinnung benannter Entitäten
KW  - Informationsextraktion
KW  - maschinelle Verarbeitung natürlicher Sprache
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-412576
ER  - 
TY  - JOUR
A1  - Ziegler, Joceline
A1  - Pfitzner, Bjarne
A1  - Schulz, Heinrich
A1  - Saalbach, Axel
A1  - Arnrich, Bert
T1  - Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data
JF  - Sensors
N2  - Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
KW  - federated learning
KW  - privacy and security
KW  - privacy attack
KW  - X-ray
Y1  - 2022
U6  - https://doi.org/10.3390/s22145195
SN  - 1424-8220
VL  - 22
PB  - MDPI
CY  - Basel, Schweiz
ET  - 14
ER  - 
TY  - GEN
A1  - Ziegler, Joceline
A1  - Pfitzner, Bjarne
A1  - Schulz, Heinrich
A1  - Saalbach, Axel
A1  - Arnrich, Bert
T1  - Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 14 
KW  - federated learning
KW  - privacy and security
KW  - privacy attack
KW  - X-ray
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-581322
IS  - 14
ER  - 
TY  - THES
A1  - Zieger, Tobias
T1  - Self-adaptive data quality
BT  - automating duplicate detection
N2  - Carrying out business processes successfully is closely linked to the quality of the data inventory in an organization. Lacks in data quality lead to problems: Incorrect address data prevents (timely) shipments to customers. Erroneous orders lead to returns and thus to unnecessary effort. Wrong pricing forces companies to miss out on revenues or to impair customer satisfaction. If orders or customer records cannot be retrieved, complaint management takes longer. Due to erroneous inventories, too few or too much supplies might be reordered.

A special problem with data quality and the reason for many of the issues mentioned above are duplicates in databases. Duplicates are different representations of same real-world objects in a dataset. However, these representations differ from each other and are for that reason hard to match by a computer. Moreover, the number of required comparisons to find those duplicates grows with the square of the dataset size. To cleanse the data, these duplicates must be detected and removed. Duplicate detection is a very laborious process. To achieve satisfactory results, appropriate software must be created and configured (similarity measures, partitioning keys, thresholds, etc.). Both requires much manual effort and experience.

This thesis addresses automation of parameter selection for duplicate detection and presents several novel approaches that eliminate the need for human experience in parts of the duplicate detection process.

A pre-processing step is introduced that analyzes the datasets in question and classifies their attributes semantically. Not only do these annotations help understanding the respective datasets, but they also facilitate subsequent steps, for example, by selecting appropriate similarity measures or normalizing the data upfront. This approach works without schema information.

Following that, we show a partitioning technique that strongly reduces the number of pair comparisons for the duplicate detection process. The approach automatically finds particularly suitable partitioning keys that simultaneously allow for effective and efficient duplicate retrieval. By means of a user study, we demonstrate that this technique finds partitioning keys that outperform expert suggestions and additionally does not need manual configuration. Furthermore, this approach can be applied independently of the attribute types.

To measure the success of a duplicate detection process and to execute the described partitioning approach, a gold standard is required that provides information about the actual duplicates in a training dataset. This thesis presents a technique that uses existing duplicate detection results and crowdsourcing to create a near gold standard that can be used for the purposes above. Another part of the thesis describes and evaluates strategies how to reduce these crowdsourcing costs and to achieve a consensus with less effort.
N2  - Die erfolgreiche Ausführung von Geschäftsprozessen ist eng an die Datenqualität der Datenbestände in einer Organisation geknüpft. Bestehen Mängel in der Datenqualität, kann es zu Problemen kommen: Unkorrekte Adressdaten verhindern, dass Kunden (rechtzeitig) beliefert werden. Fehlerhafte Bestellungen führen zu Reklamationen und somit zu unnötigem Aufwand. Falsche Preisauszeichnungen zwingen Unternehmen, auf Einnahmen zu verzichten oder gefährden die Kundenzufriedenheit. Können Bestellungen oder Kundendaten nicht gefunden werden, verlängert sich die Abarbeitung von Beschwerden. Durch fehlerhafte Inventarisierung wird zu wenig oder zu viel Nachschub bestellt.

Ein spezielles Datenqualitätsproblem und der Grund für viele der genannten Datenqualitätsprobleme sind Duplikate in Datenbanken. Duplikate sind verschiedene Repräsentationen derselben Realweltobjekte im Datenbestand. Allerdings unterscheiden sich diese Repräsentationen voneinander und sind so für den Computer nur schwer als zusammengehörig zu erkennen. Außerdem wächst die Anzahl der zur Aufdeckung der Duplikate benötigten Vergleiche quadratisch mit der Datensatzgröße. Zum Zwecke der Datenreinigung müssen diese Duplikate erkannt und beseitigt werden. Diese Duplikaterkennung ist ein sehr aufwändiger Prozess. Um gute Ergebnisse zu erzielen, ist die Erstellung von entsprechender Software und das Konfigurieren vieler Parameter (Ähnlichkeitsmaße, Partitionierungsschlüssel, Schwellwerte usw.) nötig. Beides erfordert viel manuellen Aufwand und Erfahrung.

Diese Dissertation befasst sich mit dem Automatisieren der Parameterwahl für die Duplikaterkennung und stellt verschiedene neuartige Verfahren vor, durch die Teile des Duplikaterkennungsprozesses ohne menschliche Erfahrung gestaltet werden können.

Es wird ein Vorverarbeitungsschritt vorgestellt, der die betreffenden Datensätze analysiert und deren Attribute automatisch semantisch klassifiziert. Durch diese Annotationen wird nicht nur das Verständnis des Datensatzes verbessert, sondern es werden darüber hinaus die folgenden Schritte erleichtert, zum Beispiel können so geeignete Ähnlichkeitsmaße ausgewählt oder die Daten normalisiert werden. Dabei kommt der Ansatz ohne Schemainformationen aus.

Anschließend wird ein Partitionierungsverfahren gezeigt, das die Anzahl der für die Duplikaterkennung benötigten Vergleiche stark reduziert. Das Verfahren findet automatisch besonders geeignete Partitionierungsschlüssel, die eine gleichzeitig effektive und effiziente Duplikatsuche ermöglichen. Anhand einer Nutzerstudie wird gezeigt, dass die so gefundenen Partitionierungsschlüssel Expertenvorschlägen überlegen sind und zudem keine menschliche Konfiguration benötigen. Außerdem lässt sich das Verfahren unabhängig von den Attributtypen anwenden.

Zum Messen des Erfolges eines Duplikaterkennungsverfahrens und für das zuvor beschriebene Partitionierungsverfahren ist ein Goldstandard nötig, der Auskunft über die zu findenden Duplikate gibt. Die Dissertation stellt ein Verfahren vor, das anhand mehrerer vorhandener Duplikaterkennungsergebnisse und dem Einsatz von Crowdsourcing einen Nahezu-Goldstandard erzeugt, der für die beschriebenen Zwecke eingesetzt werden kann. Ein weiterer Teil der Arbeit beschreibt und evaluiert Strategien, wie die Kosten dieses Crowdsourcingeinsatzes reduziert werden können und mit geringerem Aufwand ein Konsens erreicht wird.
KW  - data quality
KW  - Datenqualität
KW  - Duplikaterkennung
KW  - duplicate detection
KW  - Machine Learning
KW  - Information Retrieval
KW  - Automatisierung
KW  - automation
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-410573
ER  - 
TY  - GEN
A1  - Zhou, Lin
A1  - Fischer, Eric
A1  - Tunca, Can
A1  - Brahms, Clemens Markus
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - How We Found Our IMU
BT  - Guidelines to IMU Selection and a Comparison of Seven IMUs for Pervasive Healthcare Applications
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Inertial measurement units (IMUs) are commonly used for localization or movement tracking in pervasive healthcare-related studies, and gait analysis is one of the most often studied topics using IMUs. The increasing variety of commercially available IMU devices offers convenience by combining the sensor modalities and simplifies the data collection procedures. However, selecting the most suitable IMU device for a certain use case is increasingly challenging. In this study, guidelines for IMU selection are proposed. In particular, seven IMUs were compared in terms of their specifications, data collection procedures, and raw data quality. Data collected from the IMUs were then analyzed by a gait analysis algorithm. The difference in accuracy of the calculated gait parameters between the IMUs could be used to retrace the issues in raw data, such as acceleration range or sensor calibration. Based on our algorithm, we were able to identify the best-suited IMUs for our needs. This study provides an overview of how to select the IMUs based on the area of study with concrete examples, and gives insights into the features of seven commercial IMUs using real data.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 2 
KW  - inertial measurement unit
KW  - pervasive healthcare
KW  - gait analysis
KW  - comparison of devices
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-481628
IS  - 2
ER  - 
TY  - JOUR
A1  - Zhou, Lin
A1  - Fischer, Eric
A1  - Tunca, Can
A1  - Brahms, Clemens Markus
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - How We Found Our IMU
BT  - Guidelines to IMU Selection and a Comparison of Seven IMUs for Pervasive Healthcare Applications
JF  - Sensors
N2  - Inertial measurement units (IMUs) are commonly used for localization or movement tracking in pervasive healthcare-related studies, and gait analysis is one of the most often studied topics using IMUs. The increasing variety of commercially available IMU devices offers convenience by combining the sensor modalities and simplifies the data collection procedures. However, selecting the most suitable IMU device for a certain use case is increasingly challenging. In this study, guidelines for IMU selection are proposed. In particular, seven IMUs were compared in terms of their specifications, data collection procedures, and raw data quality. Data collected from the IMUs were then analyzed by a gait analysis algorithm. The difference in accuracy of the calculated gait parameters between the IMUs could be used to retrace the issues in raw data, such as acceleration range or sensor calibration. Based on our algorithm, we were able to identify the best-suited IMUs for our needs. This study provides an overview of how to select the IMUs based on the area of study with concrete examples, and gives insights into the features of seven commercial IMUs using real data.
KW  - inertial measurement unit
KW  - pervasive healthcare
KW  - gait analysis
KW  - comparison of devices
Y1  - 2020
U6  - https://doi.org/10.3390/s20154090
SN  - 1424-8220
VL  - 20
IS  - 15
PB  - MDPI
CY  - Basel
ER  - 
TY  - BOOK
A1  - Zhang, Shuhao
A1  - Plauth, Max
A1  - Eberhardt, Felix
A1  - Polze, Andreas
A1  - Lehmann, Jens
A1  - Sejdiu, Gezim
A1  - Jabeen, Hajira
A1  - Servadei, Lorenzo
A1  - Möstl, Christian
A1  - Bär, Florian
A1  - Netzeband, André
A1  - Schmidt, Rainer
A1  - Knigge, Marlene
A1  - Hecht, Sonja
A1  - Prifti, Loina
A1  - Krcmar, Helmut
A1  - Sapegin, Andrey
A1  - Jaeger, David
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Friedrich, Tobias
A1  - Rothenberger, Ralf
A1  - Sutton, Andrew M.
A1  - Sidorova, Julia A.
A1  - Lundberg, Lars
A1  - Rosander, Oliver
A1  - Sköld, Lars
A1  - Di Varano, Igor
A1  - van der Walt, Estée
A1  - Eloff, Jan H. P.
A1  - Fabian, Benjamin
A1  - Baumann, Annika
A1  - Ermakova, Tatiana
A1  - Kelkel, Stefan
A1  - Choudhary, Yash
A1  - Cooray, Thilini
A1  - Rodríguez, Jorge
A1  - Medina-Pérez, Miguel Angel
A1  - Trejo, Luis A.
A1  - Barrera-Animas, Ari Yair
A1  - Monroy-Borja, Raúl
A1  - López-Cuevas, Armando
A1  - Ramírez-Márquez, José Emmanuel
A1  - Grohmann, Maria
A1  - Niederleithinger, Ernst
A1  - Podapati, Sasidhar
A1  - Schmidt, Christopher
A1  - Huegle, Johannes
A1  - de Oliveira, Roberto C. L.
A1  - Soares, Fábio Mendes
A1  - van Hoorn, André
A1  - Neumer, Tamas
A1  - Willnecker, Felix
A1  - Wilhelm, Mathias
A1  - Kuster, Bernhard
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2017
T1  - HPI Future SOC Lab – Proceedings 2017
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2017. Selected projects have presented their results on April 25th and November 15th 2017 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2017 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 25. April und 15. November 2017 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 130 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - In-Memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - Künstliche Intelligenz
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-433100
SN  - 978-3-86956-475-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 130
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Zenner, Alexander M.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
T1  - StudyMe
BT  - a new mobile app for user-centric N-of-1 trials
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 18 
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589763
IS  - 18
ER  - 
TY  - JOUR
A1  - Zenner, Alexander M.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
T1  - StudyMe
BT  - a new mobile app for user-centric N-of-1 trials
JF  - Trials
N2  - N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
Y1  - 2022
U6  - https://doi.org/10.1186/s13063-022-06893-7
SN  - 1745-6215
VL  - 23
PB  - BioMed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Yousfi, Alaaeddine
A1  - Hewelt, Marcin
A1  - Bauer, Christine
A1  - Weske, Mathias
T1  - Toward uBPMN-Based patterns for modeling ubiquitous business processes
JF  - IEEE Transactions on Industrial Informatics
N2  - Ubiquitous business processes are the new generation of processes that pervade the physical space and interact with their environments using a minimum of human involvement. Although they are now widely deployed in the industry, their deployment is still ad hoc . They are implemented after an arbitrary modeling phase or no modeling phase at all. The absence of a solid modeling phase backing up the implementation generates many loopholes that are stressed in the literature. Here, we tackle the issue of modeling ubiquitous business processes. We propose patterns to represent the recent ubiquitous computing features. These patterns are the outcome of an analysis we conducted in the field of human-computer interaction to examine how the features are actually deployed. The patterns' understandability, ease-of-use, usefulness, and completeness are examined via a user experiment. The results indicate that these four indexes are on the positive track. Hence, the patterns may be the backbone of ubiquitous business process modeling in industrial applications.
KW  - Ubiquitous business process
KW  - ubiquitous business process model and notation (uBPMN)
KW  - ubiquitous business process modeling
KW  - ubiquitous computing (ubicomp)
Y1  - 2017
U6  - https://doi.org/10.1109/TII.2017.2777847
SN  - 1551-3203
SN  - 1941-0050
VL  - 14
IS  - 8
SP  - 3358
EP  - 3367
PB  - Inst. of Electr. and Electronics Engineers
CY  - Piscataway
ER  - 
TY  - JOUR
A1  - Yousfi, Alaaeddine
A1  - Batoulis, Kimon
A1  - Weske, Mathias
T1  - Achieving Business Process Improvement via Ubiquitous Decision-Aware Business Processes
JF  - ACM Transactions on Internet Technology
N2  - Business process improvement is an endless challenge for many organizations. As long as there is a process, it must he improved. Nowadays, improvement initiatives are driven by professionals. This is no longer practical because people cannot perceive the enormous data of current business environments. Here, we introduce ubiquitous decision-aware business processes. They pervade the physical space, analyze the ever-changing environments, and make decisions accordingly. We explain how they can be built and used for improvement. Our approach can be a valuable improvement option to alleviate the workload of participants by helping focus on the crucial rather than the menial tasks.
KW  - Business process improvement
KW  - ubiquitous decision-aware business process
KW  - ubiquitous decisions
KW  - context
KW  - uBPMN
KW  - DMN
Y1  - 2019
U6  - https://doi.org/10.1145/3298986
SN  - 1533-5399
SN  - 1557-6051
VL  - 19
IS  - 1
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - THES
A1  - Yang, Haojin
T1  - Deep representation learning for multimedia data analysis
Y1  - 2019
ER  - 
TY  - JOUR
A1  - Wuttke, Matthias
A1  - Li, Yong
A1  - Li, Man
A1  - Sieber, Karsten B.
A1  - Feitosa, Mary F.
A1  - Gorski, Mathias
A1  - Tin, Adrienne
A1  - Wang, Lihua
A1  - Chu, Audrey Y.
A1  - Hoppmann, Anselm
A1  - Kirsten, Holger
A1  - Giri, Ayush
A1  - Chai, Jin-Fang
A1  - Sveinbjornsson, Gardar
A1  - Tayo, Bamidele O.
A1  - Nutile, Teresa
A1  - Fuchsberger, Christian
A1  - Marten, Jonathan
A1  - Cocca, Massimiliano
A1  - Ghasemi, Sahar
A1  - Xu, Yizhe
A1  - Horn, Katrin
A1  - Noce, Damia
A1  - Van der Most, Peter J.
A1  - Sedaghat, Sanaz
A1  - Yu, Zhi
A1  - Akiyama, Masato
A1  - Afaq, Saima
A1  - Ahluwalia, Tarunveer Singh
A1  - Almgren, Peter
A1  - Amin, Najaf
A1  - Arnlov, Johan
A1  - Bakker, Stephan J. L.
A1  - Bansal, Nisha
A1  - Baptista, Daniela
A1  - Bergmann, Sven
A1  - Biggs, Mary L.
A1  - Biino, Ginevra
A1  - Boehnke, Michael
A1  - Boerwinkle, Eric
A1  - Boissel, Mathilde
A1  - Böttinger, Erwin
A1  - Boutin, Thibaud S.
A1  - Brenner, Hermann
A1  - Brumat, Marco
A1  - Burkhardt, Ralph
A1  - Butterworth, Adam S.
A1  - Campana, Eric
A1  - Campbell, Archie
A1  - Campbell, Harry
A1  - Canouil, Mickael
A1  - Carroll, Robert J.
A1  - Catamo, Eulalia
A1  - Chambers, John C.
A1  - Chee, Miao-Ling
A1  - Chee, Miao-Li
A1  - Chen, Xu
A1  - Cheng, Ching-Yu
A1  - Cheng, Yurong
A1  - Christensen, Kaare
A1  - Cifkova, Renata
A1  - Ciullo, Marina
A1  - Concas, Maria Pina
A1  - Cook, James P.
A1  - Coresh, Josef
A1  - Corre, Tanguy
A1  - Sala, Cinzia Felicita
A1  - Cusi, Daniele
A1  - Danesh, John
A1  - Daw, E. Warwick
A1  - De Borst, Martin H.
A1  - De Grandi, Alessandro
A1  - De Mutsert, Renee
A1  - De Vries, Aiko P. J.
A1  - Degenhardt, Frauke
A1  - Delgado, Graciela
A1  - Demirkan, Ayse
A1  - Di Angelantonio, Emanuele
A1  - Dittrich, Katalin
A1  - Divers, Jasmin
A1  - Dorajoo, Rajkumar
A1  - Eckardt, Kai-Uwe
A1  - Ehret, Georg
A1  - Elliott, Paul
A1  - Endlich, Karlhans
A1  - Evans, Michele K.
A1  - Felix, Janine F.
A1  - Foo, Valencia Hui Xian
A1  - Franco, Oscar H.
A1  - Franke, Andre
A1  - Freedman, Barry I.
A1  - Freitag-Wolf, Sandra
A1  - Friedlander, Yechiel
A1  - Froguel, Philippe
A1  - Gansevoort, Ron T.
A1  - Gao, He
A1  - Gasparini, Paolo
A1  - Gaziano, J. Michael
A1  - Giedraitis, Vilmantas
A1  - Gieger, Christian
A1  - Girotto, Giorgia
A1  - Giulianini, Franco
A1  - Gogele, Martin
A1  - Gordon, Scott D.
A1  - Gudbjartsson, Daniel F.
A1  - Gudnason, Vilmundur
A1  - Haller, Toomas
A1  - Hamet, Pavel
A1  - Harris, Tamara B.
A1  - Hartman, Catharina A.
A1  - Hayward, Caroline
A1  - Hellwege, Jacklyn N.
A1  - Heng, Chew-Kiat
A1  - Hicks, Andrew A.
A1  - Hofer, Edith
A1  - Huang, Wei
A1  - Hutri-Kahonen, Nina
A1  - Hwang, Shih-Jen
A1  - Ikram, M. Arfan
A1  - Indridason, Olafur S.
A1  - Ingelsson, Erik
A1  - Ising, Marcus
A1  - Jaddoe, Vincent W. V.
A1  - Jakobsdottir, Johanna
A1  - Jonas, Jost B.
A1  - Joshi, Peter K.
A1  - Josyula, Navya Shilpa
A1  - Jung, Bettina
A1  - Kahonen, Mika
A1  - Kamatani, Yoichiro
A1  - Kammerer, Candace M.
A1  - Kanai, Masahiro
A1  - Kastarinen, Mika
A1  - Kerr, Shona M.
A1  - Khor, Chiea-Chuen
A1  - Kiess, Wieland
A1  - Kleber, Marcus E.
A1  - Koenig, Wolfgang
A1  - Kooner, Jaspal S.
A1  - Korner, Antje
A1  - Kovacs, Peter
A1  - Kraja, Aldi T.
A1  - Krajcoviechova, Alena
A1  - Kramer, Holly
A1  - Kramer, Bernhard K.
A1  - Kronenberg, Florian
A1  - Kubo, Michiaki
A1  - Kuhnel, Brigitte
A1  - Kuokkanen, Mikko
A1  - Kuusisto, Johanna
A1  - La Bianca, Martina
A1  - Laakso, Markku
A1  - Lange, Leslie A.
A1  - Langefeld, Carl D.
A1  - Lee, Jeannette Jen-Mai
A1  - Lehne, Benjamin
A1  - Lehtimaki, Terho
A1  - Lieb, Wolfgang
A1  - Lim, Su-Chi
A1  - Lind, Lars
A1  - Lindgren, Cecilia M.
A1  - Liu, Jun
A1  - Liu, Jianjun
A1  - Loeffler, Markus
A1  - Loos, Ruth J. F.
A1  - Lucae, Susanne
A1  - Lukas, Mary Ann
A1  - Lyytikainen, Leo-Pekka
A1  - Magi, Reedik
A1  - Magnusson, Patrik K. E.
A1  - Mahajan, Anubha
A1  - Martin, Nicholas G.
A1  - Martins, Jade
A1  - Marz, Winfried
A1  - Mascalzoni, Deborah
A1  - Matsuda, Koichi
A1  - Meisinger, Christa
A1  - Meitinger, Thomas
A1  - Melander, Olle
A1  - Metspalu, Andres
A1  - Mikaelsdottir, Evgenia K.
A1  - Milaneschi, Yuri
A1  - Miliku, Kozeta
A1  - Mishra, Pashupati P.
A1  - Program, V. A. Million Veteran
A1  - Mohlke, Karen L.
A1  - Mononen, Nina
A1  - Montgomery, Grant W.
A1  - Mook-Kanamori, Dennis O.
A1  - Mychaleckyj, Josyf C.
A1  - Nadkarni, Girish N.
A1  - Nalls, Mike A.
A1  - Nauck, Matthias
A1  - Nikus, Kjell
A1  - Ning, Boting
A1  - Nolte, Ilja M.
A1  - Noordam, Raymond
A1  - Olafsson, Isleifur
A1  - Oldehinkel, Albertine J.
A1  - Orho-Melander, Marju
A1  - Ouwehand, Willem H.
A1  - Padmanabhan, Sandosh
A1  - Palmer, Nicholette D.
A1  - Palsson, Runolfur
A1  - Penninx, Brenda W. J. H.
A1  - Perls, Thomas
A1  - Perola, Markus
A1  - Pirastu, Mario
A1  - Pirastu, Nicola
A1  - Pistis, Giorgio
A1  - Podgornaia, Anna I.
A1  - Polasek, Ozren
A1  - Ponte, Belen
A1  - Porteous, David J.
A1  - Poulain, Tanja
A1  - Pramstaller, Peter P.
A1  - Preuss, Michael H.
A1  - Prins, Bram P.
A1  - Province, Michael A.
A1  - Rabelink, Ton J.
A1  - Raffield, Laura M.
A1  - Raitakari, Olli T.
A1  - Reilly, Dermot F.
A1  - Rettig, Rainer
A1  - Rheinberger, Myriam
A1  - Rice, Kenneth M.
A1  - Ridker, Paul M.
A1  - Rivadeneira, Fernando
A1  - Rizzi, Federica
A1  - Roberts, David J.
A1  - Robino, Antonietta
A1  - Rossing, Peter
A1  - Rudan, Igor
A1  - Rueedi, Rico
A1  - Ruggiero, Daniela
A1  - Ryan, Kathleen A.
A1  - Saba, Yasaman
A1  - Sabanayagam, Charumathi
A1  - Salomaa, Veikko
A1  - Salvi, Erika
A1  - Saum, Kai-Uwe
A1  - Schmidt, Helena
A1  - Schmidt, Reinhold
A1  - Ben Schottker, 
A1  - Schulz, Christina-Alexandra
A1  - Schupf, Nicole
A1  - Shaffer, Christian M.
A1  - Shi, Yuan
A1  - Smith, Albert V.
A1  - Smith, Blair H.
A1  - Soranzo, Nicole
A1  - Spracklen, Cassandra N.
A1  - Strauch, Konstantin
A1  - Stringham, Heather M.
A1  - Stumvoll, Michael
A1  - Svensson, Per O.
A1  - Szymczak, Silke
A1  - Tai, E-Shyong
A1  - Tajuddin, Salman M.
A1  - Tan, Nicholas Y. Q.
A1  - Taylor, Kent D.
A1  - Teren, Andrej
A1  - Tham, Yih-Chung
A1  - Thiery, Joachim
A1  - Thio, Chris H. L.
A1  - Thomsen, Hauke
A1  - Thorleifsson, Gudmar
A1  - Toniolo, Daniela
A1  - Tonjes, Anke
A1  - Tremblay, Johanne
A1  - Tzoulaki, Ioanna
A1  - Uitterlinden, Andre G.
A1  - Vaccargiu, Simona
A1  - Van Dam, Rob M.
A1  - Van der Harst, Pim
A1  - Van Duijn, Cornelia M.
A1  - Edward, Digna R. Velez
A1  - Verweij, Niek
A1  - Vogelezang, Suzanne
A1  - Volker, Uwe
A1  - Vollenweider, Peter
A1  - Waeber, Gerard
A1  - Waldenberger, Melanie
A1  - Wallentin, Lars
A1  - Wang, Ya Xing
A1  - Wang, Chaolong
A1  - Waterworth, Dawn M.
A1  - Bin Wei, Wen
A1  - White, Harvey
A1  - Whitfield, John B.
A1  - Wild, Sarah H.
A1  - Wilson, James F.
A1  - Wojczynski, Mary K.
A1  - Wong, Charlene
A1  - Wong, Tien-Yin
A1  - Xu, Liang
A1  - Yang, Qiong
A1  - Yasuda, Masayuki
A1  - Yerges-Armstrong, Laura M.
A1  - Zhang, Weihua
A1  - Zonderman, Alan B.
A1  - Rotter, Jerome I.
A1  - Bochud, Murielle
A1  - Psaty, Bruce M.
A1  - Vitart, Veronique
A1  - Wilson, James G.
A1  - Dehghan, Abbas
A1  - Parsa, Afshin
A1  - Chasman, Daniel I.
A1  - Ho, Kevin
A1  - Morris, Andrew P.
A1  - Devuyst, Olivier
A1  - Akilesh, Shreeram
A1  - Pendergrass, Sarah A.
A1  - Sim, Xueling
A1  - Boger, Carsten A.
A1  - Okada, Yukinori
A1  - Edwards, Todd L.
A1  - Snieder, Harold
A1  - Stefansson, Kari
A1  - Hung, Adriana M.
A1  - Heid, Iris M.
A1  - Scholz, Markus
A1  - Teumer, Alexander
A1  - Kottgen, Anna
A1  - Pattaro, Cristian
T1  - A catalog of genetic loci associated with kidney function from analyses of a million individuals
JF  - Nature genetics
N2  - Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through transancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these,147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research.
Y1  - 2019
U6  - https://doi.org/10.1038/s41588-019-0407-x
SN  - 1061-4036
SN  - 1546-1718
VL  - 51
IS  - 6
SP  - 957
EP  - +
PB  - Nature Publ. Group
CY  - New York
ER  - 
TY  - THES
A1  - Wolf, Johannes
T1  - Analysis and visualization of transport infrastructure based on large-scale geospatial mobile mapping data
T1  - Analyse und Visualisierung von Verkehrsinfrastruktur basierend auf großen Mobile-Mapping-Datensätzen
N2  - 3D point clouds are a universal and discrete digital representation of three-dimensional objects and environments. For geospatial applications, 3D point clouds have become a fundamental type of raw data acquired and generated using various methods and techniques. In particular, 3D point clouds serve as raw data for creating digital twins of the built environment.

This thesis concentrates on the research and development of concepts, methods, and techniques for preprocessing, semantically enriching, analyzing, and visualizing 3D point clouds for applications around transport infrastructure. It introduces a collection of preprocessing techniques that aim to harmonize raw 3D point cloud data, such as point density reduction and scan profile detection. Metrics such as, e.g., local density, verticality, and planarity are calculated for later use. One of the key contributions tackles the problem of analyzing and deriving semantic information in 3D point clouds. Three different approaches are investigated: a geometric analysis, a machine learning approach operating on synthetically generated 2D images, and a machine learning approach operating on 3D point clouds without intermediate representation.

In the first application case, 2D image classification is applied and evaluated for mobile mapping data focusing on road networks to derive road marking vector data. The second application case investigates how 3D point clouds can be merged with ground-penetrating radar data for a combined visualization and to automatically identify atypical areas in the data. For example, the approach detects pavement regions with developing potholes. The third application case explores the combination of a 3D environment based on 3D point clouds with panoramic imagery to improve visual representation and the detection of 3D objects such as traffic signs.

The presented methods were implemented and tested based on software frameworks for 3D point clouds and 3D visualization. In particular, modules for metric computation, classification procedures, and visualization techniques were integrated into a modular pipeline-based C++ research framework for geospatial data processing, extended by Python machine learning scripts. All visualization and analysis techniques scale to large real-world datasets such as road networks of entire cities or railroad networks.

The thesis shows that some use cases allow taking advantage of established image vision methods to analyze images rendered from mobile mapping data efficiently. The two presented semantic classification methods working directly on 3D point clouds are use case independent and show similar overall accuracy when compared to each other. While the geometry-based method requires less computation time, the machine learning-based method supports arbitrary semantic classes but requires training the network with ground truth data. Both methods can be used in combination to gradually build this ground truth with manual corrections via a respective annotation tool.

This thesis contributes results for IT system engineering of applications, systems, and services that require spatial digital twins of transport infrastructure such as road networks and railroad networks based on 3D point clouds as raw data. It demonstrates the feasibility of fully automated data flows that map captured 3D point clouds to semantically classified models. This provides a key component for seamlessly integrated spatial digital twins in IT solutions that require up-to-date, object-based, and semantically enriched information about the built environment.
N2  - 3D-Punktwolken sind eine universelle und diskrete digitale Darstellung von dreidimensionalen Objekten und Umgebungen. Für raumbezogene Anwendungen sind 3D-Punktwolken zu einer grundlegenden Form von Rohdaten geworden, die mit verschiedenen Methoden und Techniken erfasst und erzeugt werden. Insbesondere dienen 3D-Punktwolken als Rohdaten für die Erstellung digitaler Zwillinge der bebauten Umwelt.

Diese Arbeit konzentriert sich auf die Erforschung und Entwicklung von Konzepten, Methoden und Techniken zur Vorverarbeitung, semantischen Anreicherung, Analyse und Visualisierung von 3D-Punktwolken für Anwendungen im Bereich der Verkehrsinfrastruktur. Es wird eine Sammlung von Vorverarbeitungstechniken vorgestellt, die auf die Harmonisierung von 3D-Punktwolken-Rohdaten abzielen, so z.B. die Reduzierung der Punktdichte und die Erkennung von Scanprofilen. Metriken wie bspw. die lokale Dichte, Vertikalität und Planarität werden zur späteren Verwendung berechnet. Einer der Hauptbeiträge befasst sich mit dem Problem der Analyse und Ableitung semantischer Informationen in 3D-Punktwolken. Es werden drei verschiedene Ansätze untersucht: Eine geometrische Analyse sowie zwei maschinelle Lernansätze, die auf synthetisch erzeugten 2D-Bildern, bzw. auf 3D-Punktwolken ohne Zwischenrepräsentation arbeiten.

Im ersten Anwendungsfall wird die 2D-Bildklassifikation für Mobile-Mapping-Daten mit Fokus auf Straßennetze angewendet und evaluiert, um Vektordaten für Straßenmarkierungen abzuleiten. Im zweiten Anwendungsfall wird untersucht, wie 3D-Punktwolken mit Bodenradardaten für eine kombinierte Visualisierung und automatische Identifikation atypischer Bereiche in den Daten zusammengeführt werden können. Der Ansatz erkennt zum Beispiel Fahrbahnbereiche mit entstehenden Schlaglöchern. Der dritte Anwendungsfall untersucht die Kombination einer 3D-Umgebung auf Basis von 3D-Punktwolken mit Panoramabildern, um die visuelle Darstellung und die Erkennung von 3D-Objekten wie Verkehrszeichen zu verbessern.

Die vorgestellten Methoden wurden auf Basis von Software-Frameworks für 3D-Punktwolken und 3D-Visualisierung implementiert und getestet. Insbesondere wurden Module für Metrikberechnungen, Klassifikationsverfahren und Visualisierungstechniken in ein modulares, pipelinebasiertes C++-Forschungsframework für die Geodatenverarbeitung integriert, das durch Python-Skripte für maschinelles Lernen erweitert wurde. Alle Visualisierungs- und Analysetechniken skalieren auf große reale Datensätze wie Straßennetze ganzer Städte oder Eisenbahnnetze.

Die Arbeit zeigt, dass es in einigen Anwendungsfällen möglich ist, die Vorteile etablierter Bildverarbeitungsmethoden zu nutzen, um aus Mobile-Mapping-Daten gerenderte Bilder effizient zu analysieren. Die beiden vorgestellten semantischen Klassifikationsverfahren, die direkt auf 3D-Punktwolken arbeiten, sind anwendungsfallunabhängig und zeigen im Vergleich zueinander eine ähnliche Gesamtgenauigkeit. Während die geometriebasierte Methode weniger Rechenzeit benötigt, unterstützt die auf maschinellem Lernen basierende Methode beliebige semantische Klassen, erfordert aber das Trainieren des Netzwerks mit Ground-Truth-Daten. Beide Methoden können in Kombination verwendet werden, um diese Ground Truth mit manuellen Korrekturen über ein entsprechendes Annotationstool schrittweise aufzubauen.

Diese Arbeit liefert Ergebnisse für das IT-System-Engineering von Anwendungen, Systemen und Diensten, die räumliche digitale Zwillinge von Verkehrsinfrastruktur wie Straßen- und Schienennetzen auf der Basis von 3D-Punktwolken als Rohdaten benötigen. Sie demonstriert die Machbarkeit von vollautomatisierten Datenflüssen, die erfasste 3D-Punktwolken auf semantisch klassifizierte Modelle abbilden. Dies stellt eine Schlüsselkomponente für nahtlos integrierte räumliche digitale Zwillinge in IT-Lösungen dar, die aktuelle, objektbasierte und semantisch angereicherte Informationen über die bebaute Umwelt benötigen.
KW  - 3D point cloud
KW  - geospatial data
KW  - mobile mapping
KW  - semantic classification
KW  - 3D visualization
KW  - 3D-Punktwolke
KW  - räumliche Geodaten
KW  - Mobile Mapping
KW  - semantische Klassifizierung
KW  - 3D-Visualisierung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-536129
ER  - 
TY  - JOUR
A1  - Wittig, Alice
A1  - Miranda, Fabio Malcher
A1  - Hölzer, Martin
A1  - Altenburg, Tom
A1  - Bartoszewicz, Jakub Maciej
A1  - Beyvers, Sebastian
A1  - Dieckmann, Marius Alfred
A1  - Genske, Ulrich
A1  - Giese, Sven Hans-Joachim
A1  - Nowicka, Melania
A1  - Richard, Hugues
A1  - Schiebenhoefer, Henning
A1  - Schmachtenberg, Anna-Juliane
A1  - Sieben, Paul
A1  - Tang, Ming
A1  - Tembrockhaus, Julius
A1  - Renard, Bernhard Y.
A1  - Fuchs, Stephan
T1  - CovRadar
BT  - continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance
JF  - Bioinformatics
N2  - The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac411
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - 17
SP  - 4223
EP  - 4225
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - GEN
A1  - Welearegai, Gebrehiwet B.
A1  - Schlueter, Max
A1  - Hammer, Christian
T1  - Static security evaluation of an industrial web application
T2  - Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
N2  - JavaScript is the most popular programming language for web applications. Static analysis of JavaScript applications is highly challenging due to its dynamic language constructs and event-driven asynchronous executions, which also give rise to many security-related bugs. Several static analysis tools to detect such bugs exist, however, research has not yet reported much on the precision and scalability trade-off of these analyzers. As a further obstacle, JavaScript programs structured in Node. js modules need to be collected for analysis, but existing bundlers are either specific to their respective analysis tools or not particularly suitable for static analysis.
KW  - JavaScript
KW  - WALA
KW  - SAFE
KW  - comparison
Y1  - 2019
SN  - 978-1-4503-5933-7
U6  - https://doi.org/10.1145/3297280.3297471
SP  - 1952
EP  - 1961
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - BOOK
A1  - Weber, Benedikt
T1  - Human pose estimation for decubitus prophylaxis
T1  - Verwendung von Posenabschätzung zur Dekubitusprophylaxe
N2  - Decubitus is one of the most relevant diseases in nursing and the most expensive to treat. It is caused by sustained pressure on tissue, so it particularly affects bed-bound patients. This work lays a foundation for pressure mattress-based decubitus prophylaxis by implementing a solution to the single-frame 2D Human Pose Estimation problem.
For this, methods of Deep Learning are employed. Two approaches are examined, a coarse-to-fine Convolutional Neural Network for direct regression of joint coordinates and a U-Net for the derivation of probability distribution heatmaps.

We conclude that training our models on a combined dataset of the publicly available Bodies at Rest and SLP data yields the best results. Furthermore, various preprocessing techniques are investigated, and a hyperparameter optimization is performed to discover an improved model architecture.
Another finding indicates that the heatmap-based approach outperforms direct regression.
This model achieves a mean per-joint position error of 9.11 cm for the Bodies at Rest data and 7.43 cm for the SLP data.
We find that it generalizes well on data from mattresses other than those seen during training but has difficulties detecting the arms correctly.

Additionally, we give a brief overview of the medical data annotation tool annoto we developed in the bachelor project and furthermore conclude that the Scrum framework and agile practices enhanced our development workflow.
N2  - Dekubitus ist eine der relevantesten Krankheiten in der Krankenpflege und die kostspieligste in der Behandlung. Sie wird durch anhaltenden Druck auf Gewebe verursacht, betrifft also insbesondere bettlägerige Patienten. Diese Arbeit legt eine Grundlage für druckmatratzenbasierte Dekubitusprophylaxe, indem eine Lösung für das Einzelbild-2D-Posenabschätzungsproblem implementiert wird.
Dafür werden Methoden des tiefen Lernens verwendet. Zwei Ansätze, basierend auf einem Gefalteten Neuronalen grob-zu-fein Netzwerk zur direkten Regression der Gelenkkoordinaten und auf einem U-Netzwerk zur Ableitung von Wahrscheinlichkeitsverteilungsbildern, werden untersucht.

Wir schlussfolgern, dass das Training unserer Modelle auf einem kombinierten Datensatz, bestehend aus den frei verfügbaren Bodies at Rest und SLP Daten, die besten Ergebnisse liefert. Weiterhin werden diverse Vorverarbeitungsverfahren untersucht und eine Hyperparameteroptimierung zum Finden einer verbesserten Modellarchitektur durchgeführt.
Der wahrscheinlichkeitsverteilungsbasierte Ansatz übertrifft die direkte Regression.
Dieses Modell erreicht einen durchschnittlichen Pro-Gelenk-Positionsfehler von 9,11 cm auf den Bodies at Rest und von 7,43 cm auf den SLP Daten. Wir sehen, dass es gut auf Daten anderer als der im Training verwendeten Matratzen funktioniert, aber Schwierigkeiten mit der korrekten Erkennung der Arme hat. 

Weiterhin geben wir eine kurze Übersicht des medizinischen Datenannotationstools annoto, welches wir im Zusammenhang mit dem Bachelorprojekt entwickelt haben, und schlussfolgern außerdem, dass Scrum und agile Praktiken unseren Entwicklungsprozess verbessert haben.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 153 
KW  - machine learning
KW  - deep learning
KW  - convolutional neural networks
KW  - pose estimation
KW  - decubitus
KW  - telemedicine
KW  - maschinelles Lernen
KW  - tiefes Lernen
KW  - gefaltete neuronale Netze
KW  - Posenabschätzung
KW  - Dekubitus
KW  - Telemedizin
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-567196
SN  - 978-3-86956-551-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 153
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Vollmer, Jan Ole
A1  - Trapp, Matthias
A1  - Schumann, Heidrun
A1  - Döllner, Jürgen Roland Friedrich
T1  - Hierarchical spatial aggregation for level-of-detail visualization of 3D thematic data
JF  - ACM transactions on spatial algorithms and systems
N2  - Thematic maps are a common tool to visualize semantic data with a spatial reference. Combining thematic data with a geometric representation of their natural reference frame aids the viewer’s ability in gaining an overview, as well as perceiving patterns with respect to location; however, as the amount of data for visualization continues to increase, problems such as information overload and visual clutter impede perception, requiring data aggregation and level-of-detail visualization techniques. While existing aggregation techniques for thematic data operate in a 2D reference frame (i.e., map), we present two aggregation techniques for 3D spatial and spatiotemporal data mapped onto virtual city models that hierarchically aggregate thematic data in real time during rendering to support on-the-fly and on-demand level-of-detail generation. An object-based technique performs aggregation based on scene-specific objects and their hierarchy to facilitate per-object analysis, while the scene-based technique aggregates data solely based on spatial locations, thus supporting visual analysis of data with arbitrary reference geometry. Both techniques can apply different aggregation functions (mean, minimum, and maximum) for ordinal, interval, and ratio-scaled data and can be easily extended with additional functions. Our implementation utilizes the programmable graphics pipeline and requires suitably encoded data, i.e., textures or vertex attributes. We demonstrate the application of both techniques using real-world datasets, including solar potential analyses and the propagation of pressure waves in a virtual city model.
KW  - Level-of-detail visualization
KW  - spatial aggregation
KW  - real-time rendering
Y1  - 2018
U6  - https://doi.org/10.1145/3234506
SN  - 2374-0353
SN  - 2374-0361
VL  - 4
IS  - 3
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - THES
A1  - Vogel, Thomas
T1  - Model-driven engineering of self-adaptive software
T1  - Modellgetriebene Entwicklung von Selbst-Adaptiver Software
N2  - The development of self-adaptive software requires the engineering of an adaptation engine that controls the underlying adaptable software by a feedback loop. State-of-the-art approaches prescribe the feedback loop in terms of numbers, how the activities (e.g., monitor, analyze, plan, and execute (MAPE)) and the knowledge are structured to a feedback loop, and the type of knowledge. Moreover, the feedback loop is usually hidden in the implementation or framework and therefore not visible in the architectural design. Additionally, an adaptation engine often employs runtime models that either represent the adaptable software or capture strategic knowledge such as reconfiguration strategies. State-of-the-art approaches do not systematically address the interplay of such runtime models, which would otherwise allow developers to freely design the entire feedback loop.

This thesis presents ExecUtable RuntimE MegAmodels (EUREMA), an integrated model-driven engineering (MDE) solution that rigorously uses models for engineering feedback loops. EUREMA provides a domain-specific modeling language to specify and an interpreter to execute feedback loops. The language allows developers to freely design a feedback loop concerning the activities and runtime models (knowledge) as well as the number of feedback loops. It further supports structuring the feedback loops in the adaptation engine that follows a layered architectural style. Thus, EUREMA makes the feedback loops explicit in the design and enables developers to reason about design decisions. 

To address the interplay of runtime models, we propose the concept of a runtime megamodel, which is a runtime model that contains other runtime models as well as activities (e.g., MAPE) working on the contained models. This concept is the underlying principle of EUREMA. The resulting EUREMA (mega)models are kept alive at runtime and they are directly executed by the EUREMA interpreter to run the feedback loops. Interpretation provides the flexibility to dynamically adapt a feedback loop. In this context, EUREMA supports engineering self-adaptive software in which feedback loops run independently or in a coordinated fashion within the same layer as well as on top of each other in different layers of the adaptation engine. Moreover, we consider preliminary means to evolve self-adaptive software by providing a maintenance interface to the adaptation engine.

This thesis discusses in detail EUREMA by applying it to different scenarios such as single, multiple, and stacked feedback loops for self-repairing and self-optimizing the mRUBiS application. Moreover, it investigates the design and expressiveness of EUREMA, reports on experiments with a running system (mRUBiS) and with alternative solutions, and assesses EUREMA with respect to quality attributes such as performance and scalability.

The conducted evaluation provides evidence that EUREMA as an integrated and open MDE approach for engineering self-adaptive software seamlessly integrates the development and runtime environments using the same formalism to specify and execute feedback loops, supports the dynamic adaptation of feedback loops in layered architectures, and achieves an efficient execution of feedback loops by leveraging incrementality.
N2  - Die Entwicklung von selbst-adaptiven Softwaresystemen erfordert die Konstruktion einer geschlossenen Feedback Loop, die das System zur Laufzeit beobachtet und falls nötig anpasst. Aktuelle Konstruktionsverfahren schreiben eine bestimmte Feedback Loop im Hinblick auf Anzahl und Struktur vor. Die Struktur umfasst die vorhandenen Aktivitäten der Feedback Loop (z. B. Beobachtung, Analyse, Planung und Ausführung einer Adaption) und die Art des hierzu verwendeten Systemwissens. Dieses System- und zusätzlich das strategische Wissen (z. B. Adaptionsregeln) werden in der Regel in Laufzeitmodellen erfasst und in die Feedback Loop integriert. Aktuelle Verfahren berücksichtigen jedoch nicht systematisch die Laufzeitmodelle und deren Zusammenspiel, so dass Entwickler die Feedback Loop nicht frei entwerfen und gestalten können. Folglich wird die Feedback Loop während des Entwurfs der Softwarearchitektur häufig nicht explizit berücksichtigt. 

Diese Dissertation stellt mit EUREMA ein neues Konstruktionsverfahren für Feedback Loops vor. Basierend auf Prinzipien der modellgetriebenen Entwicklung (MDE) setzt EUREMA auf die konsequente Nutzung von Modellen für die Konstruktion, Ausführung und Adaption von selbst-adaptiven Softwaresystemen. Hierzu wird eine domänenspezifische Modellierungssprache (DSL) vorgestellt, mit der Entwickler die Feedback Loop frei entwerfen und gestalten können, d. h. ohne Einschränkung bezüglich der Aktivitäten, Laufzeitmodelle und Anzahl der Feedback Loops. Zusätzlich bietet die DSL eine Architektursicht auf das System, die die Feedback Loops berücksichtigt. Daher stellt die DSL Konstrukte zur Verfügung, mit denen Entwickler während des Entwurfs der Architektur die Feedback Loops explizit definieren und berücksichtigen können.

Um das Zusammenspiel der Laufzeitmodelle zu erfassen, wird das Konzept eines sogenannten Laufzeitmegamodells vorgeschlagen, das alle Aktivitäten und Laufzeitmodelle einer Feedback Loop erfasst. Dieses Konzept dient als Grundlage der vorgestellten DSL. Die bei der Konstruktion und mit der DSL erzeugten (Mega-)Modelle werden zur Laufzeit bewahrt und von einem Interpreter ausgeführt, um das spezifizierte Adaptionsverhalten zu realisieren. Der Interpreteransatz bietet die notwendige Flexibilität, um das Adaptionsverhalten zur Laufzeit anzupassen. Dies ermöglicht über die Entwicklung von Systemen mit mehreren Feedback Loops auf einer Ebene hinaus das Schichten von Feedback Loops im Sinne einer adaptiven Regelung. Zusätzlich bietet EUREMA eine Schnittstelle für Wartungsprozesse an, um das Adaptionsverhalten im laufendem System anzupassen.

Die Dissertation diskutiert den EUREMA-Ansatz und wendet diesen auf verschiedene Problemstellungen an, u. a. auf einzelne, mehrere und koordinierte als auch geschichtete Feedback Loops. Als Anwendungsbeispiel dient die Selbstheilung und Selbstoptimierung des Online-Marktplatzes mRUBiS. Für die Evaluierung von EUREMA werden Experimente mit dem laufenden mRUBiS und mit alternativen Lösungen durchgeführt, das Design und die Ausdrucksmächtigkeit der DSL untersucht und Qualitätsmerkmale wie Performanz und Skalierbarkeit betrachtet. Die Ergebnisse der Evaluierung legen nahe, dass EUREMA als integrierter und offener Ansatz für die Entwicklung selbst-adaptiver Softwaresysteme folgende Beiträge zum Stand der Technik leistet: eine nahtlose Integration der Entwicklungs- und Laufzeitumgebung durch die konsequente Verwendung von Modellen, die dynamische Anpassung des Adaptionsverhaltens in einer Schichtenarchitektur und eine effiziente Ausführung von Feedback Loops durch inkrementelle Verarbeitungsschritte.
KW  - model-driven engineering
KW  - self-adaptive software
KW  - domain-specific modeling
KW  - runtime models
KW  - software evolution
KW  - modellgetriebene Entwicklung
KW  - Selbst-Adaptive Software
KW  - Domänenspezifische Modellierung
KW  - Laufzeitmodelle
KW  - Software-Evolution
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-409755
ER  - 
TY  - THES
A1  - Vitagliano, Gerardo
T1  - Modeling the structure of tabular files for data preparation
T1  - Modellierung der Struktur von Tabellarische Dateien für die Datenaufbereitung
N2  - To manage tabular data files and leverage their content in a given downstream task, practitioners often design and execute complex transformation pipelines to prepare them. The complexity of such pipelines stems from different factors, including the nature of the preparation tasks, often exploratory or ad-hoc to specific datasets; the large repertory of tools, algorithms, and frameworks that practitioners need to master; and the volume, variety, and velocity of the files to be prepared. Metadata plays a fundamental role in reducing this complexity: characterizing a file assists end users in the design of data preprocessing pipelines, and furthermore paves the way for suggestion, automation, and optimization of data preparation tasks.
Previous research in the areas of data profiling, data integration, and data cleaning, has focused on extracting and characterizing metadata regarding the content of tabular data files, i.e., about the records and attributes of tables. Content metadata are useful for the latter stages of a preprocessing pipeline, e.g., error correction, duplicate detection, or value normalization, but they require a properly formed tabular input. Therefore, these metadata are not relevant for the early stages of a preparation pipeline, i.e., to correctly parse tables out of files. In this dissertation, we turn our focus to what we call the structure of a tabular data file, i.e., the set of characters within a file that do not represent data values but are required to parse and understand the content of the file. We provide three different approaches to represent file structure, an explicit representation based on context-free grammars; an implicit representation based on file-wise similarity; and a learned representation based on machine learning.
In our first contribution, we use the grammar-based representation to characterize a set of over 3000 real-world csv files and identify multiple structural issues that let files deviate from the csv standard, e.g., by having inconsistent delimiters or containing multiple tables. We leverage our learnings about real-world files and propose Pollock, a benchmark to test how well systems parse csv files that have a non-standard structure, without any previous preparation. We report on our experiments on using Pollock to evaluate the performance of 16 real-world data management systems.
Following, we characterize the structure of files implicitly, by defining a measure of structural similarity for file pairs. We design a novel algorithm to compute this measure, which is based on a graph representation of the files' content. We leverage this algorithm and propose Mondrian, a graphical system to assist users in identifying layout templates in a dataset, classes of files that have the same structure, and therefore can be prepared by applying the same preparation pipeline.
Finally, we introduce MaGRiTTE, a novel architecture that uses self-supervised learning to automatically learn structural representations of files in the form of vectorial embeddings at three different levels: cell level, row level, and file level. We experiment with the application of structural embeddings for several tasks, namely dialect detection, row classification, and data preparation efforts estimation.
Our experimental results show that structural metadata, either identified explicitly on parsing grammars, derived implicitly as file-wise similarity, or learned with the help of machine learning architectures, is fundamental to automate several tasks, to scale up preparation to large quantities of files, and to provide repeatable preparation pipelines.
N2  - Anwender müssen häufig komplexe Pipelines zur Aufbereitung von tabellarischen Dateien entwerfen, um diese verwalten und ihre Inhalte für nachgelagerte Aufgaben nutzen zu können. Die Komplexität solcher Pipelines ergibt sich aus verschiedenen Faktoren, u.a. (i) aus der Art der Aufbereitungsaufgaben, die oft explorativ oder ad hoc für bestimmte Datensätze durchgeführt werden, (ii) aus dem großen Repertoire an Werkzeugen, Algorithmen und Frameworks, die von den Anwendern beherrscht werden müssen, sowie (iii) aus der Menge, der Größe und der Verschiedenartigkeit der aufzubereitenden Dateien. Metadaten spielen eine grundlegende Rolle bei der Verringerung dieser Komplexität: Die Charakterisierung einer Datei hilft den Nutzern bei der Gestaltung von Datenaufbereitungs-Pipelines und ebnet darüber hinaus den Weg für Vorschläge, Automatisierung und Optimierung von Datenaufbereitungsaufgaben. Bisherige Forschungsarbeiten in den Bereichen Data Profiling, Datenintegration und Datenbereinigung konzentrierten sich auf die Extraktion und Charakterisierung von Metadaten über die Inhalte der tabellarischen Dateien, d.h. über die Datensätze und Attribute von Tabellen. Inhalts-basierte Metadaten sind für die letzten Phasen einer Aufbereitungspipeline nützlich, z.B. für die Fehlerkorrektur, die Erkennung von Duplikaten oder die Normalisierung von Werten, aber sie erfordern eine korrekt geformte tabellarische Eingabe. Daher sind diese Metadaten für die frühen Phasen einer Aufbereitungspipeline, d.h. für das korrekte Parsen von Tabellen aus Dateien, nicht relevant. In dieser Dissertation konzentrieren wir uns die Struktur einer tabellarischen Datei nennen, d.h. die Menge der Zeichen in einer Datei, die keine Datenwerte darstellen, aber erforderlich sind, um den Inhalt der Datei zu analysieren und zu verstehen. Wir stellen drei verschiedene Ansätze zur Darstellung der Dateistruktur vor: eine explizite Darstellung auf der Grundlage kontextfreier Grammatiken, eine implizite Darstellung auf der Grundlage von Dateiähnlichkeiten und eine erlernte Darstellung auf der Grundlage von maschinellem Lernen. In unserem ersten Ansatz verwenden wir die grammatikbasierte Darstellung, um eine Menge von über 3000 realen CSV-Dateien zu charakterisieren und mehrere strukturelle Probleme zu identifizieren, die dazu führen, dass Dateien vom CSV-Standard abweichen, z.B. durch inkonsistente Begrenzungszeichen oder dem Enthalten mehrere Tabellen in einer einzelnen Datei. Wir nutzen unsere Erkenntnisse aus realen Dateien und schlagen Pollock vor, einen Benchmark, der testet, wie gut Systeme unaufbereitete CSV-Dateien parsen. Wir berichten über unsere Experimente zur Verwendung von Pollock, in denen wir die Leistung von 16 realen Datenverwaltungssystemen bewerten. Anschließend charakterisieren wir die Struktur von Dateien implizit, indem wir ein Maß für die strukturelle Ähnlichkeit von Dateipaaren definieren. Wir entwickeln einen neuartigen Algorithmus zur Berechnung dieses Maßes, der auf einer Graphen-basierten Darstellung des Dateiinhalts basiert. Wir nutzen diesen Algorithmus und schlagen Mondrian vor, ein grafisches System zur Unterstützung der Benutzer bei der Identifizierung von Layout Vorlagen in einem Datensatz, d.h. von Dateiklassen, die die gleiche Struktur aufweisen und daher mit der gleichen Pipeline aufbereitet werden können. Schließlich stellen wir MaGRiTTE vor, eine neuartige Architektur, die selbst- überwachtes Lernen verwendet, um automatisch strukturelle Darstellungen von Dateien in Form von vektoriellen Einbettungen auf drei verschiedenen Ebenen zu lernen: auf Zellebene, auf Zeilenebene und auf Dateiebene. Wir experimentieren mit der Anwendung von strukturellen Einbettungen für verschiedene Aufgaben, nämlich Dialekterkennung, Zeilenklassifizierung und der Schätzung des Aufwands für die Datenaufbereitung. Unsere experimentellen Ergebnisse zeigen, dass strukturelle Metadaten, die entweder explizit mit Hilfe von Parsing-Grammatiken identifiziert, implizit als Dateiähnlichkeit abgeleitet oder mit Machine-Learning Architekturen erlernt werden, von grundlegender Bedeutung für die Automatisierung verschiedener Aufgaben, die Skalierung der Aufbereitung auf große Mengen von Dateien und die Bereitstellung wiederholbarer Aufbereitungspipelines sind.
KW  - data preparation
KW  - file structure
KW  - Datenaufbereitung
KW  - tabellarische Dateien
KW  - Dateistruktur
KW  - tabular data
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-624351
ER  - 
TY  - JOUR
A1  - Van Hout, Cristopher V.
A1  - Tachmazidou, Ioanna
A1  - Backman, Joshua D.
A1  - Hoffman, Joshua D.
A1  - Liu, Daren
A1  - Pandey, Ashutosh K.
A1  - Gonzaga-Jauregui, Claudia
A1  - Khalid, Shareef
A1  - Ye, Bin
A1  - Banerjee, Nilanjana
A1  - Li, Alexander H.
A1  - O'Dushlaine, Colm
A1  - Marcketta, Anthony
A1  - Staples, Jeffrey
A1  - Schurmann, Claudia
A1  - Hawes, Alicia
A1  - Maxwell, Evan
A1  - Barnard, Leland
A1  - Lopez, Alexander
A1  - Penn, John
A1  - Habegger, Lukas
A1  - Blumenfeld, Andrew L.
A1  - Bai, Xiaodong
A1  - O'Keeffe, Sean
A1  - Yadav, Ashish
A1  - Praveen, Kavita
A1  - Jones, Marcus
A1  - Salerno, William J.
A1  - Chung, Wendy K.
A1  - Surakka, Ida
A1  - Willer, Cristen J.
A1  - Hveem, Kristian
A1  - Leader, Joseph B.
A1  - Carey, David J.
A1  - Ledbetter, David H.
A1  - Cardon, Lon
A1  - Yancopoulos, George D.
A1  - Economides, Aris
A1  - Coppola, Giovanni
A1  - Shuldiner, Alan R.
A1  - Balasubramanian, Suganthi
A1  - Cantor, Michael
A1  - Nelson, Matthew R.
A1  - Whittaker, John
A1  - Reid, Jeffrey G.
A1  - Marchini, Jonathan
A1  - Overton, John D.
A1  - Scott, Robert A.
A1  - Abecasis, Goncalo R.
A1  - Yerges-Armstrong, Laura M.
A1  - Baras, Aris
T1  - Exome sequencing and characterization of 49,960 individuals in the UK Biobank
JF  - Nature : the international weekly journal of science
N2  - The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world(1). Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, includingPIEZO1on varicose veins,COL6A1on corneal resistance,MEPEon bone density, andIQGAP2andGMPRon blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenicBRCA1andBRCA2variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community. <br /> Exome sequences from the first 49,960 participants in the UK Biobank highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
KW  - clinical exome
KW  - breast-cancer
KW  - mutations
KW  - recommendations
KW  - gene
KW  - metaanalysis
KW  - variants,
KW  - BRCA1
KW  - risk
KW  - susceptibility
Y1  - 2020
U6  - https://doi.org/10.1038/s41586-020-2853-0
SN  - 0028-0836
SN  - 1476-4687
VL  - 586
IS  - 7831
SP  - 749
EP  - 756
PB  - Macmillan Publishers Limited
CY  - London
ER  - 
TY  - BOOK
A1  - van der Walt, Estee
A1  - Odun-Ayo, Isaac
A1  - Bastian, Matthias
A1  - Eldin Elsaid, Mohamed Esam
T1  - Proceedings of the Fifth HPI Cloud Symposium "Operating the Cloud“ 2017
N2  - Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic Operating the Cloud. Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI’s Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. Operating the Cloud aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration.

In these proceedings, the results of the fifth HPI cloud symposium Operating the Cloud 2017 are published. We thank the authors for exciting presentations and insights into their current work and research. Moreover, we look forward to more interesting submissions for the upcoming symposium in 2018.
N2  - Jedes Jahr lädt das Hasso-Plattner-Institut (HPI) Gäste aus der Industrie und der Wissenschaft zu einem kooperativen und wissenschaftlichen Symposium zum Thema Cloud Computing ein. Unser Ziel ist es, ein Forum für den Austausch von Wissen und Erfahrungen zwischen der Industrie und der Wissenschaft zu bieten. Parallel zur Veranstaltung findet der HPI Future SOC Lab Tag statt, der eine zusätzliche attraktive Umgebung für wissenschaftliche und branchenbezogene Diskussionen bietet. Das Symposium zielt darauf ab, eine Plattform für produktive Interaktionen von innovativen Ideen, Visionen und aufkommenden Technologien im Bereich von Cloud Computing zu bitten. 

Anlässlich dieses Symposiums fordern wir die Einreichung von Forschungsarbeiten und Erfahrungsberichte. Dieser technische Bericht umfasst eine Zusammenstellung der im Rahmen des fünften HPI Cloud Symposiums "Operating the Cloud" 2017 angenommenen Forschungspapiere. Wir danken den Autoren für spannende Vorträge und Einblicke in ihre aktuelle Arbeit und Forschung. Darüber hinaus freuen wir uns auf weitere interessante Einreichungen für das kommende Symposium im Laufe des Jahres.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 122 
KW  - Sicherheit
KW  - verteilte Leistungsüberwachung
KW  - Identitätsmanagement
KW  - Leistungsmodelle von virtuellen Maschinen
KW  - Privatsphäre
KW  - security
KW  - distributed performance monitoring
KW  - identity management
KW  - performance models of virtual machines
KW  - privacy
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-411330
SN  - 978-3-86956-432-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 122
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - van der Aa, Han
A1  - Leopold, Henrik
A1  - Weidlich, Matthias
T1  - Partial order resolution of event logs for process conformance checking
JF  - Decision support systems : DSS
N2  - While supporting the execution of business processes, information systems record event logs. Conformance checking relies on these logs to analyze whether the recorded behavior of a process conforms to the behavior of a normative specification. A key assumption of existing conformance checking techniques, however, is that all events are associated with timestamps that allow to infer a total order of events per process instance. Unfortunately, this assumption is often violated in practice. Due to synchronization issues, manual event recordings, or data corruption, events are only partially ordered. In this paper, we put forward the problem of partial order resolution of event logs to close this gap. It refers to the construction of a probability distribution over all possible total orders of events of an instance. To cope with the order uncertainty in real-world data, we present several estimators for this task, incorporating different notions of behavioral abstraction. Moreover, to reduce the runtime of conformance checking based on partial order resolution, we introduce an approximation method that comes with a bounded error in terms of accuracy. Our experiments with real-world and synthetic data reveal that our approach improves accuracy over the state-of-the-art considerably.
KW  - process mining
KW  - conformance checking
KW  - partial order resolution
KW  - data
KW  - uncertainty
Y1  - 2020
U6  - https://doi.org/10.1016/j.dss.2020.113347
SN  - 0167-9236
SN  - 1873-5797
VL  - 136
PB  - Elsevier
CY  - Amsterdam [u.a.]
ER  - 
TY  - THES
A1  - Ussath, Martin Georg
T1  - Analytical approaches for advanced attacks
Y1  - 2017
ER  - 
TY  - JOUR
A1  - Ulrich, Jens-Uwe
A1  - Lutfi, Ahmad
A1  - Rutzen, Kilian
A1  - Renard, Bernhard Y.
T1  - ReadBouncer
BT  - precise and scalable adaptive sampling for nanopore sequencing
JF  - Bioinformatics
N2  - Motivation: 
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. 

Results: 
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac223
SN  - 1367-4803
SN  - 1460-2059
VL  - 38
IS  - SUPPL 1
SP  - 153
EP  - 160
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - GEN
A1  - Ullrich, Andre
A1  - Enke, Judith
A1  - Teichmann, Malte
A1  - Kress, Antonio
A1  - Gronau, Norbert
T1  - Audit - and then what?
BT  - a roadmap for digitization of learning factories
T2  - Procedia Manufacturing
N2  - Current trends such as digital transformation, Internet of Things, or Industry 4.0 are challenging the majority of learning factories. Regardless of whether a conventional learning factory, a model factory, or a digital learning factory, traditional approaches such as the monotonous execution of specific instructions don‘t suffice the learner’s needs, market requirements as well as especially current technological developments. Contemporary teaching environments need a clear strategy, a road to follow for being able to successfully cope with the changes and develop towards digitized learning factories. This demand driven necessity of transformation leads to another obstacle: Assessing the status quo and developing and implementing adequate action plans. Within this paper, details of a maturity-based audit of the hybrid learning factory in the Research and Application Centre Industry 4.0 and a thereof derived roadmap for the digitization of a learning factory are presented.
KW  - Audit
KW  - Digitization
KW  - Learning Factory
KW  - Roadmap
Y1  - 2019
U6  - https://doi.org/10.1016/j.promfg.2019.03.025
SN  - 2351-9789
VL  - 31
SP  - 162
EP  - 168
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Trilla, Irene
A1  - Drimalla, Hanna
A1  - Bajbouj, Malek
A1  - Dziobek, Isabel
T1  - The influence of reward on facial mimicry
BT  - no evidence for a significant effect of oxytocin
JF  - Frontiers in behavioral neuroscience
N2  - Recent findings suggest a role of oxytocin on the tendency to spontaneously mimic the emotional facial expressions of others. Oxytocin-related increases of facial mimicry, however, seem to be dependent on contextual factors. Given previous literature showing that people preferentially mimic emotional expressions of individuals associated with high (vs. low) rewards, we examined whether the reward value of the mimicked agent is one factor influencing the oxytocin effects on facial mimicry. To test this hypothesis, 60 male adults received 24 IU of either intranasal oxytocin or placebo in a double-blind, between-subject experiment. Next, the value of male neutral faces was manipulated using an associative learning task with monetary rewards. After the reward associations were learned, participants watched videos of the same faces displaying happy and angry expressions. Facial reactions to the emotional expressions were measured with electromyography. We found that participants judged as more pleasant the face identities associated with high reward values than with low reward values. However, happy expressions by low rewarding faces were more spontaneously mimicked than high rewarding faces. Contrary to our expectations, we did not find a significant direct effect of intranasal oxytocin on facial mimicry, nor on the reward-driven modulation of mimicry. Our results support the notion that mimicry is a complex process that depends on contextual factors, but failed to provide conclusive evidence of a role of oxytocin on the modulation of facial mimicry.
KW  - oxytocin
KW  - facial mimicry
KW  - reward
KW  - EMG
KW  - social modulation
KW  - null results
Y1  - 2020
U6  - https://doi.org/10.3389/fnbeh.2020.00088
SN  - 1662-5153
VL  - 14
PB  - Frontiers Media
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Trautmann, Justin
A1  - Zhou, Lin
A1  - Brahms, Clemens Markus
A1  - Tunca, Can
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - TRIPOD
BT  - A treadmill walking dataset with IMU, pressure-distribution  and photoelectric data for gait analysis
JF  - Data : open access ʻData in scienceʼ journal
N2  - Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data.
KW  - inertial measurement unit
KW  - gait analysis algorithm
KW  - OptoGait
KW  - Zebris
KW  - data pipeline
KW  - public dataset
Y1  - 2021
U6  - https://doi.org/10.3390/data6090095
SN  - 2306-5729
VL  - 6
IS  - 9
PB  - MDPI
CY  - Basel
ER  - 
TY  - GEN
A1  - Trautmann, Justin
A1  - Zhou, Lin
A1  - Brahms, Clemens Markus
A1  - Tunca, Can
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - TRIPOD - A Treadmill Walking Dataset with IMU, Pressure-distribution  and Photoelectric Data for Gait Analysis
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Inertial measurement units (IMUs) enable easy to operate and low-cost data recording for gait analysis. When combined with treadmill walking, a large number of steps can be collected in a controlled environment without the need of a dedicated gait analysis laboratory. In order to evaluate existing and novel IMU-based gait analysis algorithms for treadmill walking, a reference dataset that includes IMU data as well as reliable ground truth measurements for multiple participants and walking speeds is needed. This article provides a reference dataset consisting of 15 healthy young adults who walked on a treadmill at three different speeds. Data were acquired using seven IMUs placed on the lower body, two different reference systems (Zebris FDMT-HQ and OptoGait), and two RGB cameras. Additionally, in order to validate an existing IMU-based gait analysis algorithm using the dataset, an adaptable modular data analysis pipeline was built. Our results show agreement between the pressure-sensitive Zebris and the photoelectric OptoGait system (r = 0.99), demonstrating the quality of our reference data. As a use case, the performance of an algorithm originally designed for overground walking was tested on treadmill data using the data pipeline. The accuracy of stride length and stride time estimations was comparable to that reported in other studies with overground data, indicating that the algorithm is equally applicable to treadmill data. The Python source code of the data pipeline is publicly available, and the dataset will be provided by the authors upon request, enabling future evaluations of IMU gait analysis algorithms without the need of recording new data.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 6 
KW  - inertial measurement unit
KW  - gait analysis algorithm
KW  - OptoGait
KW  - Zebris
KW  - data pipeline
KW  - public dataset
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-522027
IS  - 6
ER  - 
TY  - THES
A1  - Traifeh, Hanadi
T1  - Design Thinking in the Arab world
T1  - Design Thinking in der Arabischen Welt
BT  - perspectives, challenges and opportunities
BT  - Perspektiven, Herausforderungen und Potentiale
N2  - Design Thinking is a human-centered approach to innovation that has become increasingly popular globally over the last decade. While the spread of Design Thinking is well understood and documented in the Western cultural contexts, particularly in Europe and the US due to the popularity of the Stanford-Potsdam Design Thinking education model, this is not the case when it comes to non-Western cultural contexts. This thesis fills a gap identified in the literature regarding how Design Thinking emerged, was perceived, adopted, and practiced in the Arab world. The culture in that part of the world differs from that of the Western context, which impacts the mindset of people and how they interact with Design Thinking tools and methods. 

A mixed-methods research approach was followed in which both quantitative and qualitative methods were employed. First, two methods were used in the quantitative phase: a social media analysis using Twitter as a source of data, and an online questionnaire. The results and analysis of the quantitative data informed the design of the qualitative phase in which two methods were employed: ten semi-structured interviews, and participant observation of seven Design Thinking training events. 

According to the analyzed data, the Arab world appears to have had an early, though relatively weak, and slow, adoption of Design Thinking since 2006. Increasing adoption, however, has been witnessed over the last decade, especially in Saudi Arabia, the United Arab Emirates and Egypt. The results also show that despite its limited spread, Design Thinking has been practiced the most in education, information technology and communication, administrative services, and the non-profit sectors. The way it is being practiced, though, is not fully aligned with how it is being practiced and taught in the US and Europe, as most people in the region do not necessarily believe in all mindset attributes introduced by the Stanford-Potsdam tradition.

Practitioners in the Arab world also seem to shy away from the 'wild side' of Design Thinking in particular, and do not fully appreciate the connection between art-design, and science-engineering. This questions the role of the educational institutions in the region since -according to the findings- they appear to be leading the movement in promoting and developing Design Thinking in the Arab world. Nonetheless, it is notable that people seem to be aware of the positive impact of applying Design Thinking in the region, and its potential to bring meaningful transformation. However, they also seem to be concerned about the current cultural, social, political, and economic challenges that may challenge this transformation. Therefore, they call for more awareness and demand to create Arabic, culturally appropriate programs to respond to the local needs. On another note, the lack of Arabic content and local case studies on Design Thinking were identified by several interviewees and were also confirmed by the participant observation as major challenges that are slowing down the spread of Design Thinking or sometimes hampering capacity building in the region. Other challenges that were revealed by the study are: changing the mindset of people, the lack of dedicated Design Thinking spaces, and the need for clear instructions on how to apply Design Thinking methods and activities. The concept of time and how Arabs deal with it, gender management during trainings, and hierarchy and power dynamics among training participants are also among the identified challenges. Another key finding revealed by the study is the confirmation of التفكير التصميمي as the Arabic term to be most widely adopted in the region to refer to Design Thinking, since four other Arabic terms were found to be associated with Design Thinking.

Based on the findings of the study, the thesis concludes by presenting a list of recommendations on how to overcome the mentioned challenges and what factors should be considered when designing and implementing culturally-customized Design Thinking training in the Arab region.
N2  - Design Thinking ist ein nutzerzentrierter Innovationsansatz, der in den letzten zehn Jahren weltweit an Bekanntheit gewonnen hat. Während die Verbreitung von Design Thinking im westlichen Kulturkreis – insbesondere in Europa und den USA – aufgrund der Bedeutung des Stanford-Potsdam Design Thinking-Ausbildungsmodells gut verstanden und dokumentiert ist, ist dies nicht der Fall, wenn es sich um nicht-westliche Kulturkreise handelt. Diese Arbeit schließt eine Lücke in der Literatur darüber, wie Design Thinking in der arabischen Welt entstanden ist, wahrgenommen, angenommen und praktiziert wurde. Die vorhandenen kulturellen Unterschiede zwischen der westlichen und der arabischen Welt wirken sich auch auf  die Denkweise der Menschen aus, unddarauf, wie sie mit Design Thinking-Tools und -Methoden umgehen. 
Es wurde ein ‚Mixed Methods‘-Forschungsansatz verfolgt, d.h. sowohl quantitative als auch qualitative Methoden wurden eingesetzt. In der quantitativen Phase kamen zunächst zwei Methoden zum Einsatz: eine Social-Media-Analyse mit Twitter als Datenquelle und ein Online-Fragebogen. Die Ergebnisse und die Analyse der quantitativen Daten bildeten die Grundlage für die Gestaltung der qualitativen Phase, in der zwei Methoden angewendet wurden: zehn halbstrukturierte Interviews und die teilnehmende Beobachtung von sieben Design Thinking-Trainings. 
Den analysierten Daten zufolge scheint es in der arabischen Welt seit 2006 eine frühe, wenn auch relativ schwache und langsame Einführung von Design Thinking gegeben zu haben. In den letzten zehn Jahren ist jedoch eine zunehmende Akzeptanz zu beobachten, insbesondere in Saudi-Arabien, den Vereinigten Arabischen Emiraten und Ägypten. Die Ergebnisse zeigen auch, dass Design Thinking trotz seiner begrenzten Verbreitung am häufigsten im Bildungswesen, in der Informationstechnologie und Kommunikation, in der Verwaltung und im Non-Profit-Sektor angewandt wird. Die Art und Weise, wie Design Thinking praktiziert wird, stimmt jedoch nicht vollständig mit der Art und Weise überein, wie es in den USA und Europa praktiziert und gelehrt wird, da die meisten Menschen in der Region nicht unbedingt an alle Denkattribute glauben, die im Stanford-Potsdam-Modell eingeführt wurden.
Die Praktiker in der arabischen Welt scheinen auch vor der "wilden Seite" des Design Thinking zurückzuschrecken und die Verbindung zwischen Kunst und Design auf der einen sowie Wissenschaft und Technik auf der anderen Seite nicht vollumfänglich zu schätzen. Dies wirft die Frage nach der Rolle von Bildungseinrichtungen in der Region auf, da sie - den Ergebnissen zufolge - die Bewegung zur Förderung und Entwicklung von Design Thinking in der arabischen Welt anzuführen scheinen. Nichtsdestotrotz ist es bemerkenswert, dass sich die Menschen der positiven Auswirkungen der Anwendung von Design Thinking in der Region und seines Potenzials, sinnvolle Veränderungen zu bewirken, bewusst zu sein scheinen. Sie scheinen jedoch auch besorgt zu sein über die aktuellen kulturellen, sozialen, politischen und wirtschaftlichen Herausforderungen, die diese Transformation in Frage stellen könnten. Daher fordern sie eine stärkere Sensibilisierung und die Schaffung von arabischen, kulturell angemessenen Programmen, um auf die lokalen Bedürfnisse einzugehen. Auch das Fehlen arabischer Inhalte und lokaler Fallstudien zu Design Thinking wurde von mehreren Befragten genannt und durch die teilnehmende Beobachtung bestätigt, da dies die Verbreitung von Design Thinking verlangsamt oder den Aufbau von Kapazitäten in der Region behindert. Weitere Herausforderungen, die sich aus der Studie ergaben, sind: die Veränderung des Mindsets der Menschen, das Fehlen spezieller Design-Thinking-Räume und der Bedarf an klaren Anweisungen zur Anwendung von Design-Thinking-Methoden und -Aktivitäten. Das Konzept von Zeit und der Umgang der arabischen Welt damit, Gender-Management während der Schulungen sowie Hierarchie und Machtdynamik unter den Schulungsteilnehmern gehören ebenfalls zu den identifizierten Herausforderungen. Ein weiteres wichtiges Ergebnis der Studie ist die Bestätigung von التفكير التصميمي als dem in der Region am weitesten verbreiteten arabischen Begriff für Design Thinking, da vier weitere arabische Begriffe mit Design Thinking in Verbindung gebracht werden konnten.
Basierend auf den Ergebnissen der Studie schließt die Arbeit mit einer Liste von Empfehlungen, wie die genannten Herausforderungen überwunden werden können und welche Faktoren bei der Entwicklung und Implementierung von kulturell angepassten Design Thinking-Trainings in der arabischen Welt berücksichtigt werden sollten.
KW  - Design Thinking
KW  - human-centered design
KW  - the Arab world
KW  - emergence
KW  - adoption
KW  - implementation
KW  - culture
KW  - Design Thinking
KW  - Annahme
KW  - Kultur
KW  - Entstehung
KW  - menschenzentriertes Design
KW  - Implementierung
KW  - die arabische Welt
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-598911
ER  - 
TY  - GEN
A1  - Torkura, Kennedy A.
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Strauss, Tim
A1  - Graupner, Hendrik
A1  - Cheng, Feng
A1  - Meinel, Christoph
T1  - CSBAuditor
BT  - proactive security risk analysis for cloud storage broker systems
T2  - 17th International Symposium on Network Computing and Applications (NCA)
N2  - Cloud Storage Brokers (CSB) provide seamless and concurrent access to multiple Cloud Storage Services (CSS) while abstracting cloud complexities from end-users. However, this multi-cloud strategy faces several security challenges including enlarged attack surfaces, malicious insider threats, security complexities due to integration of disparate components and API interoperability issues. Novel security approaches are imperative to tackle these security issues. Therefore, this paper proposes CSBAuditor, a novel cloud security system that continuously audits CSB resources, to detect malicious activities and unauthorized changes e.g. bucket policy misconfigurations, and remediates these anomalies. The cloud state is maintained via a continuous snapshotting mechanism thereby ensuring fault tolerance. We adopt the principles of chaos engineering by integrating Broker Monkey, a component that continuously injects failure into our reference CSB system, Cloud RAID. Hence, CSBAuditor is continuously tested for efficiency i.e. its ability to detect the changes injected by Broker Monkey. CSBAuditor employs security metrics for risk analysis by computing severity scores for detected vulnerabilities using the Common Configuration Scoring System, thereby overcoming the limitation of insufficient security metrics in existing cloud auditing schemes. CSBAuditor has been tested using various strategies including chaos engineering failure injection strategies. Our experimental evaluation validates the efficiency of our approach against the aforementioned security issues with a detection and recovery rate of over 96 %.
KW  - Cloud-Security
KW  - Cloud Audit
KW  - Security Metrics
KW  - Security Risk Assessment
KW  - Secure Configuration
Y1  - 2018
SN  - 978-1-5386-7659-2
U6  - https://doi.org/10.1109/NCA.2018.8548329
PB  - IEEE
CY  - New York
ER  - 
TY  - GEN
A1  - Torkura, Kennedy A.
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Meinig, Michael
A1  - Kayem, Anne V. D. M.
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Graupner, Hendrik
T1  - Securing cloud storage brokerage systems through threat models
T2  - Proceedings IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)
N2  - Cloud storage brokerage is an abstraction aimed at providing value-added services. However, Cloud Service Brokers are challenged by several security issues including enlarged attack surfaces due to integration of disparate components and API interoperability issues. Therefore, appropriate security risk assessment methods are required to identify and evaluate these security issues, and examine the efficiency of countermeasures. A possible approach for satisfying these requirements is employment of threat modeling concepts, which have been successfully applied in traditional paradigms. In this work, we employ threat models including attack trees, attack graphs and Data Flow Diagrams against a Cloud Service Broker (CloudRAID) and analyze these security threats and risks. Furthermore, we propose an innovative technique for combining Common Vulnerability Scoring System (CVSS) and Common Configuration Scoring System (CCSS) base scores in probabilistic attack graphs to cater for configuration-based vulnerabilities which are typically leveraged for attacking cloud storage systems. This approach is necessary since existing schemes do not provide sufficient security metrics, which are imperatives for comprehensive risk assessments. We demonstrate the efficiency of our proposal by devising CCSS base scores for two common attacks against cloud storage: Cloud Storage Enumeration Attack and Cloud Storage Exploitation Attack. These metrics are then used in Attack Graph Metric-based risk assessment. Our experimental evaluation shows that our approach caters for the aforementioned gaps and provides efficient security hardening options. Therefore, our proposals can be employed to improve cloud security.
KW  - Cloud-Security
KW  - Threat Models
KW  - Security Metrics
KW  - Security Risk Assessment
KW  - Secure Configuration
Y1  - 2018
SN  - 978-1-5386-2195-0
U6  - https://doi.org/10.1109/AINA.2018.00114
SN  - 1550-445X
SP  - 759
EP  - 768
PB  - IEEE
CY  - New York
ER  - 
TY  - GEN
A1  - Torkura, Kennedy A.
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Kayem, Anne V. D. M.
A1  - Cheng, Feng
A1  - Meinel, Christoph
T1  - A cyber risk based moving target defense mechanism for microservice architectures
T2  - IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)
N2  - Microservice Architectures (MSA) structure applications as a collection of loosely coupled services that implement business capabilities. The key advantages of MSA include inherent support for continuous deployment of large complex applications, agility and enhanced productivity. However, studies indicate that most MSA are homogeneous, and introduce shared vulnerabilites, thus vulnerable to multi-step attacks, which are economics-of-scale incentives to attackers. In this paper, we address the issue of shared vulnerabilities in microservices with a novel solution based on the concept of Moving Target Defenses (MTD). Our mechanism works by performing risk analysis against microservices to detect and prioritize vulnerabilities. Thereafter, security risk-oriented software diversification is employed, guided by a defined diversification index. The diversification is performed at runtime, leveraging both model and template based automatic code generation techniques to automatically transform programming languages and container images of the microservices. Consequently, the microservices attack surfaces are altered thereby introducing uncertainty for attackers while reducing the attackability of the microservices. Our experiments demonstrate the efficiency of our solution, with an average success rate of over 70% attack surface randomization.
KW  - Security Risk Assessment
KW  - Security Metrics
KW  - Moving Target Defense
KW  - Microservices Security
KW  - Application Container Security
Y1  - 2018
SN  - 978-1-7281-1141-4
U6  - https://doi.org/10.1109/BDCloud.2018.00137
SN  - 2158-9178
SP  - 932
EP  - 939
PB  - Institute of Electrical and Electronics Engineers
CY  - Los Alamitos
ER  - 
TY  - THES
A1  - Torcato Mordido, Gonçalo Filipe
T1  - Diversification, compression, and evaluation methods for generative adversarial networks
N2  - Generative adversarial networks (GANs) have been broadly applied to a wide range of application domains since their proposal. In this thesis, we propose several methods that aim to tackle different existing problems in GANs. Particularly, even though GANs are generally able to generate high-quality samples, the diversity of the generated set is often sub-optimal. Moreover, the common increase of the number of models in the original GANs framework, as well as their architectural sizes, introduces additional costs. Additionally, even though challenging, the proper evaluation of a generated set is an important direction to ultimately improve the generation process in GANs. We start by introducing two diversification methods that extend the original GANs framework to multiple adversaries to stimulate sample diversity in a generated set. Then, we introduce a new post-training compression method based on Monte Carlo methods and importance sampling to quantize and prune the weights and activations of pre-trained neural networks without any additional training. The previous method may be used to reduce the memory and computational costs introduced by increasing the number of models in the original GANs framework. Moreover, we use a similar procedure to quantize and prune gradients during training, which also reduces the communication costs between different workers in a distributed training setting. We introduce several topology-based evaluation methods to assess data generation in different settings, namely image generation and language generation. Our methods retrieve both single-valued and double-valued metrics, which, given a real set, may be used to broadly assess a generated set or separately evaluate sample quality and sample diversity, respectively. Moreover, two of our metrics use locality-sensitive hashing to accurately assess the generated sets of highly compressed GANs. The analysis of the compression effects in GANs paves the way for their efficient employment in real-world applications. Given their general applicability, the methods proposed in this thesis may be extended beyond the context of GANs. Hence, they may be generally applied to enhance existing neural networks and, in particular, generative frameworks.
N2  - Generative adversarial networks (GANs) wurden seit ihrer Einführung in einer Vielzahl von Anwendungsbereichen eingesetzt. In dieser Dissertation schlagen wir einige Verfahren vor, die darauf abzielen, verschiedene bestehende Probleme von GANs zu lösen. Insbesondere, fokussieren wir uns auf das Problem das GANs zwar qualitative hochwertige Samples generieren können, aber die Diversität ist oft sub-optimal. Darüber hinaus, stellt die allgemein übliche Zunahme der Anzahl der Modelle unter dem ursprünglichen GAN-Framework, als auch deren Modellgröße weitere Aufwendungskosten dar. Abschließend, ist die richtige Evaluierung einer generierten Menge, wenn auch herausfordernd, eine wichtige Forschungsrichtung, um letztendlich den Generierungsprozess von GANs zu verbessern.

Wir beginnen mit der Einführung von zwei Diversifizierungsmethoden die das ursprüngliche GAN-Framework um mehrere Gegenspieler erweitern, um die Diversität zu erhöhen. Um den zusätzlichen Speicher- und Rechenaufwand zu reduzieren, führen wir dann eine neue Kompressionsmethode ein. Diese Methode basiert auf den Monte-Carlo-Methoden und Importance Sampling, für das Quantisieren und Pruning der Gewichte und Aktivierungen von schon trainierten neuronalen Netzwerken ohne zusätzliches Trainieren. Wir erweitern die erwähne Methode zusätzlich für das Quantisieren und Pruning von Gradienten während des Trainierens, was die Kommunikationskosten zwischen verschiedenen sogenannten „Workern“ in einer verteilten Trainingsumgebung reduziert. 

Bezüglich der Bewertung der generierten Samples, stellen wir mehrere typologie basierte Evaluationsmethoden vor, die sich auf Bild-und Text konzentrieren. Um verschiedene Anwendungsfälle zu erfassen, liefern unsere vorgestellten Methoden einwertige und doppelwertige Metriken. Diese können einerseits dazu genutzt werden, generierte Samples, oder die Qualität und Verteilung der Samples anhand einer Menge von echten Samples  zu bewerten. Außerdem, verwenden zwei unserer vorgestellten Metriken so genanntes locality-sensitive Hashing, um die generierten Samples von stark komprimierten GANs genau zu bewerten. Die Analyse von Kompressionseffekten in GANs ebnet den Weg für ihren effizienten Einsatz für reale Anwendungen. 

Aufgrund der allgemeinen Anwendungsmöglichkeit von GANs, können die in dieser Arbeit vorgestellten Methoden auch über Kontext von GANs hinaus erweitert werden. Daher könnten sie allgemein auf existierende neuronale Netzwerke angewandt werden und insbesondere auf generative Frameworks.
KW  - deep learning
KW  - generative adversarial networks
KW  - erzeugende gegnerische Netzwerke
KW  - tiefes Lernen
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-535460
ER  - 
TY  - JOUR
A1  - Thienen, Julia von
A1  - Weinstein, Theresa Julia
A1  - Meinel, Christoph
T1  - Creative metacognition in design thinking
BT  - exploring theories, educational practices, and their implications for measurement
JF  - Frontiers in psychology
N2  - Design thinking is a well-established practical and educational approach to fostering high-level creativity and innovation, which has been refined since the 1950s with the participation of experts like Joy Paul Guilford and Abraham Maslow. Through real-world projects, trainees learn to optimize their creative outcomes by developing and practicing creative cognition and metacognition. This paper provides a holistic perspective on creativity, enabling the formulation of a comprehensive theoretical framework of creative metacognition. It focuses on the design thinking approach to creativity and explores the role of metacognition in four areas of creativity expertise: Products, Processes, People, and Places. The analysis includes task-outcome relationships (product metacognition), the monitoring of strategy effectiveness (process metacognition), an understanding of individual or group strengths and weaknesses (people metacognition), and an examination of the mutual impact between environments and creativity (place metacognition). It also reviews measures taken in design thinking education, including a distribution of cognition and metacognition, to support students in their development of creative mastery. On these grounds, we propose extended methods for measuring creative metacognition with the goal of enhancing comprehensive assessments of the phenomenon. Proposed methodological advancements include accuracy sub-scales, experimental tasks where examinees explore problem and solution spaces, combinations of naturalistic observations with capability testing, as well as physiological assessments as indirect measures of creative metacognition.
KW  - accuracy
KW  - creativity
KW  - design thinking
KW  - education
KW  - measurement
KW  - metacognition
KW  - innovation
KW  - framework
Y1  - 2023
U6  - https://doi.org/10.3389/fpsyg.2023.1157001
SN  - 1664-1078
VL  - 14
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Thienen, Julia von
A1  - Clancey, William J.
A1  - Corazza, Giovanni Emanuele
A1  - Meinel, Christoph
T1  - Theoretical foundations of design thinking creative thinking theories
JF  - Design Thinking Research: Making Distinctions: Collaboration versus Cooperation
N2  - Design thinking is acknowledged as a thriving innovation practice plus something more, something in the line of a deep understanding of innovation processes. At the same time, quite how and why design thinking works-in scientific terms-appeared an open question at first. Over recent years, empirical research has achieved great progress in illuminating the principles that make design thinking successful. Lately, the community began to explore an additional approach. Rather than setting up novel studies, investigations into the history of design thinking hold the promise of adding systematically to our comprehension of basic principles. This chapter makes a start in revisiting design thinking history with the aim of explicating scientific understandings that inform design thinking practices today. It offers a summary of creative thinking theories that were brought to Stanford Engineering in the 1950s by John E. Arnold.
Y1  - 2018
SN  - 978-3-319-60967-6
SN  - 978-3-319-60966-9
U6  - https://doi.org/10.1007/978-3-319-60967-6_2
SP  - 13
EP  - 40
PB  - Springer
CY  - New York
ER  - 
TY  - GEN
A1  - Teusner, Ralf
A1  - Matthies, Christoph
A1  - Staubitz, Thomas
T1  - What Stays in Mind?
BT  - Retention Rates in Programming MOOCs
T2  - IEEE Frontiers in Education Conference (FIE)
Y1  - 2018
SN  - 978-1-5386-1174-6
U6  - https://doi.org/10.1109/FIE.2018.8658890
SN  - 0190-5848
PB  - IEEE
CY  - New York
ER  - 
TY  - THES
A1  - Teusner, Ralf
T1  - Situational interventions and peer feedback in massive open online courses
T1  - Situationsabhängige Interventionen und Peer-Feedback in Massive Open Online Courses
BT  - narrowing the gap between learners and instructors in online programming education
N2  - Massive Open Online Courses (MOOCs) open up new opportunities to learn a wide variety of skills online and are thus well suited for individual education, especially where proffcient teachers are not available locally. At the same time, modern society is undergoing a digital transformation, requiring the training of large numbers of current and future employees. Abstract thinking, logical reasoning, and the need to formulate instructions for computers are becoming increasingly relevant. A holistic way to train these skills is to learn how to program. Programming, in addition to being a mental discipline, is also considered a craft, and practical training is required to achieve mastery. In order to effectively convey programming skills in MOOCs, practical exercises are incorporated into the course curriculum to offer students the necessary hands-on experience to reach an in-depth understanding of the programming concepts presented. Our preliminary analysis showed that while being an integral and rewarding part of courses, practical exercises bear the risk of overburdening students who are struggling with conceptual misunderstandings and unknown syntax. In this thesis, we develop, implement, and evaluate different interventions with the aim to improve the learning experience, sustainability, and success of online programming courses. Data from four programming MOOCs, with a total of over 60,000 participants, are employed to determine criteria for practical programming exercises best suited for a given audience.

Based on over five million executions and scoring runs from students' task submissions, we deduce exercise difficulties, students' patterns in approaching the exercises, and potential flaws in exercise descriptions as well as preparatory videos. The primary issue in online learning is that students face a social gap caused by their isolated physical situation. Each individual student usually learns alone in front of a computer and suffers from the absence of a pre-determined time structure as provided in traditional school classes. Furthermore, online learning usually presses students into a one-size-fits-all curriculum, which presents the same content to all students, regardless of their individual needs and learning styles. Any means of a personalization of content or individual feedback regarding problems they encounter are mostly ruled out by the discrepancy between the number of learners and the number of instructors. This results in a high demand for self-motivation and determination of MOOC participants. Social distance exists between individual students as well as between students and course instructors. It decreases engagement and poses a threat to learning success. Within this research, we approach the identified issues within MOOCs and suggest scalable technical solutions, improving social interaction and balancing content difficulty.

Our contributions include situational interventions, approaches for personalizing educational content as well as concepts for fostering collaborative problem-solving. With these approaches, we reduce counterproductive struggles and create a universal improvement for future programming MOOCs. We evaluate our approaches and methods in detail to improve programming courses for students as well as instructors and to advance the state of knowledge in online education.

Data gathered from our experiments show that receiving peer feedback on one's programming problems improves overall course scores by up to 17%. Merely the act of phrasing a question about one's problem improved overall scores by about 14%. The rate of students reaching out for help was significantly improved by situational just-in-time interventions. Request for Comment interventions increased the share of students asking for help by up to 158%. Data from our four MOOCs further provide detailed insight into the learning behavior of students. We outline additional significant findings with regard to student behavior and demographic factors. Our approaches, the technical infrastructure, the numerous educational resources developed, and the data collected provide a solid foundation for future research.
N2  - MOOCs (Massive Open Online Courses) ermöglichen es jedem Interessierten sich in verschiedenen Fachrichtungen online weiterzubilden. Sie fördern die persönliche individuelle Entwicklung und ermöglichen lebenslanges Lernen auch dort, wo geeignete Lehrer nicht verfügbar sind. Unsere Gesellschaft befindet sich derzeit in der sogenannten "digitalen Transformation". Von vielen Arbeitnehmern werden in diesem Zusammenhang zunehmend Fähigkeiten wie abstraktes Denken und logisches Schlussfolgern erwartet. Das Erlernen einer Programmiersprache ist eine geeignete Möglichkeit, diese Fähigkeiten zu erlangen. Obwohl Programmieren als geistige Disziplin angesehen wird, ist es zu einem gewissen Grad auch ein Handwerk, bei dem sich das individuelle Können insbesondere durch stetige praktische Anwendung entwickelt. Um Programmierkenntnisse effektiv in einem MOOC zu vermitteln, sollten daher praktische Aufgaben von vornherein in den Lehrstoff des Kurses integriert werden, um die vorgestellten Konzepte geeignet zu vertiefen und zu festigen. Neben den positiven Aspekten
für die Lernenden weisen praktische Programmieraufgaben jedoch auch ein erhöhtes Frustpotential auf. Kryptische Fehlermeldungen und teils unbekannte Syntax überfordern insbesondere diejenigen Teilnehmer, welche zusätzlich mit konzeptionellen Missverständnissen zu kämpfen haben.

Im Rahmen dieser Arbeit entwickeln und analysieren wir mehrere Interventionsmöglichkeiten um die Lernerfahrung und den Lernerfolg von Teilnehmern in Programmier-MOOCs zu verbessern. Daten von über 60.000 Teilnehmern aus vier Programmier-MOOCs bilden die Grundlage für eine Analyse von Kriterien für geeignete Programmieraufgaben für spezifische Teilnehmergruppen. Auf Basis von 5 Millionen Codeausführungen von Teilnehmern leiten wir Schwachstellen in Aufgaben und typische Herangehensweisen der Teilnehmer ab. Die Hauptschwierigkeit beim Lernen in einer virtuellen Umgebung ist die durch physische Isolation hervorgerufene soziale Entkopplung. Jeder Teilnehmer lernt alleine vor einem Bildschirm, ein gemeinsamer Stundenplan wie im klassischen Schulunterricht fehlt. Weiterhin präsentieren bestehende online Kurse den Teilnehmern in der Regel lediglich universell einsetzbare Lerninhalte, welche in keiner Weise
auf die jeweiligen Bedürfnisse und Vorerfahrungen der individuellen Teilnehmer angepasst sind. Personalisierte Lerninhalte bzw. individuelles Feedback sind in MOOCs aufgrund der großen Anzahl an Teilnehmern und der nur kleinen Anzahl an Lehrenden oft nur schwer bzw. gar nicht zu realisieren. Daraus resultieren wiederum hohe Anforderungen an das individuelle Durchhaltevermögen und die Selbstmotivation der MOOC-Teilnehmer. Die soziale Entkopplung manifestiert sich sowohl zwischen den Teilnehmern untereinander als auch zwischen den Lehrenden und den Teilnehmern. Negative Folgen sind ein häufig verringertes Engagement und damit eine Gefährdung des Lernerfolgs. In dieser Arbeit schlagen wir als Gegenmaßnahme skalierbare technische Lösungen vor, um die soziale Interaktion zu verbessern und inhaltliche Schwierigkeiten zu überwinden.

Unsere wissenschaftlichen Beiträge umfassen situationsabhängige Interventionen, Ansätze zur Personalisierung von Lerninhalten, sowie Konzepte und Anreize zur Verbesserung der Kollaboration der Teilnehmer untereinander. Mit diesen Maßnahmen schaffen wir es, kontraproduktive Blockaden beim Lernen zu lösen und stellen damit einen universell einsetzbaren Ansatz zur Verbesserung von zukünftigen Progammier-MOOCs bereit.

Die aus unseren Experimenten gesammelten Daten zeigen, dass bei Programmierproblemen gewährtes Feedback von anderen Teilnehmern die Gesamtpunktzahl innerhalb des Teilnehmerfeldes durchschnittlich um bis zu 17% verbessert. Bereits das Formulieren des jeweiligen individuellen Problems verbesserte die Gesamtpunktzahl um etwa 14%. Durch situative Interventionen konnte weiterhin der Anteil der Teilnehmer, die nach Hilfe fragen, um bis zu 158% gesteigert werden. Die gesammelten Daten aus unseren vier MOOCs ermöglichen darüber hinaus detaillierte Einblicke in das Lernverhalten der Teilnehmer. Wir zeigen zusätzlich Erkenntnisse in Bezug auf das Verhalten der Teilnehmer und zu demografischen Faktoren auf. Die in dieser Arbeit beschriebenen Ansätze, die geschaffene technische Infrastruktur, das entworfene Lehrmaterial, sowie der umfangreiche gesammelte Datenbestand bilden darüber hinaus eine vielversprechende Grundlage für weitere zukünftige Forschung.
KW  - programming
KW  - MOOC
KW  - intervention
KW  - collaboration
KW  - peer feedback
KW  - Programmierung
KW  - MOOC
KW  - Interventionen
KW  - Kollaboration
KW  - Peer-feedback
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-507587
ER  - 
TY  - GEN
A1  - Teichmann, Malte
A1  - Ullrich, Andre
A1  - Gronau, Norbert
T1  - Subject-oriented learning
BT  - a new perspective for vocational training in learning factories
T2  - Procedia Manufacturing
N2  - The transformation to a digitized company changes not only the work but also social context for the employees and requires inter alia new knowledge and skills from them. Additionally, individual action problems arise. This contribution proposes the subject-oriented learning theory, in which the employees´ action problems are the starting point of training activities in learning factories. In this contribution, the subject-oriented learning theory is exemplified and respective advantages for vocational training in learning factories are pointed out both theoretically and practically. Thereby, especially the individual action problems of learners and the infrastructure are emphasized as starting point for learning processes and competence development.
KW  - Subject-oriented learning
KW  - action problems
KW  - vocational training
KW  - learning factories
Y1  - 2019
U6  - https://doi.org/10.1016/j.promfg.2019.03.012
SN  - 2351-9789
VL  - 31
SP  - 72
EP  - 78
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Tang, Mitchell
A1  - Nakamoto, Carter H.
A1  - Stern, Ariel Dora
A1  - Mehrotra, Ateev
T1  - Trends in remote patient monitoring use in traditional Medicare
JF  - JAMA Internal Medicine
N2  - This cross-sectional study uses traditional Medicare claims data to assess trends in general remote patient monitoring from January 2018 through September 2021.
Y1  - 2022
U6  - https://doi.org/10.1001/jamainternmed.2022.3043
SN  - 2168-6106
SN  - 2168-6114
VL  - 182
IS  - 9
SP  - 1005
EP  - 1006
PB  - American Veterinary Medical Association
CY  - Chicago
ER  - 
TY  - THES
A1  - Tan, Jing
T1  - Multi-Agent Reinforcement Learning for Interactive Decision-Making
T1  - Multiagenten Verstärkendes Lernen für Interaktive Entscheidungsfindung
N2  - Distributed decision-making studies the choices made among a group of interactive and self-interested agents. Specifically, this thesis is concerned with the optimal sequence of choices an agent makes as it tries to maximize its achievement on one or multiple objectives in the dynamic environment. The optimization of distributed decision-making is important in many real-life applications, e.g., resource allocation (of products, energy, bandwidth, computing power, etc.) and robotics (heterogeneous agent cooperation on games or tasks), in various fields such as vehicular network, Internet of Things, smart grid, etc.
This thesis proposes three multi-agent reinforcement learning algorithms combined with game-theoretic tools to study strategic interaction between decision makers, using resource allocation in vehicular network as an example. Specifically, the thesis designs an interaction mechanism based on second-price auction, incentivizes the agents to maximize multiple short-term and long-term, individual and system objectives, and simulates a dynamic environment with realistic mobility data to evaluate algorithm performance and study agent behavior. 

Theoretical results show that the mechanism has Nash equilibria, is a maximization of social welfare and Pareto optimal allocation of resources in a stationary environment. Empirical results show that in the dynamic environment, our proposed learning algorithms outperform state-of-the-art algorithms in single and multi-objective optimization, and demonstrate very good generalization property in significantly different environments. Specifically, with the long-term multi-objective learning algorithm, we demonstrate that by considering the long-term impact of decisions, as well as by incentivizing the agents with a system fairness reward, the agents achieve better results in both individual and system objectives, even when their objectives are private, randomized, and changing over time. Moreover, the agents show competitive behavior to maximize individual payoff when resource is scarce, and cooperative behavior in achieving a system objective when resource is abundant; they also learn the rules of the game, without prior knowledge, to overcome disadvantages in initial parameters (e.g., a lower budget).

To address practicality concerns, the thesis also provides several computational performance improvement methods, and tests the algorithm in a single-board computer. Results show the feasibility of online training and inference in milliseconds. 

There are many potential future topics following this work. 1) The interaction mechanism can be modified into a double-auction, eliminating the auctioneer, resembling a completely distributed, ad hoc network; 2) the objectives are assumed to be independent in this thesis, there may be a more realistic assumption regarding correlation between objectives, such as a hierarchy of objectives; 3) current work limits information-sharing between agents, the setup befits applications with privacy requirements or sparse signaling; by allowing more information-sharing between the agents, the algorithms can be modified for more cooperative scenarios such as robotics.
N2  - Die Verteilte Entscheidungsfindung untersucht Entscheidungen innerhalb einer Gruppe von interaktiven und eigennützigen Agenten. Diese Arbeit befasst sich insbesondere mit der optimalen Folge von Entscheidungen eines Agenten, der das Erreichen eines oder mehrerer Ziele in einer dynamischen Umgebung zu maximieren versucht. Die Optimierung einer verteilten Entscheidungsfindung ist in vielen alltäglichen Anwendungen relevant, z.B. zur Allokation von Ressourcen (Produkte, Energie, Bandbreite, Rechenressourcen etc.) und in der Robotik (heterogene Agenten-Kooperation in Spielen oder Aufträgen) in diversen Feldern wie Fahrzeugkommunikation, Internet of Things, Smart Grid, usw.
Diese Arbeit schlägt drei Multi-Agenten Reinforcement Learning Algorithmen kombiniert mit spieltheoretischen Ansätzen vor, um die strategische Interaktion zwischen Entscheidungsträgern zu untersuchen. Dies wird am Beispiel einer Ressourcenallokation in der Fahrzeug-zu-X-Kommunikation (vehicle-to-everything) gezeigt. Speziell wird in der Arbeit ein Interaktionsmechanismus entwickelt, der auf Basis einer Zweitpreisauktion den Agenten zur Maximierung mehrerer kurz- und langfristiger Ziele sowie individueller und Systemziele anregt. Dabei wird eine dynamische Umgebung mit realistischen Mobilitätsdaten simuliert, um die Leistungsfähigkeit des Algorithmus zu evaluieren und das Agentenverhalten zu untersuchen.

Eine theoretische Analyse zeigt, dass bei diesem Mechanismus das Nash-Gleichgewicht sowie eine Maximierung von Wohlfahrt und Pareto-optimaler Ressourcenallokation in einer statischen Umgebung vorliegen. Empirische Untersuchungen ergeben, dass in einer dynamischen Umgebung der vorgeschlagene Lernalgorithmus den aktuellen Stand der Technik bei ein- und mehrdimensionaler Optimierung übertrifft, und dabei sehr gut auch auf stark abweichende Umgebungen generalisiert werden kann.

Speziell mit dem langfristigen mehrdimensionalen Lernalgorithmus wird gezeigt, dass bei Berücksichtigung von langfristigen Auswirkungen von Entscheidungen, als auch durch einen Anreiz zur Systemgerechtigkeit, die Agenten in individuellen als auch Systemzielen bessere Ergebnisse liefern, und das auch, wenn ihre Ziele privat, zufällig und zeitveränderlich sind. Weiter zeigen die Agenten Wettbewerbsverhalten, um ihre eigenen Ziele zu maximieren, wenn die Ressourcen knapp sind, und kooperatives Verhalten, um Systemziele zu erreichen, wenn die Ressourcen ausreichend sind. Darüber hinaus lernen sie die Ziele des Spiels ohne vorheriges Wissen über dieses, um Startschwierigkeiten, wie z.B. ein niedrigeres Budget, zu überwinden.

Für die praktische Umsetzung zeigt diese Arbeit auch mehrere Methoden auf, welche die Rechenleistung verbessern können, und testet den Algorithmus auf einem handelsüblichen Einplatinencomputer. Die Ergebnisse zeigen die Durchführbarkeit von inkrementellem Lernen und Inferenz innerhalb weniger Millisekunden auf. Ausgehend von den Ergebnissen dieser Arbeit könnten sich verschiedene Forschungsfragen anschließen: 1) Der Interaktionsmechanismus kann zu einer Doppelauktion verändert und dabei der Auktionator entfernt werden. Dies würde einem vollständig verteilten Ad-Hoc-Netzwerk entsprechen. 2) Die Ziele werden in dieser Arbeit als unabhängig betrachtet. Es könnte eine Korrelation zwischen mehreren Zielen angenommen werden, so wie eine Zielhierarchie. 3) Die aktuelle Arbeit begrenzt den Informationsaustausch zwischen Agenten. Diese Annahme passt zu Anwendungen mit Anforderungen an den Schutz der Privatsphäre oder bei spärlichen Signalen. Indem der Informationsaustausch erhöht wird, könnte der Algorithmus auf stärker kooperative Anwendungen wie z.B. in der Robotik erweitert werden.
KW  - V2X
KW  - distributed systems
KW  - reinforcement learning
KW  - game theory
KW  - auction
KW  - decision making
KW  - behavioral sciences
KW  - multi-objective
KW  - V2X
KW  - Verteilte Systeme
KW  - Spieltheorie
KW  - Auktion
KW  - Entscheidungsfindung
KW  - Verhaltensforschung
KW  - verstärkendes Lernen
KW  - Multiziel
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-607000
ER  - 
TY  - GEN
A1  - Tala, Mahdi
A1  - Schrape, Oliver
A1  - Krstić, Miloš
A1  - Bertozzi, Davide
T1  - Exploring the Performance-Energy Optimization Space of a Bridge Between 3D-Stacked Electronic and Optical Networks-on-Chip
T2  - XXXIII Conference on Design of Circuits and Integrated Systems (DCIS)
N2  - The relentless improvement of silicon photonics is making optical interconnects and networks appealing for use in miniaturized systems, where electrical interconnects cannot keep up with the growing levels of core integration due to bandwidth density and power efficiency limitations. At the same time, solutions such as 3D stacking or 2.5D integration open the door to a fully dedicated process optimization for the photonic die. However, an architecture-level integration challenge arises between the electronic network and the optical one in such tightly-integrated parallel systems. It consists of adapting signaling rates, matching the different levels of communication parallelism, handling cross-domain flow control, addressing re-synchronization concerns, and avoiding protocol-dependent deadlock. The associated energy and performance overhead may offset the inherent benefits of the emerging technology itself. This paper explores a hybrid CMOS-ECL bridge architecture between 3D-stacked technology-heterogeneous networks-on-chip (NoCs). The different ways of overcoming the serialization challenge (i.e., through an improvement of the signaling rate and/or through space-/wavelength division multiplexing options) give rise to a configuration space that the paper explores, in search for the most energy-efficient configuration for high-performance.
Y1  - 2018
SN  - 978-1-7281-0171-2
U6  - https://doi.org/10.1109/DCIS.2018.8681461
SN  - 2471-6170
SN  - 2640-5563
PB  - IEEE
CY  - New York
ER  - 
TY  - THES
A1  - Taeumel, Marcel
T1  - Data-driven tool construction in exploratory programming environments
T1  - Datengetriebener Werkzeugbau in explorativen Programmierumgebungen
N2  - This work presents a new design for programming environments that promote the exploration of domain-specific software artifacts and the construction of graphical tools for such program comprehension tasks. In complex software projects, tool building is essential because domain- or task-specific tools can support decision making by representing concerns concisely with low cognitive effort. In contrast, generic tools can only support anticipated scenarios, which usually align with programming language concepts or well-known project domains.

However, the creation and modification of interactive tools is expensive because the glue that connects data to graphics is hard to find, change, and test. Even if valuable data is available in a common format and even if promising visualizations could be populated, programmers have to invest many resources to make changes in the programming environment. Consequently, only ideas of predictably high value will be implemented. In the non-graphical, command-line world, the situation looks different and inspiring: programmers can easily build their own tools as shell scripts by configuring and combining filter programs to process data.

We propose a new perspective on graphical tools and provide a concept to build and modify such tools with a focus on high quality, low effort, and continuous adaptability. That is, (1) we propose an object-oriented, data-driven, declarative scripting language that reduces the amount of and governs the effects of glue code for view-model specifications, and (2) we propose a scalable UI-design language that promotes short feedback loops in an interactive, graphical environment such as Morphic known from Self or Squeak/Smalltalk systems.

We implemented our concept as a tool building environment, which we call VIVIDE, on top of Squeak/Smalltalk and Morphic. We replaced existing code browsing and debugging tools to iterate within our solution more quickly. In several case studies with undergraduate and graduate students, we observed that VIVIDE can be applied to many domains such as live language development, source-code versioning, modular code browsing, and multi-language debugging. Then, we designed a controlled experiment to measure the effect on the time to build tools. Several pilot runs showed that training is crucial and, presumably, takes days or weeks, which implies a need for further research.

As a result, programmers as users can directly work with tangible representations of their software artifacts in the VIVIDE environment. Tool builders can write domain-specific scripts to populate views to approach comprehension tasks from different angles. Our novel perspective on graphical tools can inspire the creation of new trade-offs in modularity for both data providers and view designers.
N2  - Diese Arbeit schlägt einen neuartigen Entwurf für Programmierumgebungen vor, welche den Umgang mit domänenspezifischen Software-Artefakten erleichtern und die Konstruktion von unterstützenden, grafischen Werkzeugen fördern. Werkzeugbau ist in komplexen Software-Projekten ein essentieller Bestandteil, weil spezifische, auf Domäne und Aufgabe angepasste, Werkzeuge relevante Themen und Konzepte klar darstellen und somit effizient zur Entscheidungsfindung beitragen können. Im Gegensatz dazu sind vorhandene, traditionelle Werkzeuge nur an allgemeinen, wiederkehrenden Anforderungen ausgerichtet, welche im Spezialfall Gedankengänge nur unzureichend abbilden können.

Leider sind das Erstellen und Anpassen von interaktiven Werkzeugen teuer, weil die Beschreibungen zwischen Information und Repräsentation nur schwer auffindbar, änderbar und prüfbar sind. Selbst wenn relevante Daten verfügbar und vielversprechende Visualisierungen konfigurierbar sind, müssten Programmierer viele Ressourcen für das Verändern ihrer Programmierumgeben investieren. Folglich können nur Ideen von hohem Wert umgesetzt werden, um diese Kosten zu rechtfertigen. Dabei sieht die Situation in der textuellen Welt der Kommandozeile sehr vielversprechend aus. Dort können Programmierer einfach ihre Werkzeuge in Form von Skripten anpassen und kleine Filterprogramme kombinieren, um Daten zu verarbeiten.

Wir stellen eine neuartige Perspektive auf grafische Werkzeuge vor und vermitteln dafür ein Konzept, um diese Werkzeuge mit geringem Aufwand und in hoher Qualität zu konstruieren. Im Detail beinhaltet das, erstens, eine objekt-orientierte, daten-getriebene, deklarative Skriptsprache, um die Programmierschnittstelle zwischen Information und Repräsentation zu vereinfachen. Zweitens ist dies eine skalierbare Entwurfssprache für Nutzerschnitt-stellen, welche kurze Feedback-Schleifen und Interaktivität kombiniert, wie es in den Umgebungen Self oder Squeak/Smalltalk typisch ist.

Wir haben unser Konzept in Form einer neuartigen Umgebung für Werkzeugbau mit Hilfe von Squeak/Smalltalk und Morphic umgesetzt. Die Umgebung trägt den Namen VIVIDE. Damit konnten wir die bestehenden Werkzeuge von Squeak für Quelltextexploration und  ausführung ersetzen, um unsere Lösung kontinuierlich zu verbessern. In mehreren Fallstudien mit Studenten konnten wir beobachten, dass sich VIVIDE in vielen Domänen anwenden lässt: interaktive Entwicklung von Programmiersprachen, modulare Versionierung und Exploration von Quelltext und Fehleranalyse von mehrsprachigen Systemen. Mit Blick auf zukünftige Forschung haben wir ebenfalls ein kontrolliertes Experiment entworfen. Nach einigen Testläufen stellte sich die Trainingsphase von VIVIDE als größte, und somit offene, Herausforderung heraus.

Im Ergebnis sind wir davon überzeugt, dass Programmierer in VIVIDE direkt mit greifbaren, interaktiven Darstellungen relevanter Software-Artefakte arbeiten können. Im Rahmen des Werkzeugbaus können Programmierer kompakte, angepasste Skripte schreiben, die Visualisierungen konfigurieren, um Programmieraufgaben spezifisch aus mehreren Blickwinkeln zu betrachten. Unsere neuartige Perspektive auf grafische Werkzeuge kann damit sowohl das Bereitstellen von Informationen, als auch den Entwurf interaktiver Grafik positiv beeinflussen.
KW  - programming
KW  - tool building
KW  - user interaction
KW  - exploration
KW  - liveness
KW  - immediacy
KW  - direct manipulation
KW  - scripting languages
KW  - Squeak/Smalltalk
KW  - Programmieren
KW  - Werkzeugbau
KW  - Nutzerinteraktion
KW  - Exploration
KW  - Lebendigkeit
KW  - Direkte Manipulation
KW  - Skriptsprachen
KW  - Squeak/Smalltalk
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-444289
ER  - 
TY  - JOUR
A1  - Söchting, Maximilian
A1  - Trapp, Matthias
T1  - Controlling image-stylization techniques using eye tracking
JF  - Science and Technology Publications
N2  - With the spread of smart phones capable of taking high-resolution photos and the development of high-speed mobile data infrastructure, digital visual media is becoming one of the most important forms of modern communication. With this development, however, also comes a devaluation of images as a media form with the focus becoming the frequency at which visual content is generated instead of the quality of the content. In this work, an interactive system using image-abstraction techniques and an eye tracking sensor is presented, which allows users to experience diverting and dynamic artworks that react to their eye movement. The underlying modular architecture enables a variety of different interaction techniques that share common design principles, making the interface as intuitive as possible. The resulting experience allows users to experience a game-like interaction in which they aim for a reward, the artwork, while being held under constraints, e.g., not blinking. The co nscious eye movements that are required by some interaction techniques hint an interesting, possible future extension for this work into the field of relaxation exercises and concentration training.
KW  - Eye-tracking
KW  - Image Abstraction
KW  - Image Processing
KW  - Artistic Image Stylization
KW  - Interactive Media
Y1  - 2020
SN  - 2184-4321
PB  - Springer
CY  - Berlin
ER  - 
TY  - GEN
A1  - Söchting, Maximilian
A1  - Trapp, Matthias
T1  - Controlling image-stylization techniques using eye tracking
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - With the spread of smart phones capable of taking high-resolution photos and the development of high-speed mobile data infrastructure, digital visual media is becoming one of the most important forms of modern communication. With this development, however, also comes a devaluation of images as a media form with the focus becoming the frequency at which visual content is generated instead of the quality of the content. In this work, an interactive system using image-abstraction techniques and an eye tracking sensor is presented, which allows users to experience diverting and dynamic artworks that react to their eye movement. The underlying modular architecture enables a variety of different interaction techniques that share common design principles, making the interface as intuitive as possible. The resulting experience allows users to experience a game-like interaction in which they aim for a reward, the artwork, while being held under constraints, e.g., not blinking. The co nscious eye movements that are required by some interaction techniques hint an interesting, possible future extension for this work into the field of relaxation exercises and concentration training.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 7 
KW  - eye-tracking
KW  - image abstraction
KW  - image processing
KW  - artistic image stylization
KW  - interactive media
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-524717
IS  - 7
ER  - 
TY  - GEN
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Torkura, Kennedy A.
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Graupner, Hendrik
T1  - Unified logging system for monitoring multiple cloud storage providers in cloud storage broker
T2  - 32ND International Conference on Information Networking (ICOIN)
N2  - With the increasing demand for personal and enterprise data storage service, Cloud Storage Broker (CSB) provides cloud storage service using multiple Cloud Service Providers (CSPs) with guaranteed Quality of Service (QoS), such as data availability and security. However monitoring cloud storage usage in multiple CSPs has become a challenge for CSB due to lack of standardized logging format for cloud services that causes each CSP to implement its own format. In this paper we propose a unified logging system that can be used by CSB to monitor cloud storage usage across multiple CSPs. We gather cloud storage log files from three different CSPs and normalise these into our proposed log format that can be used for further analysis process. We show that our work enables a coherent view suitable for data navigation, monitoring, and analytics.
KW  - Unified logging system
KW  - Cloud Service Provider
KW  - cloud monitoring
KW  - data integration
KW  - security analytics
Y1  - 2018
SN  - 978-1-5386-2290-2
U6  - https://doi.org/10.1109/ICOIN.2018.8343081
SP  - 44
EP  - 49
PB  - IEEE
CY  - New York
ER  - 
TY  - THES
A1  - Sukmana, Muhammad Ihsan Haikal
T1  - Security improvements for enterprise file sychronization and sharing system
T1  - Sicherheitsverbesserungen für Enterprise File Synchronization und Sharing System
N2  - With the fast rise of cloud computing adoption in the past few years, more companies are migrating their confidential files from their private data center to the cloud to help enterprise's digital transformation process. Enterprise file synchronization and share (EFSS) is one of the solutions offered for enterprises to store their files in the cloud with secure and easy file sharing and collaboration between its employees. However, the rapidly increasing number of cyberattacks on the cloud might target company's files on the cloud to be stolen or leaked to the public. It is then the responsibility of the EFSS system to ensure the company's confidential files to only be accessible by authorized employees.
CloudRAID is a secure personal cloud storage research collaboration project that provides data availability and confidentiality in the cloud. It combines erasure and cryptographic techniques to securely store files as multiple encrypted file chunks in various cloud service providers (CSPs). However, several aspects of CloudRAID's concept are unsuitable for secure and scalable enterprise cloud storage solutions, particularly key management system, location-based access control, multi-cloud storage management, and cloud file access monitoring.
This Ph.D. thesis focuses on CloudRAID for Business (CfB) as it resolves four main challenges of CloudRAID's concept for a secure and scalable EFSS system. First, the key management system is implemented using the attribute-based encryption scheme to provide secure and scalable intra-company and inter-company file-sharing functionalities. Second, an Internet-based location file access control functionality is introduced to ensure files could only be accessed at pre-determined trusted locations. Third, a unified multi-cloud storage resource management framework is utilized to securely manage cloud storage resources available in various CSPs for authorized CfB stakeholders. Lastly, a multi-cloud storage monitoring system is introduced to monitor the activities of files in the cloud using the generated cloud storage log files from multiple CSPs.
In summary, this thesis helps CfB system to provide holistic security for company's confidential files on the cloud-level, system-level, and file-level to ensure only authorized company and its employees could access the files.
N2  - Mit der raschen Verbreitung von Cloud Computing in den letzten Jahren verlagern immer mehr Unternehmen ihre vertraulichen Dateien von ihren privaten Rechenzentren in die Cloud, um den digitalen Transformationsprozess des Unternehmens zu unterstützen. Enterprise File Synchronization and Share (EFSS) ist eine der Lösungen, die Unternehmen angeboten werden, um ihre Dateien in der Cloud zu speichern und so eine sichere und einfache gemeinsame Nutzung von Dateien und die Zusammenarbeit zwischen den Mitarbeitern zu ermöglichen. Die schnell wachsende Zahl von Cyberangriffen auf die Cloud kann jedoch dazu führen, dass die in der Cloud gespeicherten Unternehmensdateien gestohlen werden oder an die Öffentlichkeit gelangen. Es liegt dann in der Verantwortung des EFSS-Systems, sicherzustellen, dass die vertraulichen Dateien des Unternehmens nur für autorisierte Mitarbeiter zugänglich sind.
CloudRAID ist ein Forschungsprojekt für sichere persönliche Cloud-Speicher, das die Verfügbarkeit und Vertraulichkeit von Daten in der Cloud gewährleistet. Es kombiniert Lösch- und Verschlüsselungstechniken, um Dateien in Form von mehreren verschlüsselten Datei-Blöcken bei verschiedenen Cloud-Service-Providern (CSPs) sicher zu speichern. Mehrere Aspekte des CloudRAID-Konzepts sind jedoch für sichere und skalierbare Cloud-Speicherlösungen für Unternehmen ungeeignet, insbesondere das Schlüsselverwaltungssystem, die standortbasierte Zugriffskontrolle, die Verwaltung mehrerer Cloud-Speicher und die Überwachung des Zugriffs auf Cloud-Dateien.
Diese Doktorarbeit konzentriert sich auf CloudRAID for Business (CfB), da es die vier wichtigsten Herausforderungen des CloudRAID-Konzepts für ein sicheres und skalierbares EFSS-System löst. Erstens wird das Verwaltungssystem der kryptografischen Schlüssel unter Verwendung des attributbasierten Verschlüsselungsschemas implementiert, um sichere und skalierbare unternehmensinterne und -übergreifende Dateifreigabefunktionen bereitzustellen. Zweitens wird eine internetbasierte Dateizugriffskontrolle eingeführt, um sicherzustellen, dass der Zugriff auf Dateien nur an vorher festgelegten vertrauenswürdigen Standorten möglich ist. Drittens wird ein einheitlicher Rahmen für die Verwaltung von Multi-Cloud-Speicherressourcen verwendet, um die in verschiedenen CSPs verfügbaren Cloud-Speicherressourcen für autorisierte CfB-Akteure sicher zu verwalten. Schließlich wird ein Multi-Cloud-Storage-Monitoring-System eingeführt, um die Aktivitäten von Dateien in der Cloud anhand der von mehreren CSPs generierten Cloud-Storage-Protokolldateien zu überwachen.
Zusammenfassend lässt sich sagen, dass diese Arbeit dem CfB-System hilft, ganzheitliche Sicherheit für vertrauliche Unternehmensdateien auf Cloud-, System- und Dateiebene zu bieten, um sicherzustellen, dass nur autorisierte Unternehmen und ihre Mitarbeiter auf die Dateien zugreifen können.
KW  - CloudRAID
KW  - CloudRAID for Business
KW  - Cloud Computing
KW  - Cybersecurity
KW  - Cryptography
KW  - Access Control
KW  - Enterprise File Synchronization and Share
KW  - Zugriffskontrolle
KW  - Cloud Computing
KW  - CloudRAID
KW  - CloudRAID for Business
KW  - Kryptografie
KW  - Cybersicherheit
KW  - Unternehmensdateien synchronisieren und teilen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-549996
ER  - 
TY  - JOUR
A1  - Stojanovic, Vladeta
A1  - Trapp, Matthias
A1  - Richter, Rico
A1  - Döllner, Jürgen Roland Friedrich
T1  - Service-oriented semantic enrichment of indoor point clouds using octree-based multiview classification
JF  - Graphical Models
N2  - The use of Building Information Modeling (BIM) for Facility Management (FM) in the Operation and Maintenance (O&M) stages of the building life-cycle is intended to bridge the gap between operations and digital data, but lacks the functionality of assessing the state of the built environment due to non-automated generation of associated semantics. 3D point clouds can be used to capture the physical state of the built environment, but also lack these associated semantics. A prototypical implementation of a service-oriented architecture for classification of indoor point cloud scenes of office environments is presented, using multiview classification. The multiview classification approach is tested using a retrained Convolutional Neural Network (CNN) model - Inception V3. The presented approach for classifying common office furniture objects (chairs, sofas and desks), contained in 3D point cloud scans, is tested and evaluated. The results show that the presented approach can classify common office furniture up to an acceptable degree of accuracy, and is suitable for quick and robust semantics approximation - based on RGB (red, green and blue color channel) cubemap images of the octree partitioned areas of the 3D point cloud scan. Additional methods for web-based 3D visualization, editing and annotation of point clouds are also discussed. Using the described approach, captured scans of indoor environments can be semantically enriched using object annotations derived from multiview classification results. Furthermore, the presented approach is suited for semantic enrichment of lower resolution indoor point clouds acquired using commodity mobile devices.
KW  - Semantic enrichment
KW  - 3D point clouds
KW  - Multiview classification
KW  - Service-oriented
KW  - Indoor environments
Y1  - 2019
U6  - https://doi.org/10.1016/j.gmod.2019.101039
SN  - 1524-0703
SN  - 1524-0711
VL  - 105
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - THES
A1  - Stojanovic, Vladeta
T1  - Digital twins for indoor built environments
T1  - Digitale Zwillinge für gebaute Innenumgebungen
N2  - One of the key challenges in modern Facility Management (FM) is to digitally reflect the current state of the built environment, referred to as-is or as-built versus as-designed representation. While the use of Building Information Modeling (BIM) can address the issue of digital representation, the generation and maintenance of BIM data requires a considerable amount of manual work and domain expertise. Another key challenge is being able to monitor the current state of the built environment, which is used to provide feedback and enhance decision making. The need for an integrated solution for all data associated with the operational life cycle of a building is becoming more pronounced as practices from Industry 4.0 are currently being evaluated and adopted for FM use. This research presents an approach for digital representation of indoor environments in their current state within the life cycle of a given building. Such an approach requires the fusion of various sources of digital data. The key to solving such a complex issue of digital data integration, processing and representation is with the use of a Digital Twin (DT). A DT is a digital duplicate of the physical environment, states, and processes. A DT fuses as-designed and as-built digital representations of built environment with as-is data, typically in the form of floorplans, point clouds and BIMs, with additional information layers pertaining to the current and predicted states of an indoor environment or a complete building (e.g., sensor data). The design, implementation and initial testing of prototypical DT software services for indoor environments is presented and described. These DT software services are implemented within a service-oriented paradigm, and their feasibility is presented through functioning and tested key software components within prototypical Service-Oriented System (SOS) implementations. The main outcome of this research shows that key data related to the built environment can be semantically enriched and combined  to enable digital representations of indoor environments, based on the concept of a DT. Furthermore, the outcomes of this research show that digital data, related to FM and Architecture, Construction, Engineering, Owner and Occupant (AECOO) activity, can be combined, analyzed and visualized in real-time using a service-oriented approach. This has great potential to benefit decision making related to Operation and Maintenance (O&M) procedures within the scope of the post-construction life cycle stages of typical office buildings.
N2  - Eine der wichtigsten Herausforderungen im modernen Facility Management (FM) besteht darin, den aktuellen Zustand der gebauten Umgebung digital wiederzugeben und die tatsächliche mit der geplanten Gebäudedarstellung zu vergleichen. Während die Verwendung von Building Information Modeling (BIM) das Problem der digitalen Darstellung lösen kann, erfordert die Generierung und Pflege von BIM-Daten einen erheblichen manuellen Aufwand und Fachkenntnisse. Eine weitere wichtige Herausforderung besteht darin, den aktuellen Zustand der gebauten Umgebung zu überwachen, um Feedback zu geben und die Entscheidungsfindung zu verbessern. Die Notwendigkeit einer integrierten Lösung für alle Daten im Zusammenhang mit dem Betriebslebenszyklus eines Gebäudes wird immer deutlicher, da derzeit Praktiken aus Industrie 4.0 evaluiert und für die FM-Nutzung übernommen werden. Diese Studie präsentiert einen Ansatz zur digitalen Darstellung von Innenräumen in ihrem aktuellen Zustand innerhalb des Lebenszyklus eines bestimmten Gebäudes. Ein solcher Ansatz erfordert die Fusion verschiedener Quellen digitaler Daten. Der Schlüssel zur Lösung eines solch komplexen Problems der Integration, Verarbeitung und Darstellung digitaler Daten liegt in der Verwendung eines Digital Twin (DT). Ein DT ist ein digitales Duplikat der physischen Umgebung, Zustände und Prozesse. Ein DT verschmilzt die entworfenen und gebauten digitalen Darstellungen der gebauten Umwelt mit aktuellen Repräsentationsdaten, typischerweise in Form von Grundrissen, Punktwolken und BIMs, mit zusätzlichen Informationsebenen, die sich auf die aktuellen und vorhergesagten Zustände einer Innenumgebung oder eines kompletten Gebäudes beziehen (z.B. Sensordaten). Das Design, die Implementierung und die ersten Tests prototypischen DT-Software-Dienstleistungen für Innenräume werden vorgestellt und beschrieben. Die DT-Software-Dienstleistungen werden innerhalb eines serviceorientierten Paradigmas implementiert, und ihre Machbarkeit wird durch funktionierende und getestete wichtige Softwarekomponenten in prototypischen SOS-Implementierungen dargestellt. Das Hauptergebnis dieser Forschung zeigt, dass Schlüsseldaten in Bezug auf die gebaute Umgebung semantisch angereichert und kombiniert werden können, um digitale Darstellungen von Innenumgebungen basierend auf dem Konzept eines DT zu ermöglichen. Darüber hinaus zeigen die Ergebnisse dieser Forschung, dass digitale Daten in Bezug auf FM und Architektur, Bauwesen, Ingenieurwesen, Eigentümer- und Insassenaktivitäten mithilfe eines serviceorientierten Ansatzes in Echtzeit kombiniert, analysiert und visualisiert werden können. Dies hat ein großes Potenzial für die Entscheidungsfindung in Bezug auf Betriebsund Wartungsverfahren im Rahmen der Lebenszyklusphasen typischer Bürogebäude nach dem Bau.
KW  - Digital Twin
KW  - BIM
KW  - Point Clouds
KW  - Service-Oriented Systems
KW  - 3D Visualization
KW  - Data Analytics
KW  - Machine Learning
KW  - Deep Learning
KW  - Semantic Enrichment
KW  - Indoor Point Clouds
KW  - Real Estate 4.0
KW  - Facility Management
KW  - Building Management
KW  - Sensor Analytics
KW  - Visualization
KW  - 3D-Visualisierung
KW  - Gebäudeinformationsmodellierung
KW  - Gebäudemanagement
KW  - Daten-Analytik
KW  - Tiefes Lernen
KW  - Digitaler Zwilling
KW  - Indoor-Punktwolken
KW  - Maschinelles Lernen
KW  - Punktwolken
KW  - Immobilien 4.0
KW  - Semantische Anreicherung
KW  - Sensor-Analytik
KW  - Service-Orientierte Systeme
KW  - Visualisierung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-509134
ER  - 
TY  - JOUR
A1  - Steinbeck, Hendrik
A1  - Meinel, Christoph
ED  - Meinel, Christoph
ED  - Schweiger, Stefanie
ED  - Staubitz, Thomas
ED  - Conrad, Robert
ED  - Alario Hoyos, Carlos
ED  - Ebner, Martin
ED  - Sancassani, Susanna
ED  - Żur, Agnieszka
ED  - Friedl, Christian
ED  - Halawa, Sherif
ED  - Gamage, Dilrukshi
ED  - Scott, Jeffrey
ED  - Kristine Jonson Carlon, May
ED  - Deville, Yves
ED  - Gaebel, Michael
ED  - Delgado Kloos, Carlos
ED  - von Schmieden, Karen
T1  - What makes an educational video?
BT  - deconstructing characteristics of video production styles for
MOOCs
JF  - EMOOCs 2023 : Post-Covid Prospects for Massive Open Online Courses - Boost or Backlash?
N2  - In an effort to describe and produce different formats for video instruction, the research community in technology-enhanced learning, and MOOC scholars in particular, have focused on the general style of video production: whether it is a digitally scripted “talk-and-chalk” or a “talking head” version of a learning unit. Since these production styles include various sub-elements, this paper deconstructs the inherited elements of video production in the context of educational live-streams. Using over 700 videos – both from synchronous and asynchronous modalities of large video-based platforms (YouTube and Twitch), 92 features were found in eight categories of video production. These include commonly analyzed features such as the use of green screen and a visible instructor, but also less studied features such as social media connections and changing camera perspective depending on the topic being covered. Overall, the research results enable an analysis of common video production styles and a toolbox for categorizing new formats – independent of their final (a)synchronous use in MOOCs. Keywords: video production, MOOC video styles, live-streaming.
KW  - Digitale Bildung
KW  - Kursdesign
KW  - MOOC
KW  - Micro Degree
KW  - Online-Lehre
KW  - Onlinekurs
KW  - Onlinekurs-Produktion
KW  - digital education
KW  - e-learning
KW  - micro degree
KW  - micro-credential
KW  - online course creation
KW  - online course design
KW  - online teaching
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-622086
SP  - 47
EP  - 58
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Staubitz, Thomas
A1  - Teusner, Ralf
A1  - Meinel, Christoph
T1  - MOOCs in Secondary Education
BT  - Experiments and Observations from German Classrooms
T2  - 2019 IEEE Global Engineering Education Conference (EDUCON)
N2  - Computer science education in German schools is often less than optimal. It is only mandatory in a few of the federal states and there is a lack of qualified teachers. As a MOOC (Massive Open Online Course) provider with a German background, we developed the idea to implement a MOOC addressing pupils in secondary schools to fill this gap. The course targeted high school pupils and enabled them to learn the Python programming language. In 2014, we successfully conducted the first iteration of this MOOC with more than 7000 participants. However, the share of pupils in the course was not quite satisfactory. So we conducted several workshops with teachers to find out why they had not used the course to the extent that we had imagined. The paper at hand explores and discusses the steps we have taken in the following years as a result of these workshops.
KW  - MOOC
KW  - Secondary Education
KW  - School
KW  - Teamwork
KW  - K-12
KW  - Programming course
KW  - Java
KW  - Python
Y1  - 2019
SN  - 978-1-5386-9506-7
U6  - https://doi.org/10.1109/EDUCON.2019.8725138
SN  - 2165-9567
SP  - 173
EP  - 182
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Staubitz, Thomas
A1  - Serth, Sebastian
A1  - Thomas, Max
A1  - Ebner, Martin
A1  - Koschutnig-Ebner, Markus
A1  - Rampelt, Florian
A1  - von Stetten, Alexander
A1  - Wittke, Andreas
ED  - Meinel, Christoph
ED  - Schweiger, Stefanie
ED  - Staubitz, Thomas
ED  - Conrad, Robert
ED  - Alario Hoyos, Carlos
ED  - Ebner, Martin
ED  - Sancassani, Susanna
ED  - Żur, Agnieszka
ED  - Friedl, Christian
ED  - Halawa, Sherif
ED  - Gamage, Dilrukshi
ED  - Scott, Jeffrey
ED  - Kristine Jonson Carlon, May
ED  - Deville, Yves
ED  - Gaebel, Michael
ED  - Delgado Kloos, Carlos
ED  - von Schmieden, Karen
T1  - A metastandard for the international exchange of MOOCs
BT  - the MOOChub as first prototype
JF  - EMOOCs 2023 : Post-Covid Prospects for Massive Open Online Courses - Boost or Backlash?
N2  - The MOOChub is a joined web-based catalog of all relevant German and Austrian MOOC platforms that lists well over 750 Massive Open Online Courses (MOOCs). Automatically building such a catalog requires that all partners describe and publicly offer the metadata of their courses in the same way. The paper at hand presents the genesis of the idea to establish a common metadata standard and the story of its subsequent development. The result of this effort is, first, an open-licensed de-facto-standard, which is based on existing commonly used standards and second, a first prototypical platform that is using this standard: the MOOChub, which lists all courses of the involved partners. This catalog is searchable and provides a more comprehensive overview of basically all MOOCs that are offered by German and Austrian MOOC platforms. Finally, the upcoming developments to further optimize the catalog and the metadata standard are reported.
KW  - Digitale Bildung
KW  - Kursdesign
KW  - MOOC
KW  - Micro Degree
KW  - Online-Lehre
KW  - Onlinekurs
KW  - Onlinekurs-Produktion
KW  - digital education
KW  - e-learning
KW  - micro degree
KW  - micro-credential
KW  - online course creation
KW  - online course design
KW  - online teaching
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-624154
SP  - 147
EP  - 161
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Staubitz, Thomas
A1  - Meinel, Christoph
T1  - Graded Team Assignments in MOOCs
BT  - Effects of Team Composition and Further Factors on Team Dropout Rates and Performance
T2  - SCALE
N2  - The ability to work in teams is an important skill in today's work environments. In MOOCs, however, team work, team tasks, and graded team-based assignments play only a marginal role. To close this gap, we have been exploring ways to integrate graded team-based assignments in MOOCs. Some goals of our work are to determine simple criteria to match teams in a volatile environment and to enable a frictionless online collaboration for the participants within our MOOC platform. The high dropout rates in MOOCs pose particular challenges for team work in this context. By now, we have conducted 15 MOOCs containing graded team-based assignments in a variety of topics. The paper at hand presents a study that aims to establish a solid understanding of the participants in the team tasks. Furthermore, we attempt to determine which team compositions are particularly successful. Finally, we examine how several modifications to our platform's collaborative toolset have affected the dropout rates and performance of the teams.
KW  - Teamwork
KW  - MOOCs
KW  - Team-based Learning
KW  - Team Assessment
KW  - Peer Assessment
KW  - Project-based learning
Y1  - 2019
SN  - 978-1-4503-6804-9
U6  - https://doi.org/10.1145/3330430.3333619
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - GEN
A1  - Staubitz, Thomas
A1  - Meinel, Christoph
T1  - Collaborative Learning in MOOCs - Approaches and Experiments
T2  - 2018 IEEE Frontiers in Education (FIE) Conference
N2  - This Research-to-Practice paper examines the practical application of various forms of collaborative learning in MOOCs. Since 2012, about 60 MOOCs in the wider context of Information Technology and Computer Science have been conducted on our self-developed MOOC platform. The platform is also used by several customers, who either run their own platform instances or use our white label platform. We, as well as some of our partners, have experimented with different approaches in collaborative learning in these courses. Based on the results of early experiments, surveys amongst our participants, and requests by our business partners we have integrated several options to offer forms of collaborative learning to the system. The results of our experiments are directly fed back to the platform development, allowing to fine tune existing and to add new tools where necessary. In the paper at hand, we discuss the benefits and disadvantages of decisions in the design of a MOOC with regard to the various forms of collaborative learning. While the focus of the paper at hand is on forms of large group collaboration, two types of small group collaboration on our platforms are briefly introduced.
KW  - MOOC
KW  - Collaborative learning
KW  - Peer assessment
KW  - Team based assignment
KW  - Teamwork
Y1  - 2018
SN  - 978-1-5386-1174-6
SN  - 0190-5848
PB  - IEEE
CY  - New York
ER  - 
TY  - THES
A1  - Staubitz, Thomas
T1  - Gradable team assignments in large scale learning environments
BT  - collaborative learning, teamwork, and peer assessment in MOOCs
BT  - Kollaboratives Lernen, Teamarbeit und Peer Assessment in MOOCs
N2  - Lifelong learning plays an increasingly important role in many societies. Technology is changing faster than ever and what has been important to learn today, may be obsolete tomorrow. The role of informal programs is becoming increasingly important. Particularly, Massive Open Online Courses have become popular among learners and instructors. In 2008, a group of Canadian education enthusiasts started the first Massive Open Online Courses or MOOCs to prove their cognitive theory of Connectivism. Around 2012, a variety of American start-ups redefined the concept of MOOCs. Instead of following the connectivist doctrine they returned to a more traditional approach. They focussed on video lecturing and combined this with a course forum that allowed the participants to discuss with each other and the teaching team. While this new version of the concept was enormously successful in terms of massiveness—hundreds of thousands of participants from all over the world joined the first of these courses—many educators criticized the re-lapse to the cognitivist model. In the early days, the evolving platforms often did not have more features than a video player, simple multiple-choice quizzes, and the course forum. It soon became a major interest of research to allow the scaling of more modern approaches of learning and teaching for the massiveness of these courses. Hands-on exercises, alternative forms of assessment, collaboration, and teamwork are some of the topics on the agenda. The insights provided by cognitive and pedagogical theories, however, do not necessarily always run in sync with the needs and the preferences of the majority of participants. While the former promote action-learning, hands-on-learning, competence-based-learning, project-based-learning, team-based-learning as the holy grail, many of the latter often rather prefer a more laid-back style of learning, sometimes referred to as edutainment. Obviously, given the large numbers of participants in these courses, there is not just one type of learners. Participants are not a homogeneous mass but a potpourri of individuals with a wildly heterogeneous mix of backgrounds, previous knowledge, familial and professional circumstances, countries of origin, gender, age, and so on. For the majority of participants, a full-time job and/or a family often just does not leave enough room for more time intensive tasks, such as practical exercises or teamwork. Others, however, particularly enjoy these hands-on or collaborative aspects of MOOCs. Furthermore, many subjects particularly require these possibilities and simply cannot be taught or learned in courses that lack collaborative or hands-on features. In this context, the thesis discusses how team assignments have been implemented on the HPI MOOC platform. During the recent years, several experiments have been conducted and a great amount of experience has been gained by employing team assignments in courses in areas, such as Object-Oriented Programming, Design Thinking, and Business Innovation on various instances of this platform: openHPI, openSAP, and mooc.house
N2  - In einer Zeit stetigen Wandels und immer schneller wechselnder Technologien nimmt das lebenslange Lernen einen immer höheren Stellenwert ein. Massive Open Online Courses (MOOCs) sind ein hervorragendes Werkzeug, um in kurzer Zeit und mit vergleichsweise wenig Aufwand breite Teile der Bevölkerung zu erreichen. Das HPI leistet mit der eigenen Plattform openHPI und den für diverse Partner betriebenen Plattformen openSAP, OpenWHO und mooc.house sowohl im deutschsprachigen Raum als auch international einen wichtigen Beitrag zu digitalen Aufklärung. In vielen Bereichen ist die Plattform State of the Art und ist den international bekannteren Plattformen zumindest ebenbürtig. Gerade bei der Entwicklung und Anwendung von neuen Lehr- und Lernmethoden und deren technischer Unterstützung ist openHPI auch international richtungsweisend. Die vorliegende Dissertation befasst sich mit den Möglichkeiten der technischen und didaktischen Unterstützung von bewertbaren Aufgabenstellungen in MOOCs, die im Team zu bearbeiten sind. Durch die Größe der Kurse—in der Regel steht hier ein kleines Teaching Team mehreren tausend Teilnehmern gegenüber—ist eine manuelle Bewertung der Teilnehmenden durch die Lehrenden nicht möglich. Hier wird eine der alternativen Möglichkeiten zur Bewertung von Aufgaben, das sogenannte Peer Assessment, eingesetzt und für die speziellen Gegebenheiten der Bearbeitung von Aufgaben im Team angepasst. In den vergangenen fünf Jahren wurde eine iterative Langzeitstudie durchgeführt, bei der verschiedene qualitative und quantitative Methoden der Auswertung eingesetzt wurden. Das Ergebnis dieser Forschungsarbeit ist eine tiefgehende Einsicht in die Mechanismen der Teamarbeit in skalierenden digitalen Lernplattformen sowie eine Reihe von Empfehlungen zur weiteren Verbesserung der kollaborativen Eigenschaften der HPI-Plattformen, die zum Teil bereits umgesetzt wurden bzw. gerade umgesetzt werden.
T2  - Benotete Teamaufgaben in skalierenden E-Learning-Systemen
KW  - massive open online courses
KW  - MOOC
KW  - collaborative learning
KW  - online learning
KW  - teamwork
KW  - peer assessment
KW  - digital learning
KW  - eLearning
KW  - collaborative work
KW  - kollaboratives Lernen
KW  - kollaboratives Arbeiten
KW  - digitales Lernen
KW  - MOOC
KW  - Massive Open Online Courses
KW  - Online-Lernen
KW  - Peer Assessment
KW  - Teamarbeit
KW  - eLearning
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-471830
ER  - 
TY  - GEN
A1  - Sianipar, Johannes Harungguan
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Meinel, Christoph
T1  - Moving sensitive data against live memory dumping, spectre and meltdown attacks
T2  - 26th International Conference on Systems Engineering (ICSEng)
N2  - The emergence of cloud computing allows users to easily host their Virtual Machines with no up-front investment and the guarantee of always available anytime anywhere. But with the Virtual Machine (VM) is hosted outside of user's premise, the user loses the physical control of the VM as it could be running on untrusted host machines in the cloud. Malicious host administrator could launch live memory dumping, Spectre, or Meltdown attacks in order to extract sensitive information from the VM's memory, e.g. passwords or cryptographic keys of applications running in the VM. In this paper, inspired by the moving target defense (MTD) scheme, we propose a novel approach to increase the security of application's sensitive data in the VM by continuously moving the sensitive data among several memory allocations (blocks) in Random Access Memory (RAM). A movement function is added into the application source code in order for the function to be running concurrently with the application's main function. Our approach could reduce the possibility of VM's sensitive data in the memory to be leaked into memory dump file by 2 5% and secure the sensitive data from Spectre and Meltdown attacks. Our approach's overhead depends on the number and the size of the sensitive data.
KW  - Virtual Machine
KW  - Memory Dumping
KW  - Security
KW  - Cloud Computing
KW  - Spectre
KW  - Meltdown
Y1  - 2019
SN  - 978-1-5386-7834-3
PB  - IEEE
CY  - New York
ER  - 
TY  - THES
A1  - Sianipar, Johannes Harungguan
T1  - Towards scalable and secure virtual laboratory for cybersecurity e-learning
N2  - Distance Education or e-Learning platform should be able to provide a virtual laboratory to let the participants have hands-on exercise experiences in practicing their skill remotely. Especially in Cybersecurity e-Learning where the participants need to be able to attack or defend the IT System. To have a hands-on exercise, the virtual laboratory environment must be similar to the real operational environment, where an attack or a victim is represented by a node in a virtual laboratory environment. A node is usually represented by a Virtual Machine (VM). Scalability has become a primary issue in the virtual laboratory for cybersecurity e-Learning because a VM needs a significant and fix allocation of resources. Available resources limit the number of simultaneous users. Scalability can be increased by increasing the efficiency of using available resources and by providing more resources. Increasing scalability means increasing the number of simultaneous users.
In this thesis, we propose two approaches to increase the efficiency of using the available resources. The first approach in increasing efficiency is by replacing virtual machines (VMs) with containers whenever it is possible. The second approach is sharing the load with the user-on-premise machine, where the user-on-premise machine represents one of the nodes in a virtual laboratory scenario. We also propose two approaches in providing more resources. One way to provide more resources is by using public cloud services. Another way to provide more resources is by gathering resources from the crowd, which is referred to as Crowdresourcing Virtual Laboratory (CRVL).
In CRVL, the crowd can contribute their unused resources in the form of a VM, a bare metal system, an account in a public cloud, a private cloud and an isolated group of VMs, but in this thesis, we focus on a VM. The contributor must give the credential of the VM admin or root user to the CRVL system. We propose an architecture and methods to integrate or dis-integrate VMs from the CRVL system automatically. A Team placement algorithm must also be investigated to optimize the usage of resources and at the same time giving the best service to the user. Because the CRVL system does not manage the contributor host machine, the CRVL system must be able to make sure that the VM integration will not harm their system and that the training material will be stored securely in the contributor sides, so that no one is able to take the training material away without permission. We are investigating ways to handle this kind of threats.
We propose three approaches to strengthen the VM from a malicious host admin. To verify the integrity of a VM before integration to the CRVL system, we propose a remote verification method without using any additional hardware such as the Trusted Platform Module chip. As the owner of the host machine, the host admins could have access to the VM's data via Random Access Memory (RAM) by doing live memory dumping, Spectre and Meltdown attacks. To make it harder for the malicious host admin in getting the sensitive data from RAM, we propose a method that continually moves sensitive data in RAM. We also propose a method to monitor the host machine by installing an agent on it. The agent monitors the hypervisor configurations and the host admin activities.
To evaluate our approaches, we conduct extensive experiments with different settings. The use case in our approach is Tele-Lab, a Virtual Laboratory platform for Cyber Security e-Learning. We use this platform as a basis for designing and developing our approaches. The results show that our approaches are practical and provides enhanced security.
N2  - Die Fernunterrichts- oder E-Learning-Plattform sollte ein virtuelles Labor bieten, in dem die Teilnehmer praktische Übungserfahrungen sammeln können, um ihre Fähigkeiten aus der Ferne zu üben. Insbesondere im Bereich Cybersicherheit E-Learning, wo die Teilnehmer in der Lage sein müssen, das IT-System anzugreifen oder zu verteidigen. Um eine praktische Übung durchzuführen, muss die virtuelle Laborumgebung der realen Betriebsumgebung ähnlich sein, in der ein Angriff oder ein Opfer durch einen Knoten in einer virtuellen Laborumgebung repräsentiert wird. Ein Knoten wird normalerweise durch eine virtuelle Maschine (VM) repräsentiert. Die Skalierbarkeit ist zu einem Hauptproblem des virtuellen Labors für E-Learning im Bereich Cybersicherheit geworden, da für eine VM eine erhebliche und feste Zuweisung von Ressourcen erforderlich ist. Die Verfügbare Ressourcen begrenzen die Anzahl der gleichzeitigen Benutzer. Die Skalierbarkeit kann erhöht werden, indem die verfügbaren Ressourcen effzienter genutzt und mehr Ressourcen bereitgestellt werden. Die Erhöhung der Skalierbarkeit bedeutet die Erhöhung der Anzahl gleichzeitiger Benutzer.
In dieser Arbeit schlagen wir zwei Ansätze vor, um die Effzienz der Nutzung der verfügbaren Ressourcen zu erhöhen. Der erste Ansatz zur Erhöhung der Effzienz besteht darin, virtuelle Maschinen (VMs) durch Container zu ersetzen, wann immer dies möglich ist. Der zweite Ansatz besteht darin, die Last auf den Benutzer vor-ort-maschine zu verteilen, wobei der Benutzer vor-ort-maschine einen der Knoten in einem virtuellen Laborszenario repräsentiert. Wir schlagen auch zwei Ansätze vor, um mehr Ressourcen bereitzustellen. Eine Möglichkeit, mehr Ressourcen bereitzustellen, ist die Nutzung von Public Cloud Services. Eine andere Möglichkeit, mehr Ressourcen bereitzustellen, besteht darin, Ressourcen aus der Menge zu sammeln, die als Crowd-Resourcing Virtual Laboratory (CRVL) bezeichnet wird.
In CRVL, kann die Menge ihre ungenutzten Ressourcen in Form einer VM,
eines Bare-Metal-Systems, eines Accounts in einer Public Cloud, einer Private Cloud und einer isolierten Gruppe von VMs einbringen, aber in dieser Arbeit konzentrieren wir uns auf eine VM. Der Mitwirkende muss dem CRVL-System den Berechtigungsnachweis des VM-Administrators oder des Root-Benutzers geben. Wir schlagen eine Architektur und Methoden vor, um VMs automatisch in das CRVL-System zu integrieren oder daraus zu entfernen. Ein Team-Placement-Algorithmus muss ebenfalls untersucht werden, um die Ressourcennutzung zu optimieren und gleichzeitig den besten Service für den Benutzer zu bieten. Da das CRVL-System den Beitragsgeber-Hostcomputer nicht verwaltet, muss das CRVL-System in der Lage sein, sicherzustellen, dass die VM-Integration ihr System nicht beeinträchtigt und das Schulungsmaterial sicher auf den Beitragsgeberseiten aufbewahrt wird, damit niemand das Trainingsmaterial ohne Erlaubnis wegnehmen kann. Wir untersuchen Möglichkeiten, um mit dieser Art von Bedrohungen umzugehen.
Wir schlagen drei Ansätze vor, um die VM von einem bösartigen Host Administrator zu stärken. Um die Integrität einer VM vor der Integration in das CRVL-System zu überprüfen, schlagen wir eine Remote-Veriffkationsmethode ohne zusätzliche Hardware wie den Trusted Platform Module-Chip vor. Als Besitzer des Host-Rechners können die Host-Administratoren über Random Access Memory (RAM) auf die Daten der VM zugreifen, indem sie Live Memory Dumping, Spectre- und Meltdown-Angriffe durchführen. Um es dem bösartigen Host- Administrator zu erschweren, die sensiblen Daten aus dem RAM zu erhalten, schlagen wir eine Methode vor, die kontinuierlich sensible Daten im RAM bewegt. Wir schlagen auch eine Methode zur Überwachung des Host-Rechners vor, indem ein Agent darauf installiert wird. Der Agent überwacht die Hypervisor-Konfigurationen und die Aktivitäten des Hostadministrators. Um unsere Ansätze zu bewerten, führen wir umfangreiche Experimente mit unterschiedlichen Einstellungen durch. Der Anwendungsfall in unserem Ansatz ist Tele-Lab, eine virtuelle Laborplattform für Cybersicherheit E-Learning. Wir nutzen diese Plattform als Grundlage für die Gestaltung und Entwicklung unserer Ansätze. Die Ergebnisse zeigen, dass unsere Ansätze praktisch sind und mehr Sicherheit bieten.
KW  - Virtual Laboratory
KW  - Cybersecurity e-Learning
KW  - Scalability
KW  - Crowd Resourcing
KW  - Crowd Resourcing
KW  - Cybersicherheit E-Learning
KW  - Skalierbarkeit
KW  - Virtuelles Labor
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-502793
ER  - 
TY  - JOUR
A1  - Shi, Feng
A1  - Schirneck, Friedrich Martin
A1  - Friedrich, Tobias
A1  - Kötzing, Timo
A1  - Neumann, Frank
T1  - Reoptimization time analysis of evolutionary algorithms on linear functions under dynamic uniform constraints
JF  - Algorithmica : an international journal in computer science
N2  - Rigorous runtime analysis is a major approach towards understanding evolutionary computing techniques, and in this area linear pseudo-Boolean objective functions play a central role. Having an additional linear constraint is then equivalent to the NP-hard Knapsack problem, certain classes thereof have been studied in recent works. In this article, we present a dynamic model of optimizing linear functions under uniform constraints. Starting from an optimal solution with respect to a given constraint bound, we investigate the runtimes that different evolutionary algorithms need to recompute an optimal solution when the constraint bound changes by a certain amount. The classical (1+1) EA and several population-based algorithms are designed for that purpose, and are shown to recompute efficiently. Furthermore, a variant of the (1+(λ,λ))GA for the dynamic optimization problem is studied, whose performance is better when the change of the constraint bound is small.
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-605295
SN  - 0178-4617
SN  - 1432-0541
VL  - 82
IS  - 10
SP  - 3117
EP  - 3123
PB  - Springer
CY  - New York
ER  - 
TY  - THES
A1  - Shekhar, Sumit
T1  - Image and video processing based on intrinsic attributes
N2  - Advancements in computer vision techniques driven by machine learning have facilitated robust and efficient estimation of attributes such as depth, optical flow, albedo, and shading. To encapsulate all such underlying properties associated with images and videos, we evolve the concept of intrinsic images towards intrinsic attributes. Further, rapid hardware growth in the form of high-quality smartphone cameras, readily available depth sensors, mobile GPUs, or dedicated neural processing units have made image and video processing pervasive. In this thesis, we explore the synergies between the above two advancements and propose novel image and video processing techniques and systems based on them. To begin with, we investigate intrinsic image decomposition approaches and analyze how they can be implemented on mobile devices. We propose an approach that considers not only diffuse reflection but also specular reflection; it allows us to decompose an image into specularity, albedo, and shading on a resource constrained system (e.g., smartphones or tablets) using the depth data provided by the built-in depth sensors. In addition, we explore how on-device depth data can further be used to add an immersive dimension to 2D photos, e.g., showcasing parallax effects via 3D photography. In this regard, we develop a novel system for interactive 3D photo generation and stylization on mobile devices. Further, we investigate how adaptive manipulation of baseline-albedo (i.e., chromaticity) can be used for efficient visual enhancement under low-lighting conditions. The proposed technique allows for interactive editing of enhancement settings while achieving improved quality and performance. We analyze the inherent optical flow and temporal noise as intrinsic properties of a video. We further propose two new techniques for applying the above intrinsic attributes for the purpose of consistent video filtering. To this end, we investigate how to remove temporal inconsistencies perceived as flickering artifacts. One of the techniques does not require costly optical flow estimation, while both provide interactive consistency control. Using intrinsic attributes for image and video processing enables new solutions for mobile devices – a pervasive visual computing device – and will facilitate novel applications for Augmented Reality (AR), 3D photography, and video stylization. The proposed low-light enhancement techniques can also improve the accuracy of high-level computer vision tasks (e.g., face detection) under low-light conditions. Finally, our approach for consistent video filtering can extend a wide range of image-based processing for videos.
N2  - Fortschritte im Bereich der Computer-Vision-Techniken, die durch Maschinelles Lernen vorangetrieben werden, haben eine robuste und effiziente Schätzung von Attributen wie Tiefe, optischer Fluss, Albedo, und Schattierung ermöglicht. Um all diese zugrundeliegenden Eigenschaften von Bildern und Videos zu erfassen, entwickeln wir das Konzept der intrinsischen Bilder zu intrinsischen Attributen weiter. Darüber hinaus hat die rasante Entwicklung der Hardware in Form von hochwertigen Smartphone-Kameras, leicht verfügbaren Tiefensensoren, mobilen GPUs, oder speziellen neuronalen Verarbeitungseinheiten die Bild- und Videoverarbeitung allgegenwärtig gemacht. In dieser Arbeit erforschen wir die Synergien zwischen den beiden oben genannten Fortschritten und schlagen neue Bild- und Videoverarbeitungstechniken und -systeme vor, die auf ihnen basieren. Zunächst untersuchen wir intrinsische Bildzerlegungsansätze und analysieren, wie sie auf mobilen Geräten implementiert werden können. Wir schlagen einen Ansatz vor, der nicht nur die diffuse Reflexion, sondern auch die spiegelnde Reflexion berücksichtigt; er ermöglicht es uns, ein Bild auf einem ressourcenbeschränkten System (z. B. Smartphones oder Tablets) unter Verwendung der von den eingebauten Tiefensensoren bereitgestellten Tiefendaten in Spiegelung, Albedo und Schattierung zu zerlegen. Darüber hinaus erforschen wir, wie geräteinterne Tiefendaten genutzt werden können, um 2D-Fotos eine immersive Dimension hinzuzufügen, z. B. um Parallaxen-Effekte durch 3D-Fotografie darzustellen. In diesem Zusammenhang entwickeln wir ein neuartiges System zur interaktiven 3D-Fotoerstellung und -Stylisierung auf mobilen Geräten. Darüber hinaus untersuchen wir, wie eine adaptive Manipulation der Grundlinie-Albedo (d.h. der Farbintensität) für eine effiziente visuelle Verbesserung bei schlechten Lichtverhältnissen genutzt werden kann. Die vorgeschlagene Technik ermöglicht die interaktive Bearbeitung von Verbesserungseinstellungen bei verbesserter Qualität und Leistung. Wir analysieren den inhärenten optischen Fluss und die zeitliche Konsistenz als intrinsische Eigenschaften eines Videos. Darüber hinaus schlagen wir zwei neue Techniken zur Anwendung der oben genannten intrinsischen Attribute zum Zweck der konsistenten Videofilterung vor. Zu diesem Zweck untersuchen wir, wie zeitliche Inkonsistenzen, die als Flackerartefakte wahrgenommen werden, entfernt werden können. Eine der Techniken erfordert keine kostspielige optische Flussschätzung, während beide eine interaktive Konsistenzkontrolle bieten. Die Verwendung intrinsischer Attribute für die Bild- und Videoverarbeitung ermöglicht neue Lösungen für mobile Geräte - ein visuelles Computergerät, das aufgrund seiner weltweiten Verbreitung von großer Bedeutung ist - und wird neuartige Anwendungen für Augmented Reality (AR), 3D-Fotografie und Videostylisierung ermöglichen. Die vorgeschlagenen Low-Light-Enhancement-Techniken können auch die Genauigkeit von High-Level-Computer-Vision-Aufgaben (z. B. Objekt-Tracking) unter schlechten Lichtverhältnissen verbessern. Schließlich kann unser Ansatz zur konsistenten Videofilterung eine breite Palette von bildbasierten Verarbeitungen für Videos erweitern.
KW  - image processing
KW  - image-based rendering
KW  - non-photorealistic rendering
KW  - image stylization
KW  - computational photography
KW  - Bildverarbeitung
KW  - bildbasiertes Rendering
KW  - Non-photorealistic Rendering
KW  - Computational Photography
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-620049
ER  - 
TY  - GEN
A1  - Shaabani, Nuhad
A1  - Meinel, Christoph
T1  - Improving the efficiency of inclusion dependency detection
T2  - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
N2  - The detection of all inclusion dependencies (INDs) in an unknown dataset is at the core of any data profiling effort. Apart from the discovery of foreign key relationships, INDs can help perform data integration, integrity checking, schema (re-)design, and query optimization. With the advent of Big Data, the demand increases for efficient INDs discovery algorithms that can scale with the input data size. To this end, we propose S-INDD++ as a scalable system for detecting unary INDs in large datasets. S-INDD++ applies a new stepwise partitioning technique that helps discard a large number of attributes in early phases of the detection by processing the first partitions of smaller sizes. S-INDD++ also extends the concept of the attribute clustering to decide which attributes to be discarded based on the clustering result of each partition. Moreover, in contrast to the state-of-the-art, S-INDD++ does not require the partition to fit into the main memory-which is a highly appreciable property in the face of the ever growing datasets. We conducted an exhaustive evaluation of S-INDD++ by applying it to large datasets with thousands attributes and more than 266 million tuples. The results show the high superiority of S-INDD++ over the state-of-the-art. S-INDD++ reduced up to 50 % of the runtime in comparison with BINDER, and up to 98 % in comparison with S-INDD.
KW  - Algorithms
KW  - Data partitioning
KW  - Data profiling
KW  - Data mining
Y1  - 2018
SN  - 978-1-4503-6014-2
U6  - https://doi.org/10.1145/3269206.3271724
SP  - 207
EP  - 216
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - GEN
A1  - Serth, Sebastian
A1  - Staubitz, Thomas
A1  - van Elten, Martin
A1  - Meinel, Christoph
ED  - Gamage, Dilrukshi
T1  - Measuring the effects of course modularizations in online courses for life-long learners
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Many participants in Massive Open Online Courses are full-time employees seeking greater flexibility in their time commitment and the available learning paths. We recently addressed these requirements by splitting up our 6-week courses into three 2-week modules followed by a separate exam. Modularizing courses offers many advantages: Shorter modules are more sustainable and can be combined, reused, and incorporated into learning paths more easily. Time flexibility for learners is also improved as exams can now be offered multiple times per year, while the learning content is available independently. In this article, we answer the question of which impact this modularization has on key learning metrics, such as course completion rates, learning success, and no-show rates. Furthermore, we investigate the influence of longer breaks between modules on these metrics. According to our analysis, course modules facilitate more selective learning behaviors that encourage learners to focus on topics they are the most interested in. At the same time, participation in overarching exams across all modules seems to be less appealing compared to an integrated exam of a 6-week course. While breaks between the modules increase the distinctive appearance of individual modules, a break before the final exam further reduces initial interest in the exams. We further reveal that participation in self-paced courses as a preparation for the final exam is unlikely to attract new learners to the course offerings, even though learners' performance is comparable to instructor-paced courses. The results of our long-term study on course modularization provide a solid foundation for future research and enable educators to make informed decisions about the design of their courses.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 17 
KW  - Massive Open Online Course (MOOC)
KW  - course design
KW  - modularization
KW  - learning path
KW  - flexibility
KW  - e-learning
KW  - assignments
KW  - self-paced learning
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589182
IS  - 17
ER  - 
TY  - JOUR
A1  - Serth, Sebastian
A1  - Staubitz, Thomas
A1  - van Elten, Martin
A1  - Meinel, Christoph
ED  - Gamage, Dilrukshi
T1  - Measuring the effects of course modularizations in online courses for life-long learners
JF  - Frontiers in Education
N2  - Many participants in Massive Open Online Courses are full-time employees seeking greater flexibility in their time commitment and the available learning paths. We recently addressed these requirements by splitting up our 6-week courses into three 2-week modules followed by a separate exam. Modularizing courses offers many advantages: Shorter modules are more sustainable and can be combined, reused, and incorporated into learning paths more easily. Time flexibility for learners is also improved as exams can now be offered multiple times per year, while the learning content is available independently. In this article, we answer the question of which impact this modularization has on key learning metrics, such as course completion rates, learning success, and no-show rates. Furthermore, we investigate the influence of longer breaks between modules on these metrics. According to our analysis, course modules facilitate more selective learning behaviors that encourage learners to focus on topics they are the most interested in. At the same time, participation in overarching exams across all modules seems to be less appealing compared to an integrated exam of a 6-week course. While breaks between the modules increase the distinctive appearance of individual modules, a break before the final exam further reduces initial interest in the exams. We further reveal that participation in self-paced courses as a preparation for the final exam is unlikely to attract new learners to the course offerings, even though learners' performance is comparable to instructor-paced courses. The results of our long-term study on course modularization provide a solid foundation for future research and enable educators to make informed decisions about the design of their courses.
KW  - Massive Open Online Course (MOOC)
KW  - course design
KW  - modularization
KW  - learning path
KW  - flexibility
KW  - e-learning
KW  - assignments
KW  - self-paced learning
Y1  - 2022
U6  - https://doi.org/10.3389/feduc.2022.1008545
SN  - 2504-284X
VL  - 7
PB  - Frontiers
CY  - Lausanne, Schweiz
ER  - 
TY  - BOOK
A1  - Seitz, Klara
A1  - Lincke, Jens
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
T1  - Language and tool support for 3D crochet patterns
BT  - virtual crochet with a graph structure
N2  - Crochet is a popular handcraft all over the world. While other techniques such as knitting or weaving have received technical support over the years through machines, crochet is still a purely manual craft. Not just the act of crochet itself is manual but also the process of creating instructions for new crochet patterns, which is barely supported by domain specific digital solutions. This leads to unstructured and often also ambiguous and erroneous pattern instructions. In this report, we propose a concept to digitally represent crochet patterns. This format incorporates crochet techniques which allows domain specific support for crochet pattern designers during the pattern creation and instruction writing process. As contributions, we present a thorough domain analysis, the concept of a graph structure used as domain specific language to specify crochet patterns and a prototype of a projectional editor using the graph as representation format of patterns and a diagramming system to visualize them in 2D and 3D. By analyzing the domain, we learned about crochet techniques and pain points of designers in their pattern creation workflow. These insights are the basis on which we defined the pattern representation. In order to evaluate our concept, we built a prototype by which the feasibility of the concept is shown and we tested the software with professional crochet designers who approved of the concept.
N2  - Häkeln ist eine weltweit verbreitete Handarbeitskunst. Obwohl andere Techniken, wie stricken und weben über die Zeit maschinelle Unterstützung erhalten haben, ist Häkeln noch heute ein komplett manueller Vorgang. Nicht nur das Häkeln an sich, sondern auch der Prozess zur Anleitungserstellung von neuen Häkeldesigns ist kaum unterstützt mit digitalen Lösungen. In dieser Arbeit stellen wir ein Konzept vor, das Häkelanleitungen digital repräsentiert. Das entwickelte Format integriert Häkeltechniken, wodurch wir den Prozess des Anleitungschreibens für Designer spezifisch für die Häkeldomäne unterstützen können. Als Beiträge analysieren wir umfassend die Häkeldomäne, entwickeln ein Konzept zur Repräsentation von Häkelanleitungen basierend auf einer Graphenstruktur als domänenspezifische Sprache und implementieren einen projektionalen Editor, der auf der besagten Graphenstruktur aufbaut und weiterhin die erstellten Anleitungen als schematische Darstellung in 2D und 3D visualisiert. Durch die Analyse der Domäne lernen wir Häkeltechniken und Schwachstellen beim Ablauf des Anleitungserstellens kennen. Basierend auf diesen Erkenntnissen entwickeln wir das digitale Format, um Anleitungen zu repräsentieren. Für die Evaluierung unseres Konzepts, haben wir einen Prototypen implementiert, der die Machbarkeit demonstriert. Zudem haben wir die Software von professionellen Häkeldesignern testen lassen, die unsere Herangehensweise gutheißen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 137 
KW  - crochet
KW  - visual language
KW  - tools
KW  - computer-aided design
KW  - Häkeln
KW  - visuelle Sprache
KW  - Werkzeuge
KW  - rechnerunterstütztes Konstruieren
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-492530
SN  - 978-3-86956-505-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 137
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Seidel, Karen
T1  - Modelling binary classification with computability theory
T1  - Binäre Klassifikation modellieren mit Berechenbarkeitstheorie
N2  - We investigate models for incremental binary classification, an example for supervised online learning. Our starting point is a model for human and machine learning suggested by E.M.Gold.
In the first part, we consider incremental learning algorithms that use all of the available binary labeled training data in order to compute the current hypothesis. For this model, we observe that the algorithm can be assumed to always terminate and that the distribution of the training data does not influence learnability. This is still true if we pose additional delayable requirements that remain valid despite a hypothesis output delayed in time. Additionally, we consider the non-delayable requirement of consistent learning. Our corresponding results underpin the claim for delayability being a suitable structural property to describe and collectively investigate a major part of learning success criteria. Our first theorem states the pairwise implications or incomparabilities between an established collection of delayable learning success criteria, the so-called complete map. Especially, the learning algorithm can be assumed to only change its last hypothesis in case it is inconsistent with the current training data. Such a learning behaviour is called conservative.
By referring to learning functions, we obtain a hierarchy of approximative learning success criteria. Hereby we allow an increasing finite number of errors of the hypothesized concept by the learning algorithm compared with the concept to be learned. Moreover, we observe a duality depending on whether vacillations between infinitely many different correct hypotheses are still considered a successful learning behaviour. This contrasts the vacillatory hierarchy for learning from solely positive information.
We also consider a hypothesis space located between the two most common hypothesis space types in the nearby relevant literature and provide the complete map.
In the second part, we model more efficient learning algorithms. These update their hypothesis referring to the current datum and without direct regress to past training data. We focus on iterative (hypothesis based) and BMS (state based) learning algorithms. Iterative learning algorithms use the last hypothesis and the current datum in order to infer the new hypothesis.
Past research analyzed, for example, the above mentioned pairwise relations between delayable learning success criteria when learning from purely positive training data. We compare delayable learning success criteria with respect to iterative learning algorithms, as well as learning from either exclusively positive or binary labeled data. The existence of concept classes that can be learned by an iterative learning algorithm but not in a conservative way had already been observed, showing that conservativeness is restrictive. An additional requirement arising from cognitive science research %and also observed when training neural networks is U-shapedness, stating that the learning algorithm does diverge from a correct hypothesis. We show that forbidding U-shapes also restricts iterative learners from binary labeled data.
In order to compute the next hypothesis, BMS learning algorithms refer to the currently observed datum and the actual state of the learning algorithm. For learning algorithms equipped with an infinite amount of states, we provide the complete map. A learning success criterion is semantic if it still holds, when the learning algorithm outputs other parameters standing for the same classifier. Syntactic (non-semantic) learning success criteria, for example conservativeness and syntactic non-U-shapedness, restrict BMS learning algorithms. For proving the equivalence of the syntactic requirements, we refer to witness-based learning processes. In these, every change of the hypothesis is justified by a later on correctly classified witness from the training data. Moreover, for every semantic delayable learning requirement, iterative and BMS learning algorithms are equivalent. In case the considered learning success criterion incorporates syntactic non-U-shapedness, BMS learning algorithms can learn more concept classes than iterative learning algorithms.
The proofs are combinatorial, inspired by investigating formal languages or employ results from computability theory, such as infinite recursion theorems (fixed point theorems).
N2  - Wir untersuchen Modelle für inkrementelle binäre Klassifikation, ein Beispiel für überwachtes online Lernen. Den Ausgangspunkt bildet ein Modell für menschliches und maschinelles Lernen von E.M.Gold.
Im ersten Teil untersuchen wir inkrementelle Lernalgorithmen, welche zur Berechnung der Hypothesen jeweils die gesamten binär gelabelten Trainingsdaten heranziehen. Bezogen auf dieses Modell können wir annehmen, dass der Lernalgorithmus stets terminiert und die Verteilung der Trainingsdaten die grundsätzliche Lernbarkeit nicht beeinflusst. Dies bleibt bestehen, wenn wir zusätzliche Anforderungen an einen erfolgreichen Lernprozess stellen, die bei einer zeitlich verzögerten Ausgabe von Hypothesen weiterhin zutreffen. Weiterhin untersuchen wir nicht verzögerbare konsistente Lernprozesse. Unsere Ergebnisse bekräftigen die Behauptung, dass Verzögerbarkeit eine geeignete strukturelle Eigenschaft ist, um einen Großteil der Lernerfolgskriterien zu beschreiben und gesammelt zu untersuchen. Unser erstes Theorem klärt für dieses Modell die paarweisen Implikationen oder Unvergleichbarkeiten innerhalb einer etablierten Auswahl verzögerbarer Lernerfolgskriterien auf. Insbesondere können wir annehmen, dass der inkrementelle Lernalgorithmus seine Hypothese nur dann verändert, wenn die aktuellen Trainingsdaten der letzten Hypothese widersprechen. Ein solches Lernverhalten wird als konservativ bezeichnet. 
Ausgehend von Resultaten über Funktionenlernen erhalten wir eine strikte Hierarchie von approximativen Lernerfolgskriterien. Hierbei wird eine aufsteigende endliche Zahl von \emph{Anomalien} (Fehlern) des durch den Lernalgorithmus vorgeschlagenen Konzepts im Vergleich zum Lernziel erlaubt. Weiterhin ergibt sich eine Dualität abhängig davon, ob das Oszillieren zwischen korrekten Hypothesen als erfolgreiches Lernen angesehen wird. Dies steht im Gegensatz zur oszillierenden Hierarchie, wenn der Lernalgorithmus von ausschließlich positiven Daten lernt.
Auch betrachten wir einen Hypothesenraum, der einen Kompromiss zwischen den beiden am häufigsten in der naheliegenden Literatur vertretenen Arten von Hypothesenräumen darstellt.
Im zweiten Teil modellieren wir effizientere Lernalgorithmen. Diese aktualisieren ihre Hypothese ausgehend vom aktuellen Datum, jedoch ohne Zugriff auf die zurückliegenden Trainingsdaten. Wir konzentrieren uns auf iterative (hypothesenbasierte) und BMS (zustandsbasierte) Lernalgorithmen. Iterative Lernalgorithmen nutzen ihre letzte Hypothese und das aktuelle Datum, um die neue Hypothese zu berechnen.
Die bisherige Forschung klärt beispielsweise die oben erwähnten paarweisen Vergleiche zwischen den verzögerbaren Lernerfolgskriterien, wenn von ausschließlich positiven Trainingsdaten gelernt wird. Wir vergleichen verzögerbare Lernerfolgskriterien bezogen auf iterative Lernalgorithmen, sowie das Lernen von aussschließlich positiver oder binär gelabelten Daten. Bereits bekannt war die Existenz von Konzeptklassen, die von einem iterativen Lernalgorithmus gelernt werden können, jedoch nicht auf eine konservative Weise. U-shapedness ist ein in den Kognitionswissenschaften beobachtetes Phänomen, demzufolge der Lerner im Lernprozess von einer bereits korrekten Hypothese divergiert. Wir zeigen, dass iterative Lernalgorithmen auch durch das Verbieten von U-Shapes eingeschränkt werden.
Zur Berechnung der nächsten Hypothese nutzen BMS-Lernalgorithmen ergänzend zum aktuellen Datum den aktuellen Zustand des Lernalgorithmus. Für Lernalgorithmen, die über unendlich viele mögliche Zustände verfügen, leiten wir alle paarweisen Implikationen oder Unvergleichbarkeiten innerhalb der etablierten Auswahl verzögerbarer Lernerfolgskriterien her. Ein Lernerfolgskriterium ist semantisch, wenn es weiterhin gilt, falls im Lernprozess andere Parameter ausgegeben werden, die jeweils für die gleichen Klassifikatoren stehen. Syntaktische (nicht-semantische) Lernerfolgskriterien, beispielsweise Konservativität und syntaktische Non-U-Shapedness, schränken BMS-Lernalgorithmen ein. Um die Äquivalenz der syntaktischen Lernerfolgskriterien zu zeigen, betrachten wir witness-based Lernprozesse. In diesen wird jeder Hypothesenwechsel durch einen später korrekt klassifizierten Zeugen in den Trainingsdaten gerechtfertig. Weiterhin sind iterative und BMS-Lernalgorithmen für die semantischen verzögerbaren Lernerfolgskriterien jeweils äquivalent. Ist syntaktische Non-U-Shapedness Teil des Lernerfolgskriteriums, sind BMS-Lernalgorithmen mächtiger als iterative Lernalgorithmen.
Die Beweise sind kombinatorisch, angelehnt an Untersuchungen zu formalen Sprachen oder nutzen Resultate aus dem Gebiet der Berechenbarkeitstheorie, beispielsweise unendliche Rekursionstheoreme (Fixpunktsätze).
KW  - Binary Classification
KW  - Recursion
KW  - U-Shaped-Learning
KW  - Simulation
KW  - Binäre Klassifikation
KW  - Rekursion
KW  - U-Förmiges Lernen
KW  - Simulation
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-529988
ER  - 
TY  - BOOK
A1  - Schwarzer, Ingo
A1  - Weiß-Saoumi, Said
A1  - Kittel, Roland
A1  - Friedrich, Tobias
A1  - Kaynak, Koraltan
A1  - Durak, Cemil
A1  - Isbarn, Andreas
A1  - Diestel, Jörg
A1  - Knittel, Jens
A1  - Franz, Marquart
A1  - Morra, Carlos
A1  - Stahnke, Susanne
A1  - Braband, Jens
A1  - Dittmann, Johannes
A1  - Griebel, Stephan
A1  - Krampf, Andreas
A1  - Link, Martin
A1  - Müller, Matthias
A1  - Radestock, Jens
A1  - Strub, Leo
A1  - Bleeke, Kai
A1  - Jehl, Leander
A1  - Kapitza, Rüdiger
A1  - Messadi, Ines
A1  - Schmidt, Stefan
A1  - Schwarz-Rüsch, Signe
A1  - Pirl, Lukas
A1  - Schmid, Robert
A1  - Friedenberger, Dirk
A1  - Beilharz, Jossekin Jakob
A1  - Boockmeyer, Arne
A1  - Polze, Andreas
A1  - Röhrig, Ralf
A1  - Schäbe, Hendrik
A1  - Thiermann, Ricky
T1  - RailChain
BT  - Abschlussbericht
N2  - The RailChain project designed, implemented, and experimentally evaluated a juridical recorder that is based on a distributed consensus protocol. That juridical blockchain recorder has been realized as distributed ledger on board the advanced TrainLab (ICE-TD 605 017) of Deutsche Bahn.
For the project, a consortium consisting of DB Systel, Siemens, Siemens Mobility, the Hasso Plattner Institute for Digital Engineering, Technische Universität Braunschweig, TÜV Rheinland InterTraffic, and Spherity has been formed. These partners not only concentrated competencies in railway operation, computer science, regulation, and approval, but also combined experiences from industry, research from academia, and enthusiasm from startups.
Distributed ledger technologies (DLTs) define distributed databases and express a digital protocol for transactions between business partners without the need for a trusted intermediary. The implementation of a blockchain with real-time requirements for the local network of a railway system (e.g., interlocking or train) allows to log data in the distributed system verifiably in real-time. For this, railway-specific assumptions can be leveraged to make modifications to standard blockchains protocols.
EULYNX and OCORA (Open CCS On-board Reference Architecture) are parts of a future European reference architecture for control command and signalling (CCS, Reference CCS Architecture – RCA). Both architectural concepts outline heterogeneous IT systems with components from multiple manufacturers. Such systems introduce novel challenges for the approved and safety-relevant CCS of railways which were considered neither for road-side nor for on-board systems so far. Logging implementations, such as the common juridical recorder on vehicles, can no longer be realized as a central component of a single manufacturer. All centralized approaches are in question.
The research project RailChain is funded by the mFUND program and gives practical evidence that distributed consensus protocols are a proper means to immutably (for legal purposes) store state information of many system components from multiple manufacturers. The results of RailChain have been published, prototypically implemented, and experimentally evaluated in large-scale field tests on the advanced TrainLab. At the same time, the project showed how RailChain can be integrated into the road-side and on-board architecture given by OCORA and EULYNX.
Logged data can now be analysed sooner and also their trustworthiness is being increased. This enables, e.g., auditable predictive maintenance, because it is ensured that data is authentic and unmodified at any point in time.
N2  - Das Projekt RailChain hat einen verteilten Juridical Recorder entworfen, implementiert und experimentell evaluiert, der auf einem echtzeitfähigen verteilten Konsensprotokoll basiert. Dieser Juridical Blockchain Recorder wurde als distributed ledger an Bord des advanced TrainLabs der Deutschen Bahn (ICE-TD 605 017) umgesetzt.
Für das Projekt hat sich ein Konsortium aus DB Systel, Siemens, Siemens Mobility, dem Hasso-Plattner-Institut für Digital Engineering, der Technischen Universität Braunschweig, sowie TÜV Rheinland InterTraffic und Spherity formiert und dabei Kompetenzen aus den Bereichen Bahnbetrieb, Informatik und Zulassungswesen gebündelt. Die Partner kombinieren Erfahrungen aus der Industrie und die akademische Forschung mit der Aufbruchstimmung aus dem Start-Up-Umfeld.
Distributed-Ledger-Technologien (DLTs) definieren verteilte Datenbanken und stellen ein digitales Protokoll für Transaktionen zwischen Geschäftspartnern dar, ohne dass ein Mittelsmann beteiligt sein müsste. Die Implementierung einer Blockchain mit Echtzeitanforderungen für das lokale Netzwerk einer Eisenbahnanlage (z. B. Stellwerk oder Zug) erlaubt es, die im verteilten System entstehenden Daten nachweislich in Echtzeit zu protokollieren. Dabei können eisenbahnspezifische Randbedingungen ausgenutzt werden, um Standard-Blockchain-Protokolle anzupassen.
EULYNX und OCORA (Open CCS On-board Reference Architecture) sind Bestandteile einer zukünftigen europäischen Referenzarchitektur für das Leit- und Sicherungssystem (Reference CCS Architecture – RCA, Control Command and Signalling – CCS). Beide Architekturkonzepte skizzieren herstellerübergreifende, komponentenbasierende heterogene IT-Systeme. Solche Systeme bergen neue Herausforderungen, die bislang im Kontext der zugelassenen, sicherheitsrelevanten Leit- und Sicherungstechnik der Bahn weder strecken- noch fahrzeugseitig adressiert werden mussten. Logbuch-Implementierungen, wie der gängige Juridical Recorder auf Fahrzeugen, können nun nicht mehr als zentrale Systemkomponente eines einzelnen Herstellers umgesetzt werden. Alle zentralisierten Lösungsansätze sind in Frage gestellt.
Das mFUND-geförderte Forschungsprojekt erbringt den praktischen Nachweis, dass Zustandsinformationen über eine Vielzahl von Systemkomponenten herstellerübergreifend und gerichtsfest mittels verteilten Konsensprotokollen gespeichert werden können. Ergebnisse von RailChain wurden publiziert, prototypisch implementiert und in großen Feldtests auf dem advanced TrainLab experimentell evaluiert. Gleichzeitig wurde aufgezeigt, wie sich RailChain in den mit OCORA und EULYNX vorgegebenen fahrzeug- und streckenseitigen Architekturentwurf integrieren lässt.
Daten können dadurch zeitnaher ausgewertet werden und gleichzeitig wird ihre Vertrauenswürdigkeit erhöht. Dies ermöglicht u. a. nachvollziehbare zustandsorientierte Wartung, denn es kann jederzeit sichergestellt werden, dass die Daten authentisch sind und auch nicht verändert wurden.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 152 
KW  - Distributed-Ledger-Technologie (DLT)
KW  - juridical recording
KW  - Konsensprotokolle
KW  - consensus protocols
KW  - Digitalisierung
KW  - digitalization
KW  - Bahnwesen
KW  - railways
KW  - Blockchain
KW  - asset management
KW  - selbstbestimmte Identitäten
KW  - self-sovereign identity
KW  - dezentrale Identitäten
KW  - decentral identities
KW  - überprüfbare Nachweise
KW  - verifiable credentials
KW  - Echtzeit
KW  - real-time
KW  - Standardisierung
KW  - standardization
KW  - Verlässlichkeit
KW  - dependability
KW  - Fehlertoleranz
KW  - fault tolerance
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-577409
SN  - 978-3-86956-550-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 152
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Sakizloglou, Lucas
A1  - Giese, Holger
T1  - Formal testing of timed graph transformation systems using metric temporal graph logic
JF  - International journal on software tools for technology transfer
N2  - Embedded real-time systems generate state sequences where time elapses between state changes. Ensuring that such systems adhere to a provided specification of admissible or desired behavior is essential. Formal model-based testing is often a suitable cost-effective approach. We introduce an extended version of the formalism of symbolic graphs, which encompasses types as well as attributes, for representing states of dynamic systems. Relying on this extension of symbolic graphs, we present a novel formalism of timed graph transformation systems (TGTSs) that supports the model-based development of dynamic real-time systems at an abstract level where possible state changes and delays are specified by graph transformation rules. We then introduce an extended form of the metric temporal graph logic (MTGL) with increased expressiveness to improve the applicability of MTGL for the specification of timed graph sequences generated by a TGTS. Based on the metric temporal operators of MTGL and its built-in graph binding mechanics, we express properties on the structure and attributes of graphs as well as on the occurrence of graphs over time that are related by their inner structure. We provide formal support for checking whether a single generated timed graph sequence adheres to a provided MTGL specification. Relying on this logical foundation, we develop a testing framework for TGTSs that are specified using MTGL. Lastly, we apply this testing framework to a running example by using our prototypical implementation in the tool AutoGraph.
KW  - formal testing
KW  - typed attributed symbolic graphs
KW  - timed graph
KW  - transformation
KW  - graph conditions
KW  - metric temporal graph logic
Y1  - 2021
U6  - https://doi.org/10.1007/s10009-020-00585-w
SN  - 1433-2779
SN  - 1433-2787
VL  - 23
IS  - 3
SP  - 411
EP  - 488
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Invariant Analysis for Multi-Agent Graph Transformation Systems using k-Induction
N2  - The analysis of behavioral models such as Graph Transformation Systems (GTSs) is of central importance in model-driven engineering. However, GTSs often result in intractably large or even infinite state spaces and may be equipped with multiple or even infinitely many start graphs. To mitigate these problems, static analysis techniques based on finite symbolic representations of sets of states or paths thereof have been devised. We focus on the technique of k-induction for establishing invariants specified using graph conditions. To this end, k-induction generates symbolic paths backwards from a symbolic state representing a violation of a candidate invariant to gather information on how that violation could have been reached possibly obtaining contradictions to assumed invariants. However, GTSs where multiple agents regularly perform actions independently from each other cannot be analyzed using this technique as of now as the independence among backward steps may prevent the gathering of relevant knowledge altogether.

In this paper, we extend k-induction to GTSs with multiple agents thereby supporting a wide range of additional GTSs. As a running example, we consider an unbounded number of shuttles driving on a large-scale track topology, which adjust their velocity to speed limits to avoid derailing. As central contribution, we develop pruning techniques based on causality and independence among backward steps and verify that k-induction remains sound under this adaptation as well as terminates in cases where it did not terminate before.
N2  - Die Analyse von Verhaltensmodellen wie Graphtransformationssystemen (GTSs) ist von zentraler Bedeutung im Model Driven Engineering. GTSs führen jedoch häufig zu unhanhabbar großen oder sogar unendlichen Zustandsräumen und können mit mehreren oder sogar unendlich vielen Startgraphen ausgestattet sein. Um diese Probleme abzumildern, wurden statische Analysetechniken entwickelt, die auf endlichen symbolischen Darstellungen von Mengen von Zuständen oder Pfaden basieren. Wir konzentrieren uns auf die Technik der k-Induktion zur Ermittlung von Invarianten, die unter Verwendung von Graphbedingungen spezifiziert sind. Zum Zweck der Analyse erzeugt die k-Induktion symbolische Rückwärtspfade von einem symbolischen Zustand, der eine Verletzung einer Kandidateninvariante darstellt, um Informationen darüber zu sammeln, wie diese Verletzung erreicht werden konnte, wodurch möglicherweise Widersprüche zu angenommenen Invarianten gefunden werden. GTSs, bei denen mehrere Agenten regelmäßig unabhängig voneinander Aktionen ausführen, können derzeit jedoch nicht mit dieser Technik analysiert werden, da die Unabhängigkeit zwischen Rückwärtsschritten das Sammeln von relevantem Wissen möglicherweise verhindert.

In diesem Artikel erweitern wir die k-Induktion auf GTSs mit mehreren Agenten und unterstützen dadurch eine breite Palette zusätzlicher GTSs. Als laufendes Beispiel betrachten wir eine unbegrenzte Anzahl von Shuttles, die auf einer großen Tracktopologie fahren und die ihre Geschwindigkeit an Geschwindigkeitsbegrenzungen anpassen, um ein Entgleisen zu vermeiden. Als zentralen Beitrag entwickeln wir Beschneidungstechniken basierend auf Kausalität und Unabhängigkeit zwischen Rückwärtsschritten und verifizieren, dass die k-Induktion unter dieser Anpassung korrekt bleibt und in Fällen terminiert, in denen sie zuvor nicht terminierte.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 143 
KW  - k-inductive invariant checking
KW  - causality
KW  - parallel and sequential independence
KW  - symbolic analysis
KW  - bounded backward model checking
KW  - k-induktive Invariantenprüfung
KW  - Kausalität
KW  - parallele und Sequentielle Unabhängigkeit
KW  - symbolische Analyse
KW  - Bounded Backward Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545851
SN  - 978-3-86956-531-6
SN  - 1613-5652
SN  - 2191-1665
IS  - 143
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Probabilistic metric temporal graph logic
N2  - Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior adhere to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time.

In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.
N2  - Cyber-physische Systeme umfassen häufig ein komplexes nebenläufiges Verhalten mit Zeitbeschränkungen und probabilistischen Fehlern auf Anforderung. Die Analyse, ob solche Systeme mit probabilistischem gezeitetem Verhalten einer vorgegebenen Spezifikation entsprechen, ist essentiell. Wenn die Zustände des Systems durch Graphen dargestellt werden können, kann der regelbasierte Formalismus von probabilistischen gezeiteten Graphtransformationssystemen (PTGTSs) verwendet werden, um die Strukturdynamik sowie das probabilistische und gezeitete Verhalten des Systems geeignet zu erfassen. Die Modellprüfungsunterstützung für PTGTSs bzgl. Eigenschaften, die unter Verwendung von Probabilistic Timed Computation Tree Logic (PTCTL) spezifiziert wurden, wurde bereits entwickelt. Darüber hinaus wurde das gezeitete graphenbasierte Laufzeitmonitoring mittels metrischer temporaler Graphlogik (MTGL) entwickelt, um metrische temporale Eigenschaften auf identifizierten Untergraphen und ihre strukturellen Änderungen über die Zeit zu erfassen.

In diesem Artikel (a) erweitern wir MTGL auf die probabilistische metrische temporale Graphlogik (PMTGL), indem wir die Spezifikation probabilistischer Eigenschaften zulassen, (b) passen unseren MTGL-Prüfungsansatz auf PTGTSs an und (c) kombinieren die Ansätze für PTCTL-Modellprüfung und MTGL-Prüfung, um einen beschränkten Modellprüfungsansatz (BMC-Ansatz) für PMTGL zu erhalten. In unserer Auswertung wenden wir eine Implementierung unseres BMC-Ansatzes in AutoGraph auf ein Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 146 
KW  - cyber-physical systems
KW  - probabilistic timed systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - bounded model checking
KW  - cyber-physische Systeme
KW  - probabilistische gezeitete Systeme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - Bounded Model Checking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-545867
SN  - 978-3-86956-532-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 146
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Maximova, Maria
A1  - Giese, Holger
T1  - Probabilistic metric temporal graph logic
N2  - Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior adhere to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time. In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.
N2  - Cyber-physische Systeme umfassen häufig ein komplexes nebenläufiges Verhalten mit Zeitbeschränkungen und probabilistischen Fehlern auf Anforderung. Die Analyse, ob solche Systeme mit probabilistischem gezeitetem Verhalten einer vorgegebenen Spezifikation entsprechen, ist essentiell. Wenn die Zustände des Systems durch Graphen dargestellt werden können, kann der regelbasierte Formalismus von probabilistischen gezeiteten Graphtransformationssystemen (PTGTSs) verwendet werden, um die Strukturdynamik sowie das probabilistische und gezeitete Verhalten des Systems geeignet zu erfassen. Die Modellprüfungsunterstützung für PTGTSs bzgl. Eigenschaften, die unter Verwendung von probabilistischer zeitgesteuerter Berechnungsbaumlogik (PTCTL) spezifiziert wurden, wurde bereits entwickelt. Darüber hinaus wurde das gezeitete graphenbasierte Laufzeitmonitoring mittels metrischer temporaler Graphlogik (MTGL) entwickelt, um metrische temporale Eigenschaften auf identifizierten Untergraphen und ihre strukturellen Änderungen über die Zeit zu erfassen.

In diesem Artikel (a) erweitern wir MTGL auf die probabilistische metrische temporale Graphlogik (PMTGL), indem wir die Spezifikation probabilistischer Eigenschaften zulassen, (b) passen unseren MTGL-Prüfungsansatz auf PTGTSs an und (c) kombinieren die Ansätze für PTCTL-Modellprüfung und MTGL-Prüfung, um  einen beschränkten Modellprüfungsansatz (BMC-Ansatz) für PMTGL zu erhalten. In unserer Auswertung wenden wir eine Implementierung unseres BMC-Ansatzes in AutoGraph auf ein Beispiel an.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 140 
KW  - cyber-physische Systeme
KW  - probabilistische gezeitete Systeme
KW  - qualitative Analyse
KW  - quantitative Analyse
KW  - Bounded Model Checking
KW  - cyber-physical systems
KW  - probabilistic timed systems
KW  - qualitative analysis
KW  - quantitative analysis
KW  - bounded model checking
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-515066
SN  - 978-3-86956-517-0
SN  - 1613-5652
SN  - 2191-1665
IS  - 140
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schneider, Sven
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - A logic-based incremental approach to graph repair
T1  - Ein logikbasierter inkrementeller Ansatz für Graphreparatur
N2  - Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond: For example, in model-driven engineering, the abstract syntax of models is usually encoded using graphs. Flexible edit operations temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. Similarly, in graph databases—managing the storage and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints, requiring also graph repair. We present a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing repairs. In our context, we formalize consistency by so-called graph conditions being equivalent to first-order logic on graphs. We present two kind of repair algorithms: State-based repair restores consistency independent of the graph update history, whereas deltabased (or incremental) repair takes this history explicitly into account. Technically, our algorithms rely on an existing model generation algorithm for graph conditions implemented in AutoGraph. Moreover, the delta-based approach uses the new concept of satisfaction (ST) trees for encoding if and how a graph satisfies a graph condition. We then demonstrate how to manipulate these STs incrementally with respect to a graph update.
N2  - Die Reparatur von Graphen, die Wiederherstellung der Konsistenz eines Graphen, spielt in mehreren Bereichen der Informatik und darüber hinaus eine herausragende Rolle: Beispielsweise wird in der modellgetriebenen Konstruktion die abstrakte Syntax von Modellen in der Regel mithilfe von Graphen kodiert.
Flexible Bearbeitungsvorgänge erstellen vorübergehend inkonsistente Diagramme, die kein gültiges Modell darstellen, und erfordern daher eine Reparatur des Diagramms.
Auf ähnliche Weise können Aktualisierungen in Graphendatenbanken - die das Speichern und Bearbeiten von Graphendaten verwalten - dazu führen, dass eine bestimmte Datenbank einige Integritätsbeschränkungen nicht erfüllt und auch eine Graphreparatur erforderlich macht.

Wir präsentieren einen logikbasierten inkrementellen Ansatz für die Graphreparatur, der eine solide und vollständige (nach Beendigung) Übersicht über die am wenigsten verändernden Reparaturen erstellt.
In unserem Kontext formalisieren wir die Konsistenz mittels sogenannten Graphbedingungen die der Logik erster Ordnung in Graphen entsprechen.
Wir stellen zwei Arten von Reparaturalgorithmen vor: Die zustandsbasierte Reparatur stellt die Konsistenz unabhängig vom Verlauf der Graphänderung wieder her, während die deltabasierte (oder inkrementelle) Reparatur diesen Verlauf explizit berücksichtigt.
Technisch stützen sich unsere Algorithmen auf einen vorhandenen Modellgenerierungsalgorithmus für in AutoGraph implementierte Graphbedingungen.
Darüber hinaus verwendet der deltabasierte Ansatz das neue Konzept der Erfüllungsbäume (STs) zum Kodieren, ob und wie ein Graph eine Graphbedingung erfüllt.
Wir zeigen dann, wie diese STs in Bezug auf eine Graphaktualisierung inkrementell manipuliert werden.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 126 
KW  - nested graph conditions
KW  - graph repair
KW  - model repair
KW  - consistency restoration
KW  - verschachtelte Graphbedingungen
KW  - Graphreparatur
KW  - Modellreparatur
KW  - Konsistenzrestauration
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427517
SN  - 978-3-86956-462-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 126
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Schlosser, Rainer
A1  - Walther, Carsten
A1  - Boissier, Martin
A1  - Uflacker, Matthias
T1  - Automated repricing and ordering strategies in competitive markets
JF  - AI communications : AICOM ; the European journal on artificial intelligence
N2  - Merchants on modern e-commerce platforms face a highly competitive environment. They compete against each other using automated dynamic pricing and ordering strategies. Successfully managing both inventory levels as well as offer prices is a challenging task as (i) demand is uncertain, (ii) competitors strategically interact, and (iii) optimized pricing and ordering decisions are mutually dependent. We show how to derive optimized data-driven pricing and ordering strategies which are based on demand learning techniques and efficient dynamic optimization models. We verify the superior performance of our self-adaptive strategies by comparing them to different rule-based as well as data-driven strategies in duopoly and oligopoly settings. Further, to study and to optimize joint dynamic ordering and pricing strategies on online marketplaces, we built an interactive simulation platform. To be both flexible and scalable, the platform has a microservice-based architecture and allows handling dozens of competing merchants and streams of consumers with configurable characteristics.
KW  - Dynamic pricing
KW  - inventory management
KW  - demand learning
KW  - oligopoly competition
KW  - e-commerce
Y1  - 2019
U6  - https://doi.org/10.3233/AIC-180603
SN  - 0921-7126
SN  - 1875-8452
VL  - 32
IS  - 1
SP  - 15
EP  - 29
PB  - IOS Press
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Schlosser, Rainer
A1  - Richly, Keven
T1  - Dynamic pricing under competition with data-driven price anticipations and endogenous reference price effects
JF  - Journal of revenue and pricing management
N2  - Online markets have become highly dynamic and competitive. Many sellers use automated data-driven strategies to estimate demand and to update prices frequently. Further, notification services offered by marketplaces allow to continuously track markets and to react to competitors’ price adjustments instantaneously. To derive successful automated repricing strategies is challenging as competitors’ strategies are typically not known. In this paper, we analyze automated repricing strategies with data-driven price anticipations under duopoly competition. In addition, we account for reference price effects in demand, which are affected by the price adjustments of both competitors. We show how to derive optimized self-adaptive pricing strategies that anticipate price reactions of the competitor and take the evolution of the reference price into account. We verify that the results of our adaptive learning strategy tend to optimal solutions, which can be derived for scenarios with full information. Finally, we analyze the case in which our learning strategy is played against itself. We find that our self-adaptive strategies can be used to approximate equilibria in mixed strategies.
KW  - Dynamic pricing competition
KW  - Data-driven price anticipation
KW  - e-Commerce
KW  - Dynamic programming
KW  - Response strategies
Y1  - 2019
U6  - https://doi.org/10.1057/s41272-019-00206-5
SN  - 1476-6930
SN  - 1477-657X
VL  - 18
IS  - 6
SP  - 451
EP  - 464
PB  - Palgrave Macmillan
CY  - Basingstoke
ER  - 
TY  - GEN
A1  - Schlosser, Rainer
A1  - Kossmann, Jan
A1  - Boissier, Martin
T1  - Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies
T2  - 2019 IEEE 35th International Conference on Data Engineering (ICDE)
N2  - An efficient selection of indexes is indispensable for database performance. For large problem instances with hundreds of tables, existing approaches are not suitable: They either exhibit prohibitive runtimes or yield far from optimal index configurations by strongly limiting the set of index candidates or not handling index interaction explicitly. We introduce a novel recursive strategy that does not exclude index candidates in advance and effectively accounts for index interaction. Using large real-world workloads, we demonstrate the applicability of our approach. Further, we evaluate our solution end to end with a commercial database system using a reproducible setup. We show that our solutions are near-optimal for small index selection problems. For larger problems, our strategy outperforms state-of-the-art approaches in both scalability and solution quality.
Y1  - 2019
SN  - 978-1-5386-7474-1
U6  - https://doi.org/10.1109/ICDE.2019.00113
SN  - 1084-4627
SP  - 1238
EP  - 1249
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Schlosser, Rainer
A1  - Chenavaz, Régis Y.
A1  - Dimitrov, Stanko
T1  - Circular economy
BT  - joint dynamic pricing and recycling investments
JF  - International journal of production economics
N2  - In a circular economy, the use of recycled resources in production is a key performance indicator for management. Yet, academic studies are still unable to inform managers on appropriate recycling and pricing policies. We develop an optimal control model integrating a firm's recycling rate, which can use both virgin and recycled resources in the production process. Our model accounts for recycling influence both at the supply- and demandsides. The positive effect of a firm's use of recycled resources diminishes over time but may increase through investments. Using general formulations for demand and cost, we analytically examine joint dynamic pricing and recycling investment policies in order to determine their optimal interplay over time. We provide numerical experiments to assess the existence of a steady-state and to calculate sensitivity analyses with respect to various model parameters. The analysis shows how to dynamically adapt jointly optimized controls to reach sustainability in the production process. Our results pave the way to sounder sustainable practices for firms operating within a circular economy.
KW  - Dynamic pricing
KW  - Recycling investments
KW  - Optimal control
KW  - General demand function
KW  - Circular economy
Y1  - 2021
U6  - https://doi.org/10.1016/j.ijpe.2021.108117
SN  - 0925-5273
SN  - 1873-7579
VL  - 236
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Schlosser, Rainer
A1  - Boissier, Martin
T1  - Dealing with the dimensionality curse in dynamic pricing competition
BT  - Using frequent repricing to compensate imperfect market anticipations
JF  - Computers & Operations Research
N2  - Most sales applications are characterized by competition and limited demand information. For successful pricing strategies, frequent price adjustments as well as anticipation of market dynamics are crucial. Both effects are challenging as competitive markets are complex and computations of optimized pricing adjustments can be time-consuming. We analyze stochastic dynamic pricing models under oligopoly competition for the sale of perishable goods. To circumvent the curse of dimensionality, we propose a heuristic approach to efficiently compute price adjustments. To demonstrate our strategy’s applicability even if the number of competitors is large and their strategies are unknown, we consider different competitive settings in which competitors frequently and strategically adjust their prices. For all settings, we verify that our heuristic strategy yields promising results. We compare the performance of our heuristic against upper bounds, which are obtained by optimal strategies that take advantage of perfect price anticipations. We find that price adjustment frequencies can have a larger impact on expected profits than price anticipations. Finally, our approach has been applied on Amazon for the sale of used books. We have used a seller’s historical market data to calibrate our model. Sales results show that our data-driven strategy outperforms the rule-based strategy of an experienced seller by a profit increase of more than 20%.
KW  - Dynamic pricing
KW  - Oligopoly competition
KW  - Dynamic programming
KW  - Data-driven strategies
KW  - E-commerce
Y1  - 2018
U6  - https://doi.org/10.1016/j.cor.2018.07.011
SN  - 0305-0548
SN  - 1873-765X
VL  - 100
SP  - 26
EP  - 42
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Schlosser, Rainer
T1  - Stochastic dynamic pricing and advertising in isoelastic oligopoly models
JF  - European Journal of Operational Research
N2  - In this paper, we analyze stochastic dynamic pricing and advertising differential games in special oligopoly markets with constant price and advertising elasticity. We consider the sale of perishable as well as durable goods and include adoption effects in the demand. Based on a unique stochastic feedback Nash equilibrium, we derive closed-form solution formulas of the value functions and the optimal feedback policies of all competing firms. Efficient simulation techniques are used to evaluate optimally controlled sales processes over time. This way, the evolution of optimal controls as well as the firms’ profit distributions are analyzed. Moreover, we are able to compare feedback solutions of the stochastic model with its deterministic counterpart. We show that the market power of the competing firms is exactly the same as in the deterministic version of the model. Further, we discover two fundamental effects that determine the relation between both models. First, the volatility in demand results in a decline of expected profits compared to the deterministic model. Second, we find that saturation effects in demand have an opposite character. We show that the second effect can be strong enough to either exactly balance or even overcompensate the first one. As a result we are able to identify cases in which feedback solutions of the deterministic model provide useful approximations of solutions of the stochastic model.
KW  - Pricing
KW  - Advertising
KW  - Stochastic differential games
KW  - Oligopoly competition
KW  - Adoption effects
Y1  - 2017
U6  - https://doi.org/10.1016/j.ejor.2016.11.021
SN  - 0377-2217
SN  - 1872-6860
VL  - 259
SP  - 1144
EP  - 1155
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - THES
A1  - Schirneck, Friedrich Martin
T1  - Enumeration algorithms in data profiling
N2  - Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). 

We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas. 

In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for
a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal. 

We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time.

To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.
N2  - Data Profiling ist die Erhebung von Metadaten über relationale Datenbanken. Eine wichtige Klasse von Metadaten sind Abhängigkeiten zwischen verschiedenen Spalten. Für diese gibt es zwei wesentliche algorithmische Probleme. Beim Detektionsproblem soll entschieden werden, ob eine Datenbank eine Abhängigkeit eines bestimmt Typs und Größe aufweist; beim Entdeckungsproblem müssen dagegen alle gültigen Abhängigkeiten aufgezählt werden. Wir behandeln beide Probleme für drei Typen von Abhängigkeiten: eindeutige Spaltenkombinationen (UCCs), funktionale Abhängigkeiten (FDs) und Inklusionsabhängigkeiten (INDs).

Wir untersuchen zunächst deren parametrisierte Komplexität und beweisen, dass die Detektion von UCCs und FDs W[2]-vollständig ist, wobei die Größe der Abhängigkeit als Parameter dient. Ferner identifizieren wir die Detektion von INDs als eines der ersten natürlichen W[3]-vollständigen Probleme. Danach klären wir die Aufzählungskomplexität der drei Entdeckungsprobleme, indem wir lösungserhaltende Äquivalenzen zu bekannten Aufzählungsproblemen konstruieren. Die Entdeckung von UCCs zeigt sich dabei als äquivalent zum berühmten Transversal-Hypergraph-Problem, bei dem die Hitting Sets eines Hypergraphens aufzuzählen sind. Die Entdeckung von FDs ist äquivalent zum simultanen Aufzählen der Hitting Sets mehrerer Hypergraphen und INDs sind äquivalent zu den erfüllenden Belegungen antimonotoner, 3-normalisierter boolescher Formeln.

Anschließend beschäftigen wir uns mit dem Entwurf und der Analyse von Entdeckungsalgorithmen für eindeutige Spaltenkombinationen. Es ist unbekannt, ob alle UCCs einer Datenbank in worst-case ausgabepolynomieller Zeit berechnet werden können, da dies genauso schwer ist wie das allgemeine Transversal-Hypergraph-Problem. Wir konzentrieren uns daher bei der Analyse auf Instanzen, die strukturelle Ähnlichkeiten mit Datenbanken aus der Praxis aufweisen; insbesondere solche, deren Lösungen sehr klein sind. Die Äquivalenz zwischen UCCs und Hitting Sets überträgt zwar die algorithmische Schwere, erlaubt es uns aber auch Konzepte aus der Theorie von Hypergraphen auf das Data Profiling anzuwenden. Wir entwickeln daraus einen Entdeckungsalgorithmus, dessen Berechnungen auf beliebigen Eingaben nur polynomiellen Platz benötigen. Ist zusätzlich die Maximalgröße der minimalen UCCs durch eine Konstante beschränkt, so hat der Algorithmus außerdem polynomiell beschränkten Delay. Der zentrale Baustein unseres Ansatzes ist das Erweiterbarkeitsproblem für minimale Hitting Sets, das heißt, die Entscheidung, ob eine gegebene Knotenmenge in einer minimalen Lösung vorkommt. Wir zeigen, dass dies, mit der Größe der Knotenmenge als Parameter, ein weiteres natürliches Problem ist, welches vollständig für die Komplexitätsklasse W[3] ist. Außerdem beweisen wir bedingte untere Laufzeitschranken unter der Annahme gängiger Schwere-Vermutungen wie der Starken Exponentialzeithypothese (SETH). Dies belegt, dass die Laufzeit unseres Algorithmus für das Erweiterbarkeitsproblem beinahe optimal ist.

Eine empirische Untersuchung unseres Entdeckungsalgorithmus auf realen Daten bestätigt, dass die Hitting-Set-Perspektive auch praktische Vorteile für das Data Profiling hat. So sind die Berechnungzeiten für das Finden der UCCs bereits sehr schnell und der Speicherverbrauch unseres Ansatzes ist deutlich geringer als der existierender Methoden. Die Untersuchung zeigt auch zwei interessante Verbindungen zwischen Datenbanken und ihren zugehörigen Hypergraphen: Einerseits sind die Hypergraphen, die alle relevanten Informationen enthalten, meist viel kleiner als die Eingabe-Datenbanken, andererseits ist die Berechnung dieser Hypergraphen die eigentliche Engstelle in der Praxis. Sie nimmt in der Regel viel mehr Zeit in Anspruch, als das Aufzählen aller Lösungen. Dies steht im deutlichen Gegensatz zu den bekannten theoretischen Resultaten, die besagen, dass die Hypergraph-Vorberechnung polynomiell ist, während der Aufzählungsschritt exponentielle Zeit benötigen kann.

Um die erste Beobachtung zu formalisieren, führen wir ein Maximum-Entropie-Modell für nicht-uniforme Hypergraphen ein und zeigen, dass die erwartete Anzahl ihrer minimalen Hyperkanten einen Phasenübergang druchläuft. Unsere Ergebnisse erklären auch warum größere Datenbanken mitunter kleinere Hypergraphen haben. Die zweite Beobachtung inspiriert uns zu einen Entdeckungsalgorithmus neuer Art, „Hitting Set Enumeration with Partial Information and Validation“ (HPIValid). Dieser nutzt die schnellen Aufzählungszeiten auf praktischen Daten aus, um die langwierige Berechnung des zu Grunde liegenden Hypergraphens zu beschleunigen. Dadurch umgehen wir die Engstelle und können gleichzeitig die Vorteile der Hitting-Set-Perspektive beibehalten. Eine ausgiebige empirische Analyse zeigt, dass HPIValid den aktuellen Stand der Technik im Bereich der UCC-Entdeckung deutlich übertrifft. HPIValid kann Datenbanken verarbeiten, für die Data Profiling zuvor unmöglich war.
T2  - Aufzählungsalgorithmen für das Data Profiling
KW  - Chernoff-Hoeffding theorem
KW  - data profiling
KW  - enumeration algorithms
KW  - hitting sets
KW  - PhD thesis
KW  - transversal hypergraph
KW  - unique column combinations
KW  - Satz von Chernoff-Hoeffding
KW  - Dissertation
KW  - Data Profiling
KW  - Aufzählungsalgorithmen
KW  - Hitting Sets
KW  - Transversal-Hypergraph
KW  - eindeutige Spaltenkombination
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-556726
ER  - 
TY  - JOUR
A1  - Scheibel, Willy
A1  - Trapp, Matthias
A1  - Limberger, Daniel
A1  - Döllner, Jürgen Roland Friedrich
T1  - A taxonomy of treemap visualization techniques
JF  - Science and Technology Publications
N2  - A treemap is a visualization that has been specifically designed to facilitate the exploration of tree-structured data and, more general, hierarchically structured data. The family of visualization techniques that use a visual metaphor for parent-child relationships based “on the property of containment” (Johnson, 1993) is commonly referred to as treemaps. However, as the number of variations of treemaps grows, it becomes increasingly important to distinguish clearly between techniques and their specific characteristics. This paper proposes to discern between Space-filling Treemap TS, Containment Treemap TC, Implicit Edge Representation Tree TIE, and Mapped Tree TMT for classification of hierarchy visualization techniques and highlights their respective properties. This taxonomy is created as a hyponymy, i.e., its classes have an is-a relationship to one another: TS TC TIE TMT. With this proposal, we intend to stimulate a discussion on a more unambiguous classification of treemaps and, furthermore, broaden what is understood by the concept of treemap itself.
KW  - Treemaps
KW  - Taxonomy
Y1  - 2020
PB  - Springer
CY  - Berlin
ER  - 
TY  - GEN
A1  - Scheibel, Willy
A1  - Trapp, Matthias
A1  - Limberger, Daniel
A1  - Döllner, Jürgen Roland Friedrich
T1  - A taxonomy of treemap visualization techniques
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - A treemap is a visualization that has been specifically designed to facilitate the exploration of tree-structured data and, more general, hierarchically structured data. The family of visualization techniques that use a visual metaphor for parent-child relationships based “on the property of containment” (Johnson, 1993) is commonly referred to as treemaps. However, as the number of variations of treemaps grows, it becomes increasingly important to distinguish clearly between techniques and their specific characteristics. This paper proposes to discern between Space-filling Treemap TS, Containment Treemap TC, Implicit Edge Representation Tree TIE, and Mapped Tree TMT for classification of hierarchy visualization techniques and highlights their respective properties. This taxonomy is created as a hyponymy, i.e., its classes have an is-a relationship to one another: TS TC TIE TMT. With this proposal, we intend to stimulate a discussion on a more unambiguous classification of treemaps and, furthermore, broaden what is understood by the concept of treemap itself.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 8 
KW  - treemaps
KW  - taxonomy
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-524693
IS  - 8
ER  - 
TY  - BOOK
A1  - Scheer, August-Wilhelm
T1  - Was macht das Hasso-Plattner-Institut für Digital Engineering zu einer Besonderheit?
T1  - What makes the Hasso Plattner Institute for Digital Engineering special?
BT  - Festrede zum Anlass des 20-jährigen Bestehens des Hasso-Plattner-Instituts
BT  - speech on the occasion of the 20th anniversary of Hasso Plattner Institute
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 131 
KW  - Hasso-Plattner-Institut
KW  - Digital Engineering
KW  - Innovation
KW  - Design Thinking
KW  - In-Memory
KW  - Hasso Plattner Institute
KW  - Digital Engineering
KW  - innovation
KW  - Design Thinking
KW  - In-Memory
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-439232
SN  - 978-3-86956-481-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 131
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Sapegin, Andrey
T1  - High-Speed Security Log Analytics Using Hybrid Outlier Detection
N2  - The rapid development and integration of Information Technologies over the last decades influenced all areas of our life, including the business world. Yet not only the modern enterprises become digitalised, but also security and criminal threats move into the digital sphere. To withstand these threats, modern companies must be aware of all activities within their computer networks.
The keystone for such continuous security monitoring is a Security Information and Event Management (SIEM) system that collects and processes all security-related log messages from the entire enterprise network. However, digital transformations and technologies, such as network virtualisation and widespread usage of mobile communications, lead to a constantly increasing number of monitored devices and systems. As a result, the amount of data that has to be processed by a SIEM system is increasing rapidly. Besides that, in-depth security analysis of the captured data requires the application of rather sophisticated outlier detection algorithms that have a high computational complexity. Existing outlier detection methods often suffer from performance issues and are not directly applicable for high-speed and high-volume analysis of heterogeneous security-related events, which becomes a major challenge for modern SIEM systems nowadays.
This thesis provides a number of solutions for the mentioned challenges. First, it proposes a new SIEM system architecture for high-speed processing of security events, implementing parallel, in-memory and in-database processing principles. The proposed architecture also utilises the most efficient log format for high-speed data normalisation. Next, the thesis offers several novel high-speed outlier detection methods, including generic Hybrid Outlier Detection that can efficiently be used for Big Data analysis. Finally, the special User Behaviour Outlier Detection is proposed for better threat detection and analysis of particular user behaviour cases.
The proposed architecture and methods were evaluated in terms of both performance and accuracy, as well as compared with classical architecture and existing algorithms. These evaluations were performed on multiple data sets, including simulated data, well-known public intrusion detection data set, and real data from the large multinational enterprise. The evaluation results have proved the high performance and efficacy of the developed methods.
All concepts proposed in this thesis were integrated into the prototype of the SIEM system, capable of high-speed analysis of Big Security Data, which makes this integrated SIEM platform highly relevant for modern enterprise security applications.
N2  - In den letzten Jahrzehnten hat die schnelle Weiterentwicklung und Integration der Informationstechnologien alle Bereich unseres Lebens beeinflusst, nicht zuletzt auch die Geschäftswelt. Aus der zunehmenden Digitalisierung des modernen Unternehmens ergeben sich jedoch auch neue digitale Sicherheitsrisiken und kriminelle Bedrohungen. Um sich vor diesen Bedrohungen zu schützen, muss das digitale Unternehmen alle Aktivitäten innerhalb seines Firmennetzes verfolgen.
Der Schlüssel zur kontinuierlichen Überwachung aller sicherheitsrelevanten Informationen ist ein sogenanntes Security Information und Event Management (SIEM) System, das alle Meldungen innerhalb des Firmennetzwerks zentral sammelt und verarbeitet. Jedoch führt die digitale Transformation der Unternehmen sowie neue Technologien, wie die Netzwerkvirtualisierung und mobile Endgeräte, zu einer konstant steigenden Anzahl zu überwachender Geräte und Systeme. Dies wiederum hat ein kontinuierliches Wachstum der Datenmengen zur Folge, die das SIEM System verarbeiten muss. Innerhalb eines möglichst kurzen Zeitraumes muss somit eine sehr große Datenmenge (Big Data) analysiert werden, um auf Bedrohungen zeitnah reagieren zu können. Eine gründliche Analyse der sicherheitsrelevanten Aspekte der aufgezeichneten Daten erfordert den Einsatz fortgeschrittener Algorithmen der Anomalieerkennung, die eine hohe Rechenkomplexität aufweisen. Existierende Methoden der Anomalieerkennung haben oftmals Geschwindigkeitsprobleme und sind deswegen nicht anwendbar für die sehr schnelle Analyse sehr großer Mengen heterogener sicherheitsrelevanter Ereignisse.
Diese Arbeit schlägt eine Reihe möglicher Lösungen für die benannten Herausforderungen vor. Zunächst wird eine neuartige SIEM Architektur vorgeschlagen, die es erlaubt Ereignisse mit sehr hoher Geschwindigkeit zu verarbeiten. Das System basiert auf den Prinzipien der parallelen Programmierung, sowie der In-Memory und In-Database Datenverarbeitung. Die vorgeschlagene Architektur verwendet außerdem das effizienteste Datenformat zur Vereinheitlichung der Daten in sehr hoher Geschwindigkeit. Des Weiteren wurden im Rahmen dieser Arbeit mehrere neuartige Hochgeschwindigkeitsverfahren zur Anomalieerkennung entwickelt. Eines ist die Hybride Anomalieerkennung (Hybrid Outlier Detection), die sehr effizient auf Big Data eingesetzt werden kann. Abschließend wird eine spezifische Anomalieerkennung für Nutzerverhaltens (User Behaviour Outlier Detection) vorgeschlagen, die eine verbesserte Bedrohungsanalyse von spezifischen Verhaltensmustern der Benutzer erlaubt.
Die entwickelte Systemarchitektur und die Algorithmen wurden sowohl mit Hinblick auf Geschwindigkeit, als auch Genauigkeit evaluiert und mit traditionellen Architekturen und existierenden Algorithmen verglichen. Die Evaluation wurde auf mehreren Datensätzen durchgeführt, unter anderem simulierten Daten, gut erforschten öffentlichen Datensätzen und echten Daten großer internationaler Konzerne. Die Resultate der Evaluation belegen die Geschwindigkeit und Effizienz der entwickelten Methoden.
Alle Konzepte dieser Arbeit wurden in den Prototyp des SIEM Systems integriert, das in der Lage ist Big Security Data mit sehr hoher Geschwindigkeit zu analysieren. Dies zeigt das diese integrierte SIEM Plattform eine hohe praktische Relevanz für moderne Sicherheitsanwendungen besitzt.
T2  - Sicherheitsanalyse in Hochgeschwindigkeit mithilfe der Hybride Anomalieerkennung
KW  - intrusion detection
KW  - security
KW  - machine learning
KW  - anomaly detection
KW  - outlier detection
KW  - novelty detection
KW  - in-memory
KW  - SIEM
KW  - IDS
KW  - Angriffserkennung
KW  - Sicherheit
KW  - Machinelles Lernen
KW  - Anomalieerkennung
KW  - In-Memory
KW  - SIEM
KW  - IDS
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-426118
ER  - 
TY  - THES
A1  - Santuber, Joaquin
T1  - Designing for digital justice
T1  - Designing for Digital Justice
T1  - Diseñar para la justicia digital
BT  - an entanglement of people, law, and technologies in Chilean courts
BT  - eine Verflechtung von Menschen, Recht und Technologien in chilenischen Gerichten
BT  - una maraña de personas, leyes y tecnologías en los tribunales chilenos
N2  - At the beginning of 2020, with COVID-19, courts of justice worldwide had to move online to continue providing judicial service. Digital technologies materialized the court practices in ways unthinkable shortly before the pandemic creating resonances with judicial and legal regulation, as well as frictions. A better understanding of the dynamics at play in the digitalization of courts is paramount for designing justice systems that serve their users better, ensure fair and timely dispute resolutions, and foster access to justice. Building on three major bodies of literature —e-justice, digitalization and organization studies, and design research— Designing for Digital Justice takes a nuanced approach to account for human and more-than-human agencies. 
Using a qualitative approach, I have studied in depth the digitalization of Chilean courts during the pandemic, specifically between April 2020 and September 2022. Leveraging a comprehensive source of primary and secondary data, I traced back the genealogy of the novel materializations of courts’ practices structured by the possibilities offered by digital technologies. In five (5) cases studies, I show in detail how the courts got to 1) work remotely, 2) host hearings via videoconference, 3) engage with users via social media (i.e., Facebook and Chat Messenger), 4) broadcast a show with judges answering questions from users via Facebook Live, and 5) record, stream, and upload judicial hearings to YouTube to fulfil the publicity requirement of criminal hearings. The digitalization of courts during the pandemic is characterized by a suspended normativity, which makes innovation possible yet presents risks. While digital technologies enabled the judiciary to provide services continuously, they also created the risk of displacing traditional judicial and legal regulation. 
Contributing to liminal innovation and digitalization research, Designing for Digital Justice theorizes four phases: 1) the pre-digitalization phase resulting in the development of regulation, 2) the hotspot of digitalization resulting in the extension of regulation, 3) the digital innovation redeveloping regulation (moving to a new, preliminary phase), and 4) the permanence of temporal practices displacing regulation. Contributing to design research Designing for Digital Justice provides new possibilities for innovation in the courts, focusing at different levels to better address tensions generated by digitalization. Fellow researchers will find in these pages a sound theoretical advancement at the intersection of digitalization and justice with novel methodological references. Practitioners will benefit from the actionable governance framework Designing for Digital Justice Model, which provides three fields of possibilities for action to design better justice systems. Only by taking into account digital, legal, and social factors can we design better systems that promote access to justice, the rule of law, and, ultimately social peace.
N2  - Durch COVID-19 mussten zu Beginn des Jahres 2020 die Gerichte weltweit, um ihren Dienst fortzusetzen, Onlinekommunikation und digitale Technologien nutzen. Die digitalen Technologien haben die Gerichtspraktiken in einer Weise verändert, die kurz vor der Pandemie noch undenkbar war, was zu Resonanzen mit der Rechtsprechung und der gesetzlichen Regelung sowie zu Reibungen führte. Ein besseres Verständnis der Dynamik, die bei der Digitalisierung von Gerichten im Spiel ist, ist von entscheidender Bedeutung für die Gestaltung von Justizsystemen, die ihren Nutzern besser dienen, faire und zeitnahe Streitbeilegung gewährleisten und den Zugang zur Justiz und zur Rechtsstaatlichkeit fördern. Aufbauend auf den drei großen Themenkomplexen E-Justiz, Digitalisierung und Organisationen sowie Designforschung verfolgt „Designing for Digital Justice“ einen nuancierten Ansatz, um menschliche und nicht-menschliche Akteure zu berücksichtigen.

Mit Hilfe eines qualitativen Forschungsansatzes habe ich die Digitalisierung der chilenischen Gerichte während der Pandemie, insbesondere im Zeitraum von April 2020 und September 2022, eingehend untersucht. Auf der Grundlage einer umfassenden Quelle von Primär- und Sekundärdaten habe ich die Genealogie der neuartigen Materialisierung von Gerichtspraktiken zurückverfolgt, die durch die Möglichkeiten der digitalen Technologien strukturiert wurden. In fünf (5) Fallstudien zeige ich im Detail, wie die Gerichte 1) aus der Ferne arbeiten, 2) Anhörungen per Videokonferenz abhalten, 3) mit Nutzern über soziale Medien (beispielsweise Facebook und Chat Messenger) in Kontakt treten, 4) eine Sendung mit Richtern, die Fragen von Nutzern beantworten, über Facebook Live ausstrahlen und 5) Gerichtsverhandlungen aufzeichnen, streamen und auf YouTube hochladen, um die Anforderungen an die Öffentlichkeit von Strafverhandlungen zu erfüllen. Hierbei zeigt sich, dass digitale Technologien der Justiz zwar eine kontinuierliche Bereitstellung von Dienstleistungen ermöglichten. Sie bergen aber auch die Gefahr, dass sie die traditionelle gerichtliche und rechtliche Regulierung verdrängen.

Als Beitrag zum Forschungsstrom zu „Liminal Innovation“ und Digitalisierung theoretisiert „Designing for Digital Justice“ vier Phasen: 1) Vor-Digitalisierung, die zur Entwicklung von Regulierung führt, 2) der Hotspot der Digitalisierung, der zur Ausweitung der Regulierung führt, 3) digitale Innovation, die die Regulierung neu entwickelt (Übergang zu einer neuen, provisorischen Phase) und 4) die Permanenz der temporären Praktiken, die die Regulierung verdrängt. Als Beitrag zur Designforschung bietet „Designing for Digital Justice“ neue Möglichkeiten für die Gestaltung von Justizsystemen, indem es Spannungen und Interventionsebenen miteinander verbindet. Forscherkolleg*innen finden auf diesen Seiten eine fundierte theoretische Weiterentwicklung an der Schnittstelle von Digitalisierung und Gerechtigkeit sowie neue methodische Hinweise. Praktiker sollen von dem Handlungsrahmen „Designing for Digital Justice Model“ profitieren, der drei Handlungsfelder für die Gestaltung besserer Justizsysteme bietet. Nur wenn wir die digitalen, rechtlichen und sozialen Akteure berücksichtigen, können wir bessere Systeme entwerfen, die sich für den Zugang zur Justiz, die Rechtsstaatlichkeit und letztlich den sozialen Frieden einsetzen.
N2  - A principios de 2020, con la COVID-19, los tribunales de justicia de todo el mundo tuvieron que ponerse en línea para continuar con el servicio. Las tecnologías digitales materializaron las prácticas de los tribunales de formas impensables poco antes de la pandemia, creando resonancias con la regulación judicial y legal, así como fricciones. Comprender mejor las dinámicas en juego en la digitalización de los tribunales es primordial para diseñar sistemas de justicia que sirvan mejor a sus usuarios, garanticen una resolución de conflictos justa y oportuna y fomenten el acceso a la justicia. Sobre la base de tres grandes temas en la literatura -justicia electrónica, digitalización y organizaciones, e investigación del diseño-, Designing for Digital Justice adopta un enfoque matizado para tener en cuenta los organismos humanos y más que humanos.

Utilizando un enfoque cualitativo, he estudiado en profundidad la digitalización de los tribunales chilenos durante la pandemia, concretamente entre abril de 2020 y septiembre de 2022. Aprovechando una amplia fuente de datos primarios y secundarios, he rastreado la genealogía de las nuevas materializaciones de las prácticas de los tribunales estructuradas por las posibilidades que ofrecen las tecnologías digitales. En cinco (5) estudios de caso, muestro en detalle cómo los tribunales llegaron a 1) trabajar a distancia, 2) celebrar audiencias por videoconferencia, 3) relacionarse con los usuarios a través de las redes sociales (es decir, Facebook y Chat Messenger), 4) emitir un espectáculo con jueces que responden a las preguntas de los usuarios a través de Facebook Live, y 5) grabar, transmitir y subir las audiencias judiciales a YouTube para cumplir con el requisito de publicidad de las audiencias penales. La digitalización de los tribunales durante la pandemia se caracteriza por una normatividad suspendida, que posibilita la innovación, pero presenta riesgos. Si bien las tecnologías digitales permitieron al poder judicial prestar servicios de forma continua, también crearon el riesgo de desplazar la normativa judicial y legal tradicional.

Contribuyendo a la teoría de la innovación liminar y digitalización, Designing for Digital Justice teoriza cuatro fases: 1) la fase de pre-digitalización que da lugar al desarrollo de la regulación, 2) el hotspot de digitalización que da lugar a la ampliación de la regulación, 3) la innovación liminal que vuelve a desarrollar la regulación (pasando a una nueva fase preliminar), y 4) la permanencia de prácticas temporales que desplaza la regulación. Contribuyendo a la investigación sobre el diseño, Designing for Digital Justice ofrece nuevas posibilidades de intervención para el diseño de la justicia, conectando las tensiones y los niveles para intervenir en ellos. Los colegas investigadores encontrarán en estas páginas un sólido avance teórico en la intersección de la digitalización y la justicia y novedosas referencias metodológicas. Los profesionales se beneficiarán del marco de gobernanza Designing for Digital Justice Model, que ofrece tres campos de posibilidades de actuación para diseñar mejores sistemas de justicia. Sólo teniendo en cuenta las agencias digitales, jurídicas y sociales podremos diseñar mejores sistemas que se comprometan con el acceso a la justicia, el Estado de Derecho y, en última instancia, la paz social.
KW  - digitalisation
KW  - courts of justice
KW  - COVID-19
KW  - Chile
KW  - online courts
KW  - design
KW  - law
KW  - organization studies
KW  - innovation
KW  - COVID-19
KW  - Chile
KW  - Gerichtsbarkeit
KW  - Design
KW  - Digitalisierung
KW  - Innovation
KW  - Recht
KW  - Online-Gerichte
KW  - Organisationsstudien
KW  - COVID-19
KW  - Chile
KW  - tribunales de justicia
KW  - diseño
KW  - digitalización
KW  - innovación
KW  - Derecho
KW  - tribunales en línea
KW  - estudios de organización
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604178
ER  - 
TY  - THES
A1  - Sakizloglou, Lucas
T1  - Evaluating temporal queries over history-aware architectural runtime models
T1  - Ausführung temporaler Anfragen über geschichtsbewusste Architektur-Laufzeitmodelle
N2  - In model-driven engineering, the adaptation of large software systems with dynamic structure is enabled by architectural runtime models. Such a model represents an abstract state of the system as a graph of interacting components. Every relevant change in the system is mirrored in the model and triggers an evaluation of model queries, which search the model for structural patterns that should be adapted. This thesis focuses on a type of runtime models where the expressiveness of the model and model queries is extended to capture past changes and their timing. These history-aware models and temporal queries enable more informed decision-making during adaptation, as they support the formulation of requirements on the evolution of the pattern that should be adapted. However, evaluating temporal queries during adaptation poses significant challenges. First, it implies the capability to specify and evaluate requirements on the structure, as well as the ordering and timing in which structural changes occur. Then, query answers have to reflect that the history-aware model represents the architecture of a system whose execution may be ongoing, and thus answers may depend on future changes. Finally, query evaluation needs to be adequately fast and memory-efficient despite the increasing size of the history---especially for models that are altered by numerous, rapid changes.

The thesis presents a query language and a querying approach for the specification and evaluation of temporal queries. These contributions aim to cope with the challenges of evaluating temporal queries at runtime, a prerequisite for history-aware architectural monitoring and adaptation which has not been systematically treated by prior model-based solutions. The distinguishing features of our contributions are: the specification of queries based on a temporal logic which encodes structural patterns as graphs; the provision of formally precise query answers which account for timing constraints and ongoing executions; the incremental evaluation which avoids the re-computation of query answers after each change; and the option to discard history that is no longer relevant to queries. The query evaluation searches the model for occurrences of a pattern whose evolution satisfies a temporal logic formula. Therefore, besides model-driven engineering, another related research community is runtime verification. The approach differs from prior logic-based runtime verification solutions by supporting the representation and querying of structure via graphs and graph queries, respectively, which is more efficient for queries with complex patterns. We present a prototypical implementation of the approach and measure its speed and memory consumption in monitoring and adaptation scenarios from two application domains, with executions of an increasing size. We assess scalability by a comparison to the state-of-the-art from both related research communities. The implementation yields promising results, which pave the way for sophisticated history-aware self-adaptation solutions and indicate that the approach constitutes a highly effective technique for runtime monitoring on an architectural level.
N2  - In der modellgetriebenen Entwicklung wird die Adaptation großer Softwaresysteme mit dynamischer Struktur durch Architektur-Laufzeitmodelle ermöglicht. Ein solches Modell stellt einen abstrakten Zustand des Systems als einen Graphen von interagierenden Komponenten dar. Jede relevante Änderung im System spiegelt sich im Modell wider und löst eine Ausführung von Modellanfragen aus, die das Modell nach zu adaptierenden Strukturmustern durchsuchen. Diese Arbeit konzentriert sich auf eine Art von Laufzeitmodellen, bei denen die Ausdruckskraft des Modells und der Modellanfragen erweitert wird, um vergangene Änderungen und deren Zeitpunkt zu erfassen. Diese geschichtsbewussten Modelle und temporalen Anfragen ermöglichen eine fundiertere Entscheidungsfindung während der Adaptation, da sie die Formulierung von Anforderungen an die Entwicklung des Musters, das adaptiert werden soll, unterstützen. Die Ausführung von temporalen Anfragen während der Adaptation stellt jedoch eine große Herausforderung dar. Zunächst müssen Anforderungen an die Struktur sowie an die Reihenfolge und den Zeitpunkt von Strukturänderungen spezifiziert und evaluiert werden. Weiterhin müssen die Antworten auf die Anfragen berücksichtigen, dass das geschichtsbewusste Modell die Architektur eines Systems darstellt, dessen Ausführung fortlaufend sein kann, sodass die Antworten von zukünftigen Änderungen abhängen können. Schließlich muss die Anfrageausführung trotz der zunehmenden Größe der Historie hinreichend schnell und speichereffizient sein---insbesondere bei Modellen, die durch zahlreiche, schnelle Änderungen verändert werden.

In dieser Arbeit werden eine Sprache für die Spezifikation von temporalen Anfragen sowie eine Technik für deren Ausführung vorgestellt. Diese Beiträge zielen darauf ab, die Herausforderungen bei der Ausführung temporaler Anfragen zur Laufzeit zu bewältigen---eine Voraussetzung für ein geschichtsbewusstes Architekturmonitoring und geschichtsbewusste Architekturadaptation, die von früheren modellbasierten Lösungen nicht systematisch behandelt wurde. Die besonderen Merkmale unserer Beiträge sind: die Spezifikation von Anfragen auf der Basis einer temporalen Logik, die strukturelle Muster als Graphen kodiert; die Bereitstellung formal präziser Anfrageantworten, die temporale Einschränkungen und laufende Ausführungen berücksichtigen; die inkrementelle Ausführung, die die Neuberechnung von Abfrageantworten nach jeder Änderung vermeidet; und die Option, Historie zu verwerfen, die für Abfragen nicht mehr relevant ist. Bei der Anfrageausführung wird das Modell nach dem Auftreten eines Musters durchsucht, dessen Entwicklung eine temporallogische Formel erfüllt. Neben der modellgetriebenen Entwicklung ist daher die Laufzeitverifikation ein weiteres verwandtes Forschungsgebiet. Der Ansatz unterscheidet sich von bisherigen logikbasierten Lösungen zur Laufzeitverifikation, indem er die Darstellung und Abfrage von Strukturen über Graphen bzw. Graphanfragen unterstützt, was bei Anfragen mit komplexen Mustern effizienter ist. Wir stellen eine prototypische Implementierung des Ansatzes vor und messen seine Laufzeit und seinen Speicherverbrauch in Monitoring- und Adaptationsszenarien aus zwei Anwendungsdomänen mit Ausführungen von zunehmender Größe. Wir bewerten die Skalierbarkeit durch einen Vergleich mit dem Stand der Technik aus beiden verwandten Forschungsgebieten. Die Implementierung liefert vielversprechende Ergebnisse, die den Weg für anspruchsvolle geschichtsbewusste Selbstadaptationslösungen ebnen und darauf hindeuten, dass der Ansatz eine effektive Technik für das Laufzeitmonitoring auf Architekturebene darstellt.
KW  - architectural adaptation
KW  - history-aware runtime models
KW  - incremental graph query evaluation
KW  - model-driven software engineering
KW  - temporal graph queries
KW  - Architekturadaptation
KW  - geschichtsbewusste Laufzeit-Modelle
KW  - inkrementelle Ausführung von Graphanfragen
KW  - modellgetriebene Softwaretechnik
KW  - temporale Graphanfragen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604396
ER  - 
TY  - GEN
A1  - Sahlmann, Kristina
A1  - Scheffler, Thomas
A1  - Schnor, Bettina
T1  - Ontology-driven Device Descriptions for IoT Network Management
T2  - 2018 Global Internet of Things Summit (GIoTS)
N2  - One particular challenge in the Internet of Things is the management of many heterogeneous things. The things are typically constrained devices with limited memory, power, network and processing capacity. Configuring every device manually is a tedious task. We propose an interoperable way to configure an IoT network automatically using existing standards. The proposed NETCONF-MQTT bridge intermediates between the constrained devices (speaking MQTT) and the network management standard NETCONF. The NETCONF-MQTT bridge generates dynamically YANG data models from the semantic description of the device capabilities based on the oneM2M ontology. We evaluate the approach for two use cases, i.e. describing an actuator and a sensor scenario.
KW  - Internet of Things
KW  - Interoperability
KW  - oneM2M
KW  - Ontology
KW  - Semantic Web
KW  - NETCONF
KW  - YANG
KW  - MQTT
Y1  - 2018
SN  - 978-1-5386-6451-3
U6  - https://doi.org/10.1109/GIOTS.2018.8534569
SP  - 295
EP  - 300
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Rüther, Ferenc Darius
A1  - Sebode, Marcial
A1  - Lohse, Ansgar W.
A1  - Wernicke, Sarah
A1  - Böttinger, Erwin
A1  - Casar, Christian
A1  - Braun, Felix
A1  - Schramm, Christoph
T1  - Mobile app requirements for patients with rare liver diseases
BT  - a single center survey for the ERN RARE-LIVER
JF  - Clinics and research in hepatology and gastroenterology
N2  - Background: 
More patient data are needed to improve research on rare liver diseases. Mobile health apps enable an exhaustive data collection. Therefore, the European Reference Network on Hepatological diseases (ERN RARE-LIVER) intends to implement an app for patients with rare liver diseases communicating with a patient registry, but little is known about which features patients and their healthcare providers regard as being useful. 

Aims: 
This study aimed to investigate how an app for rare liver diseases would be accepted, and to find out which features are considered useful. 

Methods: 
An anonymous survey was conducted on adult patients with rare liver diseases at a single academic, tertiary care outpatient-service. Additionally, medical experts of the ERN working group on autoimmune hepatitis were invited to participate in an online survey. 

Results: 
In total, the responses from 100 patients with autoimmune (n = 90) or other rare (n = 10) liver diseases and 32 experts were analyzed. Patients were convinced to use a disease specific app (80%) and expected some benefit to their health (78%) but responses differed signifi-cantly between younger and older patients (93% vs. 62%, p < 0.001; 88% vs. 64%, p < 0.01). Comparing patients' and experts' feedback, patients more often expected a simplified healthcare pathway (e.g. 89% vs. 59% (p < 0.001) wanted access to one's own medical records), while healthcare providers saw the benefit mainly in improving compliance and treatment outcome (e.g. 93% vs. 31% (p < 0.001) and 70% vs. 21% (p < 0.001) expected the app to reduce mistakes in taking medication and improve quality of life, respectively).
KW  - Primary sclerosing cholangitis
KW  - Primary biliary cholangitis
KW  - Autoimmune
KW  - hepatitis
KW  - European reference networks
KW  - Mobile applications
KW  - Patient
KW  - reported out-come measures
Y1  - 2021
U6  - https://doi.org/10.1016/j.clinre.2021.101760
SN  - 2210-7401
SN  - 2210-741X
VL  - 45
IS  - 6
PB  - Elsevier Masson
CY  - Amsterdam
ER  - 
TY  - THES
A1  - Roumen, Thijs
T1  - Portable models for laser cutting
N2  - Laser cutting is a fast and precise fabrication process. This makes laser cutting a powerful process in custom industrial production. Since the patents on the original technology started to expire, a growing community of tech-enthusiasts embraced the technology and started sharing the models they fabricate online. Surprisingly, the shared models appear to largely be one-offs (e.g., they proudly showcase what a single person can make in one afternoon). For laser cutting to become a relevant mainstream phenomenon (as opposed to the current tech enthusiasts and industry users), it is crucial to enable users to reproduce models made by more experienced modelers, and to build on the work of others instead of creating one-offs.
We create a technological basis that allows users to build on the work of others—a progression that is currently held back by the use of exchange formats that disregard mechanical differences between machines and therefore overlook implications with respect to how well parts fit together mechanically (aka engineering fit).
For the field to progress, we need a machine-independent sharing infrastructure.
In this thesis, we outline three approaches that together get us closer to this:
(1) 2D cutting plans that are tolerant to machine variations. Our initial take is a minimally invasive approach: replacing machine-specific elements in cutting plans with more tolerant elements using mechanical hacks like springs and wedges. The resulting models fabricate on any consumer laser cutter and in a range of materials.
(2) sharing models in 3D. To allow building on the work of others, we build a 3D modeling environment for laser cutting (kyub). After users design a model, they export their 3D models to 2D cutting plans optimized for the machine and material at hand. We extend this volumetric environment with tools to edit individual plates, allowing users to leverage the efficiency of volumetric editing while having control over the most detailed elements in laser-cutting (plates)
(3) converting legacy 2D cutting plans to 3D models. To handle legacy models, we build software to interactively reconstruct 3D models from 2D cutting plans. This allows users to reuse the models in more productive ways. We revisit this by automating the assembly process for a large subset of models.
The above-mentioned software composes a larger system (kyub, 140,000 lines of code). This system integration enables the push towards actual use, which we demonstrate through a range of workshops where users build complex models such as fully functional guitars. By simplifying sharing and re-use and the resulting increase in model complexity, this line of work forms a small step to enable personal fabrication to scale past the maker phenomenon, towards a mainstream phenomenon—the same way that other fields, such as print (postscript) and ultimately computing itself (portable programming languages, etc.) reached mass adoption.
N2  - Laserschneiden ist ein schnelles und präzises Fertigungsverfahren. Diese Eigenschaften haben das Laserschneiden zu einem starken Anwärter für die industrielle Produktion gemacht. Seitdem die Patente für die ursprüngliche Technologie begannen abzulaufen, nahm eine wachsende Gemeinschaft von Technikbegeisterten die Technologie an und begann, ihre Modelle online zu teilen. Überraschenderweise scheinen die gemeinsam genutzten Modelle größtenteils Einzelstücke zu sein (z.B. zeigten sie stolz, was eine einzelne Person an einem Nachmittag entwickeln kann). Damit das Laserschneiden zu einem relevanten Mainstream-Phänomen wird, ist es entscheidend, dass die Benutzer die Möglichkeit haben Modelle zu reproduzieren, die von erfahrenen Modellierern erstellt wurden, und somit auf der Arbeit anderer aufbauen zu können, anstatt Einzelstücke zu erstellen.
Wir schaffen eine technologische Basis, die es Benutzern ermöglicht, auf der Arbeit anderer aufzubauen—eine Entwicklung, die derzeit gehemmt wird durch die Verwendung von Austauschformaten, die mechanische Unterschiede zwischen Maschinen außer Acht lassen und daher Auswirkungen darauf übersehen, wie gut Teile mechanisch zusammenpassen (aka Passung).
Damit sich das Feld sich weiterentwickeln kann, brauchen wir eine maschinenunabhängige Infrastruktur für gemeinsame Nutzung.
In dieser Dissertation präsentieren wir drei Ansätze, die uns zu diesem Ziel näherbringen:
(1) 2D-Schnittpläne, die gegenüber Maschinenvariationen tolerant sind. Unser erster Ansatz ist ein minimalinvasiver Ansatz: Wir ersetzen maschinenspezifische Elemente in Schnittplänen durch tolerantere Elemente unter Verwendung mechanischer Hacks wie Federn und Keile. Die resultierenden Modelle können auf jedem handelsüblichen Laserschneider und in einer Reihe von Materialien hergestellt werden.
(2) Teilen von Modellen in 3D. Um auf der Arbeit anderer aufbauen zu können, erstellen wir eine 3D-Modellierungsumgebung für das Laserschneiden (kyub). Nachdem die Benutzer ein Modell entworfen haben, exportieren sie ihre 3D-Modelle in 2D-Schnittpläne, die für die jeweilige Maschine und das vorhandene Material optimiert sind. Wir erweitern diese volumetrische Umgebung mit Werkzeugen zum Bearbeiten einzelner Platten, sodass Benutzer die Effizienz der volumetrischen Bearbeitung nutzen und gleichzeitig die detailliertesten Elemente beim Laserschneiden (Platten) steuern können.
(3) Umwandlung von legacy 2D-Schnittplänen in 3D-Modelle. Um mit legacy Modellen umzugehen, entwickeln wir Software, um 3DModelle interaktiv aus 2D-Schnittplänen zu rekonstruieren. Dies ermöglicht Benutzern, die Modelle auf produktivere Weise wiederzuverwenden. Wir behandeln dies erneut, indem wir den Rekonstruierungsprozess für eine große Teilmenge von Modellen automatisieren.
Die oben genannte Software ist in ein größeres System integriert (kyub, 140.000 Codezeilen). Diese Systemintegration ermöglicht es, den tatsächlichen Gebrauch voranzutreiben, was wir in einer Reihe von Workshops demonstrieren, in denen Benutzer komplexe Modelle wie voll funktionsfähige Gitarren bauen. Durch die Vereinfachung der gemeinsamen Nutzung und Wiederverwendung und die daraus resultierende Zunahme der Modellkomplexität wird diese Arbeitsrichtung und das daraus resultierende System letztendlich (teilweise) dazu beitragen, dass die persönliche Fertigung über das Maker-Phänomen hinausgeht und sich zu einem Mainstream-Phänomen entwickelt – genauso wie andere Bereiche, z.B. als Druck (Postscript) und schließlich selbst Computer (portable Programmiersprachen usw.), um eine Massenakzeptanz zu erreichen.
KW  - human computer interaction
KW  - digital fabrication
KW  - laser cutting
KW  - IT systems engineering
KW  - IT Softwarentwicklung
KW  - digitale Fabrikation
KW  - Mensch-Maschine Interaktion
KW  - Laserschneiden
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-578141
ER  - 
TY  - THES
A1  - Rothenberger, Ralf
T1  - Satisfiability thresholds for non-uniform random k-SAT
T1  - Erfüllbarkeitsschwellwerte für nicht-uniformes zufälliges k-SAT
N2  - Boolean Satisfiability (SAT) is one of the problems at the core of theoretical computer science.  It was the first problem proven to be NP-complete by Cook and, independently, by Levin. Nowadays it is conjectured that SAT cannot be solved in sub-exponential time. Thus, it is generally assumed that SAT and its restricted version k-SAT are hard to solve. However, state-of-the-art SAT solvers can solve even huge practical instances of these problems in a reasonable amount of time.

Why is SAT hard in theory, but easy in practice?  One approach to answering this question is investigating the average runtime of SAT. In order to analyze this average runtime the random k-SAT model was introduced. The model generates all k-SAT instances with n variables and m clauses with uniform probability. Researching random k-SAT led to a multitude of insights and tools for analyzing random structures in general. One major observation was the emergence of the so-called satisfiability threshold:  A phase transition point in the number of clauses at which the generated formulas go from asymptotically almost surely satisfiable to asymptotically almost surely unsatisfiable. Additionally, instances around the threshold seem to be particularly hard to solve.

In this thesis we analyze a more general model of random k-SAT that we call non-uniform random k-SAT. In contrast to the classical model each of the n Boolean variables now has a distinct probability of being drawn. For each of the m clauses we draw k variables according to the variable distribution and choose their signs uniformly at random. Non-uniform random k-SAT gives us more control over the distribution of Boolean variables in the resulting formulas. This allows us to tailor distributions to the ones observed in practice. Notably, non-uniform random k-SAT contains the previously proposed models random k-SAT, power-law random k-SAT and geometric random k-SAT as special cases.

We analyze the satisfiability threshold in non-uniform random k-SAT depending on the variable probability distribution. Our goal is to derive conditions on this distribution under which an equivalent of the satisfiability threshold conjecture holds. We start with the arguably simpler case of non-uniform random 2-SAT. For this model we show under which conditions a threshold exists, if it is sharp or coarse, and what the leading constant of the threshold function is. These are exactly the three ingredients one needs in order to prove or disprove the satisfiability threshold conjecture. For non-uniform random k-SAT with k=3 we only prove sufficient conditions under which a threshold exists. We also show some properties of the variable probabilities under which the threshold is sharp in this case. These are the first results on the threshold behavior of non-uniform random k-SAT.
N2  - Das Boolesche Erfüllbarkeitsproblem (SAT) ist eines der zentralsten Probleme der theoretischen Informatik. Es war das erste Problem, dessen NP-Vollständigkeit nachgewiesen wurde, von Cook und Levin unabhängig voneinander. Heutzutage wird vermutet, dass SAT nicht in subexponentialler Zeit gelöst werden kann. Darum wird allgemein angenommen, dass SAT und seine eingeschränkte Version k-SAT nicht effizient zu lösen sind. Trotzdem können moderne SAT solver sogar riesige Echtweltinstanzen dieser Probleme in angemessener Zeit lösen.

Warum ist SAT theoretisch schwer, aber einfach in der Praxis? Ein Ansatz um diese Frage zu beantworten ist die Untersuchung der durchschnittlichen Laufzeit von SAT. Um diese durchschnittliche oder typische Laufzeit analysieren zu können, wurde zufälliges k-SAT eingeführt. Dieses Modell erzeugt all k-SAT-Instanzen mit n Variablen und m Klauseln mit gleicher Wahrscheinlichkeit. Die Untersuchung des Zufallsmodells für k-SAT führte zu einer Vielzahl von Erkenntnissen und Techniken zur Untersuchung zufälliger Strukturen im Allgemeinen. Eine der größten Entdeckungen in diesem Zusammenhang war das Auftreten des sogenannten Erfüllbarkeitsschwellwerts: Ein Phasenübergang in der Anzahl der Klauseln, an dem die generierten Formeln von asymptotisch sicher erfüllbar zu asymptotisch sicher unerfüllbar wechseln. Zusätzlich scheinen Instanzen, die um diesen Übergang herum erzeugt werden, besonders schwer zu lösen zu sein.

In dieser Arbeit analysieren wir ein allgemeineres Zufallsmodell für k-SAT, das wir nichtuniformes zufälliges k-SAT nennen. Im Gegensatz zum klassischen Modell, hat jede Boolesche Variable jetzt eine bestimmte Wahrscheinlichkeit gezogen zu werden. Für jede der m Klauseln ziehen wir k Variablen entsprechend ihrer Wahrscheinlichkeitsverteilung und wählen ihre Vorzeichen uniform zufällig. Nichtuniformes zufälliges k-SAT gibt uns mehr Kontrolle über die Verteilung Boolescher Variablen in den resultierenden Formeln. Das erlaubt uns diese Verteilungen auf die in der Praxis beobachteten zuzuschneiden. Insbesondere enthält nichtuniformes zufälliges k-SAT die zuvor vorgestellten Modelle zufälliges k-SAT, skalenfreies zufälliges k-SAT und geometrisches zufälliges k-SAT als Spezialfälle.

Wir analysieren den Erfüllbarkeitsschwellwert in nichtuniformem zufälligen k-SAT abhängig von den Wahrscheinlichkeitsverteilungen für Variablen. Unser Ziel ist es, Bedingungen an diese Verteilungen abzuleiten, unter denen ein Äquivalent der Erfüllbarkeitsschwellwertsvermutung für zufälliges k-SAT gilt. Wir fangen mit dem wahrscheinlich einfacheren Modell nichtuniformem zufälligen 2-SAT an. Für dieses Modell zeigen wir, unter welchen Bedingungen ein Schwellwert existiert, ob er steil oder flach ansteigt und was die führende Konstante der Schwellwertfunktion ist. Das sind genau die Zutaten, die man benötigt um die Erfüllbarkeitsschwellwertsvermutung zu bestätigen oder zu widerlegen. Für nichtuniformes zufälliges k-SAT mit k≥3 zeigen wir nur hinreichende Bedingungen, unter denen ein Schwellwert existiert. Wir zeigen außerdem einige Eigenschaften der Variablenwahrscheinlichkeiten, die dazu führen, dass der Schwellwert steil ansteigt. Dies sind unseres Wissens nach die ersten allgemeinen Resultate zum Schwellwertverhalten von nichtuniformem zufälligen k-SAT.
KW  - Boolean satisfiability
KW  - random k-SAT
KW  - satisfiability threshold
KW  - non-uniform distribution
KW  - Boolsche Erfüllbarkeit
KW  - nicht-uniforme Verteilung
KW  - zufälliges k-SAT
KW  - Erfüllbarkeitsschwellwert
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-549702
ER  - 
TY  - THES
A1  - Rohloff, Tobias
T1  - Learning analytics at scale
BT  - supporting learning and teaching in MOOCs with data-driven insights
N2  - Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers.

The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended:

(1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action.

For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research.
N2  - Digitale Technologien sind Wegbereiter für innovative Bildungsansätze. Das Lernformat der Massive Open Online Courses (MOOCs) bietet einen einfachen und globalen Zugang zu lebenslangem Lernen und ist oft kostengünstiger und flexibler als klassische Präsenzlehre. Dabei können sich Tausende von Lernenden meist ohne Zulassungsbeschränkung in Kurse einschreiben, wodurch jedoch auch Herausforderungen entstehen. Eine individuelle Betreuung durch Lehrende ist kaum möglich und das Durchhaltevermögen und der Lernerfolg hängen von selbstregulatorischen Fähigkeiten der Lernenden ab. Hier bietet Technologie die Möglichkeit zur Unterstützung. Die Nutzung von Daten zur Entscheidungsfindung transformiert bereits viele Bereiche, aber im Bildungswesen ist dies noch eine junge Forschungsdisziplin. Als Learning Analytics (LA) wird das Messen, Erfassen, Analysieren und Auswerten von Daten über Lernende und ihren Lernkontext verstanden, mit dem Ziel, das Lernen und die Lernumgebungen zu verstehen und zu verbessern. Die riesige Menge an Daten, die MOOCs über das Lernverhalten und den Lernerfolg produzieren, bietet die Möglichkeit, das menschliche Lernen zu studieren und Ansätze zu entwickeln, die den Anforderungen von Lernenden und Lehrenden gerecht werden.

Der Schwerpunkt dieser Dissertation liegt auf der Implementierung von LA für die Größenordnung von MOOCs und erforscht dabei, wie datengetriebene Technologie das Lernen und Lehren in diesem Kontext unterstützen kann. Zu diesem Zweck wurden mehrere Forschungsprototypen iterativ für die HPI-MOOC-Plattform entwickelt. Daher wurden diese in einer authentischen und realen Lernumgebung getestet und evaluiert. Die meisten Ergebnisse lassen sich auf konzeptioneller Ebene auch auf andere MOOC-Plattformen übertragen, wodurch der Forschungsbeitrag dieser Arbeit praktische Erkenntnisse über das theoretisch Mögliche hinaus liefert. Insgesamt wurden vier Systemkomponenten entwickelt und erweitert:

(1) Die LA-Architektur: Eine technische Infrastruktur zum Sammeln, Verarbeiten und Analysieren von ereignisgesteuerten Lerndaten basierend auf einem schemaagnostischem Pipelining in einer serviceorientierten MOOC-Plattform. (2) Das LA-Dashboard für Lernende: Ein Werkzeug zur datengesteuerten Unterstützung der Selbstregulierung, insbesondere um Lernende in die Lage zu versetzen, ihre Lernaktivitäten, ihren Fortschritt und ihren Lernerfolg selbst zu evaluieren und zu planen. (3) Personalisierte Lernziele: Eine Reihe von Funktionen, um den Lernerfolg besser mit persönlichen Absichten zu verknüpfen, die auf ausgewählten Lernzielen basieren, um Leitlinien anzubieten und die bereitgestellten datengetriebenen Einblicke über den Lernfortschritt darauf abzustimmen. (4) Das LA-Dashboard für Lehrende: Ein Hilfsmittel, das Lehrkräfte mit datengetriebenen Erkenntnissen unterstützt, um ihre Kurse mit Tausenden von Lernenden zu überblicken, mögliche Probleme zu erkennen und fundierte Maßnahmen zu ergreifen.

Für alle untersuchten Aspekte dieser Dissertation werden verwandte Forschungsarbeiten vorgestellt, Entwicklungsprozesse und Implementierungskonzepte erläutert und Evaluierungen in Fallstudien durchgeführt. Unter anderem konnte durch den Einsatz des Dashboards für Lernende in Kombination mit personalisierten Lernzielen verbesserte Zertifizierungsraten von 11,62% bis 12,63% nachgewiesen werden. Außerdem wurde beobachtet, dass das Dashboard für Lehrende ein entscheidendes Werkzeug und ein integraler Bestandteil für die Lehre in MOOCs ist. Neben den Ergebnissen und Beiträgen werden generelle Einschränkungen der Arbeit diskutiert, die insgesamt eine fundierte Grundlage für praktische Implikationen und zukünftige Forschungsvorhaben schaffen.
KW  - Learning Analytics
KW  - MOOCs
KW  - Self-Regulated Learning
KW  - E-Learning
KW  - Service-Oriented Architecture
KW  - Online Learning Environments
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-526235
ER  - 
TY  - GEN
A1  - Risch, Julian
A1  - Krestel, Ralf
T1  - My Approach = Your Apparatus?
BT  - Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
T2  - Libraries
N2  - Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. This model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and differences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up to 31% higher document classification accuracy. More importantly, our approach is the first topic model that ensures disjunct general and specific word distributions, resulting in clear-cut topic representations.
KW  - Topic modeling
KW  - Automatic domain term extraction
KW  - Entropy
Y1  - 2018
SN  - 978-1-4503-5178-2
U6  - https://doi.org/10.1145/3197026.3197038
SN  - 2575-7865
SN  - 2575-8152
SP  - 283
EP  - 292
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Risch, Julian
A1  - Krestel, Ralf
T1  - Domain-specific word embeddings for patent classification
JF  - Data Technologies and Applications
N2  - Purpose
Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. The purpose of this paper is to examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases.

Design/methodology/approach
To account for this language use, the authors present domain-specific pre-trained word embeddings for the patent domain. The authors train the model on a very large data set of more than 5m patents and evaluate it at the task of patent classification. To this end, the authors propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings.

Findings
Experiments on a standardized evaluation data set show that the approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, the authors further investigate the model’s strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge.

Originality/value
The proposed approach fulfills the need for domain-specific word embeddings for downstream tasks in the patent domain, such as patent classification or patent analysis.
KW  - Deep learning
KW  - Document classification
KW  - Word embedding
KW  - Patents
Y1  - 2019
U6  - https://doi.org/10.1108/DTA-01-2019-0002
SN  - 2514-9288
SN  - 2514-9318
VL  - 53
IS  - 1
SP  - 108
EP  - 122
PB  - Emerald Group Publishing Limited
CY  - Bingley
ER  - 
TY  - JOUR
A1  - Risch, Julian
A1  - Krestel, Ralf
ED  - Agarwal, Basant
ED  - Nayak, Richi
ED  - Mittal, Namita
ED  - Patnaik, Srikanta
T1  - Toxic comment detection in online discussions
JF  - Deep learning-based approaches for sentiment analysis
N2  - Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, and individual conversations emerge. However, the misuse by spammers, haters, and trolls makes costly content moderation necessary. Sentiment analysis can not only support moderation but also help to understand the dynamics of online discussions. A subtask of content moderation is the identification of toxic comments. To this end, we describe the concept of toxicity and characterize its subclasses. Further, we present various deep learning approaches, including datasets and architectures, tailored to sentiment analysis in online discussions. One way to make these approaches more comprehensible and trustworthy is fine-grained instead of binary comment classification. On the downside, more classes require more training data. Therefore, we propose to augment training data by using transfer learning. We discuss real-world applications, such as semi-automated comment moderation and troll detection. Finally, we outline future challenges and current limitations in light of most recent research publications.
KW  - deep learning
KW  - natural language processing
KW  - user-generated content
KW  - toxic comment classification
KW  - hate speech detection
Y1  - 2020
SN  - 978-981-15-1216-2
SN  - 978-981-15-1215-5
U6  - https://doi.org/10.1007/978-981-15-1216-2_4
SN  - 2524-7565
SN  - 2524-7573
SP  - 85
EP  - 109
PB  - Springer
CY  - Singapore
ER  - 
TY  - THES
A1  - Risch, Julian
T1  - Reader comment analysis on online news platforms
N2  - Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-consuming content moderation. As a consequence, many platforms limited or even shut down comment sections completely. In this thesis, we present deep learning approaches for comment classification, recommendation, and prediction to foster respectful and engaging online discussions. The main focus is on two kinds of comments: toxic comments, which make readers leave a discussion, and engaging comments, which make readers join a discussion. First, we discourage and remove toxic comments, e.g., insults or threats. To this end, we present a semi-automatic comment moderation process, which is based on fine-grained text classification models and supports moderators. Our experiments demonstrate that data augmentation, transfer learning, and ensemble learning allow training robust classifiers even on small datasets. To establish trust in the machine-learned models, we reveal which input features are decisive for their output with attribution-based explanation methods. Second, we encourage and highlight engaging comments, e.g., serious questions or factual statements. We automatically identify the most engaging comments, so that readers need not scroll through thousands of comments to find them. The model training process builds on upvotes and replies as a measure of reader engagement. We also identify comments that address the article authors or are otherwise relevant to them to support interactions between journalists and their readership. Taking into account the readers' interests, we further provide personalized recommendations of discussions that align with their favored topics or involve frequent co-commenters. Our models outperform multiple baselines and recent related work in experiments on comment datasets from different platforms.
N2  - Kommentarspalten von Online-Nachrichtenplattformen sind ein essentieller Ort, um Meinungen zu äußern und politische Themen zu diskutieren. Der Missbrauch durch Trolle und Verbreiter von Hass und Spam lässt jedoch Zweifel aufkommen, ob der Nutzen die Kosten der zeitaufwendigen Kommentarmoderation rechtfertigt. Als Konsequenz daraus haben viele Plattformen ihre Kommentarspalten eingeschränkt oder sogar ganz abgeschaltet. In dieser Arbeit stellen wir Deep-Learning-Verfahren zur Klassifizierung, Empfehlung und Vorhersage von Kommentaren vor, um respektvolle und anregende Online-Diskussionen zu fördern. Das Hauptaugenmerk liegt dabei auf zwei Arten von Kommentaren: toxische Kommentare, die die Leser veranlassen, eine Diskussion zu verlassen, und anregende Kommentare, die die Leser veranlassen, sich an einer Diskussion zu beteiligen. Im ersten Schritt identifizieren und entfernen wir toxische Kommentare, z.B. Beleidigungen oder Drohungen. Zu diesem Zweck stellen wir einen halbautomatischen Moderationsprozess vor, der auf feingranularen Textklassifikationsmodellen basiert und Moderatoren unterstützt. Unsere Experimente zeigen, dass Datenanreicherung, Transfer- und Ensemble-Lernen das Trainieren robuster Klassifikatoren selbst auf kleinen Datensätzen ermöglichen. Um Vertrauen in die maschinell gelernten Modelle zu schaffen, zeigen wir mit attributionsbasierten Erklärungsmethoden auf, welche Teile der Eingabe für ihre Ausgabe entscheidend sind. Im zweiten Schritt ermutigen und markieren wir anregende Kommentare, z.B. ernsthafte Fragen oder sachliche Aussagen.
Wir identifizieren automatisch die anregendsten Kommentare, so dass die Leser nicht durch Tausende von Kommentaren blättern müssen, um sie zu finden. Der Trainingsprozess der Modelle baut auf Upvotes und Kommentarantworten als Maß für die Aktivität der Leser auf.
Wir identifizieren außerdem Kommentare, die sich an die Artikelautoren richten oder anderweitig für sie relevant sind, um die Interaktion zwischen Journalisten und ihrer Leserschaft zu unterstützen. Unter Berücksichtigung der Interessen der Leser bieten wir darüber hinaus personalisierte Diskussionsempfehlungen an, die sich an den von ihnen bevorzugten Themen oder häufigen Diskussionspartnern orientieren. In Experimenten mit Kommentardatensätzen von verschiedenen Plattformen übertreffen unsere Modelle mehrere grundlegende Vergleichsverfahren und aktuelle verwandte Arbeiten.
T2  - Analyse von Leserkommentaren auf Online-Nachrichtenplattformen
KW  - machine learning
KW  - Maschinelles Lernen
KW  - text classification
KW  - Textklassifikation
KW  - social media
KW  - Soziale Medien
KW  - hate speech detection
KW  - Hasserkennung
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-489222
ER  - 
TY  - THES
A1  - Richter, Rico
T1  - Concepts and techniques for processing and rendering of massive 3D point clouds
T1  - Konzepte und Techniken für die Verarbeitung und das Rendering von Massiven 3D-Punktwolken
N2  - Remote sensing technology, such as airborne, mobile, or terrestrial laser scanning, and photogrammetric techniques, are fundamental approaches for efficient, automatic creation of digital representations of spatial environments. For example, they allow us to generate 3D point clouds of landscapes, cities, infrastructure networks, and sites. As essential and universal category of geodata, 3D point clouds are used and processed by a growing number of applications, services, and systems such as in the domains of urban planning, landscape architecture, environmental monitoring, disaster management, virtual geographic environments as well as for spatial analysis and simulation.
While the acquisition processes for 3D point clouds become more and more reliable and widely-used, applications and systems are faced with more and more 3D point cloud data. In addition, 3D point clouds, by their very nature, are raw data, i.e., they do not contain any structural or semantics information. Many processing strategies common to GIS such as deriving polygon-based 3D models generally do not scale for billions of points. GIS typically reduce data density and precision of 3D point clouds to cope with the sheer amount of data, but that results in a significant loss of valuable information at the same time.
This thesis proposes concepts and techniques designed to efficiently store and process massive 3D point clouds. To this end, object-class segmentation approaches are presented to attribute semantics to 3D point clouds, used, for example, to identify building, vegetation, and ground structures and, thus, to enable processing, analyzing, and visualizing 3D point clouds in a more effective and efficient way. Similarly, change detection and updating strategies for 3D point clouds are introduced that allow for reducing storage requirements and incrementally updating 3D point cloud databases. In addition, this thesis presents out-of-core, real-time rendering techniques used to interactively explore 3D point clouds and related analysis results. All techniques have been implemented based on specialized spatial data structures, out-of-core algorithms, and GPU-based processing schemas to cope with massive 3D point clouds having billions of points.  
All proposed techniques have been evaluated and demonstrated their applicability to the field of geospatial applications and systems, in particular for tasks such as classification, processing, and visualization. Case studies for 3D point clouds of entire cities with up to 80 billion points show that the presented approaches open up new ways to manage and apply large-scale, dense, and time-variant 3D point clouds as required by a rapidly growing number of applications and systems.
N2  - Fernerkundungstechnologien wie luftgestütztes, mobiles oder terrestrisches Laserscanning und photogrammetrische Techniken sind grundlegende Ansätze für die effiziente, automatische Erstellung von digitalen Repräsentationen räumlicher Umgebungen. Sie ermöglichen uns zum Beispiel die Erzeugung von 3D-Punktwolken für Landschaften, Städte, Infrastrukturnetze und Standorte. 3D-Punktwolken werden als wesentliche und universelle Kategorie von Geodaten von einer wachsenden Anzahl an Anwendungen, Diensten und Systemen genutzt und verarbeitet, zum Beispiel in den Bereichen Stadtplanung, Landschaftsarchitektur, Umweltüberwachung, Katastrophenmanagement, virtuelle geographische Umgebungen sowie zur räumlichen Analyse und Simulation.
Da die Erfassungsprozesse für 3D-Punktwolken immer zuverlässiger und verbreiteter werden, sehen sich Anwendungen und Systeme mit immer größeren 3D-Punktwolken-Daten konfrontiert. Darüber hinaus enthalten 3D-Punktwolken als Rohdaten von ihrer Art her keine strukturellen oder semantischen Informationen. Viele GIS-übliche Verarbeitungsstrategien, wie die Ableitung polygonaler 3D-Modelle, skalieren in der Regel nicht für Milliarden von Punkten. GIS reduzieren typischerweise die Datendichte und Genauigkeit von 3D-Punktwolken, um mit der immensen Datenmenge umgehen zu können, was aber zugleich zu einem signifikanten Verlust wertvoller Informationen führt.
Diese Arbeit präsentiert Konzepte und Techniken, die entwickelt wurden, um massive 3D-Punktwolken effizient zu speichern und zu verarbeiten. Hierzu werden Ansätze für die Objektklassen-Segmentierung vorgestellt, um 3D-Punktwolken mit Semantik anzureichern; so lassen sich beispielsweise Gebäude-, Vegetations- und Bodenstrukturen identifizieren, wodurch die Verarbeitung, Analyse und Visualisierung von 3D-Punktwolken effektiver und effizienter durchführbar werden. Ebenso werden Änderungserkennungs- und Aktualisierungsstrategien für 3D-Punktwolken vorgestellt, mit denen Speicheranforderungen reduziert und Datenbanken für 3D-Punktwolken inkrementell aktualisiert werden können. Des Weiteren beschreibt diese Arbeit Out-of-Core Echtzeit-Rendering-Techniken zur interaktiven Exploration von 3D-Punktwolken und zugehöriger Analyseergebnisse. Alle Techniken wurden mit Hilfe spezialisierter räumlicher Datenstrukturen, Out-of-Core-Algorithmen und GPU-basierter Verarbeitungs-schemata implementiert, um massiven 3D-Punktwolken mit Milliarden von Punkten gerecht werden zu können.
Alle vorgestellten Techniken wurden evaluiert und die Anwendbarkeit für Anwendungen und Systeme, die mit raumbezogenen Daten arbeiten, wurde insbesondere für Aufgaben wie Klassifizierung, Verarbeitung und Visualisierung demonstriert. Fallstudien für 3D-Punktwolken von ganzen Städten mit bis zu 80 Milliarden Punkten zeigen, dass die vorgestellten Ansätze neue Wege zur Verwaltung und Verwendung von großflächigen, dichten und zeitvarianten 3D-Punktwolken eröffnen, die von einer wachsenden Anzahl an Anwendungen und Systemen benötigt werden.
KW  - 3D point clouds
KW  - 3D-Punktwolken
KW  - real-time rendering
KW  - Echtzeit-Rendering
KW  - 3D visualization
KW  - 3D-Visualisierung
KW  - classification
KW  - Klassifizierung
KW  - change detection
KW  - Veränderungsanalyse
KW  - LiDAR
KW  - LiDAR
KW  - remote sensing
KW  - Fernerkundung
KW  - mobile mapping
KW  - Mobile-Mapping
KW  - Big Data
KW  - Big Data
KW  - GPU
KW  - GPU
KW  - laserscanning
KW  - Laserscanning
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-423304
ER  - 
TY  - JOUR
A1  - Richly, Keven
A1  - Schlosser, Rainer
A1  - Boissier, Martin
T1  - Budget-conscious fine-grained configuration optimization for spatio-temporal applications
JF  - Proceedings of the VLDB Endowment
N2  - Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
KW  - General Earth and Planetary Sciences
KW  - Water Science and Technology
KW  - Geography, Planning and Development
Y1  - 2022
U6  - https://doi.org/10.14778/3565838.3565858
SN  - 2150-8097
VL  - 15
IS  - 13
SP  - 4079
EP  - 4092
PB  - Association for Computing Machinery (ACM)
CY  - [New York]
ER  - 
TY  - JOUR
A1  - Richly, Keven
A1  - Brauer, Janos
A1  - Schlosser, Rainer
T1  - Predicting location probabilities of drivers to improved dispatch decisions of transportation network companies based on trajectory data
JF  - Proceedings of the 9th International Conference on Operations Research and Enterprise Systems - ICORES
N2  - The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.
KW  - trajectory data
KW  - location prediction algorithm
KW  - Peer-to-Peer ridesharing
KW  - transport network companies
KW  - risk-aware dispatching
Y1  - 2020
PB  - Springer
CY  - Berlin
ER  - 
TY  - GEN
A1  - Richly, Keven
A1  - Brauer, Janos
A1  - Schlosser, Rainer
T1  - Predicting location probabilities of drivers to improved dispatch decisions of transportation network companies based on trajectory data
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 9 
KW  - trajectory data
KW  - location prediction algorithm
KW  - Peer-to-Peer ridesharing
KW  - transport network companies
KW  - risk-aware dispatching
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-524040
IS  - 9
ER  - 
TY  - GEN
A1  - Richly, Keven
T1  - A survey on trajectory data management for hybrid transactional and analytical workloads
T2  - IEEE International Conference on Big Data (Big Data)
N2  - Rapid advances in location-acquisition technologies have led to large amounts of trajectory data. This data is the foundation for a broad spectrum of services driven and improved by trajectory data mining. However, for hybrid transactional and analytical workloads, the storing and processing of rapidly accumulated trajectory data is a non-trivial task. In this paper, we present a detailed survey about state-of-the-art trajectory data management systems. To determine the relevant aspects and requirements for such systems, we developed a trajectory data mining framework, which summarizes the different steps in the trajectory data mining process. Based on the derived requirements, we analyze different concepts to store, compress, index, and process spatio-temporal data. There are various trajectory management systems, which are optimized for scalability, data footprint reduction, elasticity, or query performance. To get a comprehensive overview, we describe and compare different exciting systems. Additionally, the observed similarities in the general structure of different systems are consolidated in a general blueprint of trajectory management systems.
KW  - Trajectory Data Management
KW  - Spatio-Temporal Data
KW  - Survey
Y1  - 2019
SN  - 978-1-5386-5035-6
U6  - https://doi.org/10.1109/BigData.2018.8622394
SN  - 2639-1589
SP  - 562
EP  - 569
PB  - IEEE
CY  - New York
ER  - 
TY  - GEN
A1  - Richly, Keven
T1  - Leveraging spatio-temporal soccer data to define a graphical query language for game recordings
T2  - IEEE International Conference on Big Data (Big Data)
N2  - For professional soccer clubs, performance and video analysis are an integral part of the preparation and post-processing of games. Coaches, scouts, and video analysts extract information about strengths and weaknesses of their team as well as opponents by manually analyzing video recordings of past games. Since video recordings are an unstructured data source, it is a complex and time-intensive task to find specific game situations and identify similar patterns. In this paper, we present a novel approach to detect patterns and situations (e.g., playmaking and ball passing of midfielders) based on trajectory data. The application uses the metaphor of a tactic board to offer a graphical query language. With this interactive tactic board, the user can model a game situation or mark a specific situation in the video recording for which all matching occurrences in various games are immediately displayed, and the user can directly jump to the corresponding game scene. Through the additional visualization of key performance indicators (e.g.,the physical load of the players), the user can get a better overall assessment of situations. With the capabilities to find specific game situations and complex patterns in video recordings, the interactive tactic board serves as a useful tool to improve the video analysis process of professional sports teams.
KW  - Spatio-temporal data analysis
KW  - soccer analytics
KW  - graphical query language
Y1  - 2019
SN  - 978-1-5386-5035-6
U6  - https://doi.org/10.1109/BigData.2018.8622159
SN  - 2639-1589
SP  - 3456
EP  - 3463
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Rezaei, Mina
A1  - Yang, Haojin
A1  - Meinel, Christoph
T1  - Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation
JF  - Multimedia tools and applications : an international journal
N2  - We propose a new recurrent generative adversarial architecture named RNN-GAN to mitigate imbalance data problem in medical image semantic segmentation where the number of pixels belongs to the desired object are significantly lower than those belonging to the background. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low recall. To mitigate imbalanced training data impact, we train RNN-GAN with proposed complementary segmentation mask, in addition, ordinary segmentation masks. The RNN-GAN consists of two components: a generator and a discriminator. The generator is trained on the sequence of medical images to learn corresponding segmentation label map plus proposed complementary label both at a pixel level, while the discriminator is trained to distinguish a segmentation image coming from the ground truth or from the generator network. Both generator and discriminator substituted with bidirectional LSTM units to enhance temporal consistency and get inter and intra-slice representation of the features. We show evidence that the proposed framework is applicable to different types of medical images of varied sizes. In our experiments on ACDC-2017, HVSMR-2016, and LiTS-2017 benchmarks we find consistently improved results, demonstrating the efficacy of our approach.
KW  - Imbalanced medical image semantic segmentation
KW  - Recurrent generative
KW  - adversarial network
Y1  - 2019
U6  - https://doi.org/10.1007/s11042-019-7305-1
SN  - 1380-7501
SN  - 1573-7721
VL  - 79
IS  - 21-22
SP  - 15329
EP  - 15348
PB  - Springer
CY  - Dordrecht
ER  - 
TY  - THES
A1  - Rezaei, Mina
T1  - Deep representation learning from imbalanced medical imaging
N2  - Medical imaging plays an important role in disease diagnosis, treatment planning, and clinical monitoring. One of the major challenges in medical image analysis is imbalanced training data, in which the class of interest is much rarer than the other classes. Canonical machine learning algorithms suppose that the number of samples from different classes in the training dataset is roughly similar or balance. Training a machine learning model on an imbalanced dataset can introduce unique challenges to the learning problem. 

A model learned from imbalanced  training  data is biased towards the high-frequency samples. The predicted results of such networks have low sensitivity and high precision. In medical applications, the cost of misclassification of the minority class could be more than the cost of misclassification of the majority class. For example, the risk of not detecting a tumor could be much higher than referring to a healthy subject to a doctor. The current Ph.D. thesis introduces several deep learning-based approaches for handling class imbalanced problems for learning multi-task such as disease classification and semantic segmentation. 

At the data-level, the objective is to balance the data distribution through re-sampling  the  data  space: we propose novel approaches to correct internal bias towards fewer frequency samples. These approaches include patient-wise batch sampling, complimentary labels, supervised and unsupervised  minority oversampling using generative adversarial networks for all. 

On the other hand, at algorithm-level, we modify the learning algorithm to alleviate the bias towards majority classes. In this regard, we propose different generative adversarial networks for cost-sensitive learning, ensemble learning, and mutual learning to deal with highly imbalanced imaging data. 

We show evidence that the proposed approaches are applicable to different types of medical images of varied sizes on different applications of routine clinical tasks, such as disease  classification and semantic segmentation. Our various implemented algorithms have shown outstanding results on different medical imaging challenges.
N2  - Medizinische Bildanalyse spielt eine wichtige Rolle bei der Diagnose von Krankheiten, der Behandlungsplanung, und der klinischen Überwachung. Eines der großen Probleme in der medizinischen Bildanalyse ist das Vorhandensein von nicht ausbalancierten Trainingsdaten, bei denen die Anzahl der Datenpunkte der Zielklasse in der Unterzahl ist. Die Aussagen eines Modells, welches auf einem unbalancierten Datensatz trainiert wurde, tendieren dazu Datenpunkte in die Klasse mit der Mehrzahl an Trainingsdaten einzuordnen. Die Aussagen eines solchen Modells haben eine geringe Sensitivität aber hohe Genauigkeit. Im medizinischen Anwendungsbereich kann die Einordnung eines Datenpunktes in eine falsche Klasse Schwerwiegende Ergebnisse mit sich bringen. In die Nichterkennung eines Tumors Beispielsweise brigt ein viel höheres Risiko für einen Patienten, als wenn ein gesunder Patient zum Artz geschickt wird.

Das Problem des Lernens unter Nutzung von nicht ausbalancierten Trainingsdaten wird erst seit Kurzem bei der Klassifizierung von Krankheiten, der Entdeckung von Tumoren und beider Segmentierung von Tumoren untersucht. In der Literatur wird hier zwischen zwei verschiedenen Ansätzen unterschieden: datenbasierte und algorithmische Ansätze. Die vorliegende Arbeit behandelt das Lernen unter Nutzung von unbalancierten medizinischen Bilddatensätzen mittels datenbasierter und algorithmischer Ansätze.

Bei den datenbasierten Ansätzen ist es unser Ziel, die Datenverteilung durch gezieltes Nutzen der vorliegenden Datenbasis auszubalancieren. Dazu schlagen wir neuartige Ansätze vor, um eine ausgeglichene Einordnung der Daten aus seltenen Klassen vornehmen zu können. Diese Ansätze sind unter anderem synthesize minority class sampling, patient-wise batch normalization, und die Erstellung von komplementären Labels unter Nutzung von generative adversarial networks. Auf der Seite der algorithmischen Ansätze verändern wir den Trainingsalgorithmus, um die Tendenz in Richtung der Klasse mit der Mehrzahl an Trainingsdaten zu verringern. Dafür schlagen wir verschiedene Algorithmen im Bereich des kostenintensiven Lernens, Ensemble-Lernens und des gemeinsamen Lernens vor, um mit stark unbalancierten Trainingsdaten umgehen zu können.

Wir zeigen, dass unsere vorgeschlagenen Ansätze für verschiedenste Typen von medizinischen Bildern, mit variierender Größe, auf verschiedene Anwendungen im klinischen Alltag, z. B. Krankheitsklassifizierung, oder semantische Segmentierung, anwendbar sind. Weiterhin haben unsere Algorithmen hervorragende Ergebnisse bei unterschiedlichen Wettbewerben zur medizinischen Bildanalyse gezeigt.
KW  - machine learning
KW  - deep learning
KW  - computer vision
KW  - imbalanced learning
KW  - medical image analysis
KW  - Maschinenlernen
KW  - tiefes Lernen
KW  - unbalancierter Datensatz
KW  - Computervision
KW  - medizinische Bildanalyse
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-442759
ER  -