004 Datenverarbeitung; Informatik
Refine
Year of publication
- 2021 (78) (remove)
Document Type
- Article (57)
- Doctoral Thesis (7)
- Monograph/Edited Volume (6)
- Conference Proceeding (5)
- Postprint (3)
Is part of the Bibliography
- yes (78) (remove)
Keywords
- N-of-1 trial (2)
- SCED (2)
- app (2)
- blockchain (2)
- business process management (2)
- business processes (2)
- computer vision (2)
- cyber-physical systems (2)
- deferred choice (2)
- digital health (2)
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (23)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (19)
- Institut für Informatik und Computational Science (10)
- Fachgruppe Betriebswirtschaftslehre (7)
- Bürgerliches Recht (4)
- Department Linguistik (3)
- Extern (3)
- Institut für Biochemie und Biologie (3)
- Department Erziehungswissenschaft (2)
- Digital Engineering Fakultät (2)
- Institut für Physik und Astronomie (2)
- Wirtschaftswissenschaften (2)
- Department Sport- und Gesundheitswissenschaften (1)
- Institut für Mathematik (1)
- Institut für Umweltwissenschaften und Geographie (1)
- Potsdam Transfer - Zentrum für Gründung, Innovation, Wissens- und Technologietransfer (1)
- Öffentliches Recht (1)
Viper
(2021)
Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
Which event happened first?
(2021)
First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
Background:
Research into the application of virtual reality technology in the health care sector has rapidly increased, resulting in a large body of research that is difficult to keep up with.
Objective:
We will provide an overview of the annual publication numbers in this field and the most productive and influential countries, journals, and authors, as well as the most used, most co-occurring, and most recent keywords.
Methods:
Based on a data set of 356 publications and 20,363 citations derived from Web of Science, we conducted a bibliometric analysis using BibExcel, HistCite, and VOSviewer.
Results:
The strongest growth in publications occurred in 2020, accounting for 29.49% of all publications so far. The most productive countries are the United States, the United Kingdom, and Spain; the most influential countries are the United States, Canada, and the United Kingdom. The most productive journals are the Journal of Medical Internet Research (JMIR), JMIR Serious Games, and the Games for Health Journal; the most influential journals are Patient Education and Counselling, Medical Education, and Quality of Life Research. The most productive authors are Riva, del Piccolo, and Schwebel; the most influential authors are Finset, del Piccolo, and Eide. The most frequently occurring keywords other than “virtual” and “reality” are “training,” “trial,” and “patients.” The most relevant research themes are communication, education, and novel treatments; the most recent research trends are fitness and exergames.
Conclusions:
The analysis shows that the field has left its infant state and its specialization is advancing, with a clear focus on patient usability.
Comprior
(2021)
Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.
Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.
Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
Generative adversarial networks (GANs) have been broadly applied to a wide range of application domains since their proposal. In this thesis, we propose several methods that aim to tackle different existing problems in GANs. Particularly, even though GANs are generally able to generate high-quality samples, the diversity of the generated set is often sub-optimal. Moreover, the common increase of the number of models in the original GANs framework, as well as their architectural sizes, introduces additional costs. Additionally, even though challenging, the proper evaluation of a generated set is an important direction to ultimately improve the generation process in GANs. We start by introducing two diversification methods that extend the original GANs framework to multiple adversaries to stimulate sample diversity in a generated set. Then, we introduce a new post-training compression method based on Monte Carlo methods and importance sampling to quantize and prune the weights and activations of pre-trained neural networks without any additional training. The previous method may be used to reduce the memory and computational costs introduced by increasing the number of models in the original GANs framework. Moreover, we use a similar procedure to quantize and prune gradients during training, which also reduces the communication costs between different workers in a distributed training setting. We introduce several topology-based evaluation methods to assess data generation in different settings, namely image generation and language generation. Our methods retrieve both single-valued and double-valued metrics, which, given a real set, may be used to broadly assess a generated set or separately evaluate sample quality and sample diversity, respectively. Moreover, two of our metrics use locality-sensitive hashing to accurately assess the generated sets of highly compressed GANs. The analysis of the compression effects in GANs paves the way for their efficient employment in real-world applications. Given their general applicability, the methods proposed in this thesis may be extended beyond the context of GANs. Hence, they may be generally applied to enhance existing neural networks and, in particular, generative frameworks.
3D point clouds are a universal and discrete digital representation of three-dimensional objects and environments. For geospatial applications, 3D point clouds have become a fundamental type of raw data acquired and generated using various methods and techniques. In particular, 3D point clouds serve as raw data for creating digital twins of the built environment.
This thesis concentrates on the research and development of concepts, methods, and techniques for preprocessing, semantically enriching, analyzing, and visualizing 3D point clouds for applications around transport infrastructure. It introduces a collection of preprocessing techniques that aim to harmonize raw 3D point cloud data, such as point density reduction and scan profile detection. Metrics such as, e.g., local density, verticality, and planarity are calculated for later use. One of the key contributions tackles the problem of analyzing and deriving semantic information in 3D point clouds. Three different approaches are investigated: a geometric analysis, a machine learning approach operating on synthetically generated 2D images, and a machine learning approach operating on 3D point clouds without intermediate representation.
In the first application case, 2D image classification is applied and evaluated for mobile mapping data focusing on road networks to derive road marking vector data. The second application case investigates how 3D point clouds can be merged with ground-penetrating radar data for a combined visualization and to automatically identify atypical areas in the data. For example, the approach detects pavement regions with developing potholes. The third application case explores the combination of a 3D environment based on 3D point clouds with panoramic imagery to improve visual representation and the detection of 3D objects such as traffic signs.
The presented methods were implemented and tested based on software frameworks for 3D point clouds and 3D visualization. In particular, modules for metric computation, classification procedures, and visualization techniques were integrated into a modular pipeline-based C++ research framework for geospatial data processing, extended by Python machine learning scripts. All visualization and analysis techniques scale to large real-world datasets such as road networks of entire cities or railroad networks.
The thesis shows that some use cases allow taking advantage of established image vision methods to analyze images rendered from mobile mapping data efficiently. The two presented semantic classification methods working directly on 3D point clouds are use case independent and show similar overall accuracy when compared to each other. While the geometry-based method requires less computation time, the machine learning-based method supports arbitrary semantic classes but requires training the network with ground truth data. Both methods can be used in combination to gradually build this ground truth with manual corrections via a respective annotation tool.
This thesis contributes results for IT system engineering of applications, systems, and services that require spatial digital twins of transport infrastructure such as road networks and railroad networks based on 3D point clouds as raw data. It demonstrates the feasibility of fully automated data flows that map captured 3D point clouds to semantically classified models. This provides a key component for seamlessly integrated spatial digital twins in IT solutions that require up-to-date, object-based, and semantically enriched information about the built environment.
Die stetige Weiterentwicklung von VR-Systemen bietet neue Möglichkeiten der Interaktion mit virtuellen Objekten im dreidimensionalen Raum, stellt Entwickelnde von VRAnwendungen aber auch vor neue Herausforderungen. Selektions- und Manipulationstechniken müssen unter Berücksichtigung des Anwendungsszenarios, der Zielgruppe und der zur Verfügung stehenden Ein- und Ausgabegeräte ausgewählt werden. Diese Arbeit leistet einen Beitrag dazu, die Auswahl von passenden Interaktionstechniken zu unterstützen. Hierfür wurde eine repräsentative Menge von Selektions- und Manipulationstechniken untersucht und, unter Berücksichtigung existierender Klassifikationssysteme, eine Taxonomie entwickelt, die die Analyse der Techniken hinsichtlich interaktionsrelevanter Eigenschaften ermöglicht. Auf Basis dieser Taxonomie wurden Techniken ausgewählt, die in einer explorativen Studie verglichen wurden, um Rückschlüsse auf die Dimensionen der Taxonomie zu ziehen und neue Indizien für Vor- und Nachteile der Techniken in spezifischen Anwendungsszenarien zu generieren. Die Ergebnisse der Arbeit münden in eine Webanwendung, die Entwickelnde von VR-Anwendungen gezielt dabei unterstützt, passende Selektions- und Manipulationstechniken für ein Anwendungsszenario auszuwählen, indem Techniken auf Basis der Taxonomie gefiltert und unter Verwendung der Resultate aus der Studie sortiert werden können.
The formal modeling and analysis is of crucial importance for software development processes following the model based approach. We present the formalism of Interval Probabilistic Timed Graph Transformation Systems (IPTGTSs) as a high-level modeling language. This language supports structure dynamics (based on graph transformation), timed behavior (based on clocks, guards, resets, and invariants as in Timed Automata (TA)), and interval probabilistic behavior (based on Discrete Interval Probability Distributions). That is, for the probabilistic behavior, the modeler using IPTGTSs does not need to provide precise probabilities, which are often impossible to obtain, but rather provides a probability range instead from which a precise probability is chosen nondeterministically. In fact, this feature on capturing probabilistic behavior distinguishes IPTGTSs from Probabilistic Timed Graph Transformation Systems (PTGTSs) presented earlier.
Following earlier work on Interval Probabilistic Timed Automata (IPTA) and PTGTSs, we also provide an analysis tool chain for IPTGTSs based on inter-formalism transformations. In particular, we provide in our tool AutoGraph a translation of IPTGTSs to IPTA and rely on a mapping of IPTA to Probabilistic Timed Automata (PTA) to allow for the usage of the Prism model checker. The tool Prism can then be used to analyze the resulting PTA w.r.t. probabilistic real-time queries asking for worst-case and best-case probabilities to reach a certain set of target states in a given amount of time.
Learning analytics at scale
(2021)
Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers.
The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended:
(1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action.
For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research.