000 Informatik, Informationswissenschaft, allgemeine Werke
Refine
Year of publication
Document Type
- Article (203)
- Other (117)
- Doctoral Thesis (87)
- Part of a Book (11)
- Conference Proceeding (10)
- Postprint (10)
- Monograph/Edited Volume (5)
- Master's Thesis (4)
- Habilitation Thesis (1)
- Part of Periodical (1)
Keywords
- MOOC (39)
- e-learning (35)
- Digitale Bildung (34)
- Kursdesign (33)
- Micro Degree (33)
- Online-Lehre (33)
- Onlinekurs (33)
- Onlinekurs-Produktion (33)
- digital education (33)
- micro degree (33)
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (224)
- Institut für Informatik und Computational Science (92)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (78)
- Wirtschaftswissenschaften (23)
- Institut für Physik und Astronomie (8)
- Universitätsbibliothek (5)
- Fachgruppe Betriebswirtschaftslehre (3)
- Institut für Mathematik (3)
- Department Linguistik (2)
- Fachgruppe Volkswirtschaftslehre (2)
Data stream processing systems (DSPSs) are a key enabler to integrate continuously generated data, such as sensor measurements, into enterprise applications. DSPSs allow to steadily analyze information from data streams, e.g., to monitor manufacturing processes and enable fast reactions to anomalous behavior. Moreover, DSPSs continuously filter, sample, and aggregate incoming streams of data, which reduces the data size, and thus data storage costs.
The growing volumes of generated data have increased the demand for high-performance DSPSs, leading to a higher interest in these systems and to the development of new DSPSs. While having more DSPSs is favorable for users as it allows choosing the system that satisfies their requirements the most, it also introduces the challenge of identifying the most suitable DSPS regarding current needs as well as future demands. Having a solution to this challenge is important because replacements of DSPSs require the costly re-writing of applications if no abstraction layer is used for application development. However, quantifying performance differences between DSPSs is a difficult task. Existing benchmarks fail to integrate all core functionalities of DSPSs and lack tool support, which hinders objective result comparisons. Moreover, no current benchmark covers the combination of streaming data with existing structured business data, which is particularly relevant for companies.
This thesis proposes a performance benchmark for enterprise stream processing called ESPBench. With enterprise stream processing, we refer to the combination of streaming and structured business data. Our benchmark design represents real-world scenarios and allows for an objective result comparison as well as scaling of data. The defined benchmark query set covers all core functionalities of DSPSs. The benchmark toolkit automates the entire benchmark process and provides important features, such as query result validation and a configurable data ingestion rate.
To validate ESPBench and to ease the use of the benchmark, we propose an example implementation of the ESPBench queries leveraging the Apache Beam software development kit (SDK). The Apache Beam SDK is an abstraction layer designed for developing stream processing applications that is applied in academia as well as enterprise contexts. It allows to run the defined applications on any of the supported DSPSs. The performance impact of Apache Beam is studied in this dissertation as well. The results show that there is a significant influence that differs among DSPSs and stream processing applications. For validating ESPBench, we use the example implementation of the ESPBench queries developed using the Apache Beam SDK. We benchmark the implemented queries executed on three modern DSPSs: Apache Flink, Apache Spark Streaming, and Hazelcast Jet. The results of the study prove the functioning of ESPBench and its toolkit. ESPBench is capable of quantifying performance characteristics of DSPSs and of unveiling differences among systems.
The benchmark proposed in this thesis covers all requirements to be applied in enterprise stream processing settings, and thus represents an improvement over the current state-of-the-art.
We analyze the problem of response suggestion in a closed domain along a real-world scenario of a digital library. We present a text-processing pipeline to generate question-answer pairs from chat transcripts. On this limited amount of training data, we compare retrieval-based, conditioned-generation, and dedicated representation learning approaches for response suggestion. Our results show that retrieval-based methods that strive to find similar, known contexts are preferable over parametric approaches from the conditioned-generation family, when the training data is limited. We, however, identify a specific representation learning approach that is competitive to the retrieval-based approaches despite the training data limitation.
A catalog of genetic loci associated with kidney function from analyses of a million individuals
(2019)
Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through transancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these,147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research.
Increasing demand for analytical processing capabilities can be managed by replication approaches. However, to evenly balance the replicas' workload shares while at the same time minimizing the data replication factor is a highly challenging allocation problem. As optimal solutions are only applicable for small problem instances, effective heuristics are indispensable. In this paper, we test and compare state-of-the-art allocation algorithms for partial replication. By visualizing and exploring their (heuristic) solutions for different benchmark workloads, we are able to derive structural insights and to detect an algorithm's strengths as well as its potential for improvement. Further, our application enables end-to-end evaluations of different allocations to verify their theoretical performance.
Microservice Architectures (MSA) structure applications as a collection of loosely coupled services that implement business capabilities. The key advantages of MSA include inherent support for continuous deployment of large complex applications, agility and enhanced productivity. However, studies indicate that most MSA are homogeneous, and introduce shared vulnerabilites, thus vulnerable to multi-step attacks, which are economics-of-scale incentives to attackers. In this paper, we address the issue of shared vulnerabilities in microservices with a novel solution based on the concept of Moving Target Defenses (MTD). Our mechanism works by performing risk analysis against microservices to detect and prioritize vulnerabilities. Thereafter, security risk-oriented software diversification is employed, guided by a defined diversification index. The diversification is performed at runtime, leveraging both model and template based automatic code generation techniques to automatically transform programming languages and container images of the microservices. Consequently, the microservices attack surfaces are altered thereby introducing uncertainty for attackers while reducing the attackability of the microservices. Our experiments demonstrate the efficiency of our solution, with an average success rate of over 70% attack surface randomization.
With the emergence of the Internet of things (IoT), plenty of battery-powered and energy-harvesting devices are being deployed to fulfill sensing and actuation tasks in a variety of application areas, such as smart homes, precision agriculture, smart cities, and industrial automation. In this context, a critical issue is that of denial-of-sleep attacks. Such attacks temporarily or permanently deprive battery-powered, energy-harvesting, or otherwise energy-constrained devices of entering energy-saving sleep modes, thereby draining their charge. At the very least, a successful denial-of-sleep attack causes a long outage of the victim device. Moreover, to put battery-powered devices back into operation, their batteries have to be replaced. This is tedious and may even be infeasible, e.g., if a battery-powered device is deployed at an inaccessible location. While the research community came up with numerous defenses against denial-of-sleep attacks, most present-day IoT protocols include no denial-of-sleep defenses at all, presumably due to a lack of awareness and unsolved integration problems. After all, despite there are many denial-of-sleep defenses, effective defenses against certain kinds of denial-of-sleep attacks are yet to be found.
The overall contribution of this dissertation is to propose a denial-of-sleep-resilient medium access control (MAC) layer for IoT devices that communicate over IEEE 802.15.4 links. Internally, our MAC layer comprises two main components. The first main component is a denial-of-sleep-resilient protocol for establishing session keys among neighboring IEEE 802.15.4 nodes. The established session keys serve the dual purpose of implementing (i) basic wireless security and (ii) complementary denial-of-sleep defenses that belong to the second main component. The second main component is a denial-of-sleep-resilient MAC protocol. Notably, this MAC protocol not only incorporates novel denial-of-sleep defenses, but also state-of-the-art mechanisms for achieving low energy consumption, high throughput, and high delivery ratios. Altogether, our MAC layer resists, or at least greatly mitigates, all denial-of-sleep attacks against it we are aware of. Furthermore, our MAC layer is self-contained and thus can act as a drop-in replacement for IEEE 802.15.4-compliant MAC layers. In fact, we implemented our MAC layer in the Contiki-NG operating system, where it seamlessly integrates into an existing protocol stack.
Nowadays, graph data models are employed, when relationships between entities have to be stored and are in the scope of queries. For each entity, this graph data model locally stores relationships to adjacent entities. Users employ graph queries to query and modify these entities and relationships. These graph queries employ graph patterns to lookup all subgraphs in the graph data that satisfy certain graph structures. These subgraphs are called graph pattern matches. However, this graph pattern matching is NP-complete for subgraph isomorphism. Thus, graph queries can suffer a long response time, when the number of entities and relationships in the graph data or the graph patterns increases.
One possibility to improve the graph query performance is to employ graph views that keep ready graph pattern matches for complex graph queries for later retrieval. However, these graph views must be maintained by means of an incremental graph pattern matching to keep them consistent with the graph data from which they are derived, when the graph data changes. This maintenance adds subgraphs that satisfy a graph pattern to the graph views and removes subgraphs that do not satisfy a graph pattern anymore from the graph views.
Current approaches for incremental graph pattern matching employ Rete networks. Rete networks are discrimination networks that enumerate and maintain all graph pattern matches of certain graph queries by employing a network of condition tests, which implement partial graph patterns that together constitute the overall graph query. Each condition test stores all subgraphs that satisfy the partial graph pattern. Thus, Rete networks suffer high memory consumptions, because they store a large number of partial graph pattern matches. But, especially these partial graph pattern matches enable Rete networks to update the stored graph pattern matches efficiently, because the network maintenance exploits the already stored partial graph pattern matches to find new graph pattern matches. However, other kinds of discrimination networks exist that can perform better in time and space than Rete networks. Currently, these other kinds of networks are not used for incremental graph pattern matching.
This thesis employs generalized discrimination networks for incremental graph pattern matching. These discrimination networks permit a generalized network structure of condition tests to enable users to steer the trade-off between memory consumption and execution time for the incremental graph pattern matching. For that purpose, this thesis contributes a modeling language for the effective definition of generalized discrimination networks. Furthermore, this thesis contributes an efficient and scalable incremental maintenance algorithm, which updates the (partial) graph pattern matches that are stored by each condition test. Moreover, this thesis provides a modeling evaluation, which shows that the proposed modeling language enables the effective modeling of generalized discrimination networks. Furthermore, this thesis provides a performance evaluation, which shows that a) the incremental maintenance algorithm scales, when the graph data becomes large, and b) the generalized discrimination network structures can outperform Rete network structures in time and space at the same time for incremental graph pattern matching.
Mobile devices and associated applications (apps) are an indispensable part of daily life and provide access to important information anytime and anywhere. However, the availability of university-wide services in the mobile sector is still poor. If they exist they usually result from individual activities of students and teachers. Mobile applications can have an essential impact on the improvement of students’ self-organization as well as on the design and enhancement of specific learning scenarios, though. This article introduces a mobile campus app framework, which integrates central campus services and decentralized learning applications. An analysis of strengths and weaknesses of different approaches is presented to summarize and evaluate them in terms of requirements, development, maintenance and operation. The article discusses the underlying service-oriented architecture that allows transferring the campus app to other universities or institutions at reasonable cost. It concludes with a presentation of the results as well as ongoing discussions and future work
A Landscape for Case Models
(2019)
Case Management is a paradigm to support knowledge-intensive processes. The different approaches developed for modeling these types of processes tend to result in scattered models due to the low abstraction level at which the inherently complex processes are therein represented. Thus, readability and understandability is more challenging than that of traditional process models. By reviewing existing proposals in the field of process overviews and case models, this paper extends a case modeling language - the fragment-based Case Management (fCM) language - with the goal of modeling knowledge-intensive processes from a higher abstraction level - to generate a so-called fCM landscape. This proposal is empirically evaluated via an online experiment. Results indicate that interpreting an fCM landscape might be more effective and efficient than interpreting an informationally equivalent case model.