Filtern
Erscheinungsjahr
Dokumenttyp
- Wissenschaftlicher Artikel (160)
- Dissertation (94)
- Sonstiges (83)
- Monographie/Sammelband (39)
- Postprint (22)
- Konferenzveröffentlichung (3)
- Teil eines Buches (Kapitel) (1)
- Habilitation (1)
- Bericht (1)
Schlagworte
- MOOC (42)
- digital education (37)
- e-learning (36)
- Digitale Bildung (34)
- online course creation (34)
- online course design (34)
- Kursdesign (33)
- Micro Degree (33)
- Online-Lehre (33)
- Onlinekurs (33)
Institut
- Hasso-Plattner-Institut für Digital Engineering GmbH (404) (entfernen)
Data stream processing systems (DSPSs) are a key enabler to integrate continuously generated data, such as sensor measurements, into enterprise applications. DSPSs allow to steadily analyze information from data streams, e.g., to monitor manufacturing processes and enable fast reactions to anomalous behavior. Moreover, DSPSs continuously filter, sample, and aggregate incoming streams of data, which reduces the data size, and thus data storage costs.
The growing volumes of generated data have increased the demand for high-performance DSPSs, leading to a higher interest in these systems and to the development of new DSPSs. While having more DSPSs is favorable for users as it allows choosing the system that satisfies their requirements the most, it also introduces the challenge of identifying the most suitable DSPS regarding current needs as well as future demands. Having a solution to this challenge is important because replacements of DSPSs require the costly re-writing of applications if no abstraction layer is used for application development. However, quantifying performance differences between DSPSs is a difficult task. Existing benchmarks fail to integrate all core functionalities of DSPSs and lack tool support, which hinders objective result comparisons. Moreover, no current benchmark covers the combination of streaming data with existing structured business data, which is particularly relevant for companies.
This thesis proposes a performance benchmark for enterprise stream processing called ESPBench. With enterprise stream processing, we refer to the combination of streaming and structured business data. Our benchmark design represents real-world scenarios and allows for an objective result comparison as well as scaling of data. The defined benchmark query set covers all core functionalities of DSPSs. The benchmark toolkit automates the entire benchmark process and provides important features, such as query result validation and a configurable data ingestion rate.
To validate ESPBench and to ease the use of the benchmark, we propose an example implementation of the ESPBench queries leveraging the Apache Beam software development kit (SDK). The Apache Beam SDK is an abstraction layer designed for developing stream processing applications that is applied in academia as well as enterprise contexts. It allows to run the defined applications on any of the supported DSPSs. The performance impact of Apache Beam is studied in this dissertation as well. The results show that there is a significant influence that differs among DSPSs and stream processing applications. For validating ESPBench, we use the example implementation of the ESPBench queries developed using the Apache Beam SDK. We benchmark the implemented queries executed on three modern DSPSs: Apache Flink, Apache Spark Streaming, and Hazelcast Jet. The results of the study prove the functioning of ESPBench and its toolkit. ESPBench is capable of quantifying performance characteristics of DSPSs and of unveiling differences among systems.
The benchmark proposed in this thesis covers all requirements to be applied in enterprise stream processing settings, and thus represents an improvement over the current state-of-the-art.
We analyze the problem of response suggestion in a closed domain along a real-world scenario of a digital library. We present a text-processing pipeline to generate question-answer pairs from chat transcripts. On this limited amount of training data, we compare retrieval-based, conditioned-generation, and dedicated representation learning approaches for response suggestion. Our results show that retrieval-based methods that strive to find similar, known contexts are preferable over parametric approaches from the conditioned-generation family, when the training data is limited. We, however, identify a specific representation learning approach that is competitive to the retrieval-based approaches despite the training data limitation.
A catalog of genetic loci associated with kidney function from analyses of a million individuals
(2019)
Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through transancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these,147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research.
Increasing demand for analytical processing capabilities can be managed by replication approaches. However, to evenly balance the replicas' workload shares while at the same time minimizing the data replication factor is a highly challenging allocation problem. As optimal solutions are only applicable for small problem instances, effective heuristics are indispensable. In this paper, we test and compare state-of-the-art allocation algorithms for partial replication. By visualizing and exploring their (heuristic) solutions for different benchmark workloads, we are able to derive structural insights and to detect an algorithm's strengths as well as its potential for improvement. Further, our application enables end-to-end evaluations of different allocations to verify their theoretical performance.
Microservice Architectures (MSA) structure applications as a collection of loosely coupled services that implement business capabilities. The key advantages of MSA include inherent support for continuous deployment of large complex applications, agility and enhanced productivity. However, studies indicate that most MSA are homogeneous, and introduce shared vulnerabilites, thus vulnerable to multi-step attacks, which are economics-of-scale incentives to attackers. In this paper, we address the issue of shared vulnerabilities in microservices with a novel solution based on the concept of Moving Target Defenses (MTD). Our mechanism works by performing risk analysis against microservices to detect and prioritize vulnerabilities. Thereafter, security risk-oriented software diversification is employed, guided by a defined diversification index. The diversification is performed at runtime, leveraging both model and template based automatic code generation techniques to automatically transform programming languages and container images of the microservices. Consequently, the microservices attack surfaces are altered thereby introducing uncertainty for attackers while reducing the attackability of the microservices. Our experiments demonstrate the efficiency of our solution, with an average success rate of over 70% attack surface randomization.
With the emergence of the Internet of things (IoT), plenty of battery-powered and energy-harvesting devices are being deployed to fulfill sensing and actuation tasks in a variety of application areas, such as smart homes, precision agriculture, smart cities, and industrial automation. In this context, a critical issue is that of denial-of-sleep attacks. Such attacks temporarily or permanently deprive battery-powered, energy-harvesting, or otherwise energy-constrained devices of entering energy-saving sleep modes, thereby draining their charge. At the very least, a successful denial-of-sleep attack causes a long outage of the victim device. Moreover, to put battery-powered devices back into operation, their batteries have to be replaced. This is tedious and may even be infeasible, e.g., if a battery-powered device is deployed at an inaccessible location. While the research community came up with numerous defenses against denial-of-sleep attacks, most present-day IoT protocols include no denial-of-sleep defenses at all, presumably due to a lack of awareness and unsolved integration problems. After all, despite there are many denial-of-sleep defenses, effective defenses against certain kinds of denial-of-sleep attacks are yet to be found.
The overall contribution of this dissertation is to propose a denial-of-sleep-resilient medium access control (MAC) layer for IoT devices that communicate over IEEE 802.15.4 links. Internally, our MAC layer comprises two main components. The first main component is a denial-of-sleep-resilient protocol for establishing session keys among neighboring IEEE 802.15.4 nodes. The established session keys serve the dual purpose of implementing (i) basic wireless security and (ii) complementary denial-of-sleep defenses that belong to the second main component. The second main component is a denial-of-sleep-resilient MAC protocol. Notably, this MAC protocol not only incorporates novel denial-of-sleep defenses, but also state-of-the-art mechanisms for achieving low energy consumption, high throughput, and high delivery ratios. Altogether, our MAC layer resists, or at least greatly mitigates, all denial-of-sleep attacks against it we are aware of. Furthermore, our MAC layer is self-contained and thus can act as a drop-in replacement for IEEE 802.15.4-compliant MAC layers. In fact, we implemented our MAC layer in the Contiki-NG operating system, where it seamlessly integrates into an existing protocol stack.
As resources are valuable assets, organizations have to decide which resources to allocate to business process tasks in a way that the process is executed not only effectively but also efficiently. Traditional role-based resource allocation leads to effective process executions, since each task is performed by a resource that has the required skills and competencies to do so. However, the resulting allocations are typically not as efficient as they could be, since optimization techniques have yet to find their way in traditional business process management scenarios. On the other hand, operations research provides a rich set of analytical methods for supporting problem-specific decisions on resource allocation. This paper provides a novel framework for creating transparency on existing tasks and resources, supporting individualized allocations for each activity in a process, and the possibility to integrate problem-specific analytical methods of the operations research domain. To validate the framework, the paper reports on the design and prototypical implementation of a software architecture, which extends a traditional process engine with a dedicated resource management component. This component allows us to define specific resource allocation problems at design time, and it also facilitates optimized resource allocation at run time. The framework is evaluated using a real-world parcel delivery process. The evaluation shows that the quality of the allocation results increase significantly with a technique from operations research in contrast to the traditional applied rule-based approach.
A Landscape for Case Models
(2019)
Case Management is a paradigm to support knowledge-intensive processes. The different approaches developed for modeling these types of processes tend to result in scattered models due to the low abstraction level at which the inherently complex processes are therein represented. Thus, readability and understandability is more challenging than that of traditional process models. By reviewing existing proposals in the field of process overviews and case models, this paper extends a case modeling language - the fragment-based Case Management (fCM) language - with the goal of modeling knowledge-intensive processes from a higher abstraction level - to generate a so-called fCM landscape. This proposal is empirically evaluated via an online experiment. Results indicate that interpreting an fCM landscape might be more effective and efficient than interpreting an informationally equivalent case model.
Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond: For example, in model-driven engineering, the abstract syntax of models is usually encoded using graphs. Flexible edit operations temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. Similarly, in graph databases—managing the storage and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints, requiring also graph repair. We present a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing repairs. In our context, we formalize consistency by so-called graph conditions being equivalent to first-order logic on graphs. We present two kind of repair algorithms: State-based repair restores consistency independent of the graph update history, whereas deltabased (or incremental) repair takes this history explicitly into account. Technically, our algorithms rely on an existing model generation algorithm for graph conditions implemented in AutoGraph. Moreover, the delta-based approach uses the new concept of satisfaction (ST) trees for encoding if and how a graph satisfies a graph condition. We then demonstrate how to manipulate these STs incrementally with respect to a graph update.
Industry 4.0 is transforming how businesses innovate and, as a result, companies are spearheading the movement towards 'Digital Transformation'. While some scholars advocate the use of design thinking to identify new innovative behaviours, cognition experts emphasise the importance of top managers in supporting employees to develop these behaviours. However, there is a dearth of research in this domain and companies are struggling to implement the required behaviours. To address this gap, this study aims to identify and prioritise behavioural strategies conducive to design thinking to inform the creation of a managerial mental model. We identify 20 behavioural strategies from 45 interviewees with practitioners and educators and combine them with the concepts of 'paradigm-mindset-mental model' from cognition theory. The paper contributes to the body of knowledge by identifying and prioritising specific behavioural strategies to form a novel set of survival conditions aligned to the new industrial paradigm of Industry 4.0.
The MOOChub is a joined web-based catalog of all relevant German and Austrian MOOC platforms that lists well over 750 Massive Open Online Courses (MOOCs). Automatically building such a catalog requires that all partners describe and publicly offer the metadata of their courses in the same way. The paper at hand presents the genesis of the idea to establish a common metadata standard and the story of its subsequent development. The result of this effort is, first, an open-licensed de-facto-standard, which is based on existing commonly used standards and second, a first prototypical platform that is using this standard: the MOOChub, which lists all courses of the involved partners. This catalog is searchable and provides a more comprehensive overview of basically all MOOCs that are offered by German and Austrian MOOC platforms. Finally, the upcoming developments to further optimize the catalog and the metadata standard are reported.
In an attempt to pave the way for more extensive Computer Science Education (CSE) coverage in K-12, this research developed and made a preliminary evaluation of a blended-learning Introduction to CS program based on an academic MOOC. Using an academic MOOC that is pedagogically effective and engaging, such a program may provide teachers with disciplinary scaffolds and allow them to focus their attention on enhancing students’ learning experience and nurturing critical 21st-century skills such as self-regulated learning. As we demonstrate, this enabled us to introduce an academic level course to middle-school students. In this research, we developed the principals and initial version of such a program, targeting ninth-graders in science-track classes who learn CS as part of their standard curriculum. We found that the middle-schoolers who participated in the program achieved academic results on par with undergraduate students taking this MOOC for academic credit. Participating students also developed a more accurate perception of the essence of CS as a scientific discipline. The unplanned school closure due to the COVID19 pandemic outbreak challenged the research but underlined the advantages of such a MOOCbased blended learning program above classic pedagogy in times of global or local crises that lead to school closure. While most of the science track classes seem to stop learning CS almost entirely, and the end-of-year MoE exam was discarded, the program’s classes smoothly moved to remote learning mode, and students continued to study at a pace similar to that experienced before the school shut down.
Advanced mechatronic systems have to integrate existing technologies from mechanical, electrical and software engineering. They must be able to adapt their structure and behavior at runtime by reconfiguration to react flexibly to changes in the environment. Therefore, a tight integration of structural and behavioral models of the different domains is required. This integration results in complex reconfigurable hybrid systems, the execution logic of which cannot be addressed directly with existing standard modeling, simulation, and code-generation techniques. We present in this paper how our component-based approach for reconfigurable mechatronic systems, M ECHATRONIC UML, efficiently handles the complex interplay of discrete behavior and continuous behavior in a modular manner. In addition, its extension to even more flexible reconfiguration cases is presented.
Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete.
Resource constrained smart micro-grid architectures describe a class of smart micro-grid architectures that handle communications operations over a lossy network and depend on a distributed collection of power generation and storage units. Disadvantaged communities with no or intermittent access to national power networks can benefit from such a micro-grid model by using low cost communication devices to coordinate the power generation, consumption, and storage. Furthermore, this solution is both cost-effective and environmentally-friendly. One model for such micro-grids, is for users to agree to coordinate a power sharing scheme in which individual generator owners sell excess unused power to users wanting access to power. Since the micro-grid relies on distributed renewable energy generation sources which are variable and only partly predictable, coordinating micro-grid operations with distributed algorithms is necessity for grid stability. Grid stability is crucial in retaining user trust in the dependability of the micro-grid, and user participation in the power sharing scheme, because user withdrawals can cause the grid to breakdown which is undesirable. In this chapter, we present a distributed architecture for fair power distribution and billing on microgrids. The architecture is designed to operate efficiently over a lossy communication network, which is an advantage for disadvantaged communities. We build on the architecture to discuss grid coordination notably how tasks such as metering, power resource allocation, forecasting, and scheduling can be handled. All four tasks are managed by a feedback control loop that monitors the performance and behaviour of the micro-grid, and based on historical data makes decisions to ensure the smooth operation of the grid. Finally, since lossy networks are undependable, differentiating system failures from adversarial manipulations is an important consideration for grid stability. We therefore provide a characterisation of potential adversarial models and discuss possible mitigation measures.
The integration of MOOCs into the Moroccan Higher Education (MHE) took place in 2013 by developing different partnerships and projects at national and international levels. As elsewhere, the Covid-19 crisis has played an important role in accelerating distance education in MHE. However, based on our experience as both university professors and specialists in educational engineering, the effective execution of the digital transition has not yet been implemented. Thus, in this article, we present a retrospective feedback of MOOCs in Morocco, focusing on the policies taken by the government to better support the digital transition in general and MOOCs in particular. We are therefore seeking to establish an optimal scenario for the promotion of MOOCs, which emphasizes the policies to be considered, and which recalls the importance of conducting a delicate articulation taking into account four levels, namely environmental, institutional, organizational and individual. We conclude with recommendations that are inspired by the Moroccan academic contex that focus on the major role that MOOCs plays for university students and on maintaining lifelong learning.
3D point cloud technology facilitates the automated and highly detailed digital acquisition of real-world environments such as assets, sites, cities, and countries; the acquired 3D point clouds represent an essential category of geodata used in a variety of geoinformation applications and systems. In this paper, we present a web-based system for the interactive and collaborative exploration and inspection of arbitrary large 3D point clouds. Our approach is based on standard WebGL on the client side and is able to render 3D point clouds with billions of points. It uses spatial data structures and level-of-detail representations to manage the 3D point cloud data and to deploy out-of-core and web-based rendering concepts. By providing functionality for both, thin-client and thick-client applications, the system scales for client devices that are vastly different in computing capabilities. Different 3D point-based rendering techniques and post-processing effects are provided to enable task-specific and data-specific filtering and highlighting, e.g., based on per-point surface categories or temporal information. A set of interaction techniques allows users to collaboratively work with the data, e.g., by measuring distances and areas, by annotating, or by selecting and extracting data subsets. Additional value is provided by the system's ability to display additional, context-providing geodata alongside 3D point clouds and to integrate task-specific processing and analysis operations. We have evaluated the presented techniques and the prototype system with different data sets from aerial, mobile, and terrestrial acquisition campaigns with up to 120 billion points to show their practicality and feasibility.
Spatio-temporal data denotes a category of data that contains spatial as well as temporal components. For example, time-series of geo-data, thematic maps that change over time, or tracking data of moving entities can be interpreted as spatio-temporal data.
In today's automated world, an increasing number of data sources exist, which constantly generate spatio-temporal data. This includes for example traffic surveillance systems, which gather movement data about human or vehicle movements, remote-sensing systems, which frequently scan our surroundings and produce digital representations of cities and landscapes, as well as sensor networks in different domains, such as logistics, animal behavior study, or climate research.
For the analysis of spatio-temporal data, in addition to automatic statistical and data mining methods, exploratory analysis methods are employed, which are based on interactive visualization. These analysis methods let users explore a data set by interactively manipulating a visualization, thereby employing the human cognitive system and knowledge of the users to find patterns and gain insight into the data.
This thesis describes a software framework for the visualization of spatio-temporal data, which consists of GPU-based techniques to enable the interactive visualization and exploration of large spatio-temporal data sets. The developed techniques include data management, processing, and rendering, facilitating real-time processing and visualization of large geo-temporal data sets. It includes three main contributions:
- Concept and Implementation of a GPU-Based Visualization Pipeline.
The developed visualization methods are based on the concept of a GPU-based visualization pipeline, in which all steps -- processing, mapping, and rendering -- are implemented on the GPU. With this concept, spatio-temporal data is represented directly in GPU memory, using shader programs to process and filter the data, apply mappings to visual properties, and finally generate the geometric representations for a visualization during the rendering process. Data processing, filtering, and mapping are thereby executed in real-time, enabling dynamic control over the mapping and a visualization process which can be controlled interactively by a user.
- Attributed 3D Trajectory Visualization.
A visualization method has been developed for the interactive exploration of large numbers of 3D movement trajectories. The trajectories are visualized in a virtual geographic environment, supporting basic geometries such as lines, ribbons, spheres, or tubes. Interactive mapping can be applied to visualize the values of per-node or per-trajectory attributes, supporting shape, height, size, color, texturing, and animation as visual properties. Using the dynamic mapping system, several kind of visualization methods have been implemented, such as focus+context visualization of trajectories using interactive density maps, and space-time cube visualization to focus on the temporal aspects of individual movements.
- Geographic Network Visualization.
A method for the interactive exploration of geo-referenced networks has been developed, which enables the visualization of large numbers of nodes and edges in a geographic context. Several geographic environments are supported, such as a 3D globe, as well as 2D maps using different map projections, to enable the analysis of networks in different contexts and scales. Interactive filtering, mapping, and selection can be applied to analyze these geographic networks, and visualization methods for specific types of networks, such as coupled 3D networks or temporal networks have been implemented.
As a demonstration of the developed visualization concepts, interactive visualization tools for two distinct use cases have been developed. The first contains the visualization of attributed 3D movement trajectories of airplanes around an airport. It allows users to explore and analyze the trajectories of approaching and departing aircrafts, which have been recorded over the period of a month. By applying the interactive visualization methods for trajectory visualization and interactive density maps, analysts can derive insight from the data, such as common flight paths, regular and irregular patterns, or uncommon incidents such as missed approaches on the airport.
The second use case involves the visualization of climate networks, which are geographic networks in the climate research domain. They represent the dynamics of the climate system using a network structure that expresses statistical interrelationships between different regions. The interactive tool allows climate analysts to explore these large networks, analyzing the network's structure and relating it to the geographic background. Interactive filtering and selection enables them to find patterns in the climate data and identify e.g. clusters in the networks or flow patterns.
Patent document collections are an immense source of knowledge for research and innovation communities worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and analyzing information from this source in an effective manner. Based on deep learning methods for natural language processing, novel approaches have been developed in the field of patent analysis. The goal of these approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article, we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for future work.
Rapid advances in location-acquisition technologies have led to large amounts of trajectory data. This data is the foundation for a broad spectrum of services driven and improved by trajectory data mining. However, for hybrid transactional and analytical workloads, the storing and processing of rapidly accumulated trajectory data is a non-trivial task. In this paper, we present a detailed survey about state-of-the-art trajectory data management systems. To determine the relevant aspects and requirements for such systems, we developed a trajectory data mining framework, which summarizes the different steps in the trajectory data mining process. Based on the derived requirements, we analyze different concepts to store, compress, index, and process spatio-temporal data. There are various trajectory management systems, which are optimized for scalability, data footprint reduction, elasticity, or query performance. To get a comprehensive overview, we describe and compare different exciting systems. Additionally, the observed similarities in the general structure of different systems are consolidated in a general blueprint of trajectory management systems.