Filtern
Volltext vorhanden
- ja (210) (entfernen)
Erscheinungsjahr
Dokumenttyp
- Dissertation (88)
- Wissenschaftlicher Artikel (57)
- Monographie/Sammelband (39)
- Postprint (22)
- Konferenzveröffentlichung (3)
- Bericht (1)
Schlagworte
- MOOC (38)
- digital education (37)
- e-learning (35)
- Digitale Bildung (34)
- online course creation (34)
- online course design (34)
- Kursdesign (33)
- Micro Degree (33)
- Online-Lehre (33)
- Onlinekurs (33)
Institut
- Hasso-Plattner-Institut für Digital Engineering GmbH (210) (entfernen)
At the beginning of 2020, with COVID-19, courts of justice worldwide had to move online to continue providing judicial service. Digital technologies materialized the court practices in ways unthinkable shortly before the pandemic creating resonances with judicial and legal regulation, as well as frictions. A better understanding of the dynamics at play in the digitalization of courts is paramount for designing justice systems that serve their users better, ensure fair and timely dispute resolutions, and foster access to justice. Building on three major bodies of literature —e-justice, digitalization and organization studies, and design research— Designing for Digital Justice takes a nuanced approach to account for human and more-than-human agencies.
Using a qualitative approach, I have studied in depth the digitalization of Chilean courts during the pandemic, specifically between April 2020 and September 2022. Leveraging a comprehensive source of primary and secondary data, I traced back the genealogy of the novel materializations of courts’ practices structured by the possibilities offered by digital technologies. In five (5) cases studies, I show in detail how the courts got to 1) work remotely, 2) host hearings via videoconference, 3) engage with users via social media (i.e., Facebook and Chat Messenger), 4) broadcast a show with judges answering questions from users via Facebook Live, and 5) record, stream, and upload judicial hearings to YouTube to fulfil the publicity requirement of criminal hearings. The digitalization of courts during the pandemic is characterized by a suspended normativity, which makes innovation possible yet presents risks. While digital technologies enabled the judiciary to provide services continuously, they also created the risk of displacing traditional judicial and legal regulation.
Contributing to liminal innovation and digitalization research, Designing for Digital Justice theorizes four phases: 1) the pre-digitalization phase resulting in the development of regulation, 2) the hotspot of digitalization resulting in the extension of regulation, 3) the digital innovation redeveloping regulation (moving to a new, preliminary phase), and 4) the permanence of temporal practices displacing regulation. Contributing to design research Designing for Digital Justice provides new possibilities for innovation in the courts, focusing at different levels to better address tensions generated by digitalization. Fellow researchers will find in these pages a sound theoretical advancement at the intersection of digitalization and justice with novel methodological references. Practitioners will benefit from the actionable governance framework Designing for Digital Justice Model, which provides three fields of possibilities for action to design better justice systems. Only by taking into account digital, legal, and social factors can we design better systems that promote access to justice, the rule of law, and, ultimately social peace.
In model-driven engineering, the adaptation of large software systems with dynamic structure is enabled by architectural runtime models. Such a model represents an abstract state of the system as a graph of interacting components. Every relevant change in the system is mirrored in the model and triggers an evaluation of model queries, which search the model for structural patterns that should be adapted. This thesis focuses on a type of runtime models where the expressiveness of the model and model queries is extended to capture past changes and their timing. These history-aware models and temporal queries enable more informed decision-making during adaptation, as they support the formulation of requirements on the evolution of the pattern that should be adapted. However, evaluating temporal queries during adaptation poses significant challenges. First, it implies the capability to specify and evaluate requirements on the structure, as well as the ordering and timing in which structural changes occur. Then, query answers have to reflect that the history-aware model represents the architecture of a system whose execution may be ongoing, and thus answers may depend on future changes. Finally, query evaluation needs to be adequately fast and memory-efficient despite the increasing size of the history---especially for models that are altered by numerous, rapid changes.
The thesis presents a query language and a querying approach for the specification and evaluation of temporal queries. These contributions aim to cope with the challenges of evaluating temporal queries at runtime, a prerequisite for history-aware architectural monitoring and adaptation which has not been systematically treated by prior model-based solutions. The distinguishing features of our contributions are: the specification of queries based on a temporal logic which encodes structural patterns as graphs; the provision of formally precise query answers which account for timing constraints and ongoing executions; the incremental evaluation which avoids the re-computation of query answers after each change; and the option to discard history that is no longer relevant to queries. The query evaluation searches the model for occurrences of a pattern whose evolution satisfies a temporal logic formula. Therefore, besides model-driven engineering, another related research community is runtime verification. The approach differs from prior logic-based runtime verification solutions by supporting the representation and querying of structure via graphs and graph queries, respectively, which is more efficient for queries with complex patterns. We present a prototypical implementation of the approach and measure its speed and memory consumption in monitoring and adaptation scenarios from two application domains, with executions of an increasing size. We assess scalability by a comparison to the state-of-the-art from both related research communities. The implementation yields promising results, which pave the way for sophisticated history-aware self-adaptation solutions and indicate that the approach constitutes a highly effective technique for runtime monitoring on an architectural level.
Laser cutting is a fast and precise fabrication process. This makes laser cutting a powerful process in custom industrial production. Since the patents on the original technology started to expire, a growing community of tech-enthusiasts embraced the technology and started sharing the models they fabricate online. Surprisingly, the shared models appear to largely be one-offs (e.g., they proudly showcase what a single person can make in one afternoon). For laser cutting to become a relevant mainstream phenomenon (as opposed to the current tech enthusiasts and industry users), it is crucial to enable users to reproduce models made by more experienced modelers, and to build on the work of others instead of creating one-offs.
We create a technological basis that allows users to build on the work of others—a progression that is currently held back by the use of exchange formats that disregard mechanical differences between machines and therefore overlook implications with respect to how well parts fit together mechanically (aka engineering fit).
For the field to progress, we need a machine-independent sharing infrastructure.
In this thesis, we outline three approaches that together get us closer to this:
(1) 2D cutting plans that are tolerant to machine variations. Our initial take is a minimally invasive approach: replacing machine-specific elements in cutting plans with more tolerant elements using mechanical hacks like springs and wedges. The resulting models fabricate on any consumer laser cutter and in a range of materials.
(2) sharing models in 3D. To allow building on the work of others, we build a 3D modeling environment for laser cutting (kyub). After users design a model, they export their 3D models to 2D cutting plans optimized for the machine and material at hand. We extend this volumetric environment with tools to edit individual plates, allowing users to leverage the efficiency of volumetric editing while having control over the most detailed elements in laser-cutting (plates)
(3) converting legacy 2D cutting plans to 3D models. To handle legacy models, we build software to interactively reconstruct 3D models from 2D cutting plans. This allows users to reuse the models in more productive ways. We revisit this by automating the assembly process for a large subset of models.
The above-mentioned software composes a larger system (kyub, 140,000 lines of code). This system integration enables the push towards actual use, which we demonstrate through a range of workshops where users build complex models such as fully functional guitars. By simplifying sharing and re-use and the resulting increase in model complexity, this line of work forms a small step to enable personal fabrication to scale past the maker phenomenon, towards a mainstream phenomenon—the same way that other fields, such as print (postscript) and ultimately computing itself (portable programming languages, etc.) reached mass adoption.
Boolean Satisfiability (SAT) is one of the problems at the core of theoretical computer science. It was the first problem proven to be NP-complete by Cook and, independently, by Levin. Nowadays it is conjectured that SAT cannot be solved in sub-exponential time. Thus, it is generally assumed that SAT and its restricted version k-SAT are hard to solve. However, state-of-the-art SAT solvers can solve even huge practical instances of these problems in a reasonable amount of time.
Why is SAT hard in theory, but easy in practice? One approach to answering this question is investigating the average runtime of SAT. In order to analyze this average runtime the random k-SAT model was introduced. The model generates all k-SAT instances with n variables and m clauses with uniform probability. Researching random k-SAT led to a multitude of insights and tools for analyzing random structures in general. One major observation was the emergence of the so-called satisfiability threshold: A phase transition point in the number of clauses at which the generated formulas go from asymptotically almost surely satisfiable to asymptotically almost surely unsatisfiable. Additionally, instances around the threshold seem to be particularly hard to solve.
In this thesis we analyze a more general model of random k-SAT that we call non-uniform random k-SAT. In contrast to the classical model each of the n Boolean variables now has a distinct probability of being drawn. For each of the m clauses we draw k variables according to the variable distribution and choose their signs uniformly at random. Non-uniform random k-SAT gives us more control over the distribution of Boolean variables in the resulting formulas. This allows us to tailor distributions to the ones observed in practice. Notably, non-uniform random k-SAT contains the previously proposed models random k-SAT, power-law random k-SAT and geometric random k-SAT as special cases.
We analyze the satisfiability threshold in non-uniform random k-SAT depending on the variable probability distribution. Our goal is to derive conditions on this distribution under which an equivalent of the satisfiability threshold conjecture holds. We start with the arguably simpler case of non-uniform random 2-SAT. For this model we show under which conditions a threshold exists, if it is sharp or coarse, and what the leading constant of the threshold function is. These are exactly the three ingredients one needs in order to prove or disprove the satisfiability threshold conjecture. For non-uniform random k-SAT with k=3 we only prove sufficient conditions under which a threshold exists. We also show some properties of the variable probabilities under which the threshold is sharp in this case. These are the first results on the threshold behavior of non-uniform random k-SAT.
Learning analytics at scale
(2021)
Digital technologies are paving the way for innovative educational approaches. The learning format of Massive Open Online Courses (MOOCs) provides a highly accessible path to lifelong learning while being more affordable and flexible than face-to-face courses. Thereby, thousands of learners can enroll in courses mostly without admission restrictions, but this also raises challenges. Individual supervision by teachers is barely feasible, and learning persistence and success depend on students' self-regulatory skills. Here, technology provides the means for support. The use of data for decision-making is already transforming many fields, whereas in education, it is still a young research discipline. Learning Analytics (LA) is defined as the measurement, collection, analysis, and reporting of data about learners and their learning contexts with the purpose of understanding and improving learning and learning environments. The vast amount of data that MOOCs produce on the learning behavior and success of thousands of students provides the opportunity to study human learning and develop approaches addressing the demands of learners and teachers.
The overall purpose of this dissertation is to investigate the implementation of LA at the scale of MOOCs and to explore how data-driven technology can support learning and teaching in this context. To this end, several research prototypes have been iteratively developed for the HPI MOOC Platform. Hence, they were tested and evaluated in an authentic real-world learning environment. Most of the results can be applied on a conceptual level to other MOOC platforms as well. The research contribution of this thesis thus provides practical insights beyond what is theoretically possible. In total, four system components were developed and extended:
(1) The Learning Analytics Architecture: A technical infrastructure to collect, process, and analyze event-driven learning data based on schema-agnostic pipelining in a service-oriented MOOC platform. (2) The Learning Analytics Dashboard for Learners: A tool for data-driven support of self-regulated learning, in particular to enable learners to evaluate and plan their learning activities, progress, and success by themselves. (3) Personalized Learning Objectives: A set of features to better connect learners' success to their personal intentions based on selected learning objectives to offer guidance and align the provided data-driven insights about their learning progress. (4) The Learning Analytics Dashboard for Teachers: A tool supporting teachers with data-driven insights to enable the monitoring of their courses with thousands of learners, identify potential issues, and take informed action.
For all aspects examined in this dissertation, related research is presented, development processes and implementation concepts are explained, and evaluations are conducted in case studies. Among other findings, the usage of the learner dashboard in combination with personalized learning objectives demonstrated improved certification rates of 11.62% to 12.63%. Furthermore, it was observed that the teacher dashboard is a key tool and an integral part for teaching in MOOCs. In addition to the results and contributions, general limitations of the work are discussed—which altogether provide a solid foundation for practical implications and future research.
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-consuming content moderation. As a consequence, many platforms limited or even shut down comment sections completely. In this thesis, we present deep learning approaches for comment classification, recommendation, and prediction to foster respectful and engaging online discussions. The main focus is on two kinds of comments: toxic comments, which make readers leave a discussion, and engaging comments, which make readers join a discussion. First, we discourage and remove toxic comments, e.g., insults or threats. To this end, we present a semi-automatic comment moderation process, which is based on fine-grained text classification models and supports moderators. Our experiments demonstrate that data augmentation, transfer learning, and ensemble learning allow training robust classifiers even on small datasets. To establish trust in the machine-learned models, we reveal which input features are decisive for their output with attribution-based explanation methods. Second, we encourage and highlight engaging comments, e.g., serious questions or factual statements. We automatically identify the most engaging comments, so that readers need not scroll through thousands of comments to find them. The model training process builds on upvotes and replies as a measure of reader engagement. We also identify comments that address the article authors or are otherwise relevant to them to support interactions between journalists and their readership. Taking into account the readers' interests, we further provide personalized recommendations of discussions that align with their favored topics or involve frequent co-commenters. Our models outperform multiple baselines and recent related work in experiments on comment datasets from different platforms.
Remote sensing technology, such as airborne, mobile, or terrestrial laser scanning, and photogrammetric techniques, are fundamental approaches for efficient, automatic creation of digital representations of spatial environments. For example, they allow us to generate 3D point clouds of landscapes, cities, infrastructure networks, and sites. As essential and universal category of geodata, 3D point clouds are used and processed by a growing number of applications, services, and systems such as in the domains of urban planning, landscape architecture, environmental monitoring, disaster management, virtual geographic environments as well as for spatial analysis and simulation.
While the acquisition processes for 3D point clouds become more and more reliable and widely-used, applications and systems are faced with more and more 3D point cloud data. In addition, 3D point clouds, by their very nature, are raw data, i.e., they do not contain any structural or semantics information. Many processing strategies common to GIS such as deriving polygon-based 3D models generally do not scale for billions of points. GIS typically reduce data density and precision of 3D point clouds to cope with the sheer amount of data, but that results in a significant loss of valuable information at the same time.
This thesis proposes concepts and techniques designed to efficiently store and process massive 3D point clouds. To this end, object-class segmentation approaches are presented to attribute semantics to 3D point clouds, used, for example, to identify building, vegetation, and ground structures and, thus, to enable processing, analyzing, and visualizing 3D point clouds in a more effective and efficient way. Similarly, change detection and updating strategies for 3D point clouds are introduced that allow for reducing storage requirements and incrementally updating 3D point cloud databases. In addition, this thesis presents out-of-core, real-time rendering techniques used to interactively explore 3D point clouds and related analysis results. All techniques have been implemented based on specialized spatial data structures, out-of-core algorithms, and GPU-based processing schemas to cope with massive 3D point clouds having billions of points.
All proposed techniques have been evaluated and demonstrated their applicability to the field of geospatial applications and systems, in particular for tasks such as classification, processing, and visualization. Case studies for 3D point clouds of entire cities with up to 80 billion points show that the presented approaches open up new ways to manage and apply large-scale, dense, and time-variant 3D point clouds as required by a rapidly growing number of applications and systems.
The demand for peer-to-peer ridesharing services increased over the last years rapidly. To cost-efficiently dispatch orders and communicate accurate pick-up times is challenging as the current location of each available driver is not exactly known since observed locations can be outdated for several seconds. The developed trajectory visualization tool enables transportation network companies to analyze dispatch processes and determine the causes of unexpected delays. As dispatching algorithms are based on the accuracy of arrival time predictions, we account for factors like noise, sample rate, technical and economic limitations as well as the duration of the entire process as they have an impact on the accuracy of spatio-temporal data. To improve dispatching strategies, we propose a prediction approach that provides a probability distribution for a driver’s future locations based on patterns observed in past trajectories. We demonstrate the capabilities of our prediction results to ( i) avoid critical delays, (ii) to estimate waiting times with higher confidence, and (iii) to enable risk considerations in dispatching strategies.
Medical imaging plays an important role in disease diagnosis, treatment planning, and clinical monitoring. One of the major challenges in medical image analysis is imbalanced training data, in which the class of interest is much rarer than the other classes. Canonical machine learning algorithms suppose that the number of samples from different classes in the training dataset is roughly similar or balance. Training a machine learning model on an imbalanced dataset can introduce unique challenges to the learning problem.
A model learned from imbalanced training data is biased towards the high-frequency samples. The predicted results of such networks have low sensitivity and high precision. In medical applications, the cost of misclassification of the minority class could be more than the cost of misclassification of the majority class. For example, the risk of not detecting a tumor could be much higher than referring to a healthy subject to a doctor. The current Ph.D. thesis introduces several deep learning-based approaches for handling class imbalanced problems for learning multi-task such as disease classification and semantic segmentation.
At the data-level, the objective is to balance the data distribution through re-sampling the data space: we propose novel approaches to correct internal bias towards fewer frequency samples. These approaches include patient-wise batch sampling, complimentary labels, supervised and unsupervised minority oversampling using generative adversarial networks for all.
On the other hand, at algorithm-level, we modify the learning algorithm to alleviate the bias towards majority classes. In this regard, we propose different generative adversarial networks for cost-sensitive learning, ensemble learning, and mutual learning to deal with highly imbalanced imaging data.
We show evidence that the proposed approaches are applicable to different types of medical images of varied sizes on different applications of routine clinical tasks, such as disease classification and semantic segmentation. Our various implemented algorithms have shown outstanding results on different medical imaging challenges.
Version control is a widely used practice among software developers. It reduces the risk of changing their software and allows them to manage different configurations and to collaborate with others more efficiently. This is amplified by code sharing platforms such as GitHub or Bitbucket. Most version control systems track files (e.g., Git, Mercurial, and Subversion do), but some programming environments do not operate on files, but on objects instead (many Smalltalk implementations do). Users of such environments want to use version control for their objects anyway. Specialized version control systems, such as the ones available for Smalltalk systems (e.g., ENVY/Developer and Monticello), focus on a small subset of objects that can be versioned. Most of these systems concentrate on the tracking of methods, classes, and configurations of these. Other user-defined and user-built objects are either not eligible for version control at all, tracking them involves complicated workarounds, or a fixed, domain-unspecific serialization format is used that does not equally suit all kinds of objects. Moreover, these version control systems that are specific to a programming environment require their own code sharing platforms; popular, well-established platforms for file-based version control systems cannot be used or adapter solutions need to be implemented and maintained.
To improve the situation for version control of arbitrary objects, a framework for tracking, converting, and storing of objects is presented in this report. It allows editions of objects to be stored in an exchangeable, existing backend version control system. The platforms of the backend version control system can thus be reused. Users and objects have control over how objects are captured for the purpose of version control. Domain-specific requirements can be implemented. The storage format (i.e. the file format, when file-based backend version control systems are used) can also vary from one object to another. Different editions of objects can be compared and sets of changes can be applied to graphs of objects. A generic way for capturing and restoring that supports most kinds of objects is described. It models each object as a collection of slots. Thus, users can begin to track their objects without first having to implement version control supplements for their own kinds of objects. The proposed architecture is evaluated using a prototype implementation that can be used to track objects in Squeak/Smalltalk with Git. The prototype improves the suboptimal standing of user objects with respect to version control described above and also simplifies some version control tasks for classes and methods as well. It also raises new problems, which are discussed in this report as well.
Text collections, such as corpora of books, research articles, news, or business documents are an important resource for knowledge discovery. Exploring large document collections by hand is a cumbersome but necessary task to gain new insights and find relevant information. Our digitised society allows us to utilise algorithms to support the information seeking process, for example with the help of retrieval or recommender systems. However, these systems only provide selective views of the data and require some prior knowledge to issue meaningful queries and asses a system’s response. The advancements of machine learning allow us to reduce this gap and better assist the information seeking process. For example, instead of sighting countless business documents by hand, journalists and investigator scan employ natural language processing techniques, such as named entity recognition. Al-though this greatly improves the capabilities of a data exploration platform, the wealth of information is still overwhelming. An overview of the entirety of a dataset in the form of a two-dimensional map-like visualisation may help to circumvent this issue. Such overviews enable novel interaction paradigms for users, which are similar to the exploration of digital geographical maps. In particular, they can provide valuable context by indicating how apiece of information fits into the bigger picture.This thesis proposes algorithms that appropriately pre-process heterogeneous documents and compute the layout for datasets of all kinds. Traditionally, given high-dimensional semantic representations of the data, so-called dimensionality reduction algorithms are usedto compute a layout of the data on a two-dimensional canvas. In this thesis, we focus on text corpora and go beyond only projecting the inherent semantic structure itself. Therefore,we propose three dimensionality reduction approaches that incorporate additional information into the layout process: (1) a multi-objective dimensionality reduction algorithm to jointly visualise semantic information with inherent network information derived from the underlying data; (2) a comparison of initialisation strategies for different dimensionality reduction algorithms to generate a series of layouts for corpora that grow and evolve overtime; (3) and an algorithm that updates existing layouts by incorporating user feedback provided by pointwise drag-and-drop edits. This thesis also contains system prototypes to demonstrate the proposed technologies, including pre-processing and layout of the data and presentation in interactive user interfaces.
In unserer digitalisierten Welt verlagert sich das Lernen in die Cloud. Vom Unterricht in der Schule und der Tafel zum Tablet, hin zu einem lebenslangen Lernen in der Arbeitswelt und sogar darüber hinaus. Wie erfolgreich und attraktiv dieses zeitgemäße Lernen erfolgt, hängt nicht unwesentlich von den technologischen Möglichkeiten ab, die digitale Lernplattformen rund um MOOCs und Schul-Clouds bieten.
Bei deren Weiterentwicklung sollten statt ökonomischen Messgrößen und KPIs die Lernenden und ihre Lernerfahrungen im Vordergrund stehen.
Hierfür wurde ein Optimierungsframework entwickelt, das für die Entwicklung von Lernplattformen anhand verschiedener qualitativer und quantitative Methoden Verbesserungen identifiziert, priorisiert und deren Beurteilung und Umsetzung steuert.
Datengestützte Entscheidungen sollten auf einer ausreichenden Datenbasis aufbauen. Moderne Web-Anwendungen bestehen aber oft aus mehreren Microservices mit jeweils eigener Datenhaltung. Viele Daten sind daher nicht mehr einfach zugänglich. Daher wird in dieser Arbeit ein Learning Analytics Dienst eingeführt, der diese Daten sammelt und verarbeitet. Darauf aufbauend werden Metriken eingeführt, auf deren Grundlage die erfassten Daten nutzbar werden und die somit zu verschiedenen Zwecken verwendet werden können.
Neben der Visualisierung der Daten in Dashboards werden die Daten für eine automatisierte Qualitätskontrolle herangezogen. So kann festgestellt werden, wenn Tests zu schwierig oder die soziale Interaktion in einem MOOC zu gering ist.
Die vorgestellte Infrastruktur lässt sich aber auch verwenden, um verschiedene A/B/n-Tests durchzuführen. In solchen Tests gibt es mehrere Varianten, die an verschiedene Nutzergruppen in einem kontrollierten Experiment erprobt werden. Dank der vorgestellten Testinfrastruktur, die in der HPI MOOC Plattform eingebaut wurde, kann ermittelt werden, ob sich für diese Gruppen statistisch signifikante Änderungen in der Nutzung feststellen lassen. Dies wurde mit fünf verschiedenen Verbesserungen der HPI MOOC Plattform evaluiert, auf der auch openHPI und openSAP basieren.
Dabei konnte gezeigt werden, dass sich Lernende mit reaktivierenden Mails zurück in den Kurs holen lassen. Es ist primär die Kommunikation der unbearbeiteten Lerninhalte des Nutzers, die eine reaktivierende Wirkung hat.
Auch Übersichtsmails, die die Forenaktivität zusammenfassen, haben einen positiven Effekt erzielt.
Ein gezieltes On-Boarding kann dazu führen, dass die Nutzer die Plattform besser verstehen und hierdurch aktiver sind.
Der vierte Test konnte zeigen, dass die Zuordnung von Forenfragen zu einem bestimmten Zeitpunkt im Video und die grafische Anzeige dieser Informationen zu einer erhöhten Forenaktivität führt.
Auch die experimentelle Erprobung von unterschiedlichen Lernmaterialien, wie sie im fünften Test durchgeführt wurde, ist in MOOCs hilfreich, um eine Verbesserung der Kursmaterialien zu erreichen.
Neben diesen funktionalen Verbesserungen wird untersucht wie MOOC Plattformen und Schul-Clouds einen Nutzen bieten können, wenn Nutzern nur eine schwache oder unzuverlässige Internetanbindung zur Verfügung steht (wie dies in vielen deutschen Schulen der Fall ist). Hier wird gezeigt, dass durch ein geschicktes Vorausladen von Daten die Internetanbindungen entlastet werden können. Teile der Lernanwendungen funktionieren dank dieser Anpassungen, selbst wenn keine Verbindung zum Internet besteht.
Als Letztes wird gezeigt, wie Endgeräte sich in einem lokalen Peer-to-Peer CDN gegenseitig mit Daten versorgen können, ohne dass diese aus dem Internet heruntergeladen werden müssen.
The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
This technical report presents results of research projects executed in 2018. Selected projects have presented their results on April 17th and November 14th 2017 at the Future SOC Lab Day events.
Modern datasets often exhibit diverse, feature-rich, unstructured data, and they are massive in size. This is the case of social networks, human genome, and e-commerce databases. As Artificial Intelligence (AI) systems are increasingly used to detect pattern in data and predict future outcome, there are growing concerns on their ability to process large amounts of data. Motivated by these concerns, we study the problem of designing AI systems that are scalable to very large and heterogeneous data-sets.
Many AI systems require to solve combinatorial optimization problems in their course of action. These optimization problems are typically NP-hard, and they may exhibit additional side constraints. However, the underlying objective functions often exhibit additional properties. These properties can be exploited to design suitable optimization algorithms. One of these properties is the well-studied notion of submodularity, which captures diminishing returns. Submodularity is often found in real-world applications. Furthermore, many relevant applications exhibit generalizations of this property.
In this thesis, we propose new scalable optimization algorithms for combinatorial problems with diminishing returns. Specifically, we focus on three problems, the Maximum Entropy Sampling problem, Video Summarization, and Feature Selection. For each problem, we propose new algorithms that work at scale. These algorithms are based on a variety of techniques, such as forward step-wise selection and adaptive sampling. Our proposed algorithms yield strong approximation guarantees, and the perform well experimentally.
We first study the Maximum Entropy Sampling problem. This problem consists of selecting a subset of random variables from a larger set, that maximize the entropy. By using diminishing return properties, we develop a simple forward step-wise selection optimization algorithm for this problem. Then, we study the problem of selecting a subset of frames, that represent a given video. Again, this problem corresponds to a submodular maximization problem. We provide a new adaptive sampling algorithm for this problem, suitable to handle the complex side constraints imposed by the application. We conclude by studying Feature Selection. In this case, the underlying objective functions generalize the notion of submodularity. We provide a new adaptive sequencing algorithm for this problem, based on the Orthogonal Matching Pursuit paradigm.
Overall, we study practically relevant combinatorial problems, and we propose new algorithms to solve them. We demonstrate that these algorithms are suitable to handle massive datasets. However, our analysis is not problem-specific, and our results can be applied to other domains, if diminishing return properties hold. We hope that the flexibility of our framework inspires further research into scalability in AI.
Business process automation improves organizations’ efficiency to perform work. Therefore, a business process is first documented as a process model which then serves as blueprint for a number of process instances representing the execution of specific business cases. In existing business process management systems, process instances run independently from each other. However, in practice, instances are also collected in groups at certain process activities for a combined execution to improve the process performance. Currently, this so-called batch processing is executed manually or supported by external software. Only few research proposals exist to explicitly represent and execute batch processing needs in business process models. These works also lack a comprehensive understanding of requirements.
This thesis addresses the described issues by providing a basic concept, called batch activity. It allows an explicit representation of batch processing configurations in process models and provides a corresponding execution semantics, thereby easing automation. The batch activity groups different process instances based on their data context and can synchronize their execution over one or as well multiple process activities. The concept is conceived based on a requirements analysis considering existing literature on batch processing from different domains and industry examples. Further, this thesis provides two extensions: First, a flexible batch configuration concept, based on event processing techniques, is introduced to allow run time adaptations of batch configurations. Second, a concept for collecting and batching activity instances of multiple different process models is given. Thereby, the batch configuration is centrally defined, independently of the process models, which is especially beneficial for organizations with large process model collections. This thesis provides a technical evaluation as well as a validation of the presented concepts. A prototypical implementation in an existing open-source BPMS shows that with a few extensions, batch processing is enabled. Further, it demonstrates that the consolidated view of several work items in one user form can improve work efficiency. The validation, in which the batch activity concept is applied to different use cases in a simulated environment, implies cost-savings for business processes when a suitable batch configuration is used. For the validation, an extensible business process simulator was developed. It enables process designers to study the influence of a batch activity in a process with regards to its performance.
The MITx MicroMasters Program in Supply Chain Management (SCM) is a Massive Open Online Course (MOOC) based program that aims to impart quantitative and qualitative knowledge to SCM enthusiasts all around the world. The program that started in 2014 with just one course, now offers 5 courses and one final proctored exam, which allows a learner to gain a MicroMasters credential upon completion. While the courses are delivered in the form of pre-recorded videos by the faculty members of Massachusetts Institute of Technology (MIT), the questions and comments posted by learners in discussion forums are addressed by a group of Community Teaching Assistants (CTAs) who volunteer for this role. The MITx staff carefully selects CTAs for each run of the individual courses as they take on a co-facilitator’s role in the program. This paper highlights the importance of community teaching, discusses the profile of CTAs involved with the program, their recruitment, training, tasks and responsibilities, engagement, and rewarding process. In the end we also share a few recommendations based on the lessons learned in community teaching during the last five years of running more than 45 MOOC courses, that could help other MOOC teams deliver a high-touch experience.
Personal data privacy is considered to be a fundamental right. It forms a part of our highest ethical standards and is anchored in legislation and various best practices from the technical perspective. Yet, protecting against personal data exposure is a challenging problem from the perspective of generating privacy-preserving datasets to support machine learning and data mining operations. The issue is further compounded by the fact that devices such as consumer wearables and sensors track user behaviours on such a fine-grained level, thereby accelerating the formation of multi-attribute and large-scale high-dimensional datasets.
In recent years, increasing news coverage regarding de-anonymisation incidents, including but not limited to the telecommunication, transportation, financial transaction, and healthcare sectors, have resulted in the exposure of sensitive private information. These incidents indicate that releasing privacy-preserving datasets requires serious consideration from the pre-processing perspective. A critical problem that appears in this regard is the time complexity issue in applying syntactic anonymisation methods, such as k-anonymity, l-diversity, or t-closeness to generating privacy-preserving data. Previous studies have shown that this problem is NP-hard.
This thesis focuses on large high-dimensional datasets as an example of a special case of data that is characteristically challenging to anonymise using syntactic methods. In essence, large high-dimensional data contains a proportionately large number of attributes in proportion to the population of attribute values. Applying standard syntactic data anonymisation approaches to generating privacy-preserving data based on such methods results in high information-loss, thereby rendering the data useless for analytics operations or in low privacy due to inferences based on the data when information loss is minimised.
We postulate that this problem can be resolved effectively by searching for and eliminating all the quasi-identifiers present in a high-dimensional dataset. Essentially, we quantify the privacy-preserving data sharing problem as the Find-QID problem.
Further, we show that despite the complex nature of absolute privacy, the discovery of QID can be achieved reliably for large datasets. The risk of private data exposure through inferences can be circumvented, and both can be practicably achieved without the need for high-performance computers.
For this purpose, we present, implement, and empirically assess both mathematical and engineering optimisation methods for a deterministic discovery of privacy-violating inferences. This includes a greedy search scheme by efficiently queuing QID candidates based on their tuple characteristics, projecting QIDs on Bayesian inferences, and countering Bayesian network’s state-space-explosion with an aggregation strategy taken from multigrid context and vectorised GPU acceleration. Part of this work showcases magnitudes of processing acceleration, particularly in high dimensions. We even achieve near real-time runtime for currently impractical applications. At the same time, we demonstrate how such contributions could be abused to de-anonymise Kristine A. and Cameron R. in a public Twitter dataset addressing the US Presidential Election 2020.
Finally, this work contributes, implements, and evaluates an extended and generalised version of the novel syntactic anonymisation methodology, attribute compartmentation. Attribute compartmentation promises sanitised datasets without remaining quasi-identifiers while minimising information loss. To prove its functionality in the real world, we partner with digital health experts to conduct a medical use case study. As part of the experiments, we illustrate that attribute compartmentation is suitable for everyday use and, as a positive side effect, even circumvents a common domain issue of base rate neglect.
This paper aims to present the results of a higher education experience promoted by the research centres INTELLECT (University of Modena and Reggio Emilia) and CDM (University of Roma Tre), as part of difference master’s degrees programme of the academic years 2018/2019, 2019/2020, and 2020/2021. Through different online activities, 37 students attended and evaluated a MOOC on museum education content, such promoting their professionals and transverse skills, such as critical thinking, and developing their knowledge relative to OERs, within culture and heritage education contexts. Moreover, results from the online evaluation activities support the implementation of the MOOC in a collaborative way: during the academic years, evaluation data have been used by researcher to make changes to the course modules, thus realizing a more effective online path from and educational point of view.
Gene expression data is analyzed to identify biomarkers, e.g. relevant genes, which serve for diagnostic, predictive, or prognostic use. Traditional approaches for biomarker detection select distinctive features from the data based exclusively on the signals therein, facing multiple shortcomings in regards to overfitting, biomarker robustness, and actual biological relevance. Prior knowledge approaches are expected to address these issues by incorporating prior biological knowledge, e.g. on gene-disease associations, into the actual analysis. However, prior knowledge approaches are currently not widely applied in practice because they are often use-case specific and seldom applicable in a different scope. This leads to a lack of comparability of prior knowledge approaches, which in turn makes it currently impossible to assess their effectiveness in a broader context.
Our work addresses the aforementioned issues with three contributions. Our first contribution provides formal definitions for both prior knowledge and the flexible integration thereof into the feature selection process. Central to these concepts is the automatic retrieval of prior knowledge from online knowledge bases, which allows for streamlining the retrieval process and agreeing on a uniform definition for prior knowledge. We subsequently describe novel and generalized prior knowledge approaches that are flexible regarding the used prior knowledge and applicable to varying use case domains. Our second contribution is the benchmarking platform Comprior. Comprior applies the aforementioned concepts in practice and allows for flexibly setting up comprehensive benchmarking studies for examining the performance of existing and novel prior knowledge approaches. It streamlines the retrieval of prior knowledge and allows for combining it with prior knowledge approaches. Comprior demonstrates the practical applicability of our concepts and further fosters the overall development and comparability of prior knowledge approaches. Our third contribution is a comprehensive case study on the effectiveness of prior knowledge approaches. For that, we used Comprior and tested a broad range of both traditional and prior knowledge approaches in combination with multiple knowledge bases on data sets from multiple disease domains. Ultimately, our case study constitutes a thorough assessment of a) the suitability of selected knowledge bases for integration, b) the impact of prior knowledge being applied at different integration levels, and c) the improvements in terms of classification performance, biological relevance, and overall robustness.
In summary, our contributions demonstrate that generalized concepts for prior knowledge and a streamlined retrieval process improve the applicability of prior knowledge approaches. Results from our case study show that the integration of prior knowledge positively affects biomarker results, particularly regarding their robustness. Our findings provide the first in-depth insights on the effectiveness of prior knowledge approaches and build a valuable foundation for future research.
Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.
Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.
Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
With the growth of information technology, patient attitudes are shifting – away from passively receiving care towards actively taking responsibility for their well- being. Handling doctor-patient relationships collaboratively and providing patients access to their health information are crucial steps in empowering patients. In mental healthcare, the implicit consensus amongst practitioners has been that sharing medical records with patients may have an unpredictable, harmful impact on clinical practice. In order to involve patients more actively in mental healthcare processes, Tele-Board MED (TBM) allows for digital collaborative documentation in therapist-patient sessions. The TBM software system offers a whiteboard-inspired graphical user interface that allows therapist and patient to jointly take notes during the treatment session. Furthermore, it provides features to automatically reuse the digital treatment session notes for the creation of treatment session summaries and clinical case reports. This thesis presents the development of the TBM system and evaluates its effects on 1) the fulfillment of the therapist’s duties of clinical case documentation, 2) patient engagement in care processes, and 3) the therapist-patient relationship. Following the design research methodology, TBM was developed and tested in multiple evaluation studies in the domains of cognitive behavioral psychotherapy and addiction care. The results show that therapists are likely to use TBM with patients if they have a technology-friendly attitude and when its use suits the treatment context. Support in carrying out documentation duties as well as fulfilling legal requirements contributes to therapist acceptance. Furthermore, therapists value TBM as a tool to provide a discussion framework and quick access to worksheets during treatment sessions. Therapists express skepticism, however, regarding technology use in patient sessions and towards complete record transparency in general. Patients expect TBM to improve the communication with their therapist and to offer a better recall of discussed topics when taking a copy of their notes home after the session. Patients are doubtful regarding a possible distraction of the therapist and usage in situations when relationship-building is crucial. When applied in a clinical environment, collaborative note-taking with TBM encourages patient engagement and a team feeling between therapist and patient. Furthermore, it increases the patient’s acceptance of their diagnosis, which in turn is an important predictor for therapy success. In summary, TBM has a high potential to deliver more than documentation support and record transparency for patients, but also to contribute to a collaborative doctor-patient relationship. This thesis provides design implications for the development of digital collaborative documentation systems in (mental) healthcare as well as recommendations for a successful implementation in clinical practice.
In an attempt to pave the way for more extensive Computer Science Education (CSE) coverage in K-12, this research developed and made a preliminary evaluation of a blended-learning Introduction to CS program based on an academic MOOC. Using an academic MOOC that is pedagogically effective and engaging, such a program may provide teachers with disciplinary scaffolds and allow them to focus their attention on enhancing students’ learning experience and nurturing critical 21st-century skills such as self-regulated learning. As we demonstrate, this enabled us to introduce an academic level course to middle-school students. In this research, we developed the principals and initial version of such a program, targeting ninth-graders in science-track classes who learn CS as part of their standard curriculum. We found that the middle-schoolers who participated in the program achieved academic results on par with undergraduate students taking this MOOC for academic credit. Participating students also developed a more accurate perception of the essence of CS as a scientific discipline. The unplanned school closure due to the COVID19 pandemic outbreak challenged the research but underlined the advantages of such a MOOCbased blended learning program above classic pedagogy in times of global or local crises that lead to school closure. While most of the science track classes seem to stop learning CS almost entirely, and the end-of-year MoE exam was discarded, the program’s classes smoothly moved to remote learning mode, and students continued to study at a pace similar to that experienced before the school shut down.
Data profiling is the computer science discipline of analyzing a given dataset for its metadata. The types of metadata range from basic statistics, such as tuple counts, column aggregations, and value distributions, to much more complex structures, in particular inclusion dependencies (INDs), unique column combinations (UCCs), and functional dependencies (FDs). If present, these statistics and structures serve to efficiently store, query, change, and understand the data. Most datasets, however, do not provide their metadata explicitly so that data scientists need to profile them.
While basic statistics are relatively easy to calculate, more complex structures present difficult, mostly NP-complete discovery tasks; even with good domain knowledge, it is hardly possible to detect them manually. Therefore, various profiling algorithms have been developed to automate the discovery. None of them, however, can process datasets of typical real-world size, because their resource consumptions and/or execution times exceed effective limits.
In this thesis, we propose novel profiling algorithms that automatically discover the three most popular types of complex metadata, namely INDs, UCCs, and FDs, which all describe different kinds of key dependencies. The task is to extract all valid occurrences from a given relational instance. The three algorithms build upon known techniques from related work and complement them with algorithmic paradigms, such as divide & conquer, hybrid search, progressivity, memory sensitivity, parallelization, and additional pruning to greatly improve upon current limitations. Our experiments show that the proposed algorithms are orders of magnitude faster than related work. They are, in particular, now able to process datasets of real-world, i.e., multiple gigabytes size with reasonable memory and time consumption.
Due to the importance of data profiling in practice, industry has built various profiling tools to support data scientists in their quest for metadata. These tools provide good support for basic statistics and they are also able to validate individual dependencies, but they lack real discovery features even though some fundamental discovery techniques are known for more than 15 years. To close this gap, we developed Metanome, an extensible profiling platform that incorporates not only our own algorithms but also many further algorithms from other researchers. With Metanome, we make our research accessible to all data scientists and IT-professionals that are tasked with data profiling. Besides the actual metadata discovery, the platform also offers support for the ranking and visualization of metadata result sets.
Being able to discover the entire set of syntactically valid metadata naturally introduces the subsequent task of extracting only the semantically meaningful parts. This is challenge, because the complete metadata results are surprisingly large (sometimes larger than the datasets itself) and judging their use case dependent semantic relevance is difficult. To show that the completeness of these metadata sets is extremely valuable for their usage, we finally exemplify the efficient processing and effective assessment of functional dependencies for the use case of schema normalization.
Compound values are not universally supported in virtual machine (VM)-based programming systems and languages. However, providing data structures with value characteristics can be beneficial. On one hand, programming systems and languages can adequately represent physical quantities with compound values and avoid inconsistencies, for example, in representation of large numbers. On the other hand, just-in-time (JIT) compilers, which are often found in VMs, can rely on the fact that compound values are immutable, which is an important property in optimizing programs. Considering this, compound values have an optimization potential that can be put to use by implementing them in VMs in a way that is efficient in memory usage and execution time. Yet, optimized compound values in VMs face certain challenges: to maintain consistency, it should not be observable by the program whether compound values are represented in an optimized way by a VM; an optimization should take into account, that the usage of compound values can exhibit certain patterns at run-time; and that necessary value-incompatible properties due to implementation restrictions should be reduced.
We propose a technique to detect and compress common patterns of compound value usage at run-time to improve memory usage and execution speed. Our approach identifies patterns of frequent compound value references and introduces abbreviated forms for them. Thus, it is possible to store multiple inter-referenced compound values in an inlined memory representation, reducing the overhead of metadata and object references. We extend our approach by a notion of limited mutability, using cells that act as barriers for our approach and provide a location for shared, mutable access with the possibility of type specialization. We devise an extension to our approach that allows us to express automatic unboxing of boxed primitive data types in terms of our initial technique. We show that our approach is versatile enough to express another optimization technique that relies on values, such as Booleans, that are unique throughout a programming system. Furthermore, we demonstrate how to re-use learned usage patterns and optimizations across program runs, thus reducing the performance impact of pattern recognition.
We show in a best-case prototype that the implementation of our approach is feasible and can also be applied to general purpose programming systems, namely implementations of the Racket language and Squeak/Smalltalk. In several micro-benchmarks, we found that our approach can effectively reduce memory consumption and improve execution speed.
This research paper aims to introduce a novel practitioner-oriented and research-based taxonomy of video genres. This taxonomy can serve as a scaffolding strategy to support educators throughout the entire educational system in creating videos for pedagogical purposes. A taxonomy of video genres is essential as videos are highly valued resources among learners. Although the use of videos in education has been extensively researched and well-documented in systematic research reviews, gaps remain in the literature. Predominantly, researchers employ sophisticated quantitative methods and similar approaches to measure the performance of videos. This trend has led to the emergence of a strong learning analytics research tradition with its embedded literature. This body of research includes analysis of performance of videos in online courses such as Massive Open Online Courses (MOOCs). Surprisingly, this same literature is limited in terms of research outlining approaches to designing and creating educational videos, which applies to both video-based learning and online courses. This issue results in a knowledge gap, highlighting the need for developing pedagogical tools and strategies for video making. These can be found in frameworks, guidelines, and taxonomies, which can serve as scaffolding strategies. In contrast, there appears to be very few frameworks available for designing and creating videos for pedagogica purposes, apart from a few well-known frameworks. In this regard, this research paper proposes a novel taxonomy of video genres that educators can utilize when creating videos intended for use in either video-based learning environments or online courses. To create this taxonomy, a large number of videos from online courses were collected and analyzed using a mixed-method research design approach.
Restful choreographies
(2019)
Business process management has become a key instrument to organize work as many companies represent their operations in business process models. Recently, business process choreography diagrams have been introduced as part of the Business Process Model and Notation standard to represent interactions between business processes, run by different partners. When it comes to the interactions between services on the Web, Representational State Transfer (REST) is one of the primary architectural styles employed by web services today. Ideally, the RESTful interactions between participants should implement the interactions defined at the business choreography level.
The problem, however, is the conceptual gap between the business process choreography diagrams and RESTful interactions. Choreography diagrams, on the one hand, are modeled from business domain experts with the purpose of capturing, communicating and, ideally, driving the business interactions. RESTful interactions, on the other hand, depend on RESTful interfaces that are designed by web engineers with the purpose of facilitating the interaction between participants on the internet. In most cases however, business domain experts are unaware of the technology behind web service interfaces and web engineers tend to overlook the overall business goals of web services. While there is considerable work on using process models during process implementation, there is little work on using choreography models to implement interactions between business processes. This thesis addresses this research gap by raising the following research question: How to close the conceptual gap between business process choreographies and RESTful interactions? This thesis offers several research contributions that jointly answer the research question.
The main research contribution is the design of a language that captures RESTful interactions between participants---RESTful choreography modeling language. Formal completeness properties (with respect to REST) are introduced to validate its instances, called RESTful choreographies. A systematic semi-automatic method for deriving RESTful choreographies from business process choreographies is proposed. The method employs natural language processing techniques to translate business interactions into RESTful interactions. The effectiveness of the approach is shown by developing a prototypical tool that evaluates the derivation method over a large number of choreography models.
In addition, the thesis proposes solutions towards implementing RESTful choreographies. In particular, two RESTful service specifications are introduced for aiding, respectively, the execution of choreographies' exclusive gateways and the guidance of RESTful interactions.
Squimera
(2017)
Software development tools that work and behave consistently across different programming languages are helpful for developers, because they do not have to familiarize themselves with new tooling whenever they decide to use a new language. Also, being able to combine multiple programming languages in a program increases reusability, as developers do not have to recreate software frameworks and libraries in the language they develop in and can reuse existing software instead.
However, developers often have a broad choice with regard to tools, some of which are designed for only one specific programming language. Various Integrated Development Environments have support for multiple languages, but are usually unable to provide a consistent programming experience due to different features of language runtimes. Furthermore, common mechanisms that allow reuse of software written in other languages usually use the operating system or a network connection as the abstract layer. Tools, however, often cannot support such indirections well and are therefore less useful in debugging scenarios for example.
In this report, we present a novel approach that aims to improve the programming experience with regard to working with multiple high-level programming languages. As part of this approach, we reuse the tools of a Smalltalk programming environment for other languages and build a multi-language virtual execution environment which is able to provide the same runtime capabilities for all languages.
The prototype system Squimera is an implementation of our approach and demonstrates that it is possible to reuse development tools, so that they behave in the same way across all supported programming languages. In addition, it provides convenient means to reuse and even mix software libraries and frameworks written in different languages without breaking the debugging experience.
Polyglot programming allows developers to use multiple programming languages within the same software project. While it is common to use more than one language in certain programming domains, developers also apply polyglot programming for other purposes such as to re-use software written in other languages. Although established approaches to polyglot programming come with significant limitations, for example, in terms of performance and tool support, developers still use them to be able to combine languages.
Polyglot virtual machines (VMs) such as GraalVM provide a new level of polyglot programming, allowing languages to directly interact with each other. This reduces the amount of glue code needed to combine languages, results in better performance, and enables tools such as debuggers to work across languages. However, only a little research has focused on novel tools that are designed to support developers in building software with polyglot VMs. One reason is that tool-building is often an expensive activity, another one is that polyglot VMs are still a moving target as their use cases and requirements are not yet well understood.
In this thesis, we present an approach that builds on existing self-sustaining programming systems such as Squeak/Smalltalk to enable exploratory programming, a practice for exploring and gathering software requirements, and re-use their extensive tool-building capabilities in the context of polyglot VMs. Based on TruffleSqueak, our implementation for the GraalVM, we further present five case studies that demonstrate how our approach helps tool developers to design and build tools for polyglot programming. We further show that TruffleSqueak can also be used by application developers to build and evolve polyglot applications at run-time and by language and runtime developers to understand the dynamic behavior of GraalVM languages and internals. Since our platform allows all these developers to apply polyglot programming, it can further help to better understand the advantages, use cases, requirements, and challenges of polyglot VMs. Moreover, we demonstrate that our approach can also be applied to other polyglot VMs and that insights gained through it are transferable to other programming systems.
We conclude that our research on tools for polyglot programming is an important step toward making polyglot VMs more approachable for developers in practice. With good tool support, we believe polyglot VMs can make it much more common for developers to take advantage of multiple languages and their ecosystems when building software.
To implement OERs at HEIs sustainably, not just technical infrastructure is required, but also well-trained staff. The University of Graz is in charge of an OER training program for university staff as part of the collaborative project Open Education Austria Advanced (OEAA) with the aim of ensuring long-term competence growth in the use and creation of OERs. The program consists of a MOOC and a guided blended learning format that was evaluated to find out which accompanying teaching and learning concepts can best facilitate targeted competence development. The evaluation of the program shows that learning videos, self-study assignments and synchronous sessions are most useful for the learning process. The results indicate that the creation of OERs is a complex process that can be undergone more effectively in the guided program.
The Security Operations Center (SOC) represents a specialized unit responsible for managing security within enterprises. To aid in its responsibilities, the SOC relies heavily on a Security Information and Event Management (SIEM) system that functions as a centralized repository for all security-related data, providing a comprehensive view of the organization's security posture. Due to the ability to offer such insights, SIEMS are considered indispensable tools facilitating SOC functions, such as monitoring, threat detection, and incident response.
Despite advancements in big data architectures and analytics, most SIEMs fall short of keeping pace. Architecturally, they function merely as log search engines, lacking the support for distributed large-scale analytics. Analytically, they rely on rule-based correlation, neglecting the adoption of more advanced data science and machine learning techniques.
This thesis first proposes a blueprint for next-generation SIEM systems that emphasize distributed processing and multi-layered storage to enable data mining at a big data scale. Next, with the architectural support, it introduces two data mining approaches for advanced threat detection as part of SOC operations.
First, a novel graph mining technique that formulates threat detection within the SIEM system as a large-scale graph mining and inference problem, built on the principles of guilt-by-association and exempt-by-reputation. The approach entails the construction of a Heterogeneous Information Network (HIN) that models shared characteristics and associations among entities extracted from SIEM-related events/logs. Thereon, a novel graph-based inference algorithm is used to infer a node's maliciousness score based on its associations with other entities in the HIN. Second, an innovative outlier detection technique that imitates a SOC analyst's reasoning process to find anomalies/outliers. The approach emphasizes explainability and simplicity, achieved by combining the output of simple context-aware univariate submodels that calculate an outlier score for each entry.
Both approaches were tested in academic and real-world settings, demonstrating high performance when compared to other algorithms as well as practicality alongside a large enterprise's SIEM system.
This thesis establishes the foundation for next-generation SIEM systems that can enhance today's SOCs and facilitate the transition from human-centric to data-driven security operations.
How to reuse inclusive stem Moocs in blended settings to engage young girls to scientific careers
(2023)
The FOSTWOM project (2019–2022), an ERASMUS+ funding, gave METID (Politecnico di Milano) and the MOOC Técnico (Instituto Superior Técnico, University of Lisbon), together with other partners, the opportunity to support the design and creation of gender-inclusive MOOCs. Among other project outputs, we designed a toolkit and a framework that enabled the production of two MOOCs for undergraduate and graduate students in Science, Technology, Engineering and Maths (STEM) and used them as academic content free of gender stereotypes about intellectual ability. In this short paper, the authors aim to 1) briefly share the main outputs of the project; 2) tell the story of how the FOSTWOM approach together with 3) a motivational strategy, the Heroine’s Learning Journey, proved to be effective in the context of rural and marginal areas in Brazil, with young girls as a specific target audience.
This work explores the use of different generative AI tools in the design of MOOC courses. Authors in this experience employed a variety of AI-based tools, including natural language processing tools (e.g. Chat-GPT), and multimedia content authoring tools (e.g. DALLE-2, Midjourney, Tome.ai) to assist in the course design process. The aim was to address the unique challenges of MOOC course design, which includes to create engaging and effective content, to design interactive learning activities, and to assess student learning outcomes. The authors identified positive results with the incorporation of AI-based tools, which significantly improved the quality and effectiveness of MOOC course design. The tools proved particularly effective in analyzing and categorizing course content, identifying key learning objectives, and designing interactive learning activities that engaged students and facilitated learning. Moreover, the use of AI-based tools, streamlined the course design process, significantly reducing the time required to design and prepare the courses. In conclusion, the integration of generative AI tools into the MOOC course design process holds great potential for improving the quality and efficiency of these courses. Researchers and course designers should consider the advantages of incorporating generative AI tools into their design process to enhance their course offerings and facilitate student learning outcomes while also reducing the time and effort required for course development.
Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Risiken für Cyberressourcen können durch unbeabsichtigte oder absichtliche Bedrohungen entstehen. Dazu gehören Insider-Bedrohungen von unzufriedenen oder nachlässigen Mitarbeitern und Partnern, eskalierende und aufkommende Bedrohungen aus aller Welt, die stetige Weiterentwicklung der Angriffstechnologien und die Entstehung neuer und zerstörerischer Angriffe. Informationstechnik spielt mittlerweile in allen Bereichen des Lebens eine entscheidende Rolle, u. a. auch im Bereich des Militärs. Ein ineffektiver Schutz von Cyberressourcen kann hier Sicherheitsvorfälle und Cyberattacken erleichtern, welche die kritischen Vorgänge stören, zu unangemessenem Zugriff, Offenlegung, Änderung oder Zerstörung sensibler Informationen führen und somit die nationale Sicherheit, das wirtschaftliche Wohlergehen sowie die öffentliche Gesundheit und Sicherheit gefährden. Oftmals ist allerdings nicht klar, welche Bedrohungen konkret vorhanden sind und welche der kritischen Systemressourcen besonders gefährdet ist.
In dieser Dissertation werden verschiedene Analyseverfahren für Bedrohungen in militärischer Informationstechnik vorgeschlagen und in realen Umgebungen getestet. Dies bezieht sich auf Infrastrukturen, IT-Systeme, Netze und Anwendungen, welche Verschlusssachen (VS)/Staatsgeheimnisse verarbeiten, wie zum Beispiel bei militärischen oder Regierungsorganisationen. Die Besonderheit an diesen Organisationen ist das Konzept der Informationsräume, in denen verschiedene Datenelemente, wie z. B. Papierdokumente und Computerdateien, entsprechend ihrer Sicherheitsempfindlichkeit eingestuft werden, z. B. „STRENG GEHEIM“, „GEHEIM“, „VS-VERTRAULICH“, „VS-NUR-FÜR-DEN-DIENSTGEBRAUCH“ oder „OFFEN“.
Die Besonderheit dieser Arbeit ist der Zugang zu eingestuften Informationen aus verschiedenen Informationsräumen und der Prozess der Freigabe dieser. Jede in der Arbeit entstandene Veröffentlichung wurde mit Angehörigen in der Organisation besprochen, gegengelesen und freigegeben, so dass keine eingestuften Informationen an die Öffentlichkeit gelangen.
Die Dissertation beschreibt zunächst Bedrohungsklassifikationsschemen und Angreiferstrategien, um daraus ein ganzheitliches, strategiebasiertes Bedrohungsmodell für Organisationen abzuleiten. Im weiteren Verlauf wird die Erstellung und Analyse eines Sicherheitsdatenflussdiagramms definiert, welches genutzt wird, um in eingestuften Informationsräumen operationelle Netzknoten zu identifizieren, die aufgrund der Bedrohungen besonders gefährdet sind. Die spezielle, neuartige Darstellung ermöglicht es, erlaubte und verbotene Informationsflüsse innerhalb und zwischen diesen Informationsräumen zu verstehen.
Aufbauend auf der Bedrohungsanalyse werden im weiteren Verlauf die Nachrichtenflüsse der operationellen Netzknoten auf Verstöße gegen Sicherheitsrichtlinien analysiert und die Ergebnisse mit Hilfe des Sicherheitsdatenflussdiagramms anonymisiert dargestellt. Durch Anonymisierung der Sicherheitsdatenflussdiagramme ist ein Austausch mit externen Experten zur Diskussion von Sicherheitsproblematiken möglich.
Der dritte Teil der Arbeit zeigt, wie umfangreiche Protokolldaten der Nachrichtenflüsse dahingehend untersucht werden können, ob eine Reduzierung der Menge an Daten möglich ist. Dazu wird die Theorie der groben Mengen aus der Unsicherheitstheorie genutzt. Dieser Ansatz wird in einer Fallstudie, auch unter Berücksichtigung von möglichen auftretenden Anomalien getestet und ermittelt, welche Attribute in Protokolldaten am ehesten redundant sind.
openHPI
(2022)
On the occasion of the 10th openHPI anniversary, this technical report provides information about the HPI MOOC platform, including its core features, technology, and architecture.
In an introduction, the platform family with all partner platforms is presented; these now amount to nine platforms, including openHPI. This section introduces openHPI as an advisor and research partner in various projects.
In the second chapter, the functionalities and common course formats of the platform are presented. The functionalities are divided into learner and admin features. The learner features section provides detailed information about performance records, courses, and the learning materials of which a course is composed: videos, texts, and quizzes. In addition, the learning materials can be enriched by adding external exercise tools that communicate with the HPI MOOC platform via the Learning Tools Interoperability (LTI) standard. Furthermore, the concept of peer assessments completed the possible learning materials.
The section then proceeds with further information on the discussion forum, a fundamental concept of MOOCs compared to traditional e-learning offers. The section is concluded with a description of the quiz recap, learning objectives, mobile applications, gameful learning, and the help desk.
The next part of this chapter deals with the admin features. The described functionality is restricted to describing the news and announcements, dashboards and statistics, reporting capabilities, research options with A/B testing, the course feed, and the TransPipe tool to support the process of creating automated or manual subtitles. The platform supports a large variety of additional features, but a detailed description of these features goes beyond the scope of this report.
The chapter then elaborates on common course formats and openHPI teaching activities at the HPI. The chapter concludes with some best practices for course design and delivery.
The third chapter provides insights into the technology and architecture behind openHPI. A special characteristic of the openHPI project is the conscious decision to operate the complete application from bare metal to platform development. Hence, the chapter starts with a section about the openHPI Cloud, including detailed information about the data center and devices, the used cloud software OpenStack and Ceph, as well as the openHPI Cloud Service provided for the HPI.
Afterward, a section on the application technology stack and development tooling describes the application infrastructure components, the used automation, the deployment pipeline, and the tools used for monitoring and alerting. The chapter is concluded with detailed information about the technology stack and concrete platform implementation details. The section describes the service-oriented Ruby on Rails application, inter-service communication, and public APIs. It also provides more information on the design system and components used in the application. The section concludes with a discussion of the original microservice architecture, where we share our insights and reasoning for migrating back to a monolithic application.
The last chapter provides a summary and an outlook on the future of digital education.
openHPI
(2022)
Anlässlich des 10-jährigen Jubiläums von openHPI informiert dieser technische Bericht über die HPI-MOOC-Plattform einschließlich ihrer Kernfunktionen, Technologie und Architektur.
In einer Einleitung wird die Plattformfamilie mit allen Partnerplattformen vorgestellt; diese belaufen sich inklusive openHPI aktuell auf neun Plattformen. In diesem Abschnitt wird außerdem gezeigt, wie openHPI als Berater und Forschungspartner in verschiedenen Projekten fungiert.
Im zweiten Kapitel werden die Funktionalitäten und gängigen Kursformate der Plattform präsentiert. Die Funktionalitäten sind in Lerner- und Admin-Funktionen unterteilt. Der Bereich Lernerfunktionen bietet detaillierte Informationen zu Leistungsnachweisen, Kursen und den Lernmaterialien, aus denen sich ein Kurs zusammensetzt: Videos, Texte und Quiz. Darüber hinaus können die Lernmaterialien durch externe Übungstools angereichert werden, die über den Standard Learning Tools Interoperability (LTI) mit der HPI MOOC-Plattform kommunizieren. Das Konzept der Peer-Assessments rundet die möglichen Lernmaterialien ab.
Der Abschnitt geht dann weiter auf das Diskussionsforum ein, das einen grundlegenden Unterschied von MOOCs im Vergleich zu traditionellen E-Learning-Angeboten darstellt. Zum Abschluss des Abschnitts folgen eine Beschreibung von Quiz-Recap, Lernzielen, mobilen Anwendungen, spielerischen Lernens und dem Helpdesk.
Der nächste Teil dieses Kapitels beschäftigt sich mit den Admin-Funktionen. Die Funktionalitätsbeschreibung beschränkt sich Neuigkeiten und Ankündigungen, Dashboards und Statistiken, Berichtsfunktionen, Forschungsoptionen mit A/B-Tests, den Kurs-Feed und das TransPipe-Tool zur Unterstützung beim Erstellen von automatischen oder manuellen Untertiteln. Die Plattform unterstützt außerdem eine Vielzahl zusätzlicher Funktionen, doch eine detaillierte Beschreibung dieser Funktionen würde den Rahmen des Berichts sprengen.
Das Kapitel geht dann auf gängige Kursformate und openHPI-Lehrveranstaltungen am HPI ein, bevor es mit einigen Best Practices für die Gestaltung und Durchführung von Kursen schließt.
Zum Abschluss des technischen Berichts gibt das letzte Kapitel eine Zusammenfassung und einen Ausblick auf die Zukunft der digitalen Bildung.
Ein besonderes Merkmal des openHPI-Projekts ist die bewusste Entscheidung, die komplette Anwendung von den physischen Netzwerkkomponenten bis zur Plattformentwicklung eigenständig zu betreiben. Bei der vorliegenden deutschen Variante handelt es sich um eine gekürzte Übersetzung des technischen Berichts 148, bei der kein Einblick in die Technologien und Architektur von openHPI gegeben wird. Interessierte Leser:innen können im technischen Bericht 148 (vollständige englische Version) detaillierte Informationen zum Rechenzentrum und den Geräten, der Cloud-Software und dem openHPI Cloud Service aber auch zu Infrastruktur-Anwendungskomponenten wie Entwicklungstools, Automatisierung, Deployment-Pipeline und Monitoring erhalten. Außerdem finden sich dort weitere Informationen über den Technologiestack und konkrete Implementierungsdetails der Plattform inklusive der serviceorientierten Ruby on Rails-Anwendung, die Kommunikation zwischen den Diensten, öffentliche APIs, sowie Designsystem und -komponenten. Der Abschnitt schließt mit einer Diskussion über die ursprüngliche Microservice-Architektur und die Migration zu einer monolithischen Anwendung.
Die HPI Schul-Cloud
(2019)
Die digitale Transformation durchdringt alle gesellschaftlichen Ebenen und Felder, nicht zuletzt auch das Bildungssystem. Dieses ist auf die Veränderungen kaum vorbereitet und begegnet ihnen vor allem auf Basis des Eigenengagements seiner Lehrer*innen. Strukturelle Reaktionen auf den Mangel an qualitativ hochwertigen Fortbildungen, auf schlecht ausgestattete Unterrichtsräume und nicht professionell gewartete Computersysteme gibt es erst seit kurzem. Doch auch wenn Beharrungskräfte unter Pädagog*innen verbreitet sind, erfordert die Transformation des Systems Schule auch eine neue Mentalität und neue Arbeits- und Kooperationsformen.
Zeitgemäßer Unterricht benötigt moderne Technologie und zeitgemäße IT-Architekturen. Nur Systeme, die für Lehrer*innen und Schüler*innen problemlos verfügbar, benutzerfreundlich zu bedienen und didaktisch flexibel einsetzbar sind, finden in Schulen Akzeptanz. Hierfür haben wir die HPI Schul-Cloud entwickelt. Sie ermöglicht den einfachen Zugang zu neuesten, professionell gewarteten Anwendungen, verschiedensten digitalen Medien, die Vernetzung verschiedener Lernorte und den rechtssicheren Einsatz von Kommunikations- und Kollaborationstools.
Die Entwicklung der HPI Schul-Cloud ist umso notwendiger, als dass rechtliche Anforderungen - insbesondere aus der Datenschutzgrundverordnung der EU herrührend - den Einsatz von Cloud-Anwendungen, die in der Arbeitswelt verbreitet sind, in Schulen unmöglich machen. Im Bildungsbereich verbreitete Anwendungen sind größtenteils technisch veraltet und nicht benutzerfreundlich.
Dies nötigt die Bundesländer zu kostspieligen Eigenentwicklungen mit Aufwänden im zweistelligen Millionenbereich - Projekte die teilweise gescheitert sind. Dank der modularen Micro-Service-Architektur können die Bundesländer zukünftig auf die HPI Schul-Cloud als technische Grundlage für ihre Eigen- oder Gemeinschaftsprojekte zurückgreifen. Hierfür gilt es, eine nachhaltige Struktur für die Weiterentwicklung der Open-Source-Software HPI Schul-Cloud zu schaffen.
Dieser Bericht beschreibt den Entwicklungsstand und die weiteren Perspektiven des Projekts HPI Schul-Cloud im Januar 2019. 96 Schulen deutschlandweit nutzen die HPI Schul-Cloud, bereitgestellt durch das Hasso-Plattner-Institut. Weitere 45 Schulen und Studienseminare nutzen die Niedersächsische Bildungscloud, die technisch auf der HPI Schul-Cloud basiert. Das vom Bundesministerium für Bildung und Forschung geförderte Projekt läuft in der gegenwärtigen Roll-Out-Phase bis zum 31. Juli 2021. Gemeinsam mit unserem Kooperationspartner MINT-EC streben wir an, die HPI Schul-Cloud möglichst an allen Schulen des Netzwerks einzusetzen.
Digitale Medien sind aus unserem Alltag kaum noch wegzudenken. Einer der zentralsten Bereiche für unsere Gesellschaft, die schulische Bildung, darf hier nicht hintanstehen. Wann immer der Einsatz digital unterstützter Tools pädagogisch sinnvoll ist, muss dieser in einem sicheren Rahmen ermöglicht werden können. Die HPI Schul-Cloud ist dieser Vision gefolgt, die vom Nationalen IT-Gipfel 2016 angestoßen wurde und dem Bericht vorangestellt ist – gefolgt. Sie hat sich in den vergangenen fünf Jahren vom Pilotprojekt zur unverzichtbaren IT-Infrastruktur für zahlreiche Schulen entwickelt. Während der Corona-Pandemie hat sie für viele Tausend Schulen wichtige Unterstützung bei der Umsetzung ihres Bildungsauftrags geboten. Das Ziel, eine zukunftssichere und datenschutzkonforme Infrastruktur zur digitalen Unterstützung des Unterrichts zur Verfügung zu stellen, hat sie damit mehr als erreicht. Aktuell greifen rund 1,4 Millionen Lehrkräfte und Schülerinnen und Schüler bundesweit und an den deutschen Auslandsschulen auf die HPI Schul-Cloud zu.
Blockchain
(2018)
The term blockchain has recently become a buzzword, but only few know what exactly lies behind this approach. According to a survey, issued in the first quarter of 2017, the term is only known by 35 percent of German medium-sized enterprise representatives. However, the blockchain technology is very interesting for the mass media because of its rapid development and global capturing of different markets.
For example, many see blockchain technology either as an all-purpose weapon— which only a few have access to—or as a hacker technology for secret deals in the darknet. The innovation of blockchain technology is found in its successful combination of already existing approaches: such as decentralized networks, cryptography, and consensus models. This innovative concept makes it possible to exchange values in a decentralized system. At the same time, there is no requirement for trust between its nodes (e.g. users).
With this study the Hasso Plattner Institute would like to help readers form their own opinion about blockchain technology, and to distinguish between truly innovative properties and hype.
The authors of the present study analyze the positive and negative properties of the blockchain architecture and suggest possible solutions, which can contribute to the efficient use of the technology. We recommend that every company define a clear target for the intended application, which is achievable with a reasonable cost-benefit ration, before deciding on this technology. Both the possibilities and the limitations of blockchain technology need to be considered. The relevant steps that must be taken in this respect are summarized /summed up for the reader in this study.
Furthermore, this study elaborates on urgent problems such as the scalability of the blockchain, appropriate consensus algorithm and security, including various types of possible attacks and their countermeasures. New blockchains, for example, run the risk of reducing security, as changes to existing technology can lead to lacks in the security and failures.
After discussing the innovative properties and problems of the blockchain technology, its implementation is discussed. There are a lot of implementation opportunities for companies available who are interested in the blockchain realization. The numerous applications have either their own blockchain as a basis or use existing and widespread blockchain systems. Various consortia and projects offer "blockchain-as-a-serviceänd help other companies to develop, test and deploy their own applications.
This study gives a detailed overview of diverse relevant applications and projects in the field of blockchain technology. As this technology is still a relatively young and fast developing approach, it still lacks uniform standards to allow the cooperation of different systems and to which all developers can adhere. Currently, developers are orienting themselves to Bitcoin, Ethereum and Hyperledger systems, which serve as the basis for many other blockchain applications.
The goal is to give readers a clear and comprehensive overview of blockchain technology and its capabilities.
Digitale Technologien bieten erhebliche politische, wirtschaftliche und gesellschaftliche Chancen. Zugleich ist der Begriff digitale Souveränität zu einem Leitmotiv im deutschen Diskurs über digitale Technologien geworden: das heißt, die Fähigkeit des Staates, seine Verantwortung wahrzunehmen und die Befähigung der Gesellschaft – und des Einzelnen – sicherzustellen, die digitale Transformation selbstbestimmt zu gestalten. Exemplarisch für die Herausforderung in Deutschland und Europa, die Vorteile digitaler Technologien zu nutzen und gleichzeitig Souveränitätsbedenken zu berücksichtigen, steht der Bildungssektor. Er umfasst Bildung als zentrales öffentliches Gut, ein schnell aufkommendes Geschäftsfeld und wachsende Bestände an hochsensiblen personenbezogenen Daten. Davon ausgehend beschreibt der Bericht Wege zur Entschärfung des Spannungsverhältnisses zwischen Digitalisierung und Souveränität auf drei verschiedenen Ebenen – Staat, Wirtschaft und Individuum – anhand konkreter technischer Projekte im Bildungsbereich: die HPI Schul-Cloud (staatliche Souveränität), die MERLOT-Datenräume (wirtschaftliche Souveränität) und die openHPI-Plattform (individuelle Souveränität).
Digital technology offers significant political, economic, and societal opportunities. At the same time, the notion of digital sovereignty has become a leitmotif in German discourse: the state’s capacity to assume its responsibilities and safeguard society’s – and individuals’ – ability to shape the digital transformation in a self-determined way. The education sector is exemplary for the challenge faced by Germany, and indeed Europe, of harnessing the benefits of digital technology while navigating concerns around sovereignty. It encompasses education as a core public good, a rapidly growing field of business, and growing pools of highly sensitive personal data. The report describes pathways to mitigating the tension between digitalization and sovereignty at three different levels – state, economy, and individual – through the lens of concrete technical projects in the education sector: the HPI Schul-Cloud (state sovereignty), the MERLOT data spaces (economic sovereignty), and the openHPI platform (individual sovereignty).
Proceedings of the HPI Research School on Service-oriented Systems Engineering 2020 Fall Retreat
(2021)
Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application.
Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns.
The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.
The analysis of behavioral models is of high importance for cyber-physical systems, as the systems often encompass complex behavior based on e.g. concurrent components with mutual exclusion or probabilistic failures on demand. The rule-based formalism of probabilistic timed graph transformation systems is a suitable choice when the models representing states of the system can be understood as graphs and timed and probabilistic behavior is important. However, model checking PTGTSs is limited to systems with rather small state spaces.
We present an approach for the analysis of large scale systems modeled as probabilistic timed graph transformation systems by systematically decomposing their state spaces into manageable fragments. To obtain qualitative and quantitative analysis results for a large scale system, we verify that results obtained for its fragments serve as overapproximations for the corresponding results of the large scale system. Hence, our approach allows for the detection of violations of qualitative and quantitative safety properties for the large scale system under analysis. We consider a running example in which we model shuttles driving on tracks of a large scale topology and for which we verify that shuttles never collide and are unlikely to execute emergency brakes. In our evaluation, we apply an implementation of our approach to the running example.
The formal modeling and analysis is of crucial importance for software development processes following the model based approach. We present the formalism of Interval Probabilistic Timed Graph Transformation Systems (IPTGTSs) as a high-level modeling language. This language supports structure dynamics (based on graph transformation), timed behavior (based on clocks, guards, resets, and invariants as in Timed Automata (TA)), and interval probabilistic behavior (based on Discrete Interval Probability Distributions). That is, for the probabilistic behavior, the modeler using IPTGTSs does not need to provide precise probabilities, which are often impossible to obtain, but rather provides a probability range instead from which a precise probability is chosen nondeterministically. In fact, this feature on capturing probabilistic behavior distinguishes IPTGTSs from Probabilistic Timed Graph Transformation Systems (PTGTSs) presented earlier.
Following earlier work on Interval Probabilistic Timed Automata (IPTA) and PTGTSs, we also provide an analysis tool chain for IPTGTSs based on inter-formalism transformations. In particular, we provide in our tool AutoGraph a translation of IPTGTSs to IPTA and rely on a mapping of IPTA to Probabilistic Timed Automata (PTA) to allow for the usage of the Prism model checker. The tool Prism can then be used to analyze the resulting PTA w.r.t. probabilistic real-time queries asking for worst-case and best-case probabilities to reach a certain set of target states in a given amount of time.
Information technology and digital solutions as enablers in the tourism sector require continuous development of skills, as digital transformation is characterized by fast change, complexity and uncertainty. This research investigates how a cMOOC concept could support the tourism industry. A consortium of three universities, a tourism association, and a tourist attraction investigates online learning needs and habits of tourism industry stakeholders in the field of digitalization in a cross-border study in the Baltic Sea region. The multi-national survey (n = 244) reveals a high interest in participating in an online learning community, with two-thirds of respondents seeing opportunities to contributing to such community apart from consuming knowledge. The paper demonstrates preferred ways of learning, motivational and hampering aspects as well as types of possible contributions.
Eskalation des Commitments in Wirtschaftsinformatik Projekten: eine kognitiv-affektive Perspektive
(2024)
Projekte im Bereich der Wirtschaftsinformatik (IS-Projekte) sind von zentraler Bedeutung für die Steuerung von Unternehmensstrategien und die Aufrechterhaltung von Wettbewerbsvorteilen, überschreiten jedoch häufig das Budget, sprengen den Zeitrahmen und weisen eine hohe Misserfolgsquote auf. Diese Dissertation befasst sich mit den psychologischen Grundlagen menschlichen Verhaltens - insbesondere Kognition und Emotion - im Zusammenhang mit einem weit verbreiteten Problem im IS-Projektmanagement: der Tendenz, an fehlgehenden Handlungssträngen festzuhalten, auch Eskalation des Commitments (Englisch: “escalation of commitment” - EoC) genannt.
Mit einem kombinierten Forschungsansatz (dem Mix von qualitativen und quantitativen Methoden) untersuche ich in meiner Dissertation die emotionalen und kognitiven Grundlagen der Entscheidungsfindung hinter eskalierendem Commitment zu scheiternden IS-Projekten und deren Entwicklung über die Zeit. Die Ergebnisse eines psychophysiologischen Laborexperiments liefern Belege auf die Vorhersagen bezüglich der Rolle von negativen und komplexen situativen Emotionen der kognitiven Dissonanz Theorie gegenüber der Coping-Theorie und trägt zu einem besseren Verständnis dafür bei, wie sich Eskalationstendenzen während sequenzieller Entscheidungsfindung aufgrund kognitiver Lerneffekte verändern. Mit Hilfe psychophysiologischer Messungen, einschließlich der Daten-Triangulation zwischen elektrodermaler und kardiovaskulärer Aktivität sowie künstliche Intelligenz-basierter Analyse von Gesichtsmikroexpressionen, enthüllt diese Forschung physiologische Marker für eskalierendes Commitment. Ergänzend zu dem Experiment zeigt eine qualitative Analyse text-basierter Reflexionen während der Eskalationssituationen, dass Entscheidungsträger verschiedene kognitive Begründungsmuster verwenden, um eskalierende Verhaltensweisen zu rechtfertigen, die auf eine Sequenz von vier unterschiedlichen kognitiven Phasen schließen lassen.
Durch die Integration von qualitativen und quantitativen Erkenntnissen entwickelt diese Dissertation ein umfassendes theoretisches Model dafür, wie Kognition und Emotion eskalierendes Commitment über die Zeit beeinflussen. Ich schlage vor, dass eskalierendes Commitment eine zyklische Anpassung von Denkmodellen ist, die sich durch Veränderungen in kognitiven Begründungsmustern, Variationen im zeitlichen Kognitionsmodus und Interaktionen mit situativen Emotionen und deren Erwartung auszeichnet. Der Hauptbeitrag dieser Arbeit liegt in der Entflechtung der emotionalen und kognitiven Mechanismen, die eskalierendes Commitment im Kontext von IS-Projekten antreiben. Die Erkenntnisse tragen dazu bei, die Qualität von Entscheidungen unter Unsicherheit zu verbessern und liefern die Grundlage für die Entwicklung von Deeskalationsstrategien. Beteiligte an „in Schieflage geratenden“ IS-Projekten sollten sich der Tendenz auf fehlgeschlagenen Aktionen zu beharren und der Bedeutung der zugrundeliegenden emotionalen und kognitiven Dynamiken bewusst sein.
Virtualizing physical space
(2021)
The true cost for virtual reality is not the hardware, but the physical space it requires, as a one-to-one mapping of physical space to virtual space allows for the most immersive way of navigating in virtual reality. Such “real-walking” requires physical space to be of the same size and the same shape of the virtual world represented. This generally prevents real-walking applications from running on any space that they were not designed for.
To reduce virtual reality’s demand for physical space, creators of such applications let users navigate virtual space by means of a treadmill, altered mappings of physical to virtual space, hand-held controllers, or gesture-based techniques. While all of these solutions succeed at reducing virtual reality’s demand for physical space, none of them reach the same level of immersion that real-walking provides.
Our approach is to virtualize physical space: instead of accessing physical space directly, we allow applications to express their need for space in an abstract way, which our software systems then map to the physical space available. We allow real-walking applications to run in spaces of different size, different shape, and in spaces containing different physical objects. We also allow users immersed in different virtual environments to share the same space.
Our systems achieve this by using a tracking volume-independent representation of real-walking experiences — a graph structure that expresses the spatial and logical relationships between virtual locations, virtual elements contained within those locations, and user interactions with those elements. When run in a specific physical space, this graph representation is used to define a custom mapping of the elements of the virtual reality application and the physical space by parsing the graph using a constraint solver. To re-use space, our system splits virtual scenes and overlap virtual geometry. The system derives this split by means of hierarchically clustering of our virtual objects as nodes of our bi-partite directed graph that represents the logical ordering of events of the experience. We let applications express their demands for physical space and use pre-emptive scheduling between applications to have them share space. We present several application examples enabled by our system. They all enable real-walking, despite being mapped to physical spaces of different size and shape, containing different physical objects or other users.
We see substantial real-world impact in our systems. Today’s commercial virtual reality applications are generally designing to be navigated using less immersive solutions, as this allows them to be operated on any tracking volume. While this is a commercial necessity for the developers, it misses out on the higher immersion offered by real-walking. We let developers overcome this hurdle by allowing experiences to bring real-walking to any tracking volume, thus potentially bringing real-walking to consumers.
Die eigentlichen Kosten für Virtual Reality Anwendungen entstehen nicht primär durch die erforderliche Hardware, sondern durch die Nutzung von physischem Raum, da die eins-zu-eins Abbildung von physischem auf virtuellem Raum die immersivste Art von Navigation ermöglicht. Dieses als „Real-Walking“ bezeichnete Erlebnis erfordert hinsichtlich Größe und Form eine Entsprechung von physischem Raum und virtueller Welt. Resultierend daraus können Real-Walking-Anwendungen nicht an Orten angewandt werden, für die sie nicht entwickelt wurden.
Um den Bedarf an physischem Raum zu reduzieren, lassen Entwickler von Virtual Reality-Anwendungen ihre Nutzer auf verschiedene Arten navigieren, etwa mit Hilfe eines Laufbandes, verfälschten Abbildungen von physischem zu virtuellem Raum, Handheld-Controllern oder gestenbasierten Techniken. All diese Lösungen reduzieren zwar den Bedarf an physischem Raum, erreichen jedoch nicht denselben Grad an Immersion, den Real-Walking bietet.
Unser Ansatz zielt darauf, physischen Raum zu virtualisieren: Anstatt auf den physischen Raum direkt zuzugreifen, lassen wir Anwendungen ihren Raumbedarf auf abstrakte Weise formulieren, den unsere Softwaresysteme anschließend auf den verfügbaren physischen Raum abbilden. Dadurch ermöglichen wir Real-Walking-Anwendungen Räume mit unterschiedlichen Größen und Formen und Räume, die unterschiedliche physische Objekte enthalten, zu nutzen. Wir ermöglichen auch die zeitgleiche Nutzung desselben Raums durch mehrere Nutzer verschiedener Real-Walking-Anwendungen.
Unsere Systeme erreichen dieses Resultat durch eine Repräsentation von Real-Walking-Erfahrungen, die unabhängig sind vom gegebenen Trackingvolumen – eine Graphenstruktur, die die räumlichen und logischen Beziehungen zwischen virtuellen Orten, den virtuellen Elementen innerhalb dieser Orte, und Benutzerinteraktionen mit diesen Elementen, ausdrückt. Bei der Instanziierung der Anwendung in einem bestimmten physischen Raum wird diese Graphenstruktur und ein Constraint Solver verwendet, um eine individuelle Abbildung der virtuellen Elemente auf den physischen Raum zu erreichen. Zur mehrmaligen Verwendung des Raumes teilt unser System virtuelle Szenen und überlagert virtuelle Geometrie. Das System leitet diese Aufteilung anhand eines hierarchischen Clusterings unserer virtuellen Objekte ab, die als Knoten unseres bi-partiten, gerichteten Graphen die logische Reihenfolge aller Ereignisse repräsentieren. Wir verwenden präemptives Scheduling zwischen den Anwendungen für die zeitgleiche Nutzung von physischem Raum. Wir stellen mehrere Anwendungsbeispiele vor, die Real-Walking ermöglichen – in physischen Räumen mit unterschiedlicher Größe und Form, die verschiedene physische Objekte oder weitere Nutzer enthalten.
Wir sehen in unseren Systemen substantielles Potential. Heutige Virtual Reality-Anwendungen sind bisher zwar so konzipiert, dass sie auf einem beliebigen Trackingvolumen betrieben werden können, aber aus kommerzieller Notwendigkeit kein Real-Walking beinhalten. Damit entgeht Entwicklern die Gelegenheit eine höhere Immersion herzustellen. Indem wir es ermöglichen, Real-Walking auf jedes Trackingvolumen zu bringen, geben wir Entwicklern die Möglichkeit Real-Walking zu ihren Nutzern zu bringen.
Business process management (BPM) deals with modeling, executing, monitoring, analyzing, and improving business processes. During execution, the process communicates with its environment to get relevant contextual information represented as events. Recent development of big data and the Internet of Things (IoT) enables sources like smart devices and sensors to generate tons of events which can be filtered, grouped, and composed to trigger and drive business processes.
The industry standard Business Process Model and Notation (BPMN) provides several event constructs to capture the interaction possibilities between a process and its environment, e.g., to instantiate a process, to abort an ongoing activity in an exceptional situation, to take decisions based on the information carried by the events, as well as to choose among the alternative paths for further process execution. The specifications of such interactions are termed as event handling. However, in a distributed setup, the event sources are most often unaware of the status of process execution and therefore, an event is produced irrespective of the process being ready to consume it. BPMN semantics does not support such scenarios and thus increases the chance of processes getting delayed or getting in a deadlock by missing out on event occurrences which might still be relevant.
The work in this thesis reviews the challenges and shortcomings of integrating real-world events into business processes, especially the subscription management. The basic integration is achieved with an architecture consisting of a process modeler, a process engine, and an event processing platform. Further, points of subscription and unsubscription along the process execution timeline are defined for different BPMN event constructs. Semantic and temporal dependencies among event subscription, event occurrence, event consumption and event unsubscription are considered. To this end, an event buffer with policies for updating the buffer, retrieving the most suitable event for the current process instance, and reusing the event has been discussed that supports issuing of early subscription.
The Petri net mapping of the event handling model provides our approach with a translation of semantics from a business process perspective. Two applications based on this formal foundation are presented to support the significance of different event handling configurations on correct process execution and reachability of a process path. Prototype implementations of the approaches show that realizing flexible event handling is feasible with minor extensions of off-the-shelf process engines and event platforms.
The MOOC-CEDIA Observatory
(2021)
In the last few years, an important amount of Massive Open Online Courses (MOOCS) has been made available to the worldwide community, mainly by European and North American universities (i.e. United States). Since its emergence, the adoption of these educational resources has been widely studied by several research groups and universities with the aim of understanding their evolution and impact in educational models, through the time. In the case of Latin America, data from the MOOC-UC Observatory (updated until 2018) shows that, the adoption of these courses by universities in the region has been slow and heterogeneous. In the specific case of Ecuador, although some data is available, there is lack of information regarding the construction, publication and/or adoption of such courses by universities in the country. Moreover, there are not updated studies designed to identify and analyze the barriers and factors affecting the adoption of MOOCs in the country. The aim of this work is to present the MOOC-CEDIA Observatory, a web platform that offers interactive visualizations on the adoption of MOOCs in Ecuador. The main results of the study show that: (1) until 2020 there have been 99 MOOCs in Ecuador, (2) the domains of MOOCs are mostly related to applied sciences, social sciences and natural sciences, with the humanities being the least covered, (3) Open edX and Moodle are the most widely used platforms to deploy such courses. It is expected that the conclusions drawn from this analysis, will allow the design of recommendations aimed to promote the creation and use of quality MOOCs in Ecuador and help institutions to chart the route for their adoption, both for internal use by their community but also by society in general.