publish.UP Search

On the operationalization of graph queries with generalized discrimination networks (2016)

Beyhl, Thomas ; Blouin, Dominique ; Giese, Holger ; Lambers, Leen

Graph queries have lately gained increased interest due to application areas such as social networks, biological networks, or model queries. For the relational database case the relational algebra and generalized discrimination networks have been studied to find appropriate decompositions into subqueries and ordering of these subqueries for query evaluation or incremental updates of query results. For graph database queries however there is no formal underpinning yet that allows us to find such suitable operationalizations. Consequently, we suggest a simple operational concept for the decomposition of arbitrary complex queries into simpler subqueries and the ordering of these subqueries in form of generalized discrimination networks for graph queries inspired by the relational case. The approach employs graph transformation rules for the nodes of the network and thus we can employ the underlying theory. We further show that the proposed generalized discrimination networks have the same expressive power as nested graph conditions.

Proceedings of the 10th Ph.D. Retreat of the HPI Research School on Service-oriented Systems Engineering (2016)

Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Commonly used technologies, such as J2EE and .NET, form de facto standards for the realization of complex distributed systems. Evolution of component systems has lead to web services and service-based architectures. This has been manifested in a multitude of industry standards and initiatives such as XML, WSDL UDDI, SOAP, etc. All these achievements lead to a new and promising paradigm in IT systems engineering which proposes to design complex software solutions as collaboration of contractually defined software services. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the research school, this technical report covers a wide range of topics. These include but are not limited to: Human Computer Interaction and Computer Vision as Service; Service-oriented Geovisualization Systems; Algorithm Engineering for Service-oriented Systems; Modeling and Verification of Self-adaptive Service-oriented Systems; Tools and Methods for Software Engineering in Service-oriented Systems; Security Engineering of Service-based IT Systems; Service-oriented Information Systems; Evolutionary Transition of Enterprise Applications to Service Orientation; Operating System Abstractions for Service-oriented Computing; and Services Specification, Composition, and Enactment.

An interactive platform to simulate dynamic pricing competition on online marketplaces (2017)

Serth, Sebastian ; Podlesny, Nikolai ; Bornstein, Marvin ; Lindemann, Jan ; Latt, Johanna ; Selke, Jan ; Schlosser, Rainer ; Boissier, Martin ; Uflacker, Matthias

E-commerce marketplaces are highly dynamic with constant competition. While this competition is challenging for many merchants, it also provides plenty of opportunities, e.g., by allowing them to automatically adjust prices in order to react to changing market situations. For practitioners however, testing automated pricing strategies is time-consuming and potentially hazardously when done in production. Researchers, on the other side, struggle to study how pricing strategies interact under heavy competition. As a consequence, we built an open continuous time framework to simulate dynamic pricing competition called Price Wars. The microservice-based architecture provides a scalable platform for large competitions with dozens of merchants and a large random stream of consumers. Our platform stores each event in a distributed log. This allows to provide different performance measures enabling users to compare profit and revenue of various repricing strategies in real-time. For researchers, price trajectories are shown which ease evaluating mutual price reactions of competing strategies. Furthermore, merchants can access historical marketplace data and apply machine learning. By providing a set of customizable, artificial merchants, users can easily simulate both simple rule-based strategies as well as sophisticated data-driven strategies using demand learning to optimize their pricing strategies.

Next generation cooperative wearables (2017)

Seiffert, Martin ; Holstein, Flavio ; Schlosser, Rainer ; Schiller, Jochen

Currently available wearables are usually based on a single sensor node with integrated capabilities for classifying different activities. The next generation of cooperative wearables could be able to identify not only activities, but also to evaluate them qualitatively using the data of several sensor nodes attached to the body, to provide detailed feedback for the improvement of the execution. Especially within the application domains of sports and health-care, such immediate feedback to the execution of body movements is crucial for (re-) learning and improving motor skills. To enable such systems for a broad range of activities, generalized approaches for human motion assessment within sensor networks are required. In this paper, we present a generalized trainable activity assessment chain (AAC) for the online assessment of periodic human activity within a wireless body area network. AAC evaluates the execution of separate movements of a prior trained activity on a fine-grained quality scale. We connect qualitative assessment with human knowledge by projecting the AAC on the hierarchical decomposition of motion performed by the human body as well as establishing the assessment on a kinematic evaluation of biomechanically distinct motion fragments. We evaluate AAC in a real-world setting and show that AAC successfully delimits the movements of correctly performed activity from faulty executions and provides detailed reasons for the activity assessment.

Mixed-projection treemaps (2017)

Limberger, Daniel ; Scheibel, Willy ; Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

This paper presents a novel technique for combining 2D and 2.5D treemaps using multi-perspective views to leverage the advantages of both treemap types. It enables a new form of overview+detail visualization for tree-structured data and contributes new concepts for real-time rendering of and interaction with treemaps. The technique operates by tilting the graphical elements representing inner nodes using affine transformations and animated state transitions. We explain how to mix orthogonal and perspective projections within a single treemap. Finally, we show application examples that benefit from the reduced interaction overhead.

Implementing record and refinement for debugging timing-dependent communication (2017)

Felgentreff, Tim ; Perscheid, Michael ; Hirschfeld, Robert

Distributed applications are hard to debug because timing-dependent network communication is a source of non-deterministic behavior. Current approaches to debug non deterministic failures include post-mortem debugging as well as record and replay. However, the first impairs system performance to gather data, whereas the latter requires developers to understand the timing-dependent communication at a lower level of abstraction than they develop at. Furthermore, both approaches require intrusive core library modifications to gather data from live systems. In this paper, we present the Peek-At-Talk debugger for investigating non-deterministic failures with low overhead in a systematic, top-down method, with a particular focus on tool-building issues in the following areas: First, we show how our debugging framework Path Tools guides developers from failures to their root causes and gathers run-time data with low overhead. Second, we present Peek-At-Talk, an extension to our Path Tools framework to record non-deterministic communication and refine behavioral data that connects source code with network events. Finally, we scope changes to the core library to record network communication without impacting other network applications.

Formal models and analysis for self-adaptive cyber-physical systems (2017)

Giese, Holger

In this extended abstract, we will analyze the current challenges for the envisioned Self-Adaptive CPS. In addition, we will outline our results to approach these challenges with SMARTSOS [10] a generic approach based on extensions of graph transformation systems employing open and adaptive collaborations and models at runtime for trustworthy self-adaptation, self-organization, and evolution of the individual systems and the system-of-systems level taking the independent development, operation, management, and evolution of these systems into account.

SPARQL with property paths on the Web (2017)

Hartig, Olaf ; Pirrò, Giuseppe

Linked Data on the Web represents an immense source of knowledge suitable to be automatically processed and queried. In this respect, there are different approaches for Linked Data querying that differ on the degree of centralization adopted. On one hand, the SPARQL query language, originally defined for querying single datasets, has been enhanced with features to query federations of datasets; however, this attempt is not sufficient to cope with the distributed nature of data sources available as Linked Data. On the other hand, extensions or variations of SPARQL aim to find trade-offs between centralized and fully distributed querying. The idea is to partially move the computational load from the servers to the clients. Despite the variety and the relative merits of these approaches, as of today, there is no standard language for querying Linked Data on theWeb. A specific requirement for such a language to capture the distributed, graph-like nature of Linked Data sources on the Web is a support of graph navigation. Recently, SPARQL has been extended with a navigational feature called property paths (PPs). However, the semantics of SPARQL restricts the scope of navigation via PPs to single RDF graphs. This restriction limits the applicability of PPs for querying distributed Linked Data sources on the Web. To fill this gap, in this paper we provide formal foundations for evaluating PPs on the Web, thus contributing to the definition of a query language for Linked Data. We first introduce a family of reachability-based query semantics for PPs that distinguish between navigation on the Web and navigation at the data level. Thereafter, we consider another, alternative query semantics that couples Web graph navigation and data level navigation; we call it context-based semantics. Given these semantics, we find that for some PP-based SPARQL queries a complete evaluation on the Web is not possible. To study this phenomenon we introduce a notion of Web-safeness of queries, and prove a decidable syntactic property that enables systems to identify queries that areWeb-safe. In addition to establishing these formal foundations, we conducted an experimental comparison of the context-based semantics and a reachability- based semantics. Our experiments show that when evaluating a PP-based query under the context-based semantics one experiences a significantly smaller number of dereferencing operations, but the computed query result may contain less solutions.

Optimal price reaction strategies in the presence of active and passive competitors (2017)

Schlosser, Rainer ; Boissier, Martin

Many markets are characterized by pricing competition. Typically, competitors are involved that adjust their prices in response to other competitors with different frequencies. We analyze stochastic dynamic pricing models under competition for the sale of durable goods. Given a competitor’s pricing strategy, we show how to derive optimal response strategies that take the anticipated competitor’s price adjustments into account. We study resulting price cycles and the associated expected long-term profits. We show that reaction frequencies have a major impact on a strategy’s performance. In order not to act predictable our model also allows to include randomized reaction times. Additionally, we study to which extent optimal response strategies of active competitors are affected by additional passive competitors that use constant prices. It turns out that optimized feedback strategies effectively avoid a decline in price. They help to gain profits, especially, when aggressive competitor s are involved.

Assessing NUMA performance based on hardware event counters (2017)

Plauth, Max ; Sterz, Christoph ; Eberhardt, Felix ; Feinbube, Frank ; Polze, Andreas

Cost models play an important role for the efficient implementation of software systems. These models can be embedded in operating systems and execution environments to optimize execution at run time. Even though non-uniform memory access (NUMA) architectures are dominating today's server landscape, there is still a lack of parallel cost models that represent NUMA system sufficiently. Therefore, the existing NUMA models are analyzed, and a two-step performance assessment strategy is proposed that incorporates low-level hardware counters as performance indicators. To support the two-step strategy, multiple tools are developed, all accumulating and enriching specific hardware event counter information, to explore, measure, and visualize these low-overhead performance indicators. The tools are showcased and discussed alongside specific experiments in the realm of performance assessment.

FaVe: Modeling IPv6 firewalls for fast formal verification (2017)

Lorenz, Claas ; Kiekheben, Sebastian ; Schnor, Bettina

As virtualization drives the automation of networking, the validation of security properties becomes more and more challenging eventually ruling out manual inspections. While formal verification in Software Defined Networks is provided by comprehensive tools with high speed reverification capabilities like NetPlumber for instance, the presence of middlebox functionality like firewalls is not considered. Also, they lack the ability to handle dynamic protocol elements like IPv6 extension header chains. In this work, we provide suitable modeling abstractions to enable both - the inclusion of firewalls and dynamic protocol elements. We exemplarily model the Linux ip6tables/netfilter packet filter and also provide abstractions for an application layer gateway. Finally, we present a prototype of our formal verification system FaVe.

A framework for incremental view graph maintenance (2017)

Beyhl, Thomas

Nowadays, graph data models are employed, when relationships between entities have to be stored and are in the scope of queries. For each entity, this graph data model locally stores relationships to adjacent entities. Users employ graph queries to query and modify these entities and relationships. These graph queries employ graph patterns to lookup all subgraphs in the graph data that satisfy certain graph structures. These subgraphs are called graph pattern matches. However, this graph pattern matching is NP-complete for subgraph isomorphism. Thus, graph queries can suffer a long response time, when the number of entities and relationships in the graph data or the graph patterns increases. One possibility to improve the graph query performance is to employ graph views that keep ready graph pattern matches for complex graph queries for later retrieval. However, these graph views must be maintained by means of an incremental graph pattern matching to keep them consistent with the graph data from which they are derived, when the graph data changes. This maintenance adds subgraphs that satisfy a graph pattern to the graph views and removes subgraphs that do not satisfy a graph pattern anymore from the graph views. Current approaches for incremental graph pattern matching employ Rete networks. Rete networks are discrimination networks that enumerate and maintain all graph pattern matches of certain graph queries by employing a network of condition tests, which implement partial graph patterns that together constitute the overall graph query. Each condition test stores all subgraphs that satisfy the partial graph pattern. Thus, Rete networks suffer high memory consumptions, because they store a large number of partial graph pattern matches. But, especially these partial graph pattern matches enable Rete networks to update the stored graph pattern matches efficiently, because the network maintenance exploits the already stored partial graph pattern matches to find new graph pattern matches. However, other kinds of discrimination networks exist that can perform better in time and space than Rete networks. Currently, these other kinds of networks are not used for incremental graph pattern matching. This thesis employs generalized discrimination networks for incremental graph pattern matching. These discrimination networks permit a generalized network structure of condition tests to enable users to steer the trade-off between memory consumption and execution time for the incremental graph pattern matching. For that purpose, this thesis contributes a modeling language for the effective definition of generalized discrimination networks. Furthermore, this thesis contributes an efficient and scalable incremental maintenance algorithm, which updates the (partial) graph pattern matches that are stored by each condition test. Moreover, this thesis provides a modeling evaluation, which shows that the proposed modeling language enables the effective modeling of generalized discrimination networks. Furthermore, this thesis provides a performance evaluation, which shows that a) the incremental maintenance algorithm scales, when the graph data becomes large, and b) the generalized discrimination network structures can outperform Rete network structures in time and space at the same time for incremental graph pattern matching.

k-Inductive invariant checking for graph transformation systems (2017)

Dyck, Johannes ; Giese, Holger

While offering significant expressive power, graph transformation systems often come with rather limited capabilities for automated analysis, particularly if systems with many possible initial graphs and large or infinite state spaces are concerned. One approach that tries to overcome these limitations is inductive invariant checking. However, the verification of inductive invariants often requires extensive knowledge about the system in question and faces the approach-inherent challenges of locality and lack of context. To address that, this report discusses k-inductive invariant checking for graph transformation systems as a generalization of inductive invariants. The additional context acquired by taking multiple (k) steps into account is the key difference to inductive invariant checking and is often enough to establish the desired invariants without requiring the iterative development of additional properties. To analyze possibly infinite systems in a finite fashion, we introduce a symbolic encoding for transformation traces using a restricted form of nested application conditions. As its central contribution, this report then presents a formal approach and algorithm to verify graph constraints as k-inductive invariants. We prove the approach's correctness and demonstrate its applicability by means of several examples evaluated with a prototypical implementation of our algorithm.

Probabilistic timed graph transformation systems (2017)

Maximova, Maria ; Giese, Holger ; Krause, Christian

Today, software has become an intrinsic part of complex distributed embedded real-time systems. The next generation of embedded real-time systems will interconnect the today unconnected systems via complex software parts and the service-oriented paradigm. Therefore besides timed behavior and probabilistic behaviour also structure dynamics, where the architecture can be subject to changes at run-time, e.g. when dynamic binding of service end-points is employed or complex collaborations are established dynamically, is required. However, a modeling and analysis approach that combines all these necessary aspects does not exist so far. To fill the identified gap, we propose Probabilistic Timed Graph Transformation Systems (PTGTSs) as a high-level description language that supports all the necessary aspects of structure dynamics, timed behavior, and probabilistic behavior. We introduce the formal model of PTGTSs in this paper and present a mapping of models with finite state spaces to probabilistic timed automata (PTA) that allows to use the PRISM model checker to analyze PTGTS models with respect to PTCTL properties.

Symbolic model generation for graph properties (2017)

Schneider, Sven ; Lambers, Leen ; Orejas, Fernando

Graphs are ubiquitous in Computer Science. For this reason, in many areas, it is very important to have the means to express and reason about graph properties. In particular, we want to be able to check automatically if a given graph property is satisfiable. Actually, in most application scenarios it is desirable to be able to explore graphs satisfying the graph property if they exist or even to get a complete and compact overview of the graphs satisfying the graph property. We show that the tableau-based reasoning method for graph properties as introduced by Lambers and Orejas paves the way for a symbolic model generation algorithm for graph properties. Graph properties are formulated in a dedicated logic making use of graphs and graph morphisms, which is equivalent to firstorder logic on graphs as introduced by Courcelle. Our parallelizable algorithm gradually generates a finite set of so-called symbolic models, where each symbolic model describes a set of finite graphs (i.e., finite models) satisfying the graph property. The set of symbolic models jointly describes all finite models for the graph property (complete) and does not describe any finite graph violating the graph property (sound). Moreover, no symbolic model is already covered by another one (compact). Finally, the algorithm is able to generate from each symbolic model a minimal finite model immediately and allows for an exploration of further finite models. The algorithm is implemented in the new tool AutoGraph.

Proceedings of the Fourth HPI Cloud Symposium "Operating the Cloud" 2016 (2017)

Klauck, Stefan ; Maschler, Fabian ; Tausche, Karsten

Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic "Operating the Cloud". Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI's Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. "Operating the Cloud" aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration. On the occasion of this symposium we called for submissions of research papers and practitioner's reports. A compilation of the research papers realized during the fourth HPI cloud symposium "Operating the Cloud" 2016 are published in this proceedings. We thank the authors for exciting presentations and insights into their current work and research. Moreover, we look forward to more interesting submissions for the upcoming symposium later in the year. Every year, the Hasso Plattner Institute (HPI) invites guests from industry and academia to a collaborative scientific workshop on the topic "Operating the Cloud". Our goal is to provide a forum for the exchange of knowledge and experience between industry and academia. Co-located with the event is the HPI's Future SOC Lab day, which offers an additional attractive and conducive environment for scientific and industry related discussions. "Operating the Cloud" aims to be a platform for productive interactions of innovative ideas, visions, and upcoming technologies in the field of cloud operation and administration.

Automatic verification of behavior preservation at the transformation level for relational model transformation (2017)

Dyck, Johannes ; Giese, Holger ; Lambers, Leen

The correctness of model transformations is a crucial element for model-driven engineering of high quality software. In particular, behavior preservation is the most important correctness property avoiding the introduction of semantic errors during the model-driven engineering process. Behavior preservation verification techniques either show that specific properties are preserved, or more generally and complex, they show some kind of behavioral equivalence or refinement between source and target model of the transformation. Both kinds of behavior preservation verification goals have been presented with automatic tool support for the instance level, i.e. for a given source and target model specified by the model transformation. However, up until now there is no automatic verification approach available at the transformation level, i.e. for all source and target models specified by the model transformation. In this report, we extend our results presented in [27] and outline a new sophisticated approach for the automatic verification of behavior preservation captured by bisimulation resp. simulation for model transformations specified by triple graph grammars and semantic definitions given by graph transformation rules. In particular, we show that the behavior preservation problem can be reduced to invariant checking for graph transformation and that the resulting checking problem can be addressed by our own invariant checker even for a complex example where a sequence chart is transformed into communicating automata. We further discuss today's limitations of invariant checking for graph transformation and motivate further lines of future work in this direction.

Improving hosted continuous integration services (2017)

Weyand, Christopher ; Chromik, Jonas ; Wolf, Lennard ; Kötte, Steffen ; Haase, Konstantin ; Felgentreff, Tim ; Lincke, Jens ; Hirschfeld, Robert

Developing large software projects is a complicated task and can be demanding for developers. Continuous integration is common practice for reducing complexity. By integrating and testing changes often, changesets are kept small and therefore easily comprehensible. Travis CI is a service that offers continuous integration and continuous deployment in the cloud. Software projects are build, tested, and deployed using the Travis CI infrastructure without interrupting the development process. This report describes how Travis CI works, presents how time-driven, periodic building is implemented as well as how CI data visualization can be done, and proposes a way of dealing with dependency problems.

Towards a system for complex analysis of security events in large-scale networks (2017)

Sapegin, Andrey ; Jaeger, David ; Cheng, Feng ; Meinel, Christoph

After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems.

Matching cognitively sympathetic individual styles to develop collective intelligence in digital communities (2017)

Chujfi-La-Roche, Salim ; Meinel, Christoph

Creation, collection and retention of knowledge in digital communities is an activity that currently requires being explicitly targeted as a secure method of keeping intellectual capital growing in the digital era. In particular, we consider it relevant to analyze and evaluate the empathetic cognitive personalities and behaviors that individuals now have with the change from face-to-face communication (F2F) to computer-mediated communication (CMC) online. This document proposes a cyber-humanistic approach to enhance the traditional SECI knowledge management model. A cognitive perception is added to its cyclical process following design thinking interaction, exemplary for improvement of the method in which knowledge is continuously created, converted and shared. In building a cognitive-centered model, we specifically focus on the effective identification and response to cognitive stimulation of individuals, as they are the intellectual generators and multiplicators of knowledge in the online environment. Our target is to identify how geographically distributed-digital-organizations should align the individual's cognitive abilities to promote iteration and improve interaction as a reliable stimulant of collective intelligence. The new model focuses on analyzing the four different stages of knowledge processing, where individuals with sympathetic cognitive personalities can significantly boost knowledge creation in a virtual social system. For organizations, this means that multidisciplinary individuals can maximize their extensive potential, by externalizing their knowledge in the correct stage of the knowledge creation process, and by collaborating with their appropriate sympathetically cognitive remote peers.

Embedded smart home (2017)

Malchow, Martin ; Renz, Jan ; Bauer, Matthias ; Meinel, Christoph

The popularity of MOOCs has increased considerably in the last years. A typical MOOC course consists of video content, self tests after a video and homework, which is normally in multiple choice format. After solving this homeworks for every week of a MOOC, the final exam certificate can be issued when the student has reached a sufficient score. There are also some attempts to include practical tasks, such as programming, in MOOCs for grading. Nevertheless, until now there is no known possibility to teach embedded system programming in a MOOC course where the programming can be done in a remote lab and where grading of the tasks is additionally possible. This embedded programming includes communication over GPIO pins to control LEDs and measure sensor values. We started a MOOC course called "Embedded Smart Home" as a pilot to prove the concept to teach real hardware programming in a MOOC environment under real life MOOC conditions with over 6000 students. Furthermore, also students with real hardware have the possibility to program on their own real hardware and grade their results in the MOOC course. Finally, we evaluate our approach and analyze the student acceptance of this approach to offer a course on embedded programming. We also analyze the hardware usage and working time of students solving tasks to find out if real hardware programming is an advantage and motivating achievement to support students learning success.

A variant of genetic algorithm for non-homogeneous population (2017)

Alibabaie, Najmeh ; Ghasemzadeh, Mohammad ; Meinel, Christoph

Selection of initial points, the number of clusters and finding proper clusters centers are still the main challenge in clustering processes. In this paper, we suggest genetic algorithm based method which searches several solution spaces simultaneously. The solution spaces are population groups consisting of elements with similar structure. Elements in a group have the same size, while elements in different groups are of different sizes. The proposed algorithm processes the population in groups of chromosomes with one gene, two genes to k genes. These genes hold corresponding information about the cluster centers. In the proposed method, the crossover and mutation operators can accept parents with different sizes; this can lead to versatility in population and information transfer among sub-populations. We implemented the proposed method and evaluated its performance against some random datasets and the Ruspini dataset as well. The experimental results show that the proposed method could effectively determine the appropriate number of clusters and recognize their centers. Overall this research implies that using heterogeneous population in the genetic algorithm can lead to better results.

PVD: Passive Vulnerability Detection (2017)

Gawron, Marian ; Cheng, Feng ; Meinel, Christoph

The identification of vulnerabilities relies on detailed information about the target infrastructure. The gathering of the necessary information is a crucial step that requires an intensive scanning or mature expertise and knowledge about the system even though the information was already available in a different context. In this paper we propose a new method to detect vulnerabilities that reuses the existing information and eliminates the necessity of a comprehensive scan of the target system. Since our approach is able to identify vulnerabilities without the additional effort of a scan, we are able to increase the overall performance of the detection. Because of the reuse and the removal of the active testing procedures, our approach could be classified as a passive vulnerability detection. We will explain the approach and illustrate the additional possibility to increase the security awareness of users. Therefore, we applied the approach on an experimental setup and extracted security relevant information from web logs.

Leveraging cloud native design patterns for security-as-a-service applications (2017)

Torkura, Kennedy A. ; Sukmana, Muhammad Ihsan Haikal ; Cheng, Feng ; Meinel, Christoph

This paper discusses a new approach for designing and deploying Security-as-a-Service (SecaaS) applications using cloud native design patterns. Current SecaaS approaches do not efficiently handle the increasing threats to computer systems and applications. For example, requests for security assessments drastically increase after a high-risk security vulnerability is disclosed. In such scenarios, SecaaS applications are unable to dynamically scale to serve requests. A root cause of this challenge is employment of architectures not specifically fitted to cloud environments. Cloud native design patterns resolve this challenge by enabling certain properties e.g. massive scalability and resiliency via the combination of microservice patterns and cloud-focused design patterns. However adopting these patterns is a complex process, during which several security issues are introduced. In this work, we investigate these security issues, we redesign and deploy a monolithic SecaaS application using cloud native design patterns while considering appropriate, layered security counter-measures i.e. at the application and cloud networking layer. Our prototype implementation out-performs traditional, monolithic applications with an average Scanner Time of 6 minutes, without compromising security. Our approach can be employed for designing secure, scalable and performant SecaaS applications that effectively handle unexpected increase in security assessment requests.

Securing e-prescription from medical identity theft using steganography and antiphishing techniques (2017)

Omotosho, Adebayo ; Emuoyibofarhe, Justice ; Meinel, Christoph

Drug prescription is among the health care process that usually makes references to the patients’ medical and insurance information among other personal data, because this information is very vital and delicate, it should be adequately protected from identity thieves. This article aims at securing Electronic Prescription (EP) in order to minimize patient’s data theft and foster patients’ trust of EP system. This paper presents a steganography and antiphishing technique for preventing medical identity theft in EP. The proposed EP system design focused on the security features in the prescriber and dispensers’ modules of EP by ensuring the prescriber sends the prescription of the patient in a safe manner and to the right dispenser without the interference of fake third parties. Hexadecimal steganography image system is used to cover and secure the sent prescription details. Malicious electronic dispensing system is prevented through an authentication technique where a dispenser uses a captcha together with a one-time password, and the web server encrypted token for prescriber’s device authentication. The steganography system is evaluated using Peak Signal to Noise Ratio (PSNR). The system implementation results showed that steganography and antiphishing techniques are capable of providing a secure EP systems.

The Gamification of a MOOC Platform (2017)

Staubitz, Thomas ; Wilkins, Christian ; Hagedorn, Christiane ; Meinel, Christoph

Massive Open Online Courses (MOOCs) have left their mark on the face of education during the recent years. At the Hasso Plattner Institute (HPI) in Potsdam, Germany, we are actively developing a MOOC platform, which provides our research with a plethora of e-learning topics, such as learning analytics, automated assessment, peer assessment, team-work, online proctoring, and gamification. We run several instances of this platform. On openHPI, we provide our own courses from within the HPI context. Further instances are openSAP, openWHO, and mooc.HOUSE, which is the smallest of these platforms, targeting customers with a less extensive course portfolio. In 2013, we started to work on the gamification of our platform. By now, we have implemented about two thirds of the features that we initially have evaluated as useful for our purposes. About a year ago we activated the implemented gamification features on mooc.HOUSE. Before activating the features on openHPI as well, we examined, and re-evaluated our initial considerations based on the data we collected so far and the changes in other contexts of our platforms.

Weight-based strategy for an I/O-intensive application at a cloud data center (2018)

Peng, Junjie ; Liu, Danxu ; Wang, Yingtao ; Zeng, Ying ; Cheng, Feng ; Zhang, Wenqiang

Applications with different characteristics in the cloud may have different resources preferences. However, traditional resource allocation and scheduling strategies rarely take into account the characteristics of applications. Considering that an I/O-intensive application is a typical type of application and that frequent I/O accesses, especially small files randomly accessing the disk, may lead to an inefficient use of resources and reduce the quality of service (QoS) of applications, a weight allocation strategy is proposed based on the available resources that a physical server can provide as well as the characteristics of the applications. Using the weight obtained, a resource allocation and scheduling strategy is presented based on the specific application characteristics in the data center. Extensive experiments show that the strategy is correct and can guarantee a high concurrency of I/O per second (IOPS) in a cloud data center with high QoS. Additionally, the strategy can efficiently improve the utilization of the disk and resources of the data center without affecting the service quality of applications.

Telemedical Interventional Management in Heart Failure II (TIM-HF2), a randomised, controlled trial investigating the impact of telemedicine on unplanned cardiovascular hospitalisations and mortality in heart failure patients (2018)

Koehler, Friedrich ; Koehler, Kerstin ; Deckwart, Oliver ; Prescher, Sandra ; Wegscheider, Karl ; Winkler, Sebastian ; Vettorazzi, Eik ; Polze, Andreas ; Stangl, Karl ; Hartmann, Oliver ; Marx, Almuth ; Neuhaus, Petra ; Scherf, Michael ; Kirwan, Bridget-Anne ; Anker, Stefan D.

Background Heart failure (HF) is a complex, chronic condition that is associated with debilitating symptoms, all of which necessitate close follow-up by health care providers. Lack of disease monitoring may result in increased mortality and more frequent hospital readmissions for decompensated HF. Remote patient management (RPM) in this patient population may help to detect early signs and symptoms of cardiac decompensation, thus enabling a prompt initiation of the appropriate treatment and care before a manifestation of HF decompensation. Objective The objective of the present article is to describe the design of a new trial investigating the impact of RPM on unplanned cardiovascular hospitalisations and mortality in HF patients. Methods The TIM-HF2 trial is designed as a prospective, randomised, controlled, parallel group, open (with randomisation concealment), multicentre trial with pragmatic elements introduced for data collection. Eligible patients with HF are randomised (1:1) to either RPM + usual care or to usual care only and are followed for 12 months. The primary outcome is the percentage of days lost due to unplanned cardiovascular hospitalisations or all-cause death. The main secondary outcomes are all-cause and cardiovascular mortality. Conclusion The TIM-HF2 trial will provide important prospective data on the potential beneficial effect of telemedical monitoring and RPM on unplanned cardiovascular hospitalisations and mortality in HF patients.

Blockchains for Business Process Management (2018)

Blockchain technology offers a sizable promise to rethink the way interorganizational business processes are managed because of its potential to realize execution without a central party serving as a single point of trust (and failure). To stimulate research on this promise and the limits thereof, in this article, we outline the challenges and opportunities of blockchain for business process management (BPM). We first reflect how blockchains could be used in the context of the established BPM lifecycle and second how they might become relevant beyond. We conclude our discourse with a summary of seven research directions for investigating the application of blockchain technology in the context of BPM.

A service-oriented approach for classifying 3D points clouds by example of office furniture classification (2018)

Stojanovic, Vladeta ; Trapp, Matthias ; Richter, Rico ; Döllner, Jürgen Roland Friedrich

The rapid digitalization of the Facility Management (FM) sector has increased the demand for mobile, interactive analytics approaches concerning the operational state of a building. These approaches provide the key to increasing stakeholder engagement associated with Operation and Maintenance (O&M) procedures of living and working areas, buildings, and other built environment spaces. We present a generic and fast approach to process and analyze given 3D point clouds of typical indoor office spaces to create corresponding up-to-date approximations of classified segments and object-based 3D models that can be used to analyze, record and highlight changes of spatial configurations. The approach is based on machine-learning methods used to classify the scanned 3D point cloud data using 2D images. This approach can be used to primarily track changes of objects over time for comparison, allowing for routine classification, and presentation of results used for decision making. We specifically focus on classification, segmentation, and reconstruction of multiple different object types in a 3D point-cloud scene. We present our current research and describe the implementation of these technologies as a web-based application using a services-oriented methodology.

Temporal answer set programming on finite traces (2018)

Cabalar, Pedro ; Kaminski, Roland ; Schaub, Torsten H. ; Schuhmann, Anna

In this paper, we introduce an alternative approach to Temporal Answer Set Programming that relies on a variation of Temporal Equilibrium Logic (TEL) for finite traces. This approach allows us to even out the expressiveness of TEL over infinite traces with the computational capacity of (incremental) Answer Set Programming (ASP). Also, we argue that finite traces are more natural when reasoning about action and change. As a result, our approach is readily implementable via multi-shot ASP systems and benefits from an extension of ASP's full-fledged input language with temporal operators. This includes future as well as past operators whose combination offers a rich temporal modeling language. For computation, we identify the class of temporal logic programs and prove that it constitutes a normal form for our approach. Finally, we outline two implementations, a generic one and an extension of the ASP system clingo. Under consideration for publication in Theory and Practice of Logic Programming (TPLP)

Experimenting with robotic intra-logistics domains (2018)

Gebser, Martin ; Obermeier, Philipp ; Otto, Thomas ; Schaub, Torsten H. ; Sabuncu, Orkunt ; Van Nguyen ; Tran Cao Son

We introduce the asprilo1 framework to facilitate experimental studies of approaches addressing complex dynamic applications. For this purpose, we have chosen the domain of robotic intra-logistics. This domain is not only highly relevant in the context of today's fourth industrial revolution but it moreover combines a multitude of challenging issues within a single uniform framework. This includes multi-agent planning, reasoning about action, change, resources, strategies, etc. In return, asprilo allows users to study alternative solutions as regards effectiveness and scalability. Although asprilo relies on Answer Set Programming and Python, it is readily usable by any system complying with its fact-oriented interface format. This makes it attractive for benchmarking and teaching well beyond logic programming. More precisely, asprilo consists of a versatile benchmark generator, solution checker and visualizer as well as a bunch of reference encodings featuring various ASP techniques. Importantly, the visualizer's animation capabilities are indispensable for complex scenarios like intra-logistics in order to inspect valid as well as invalid solution candidates. Also, it allows for graphically editing benchmark layouts that can be used as a basis for generating benchmark suites.

Experience: Enhancing address matching with geocoding and similarity measure selection (2018)

Koumarelas, Ioannis ; Kroschk, Axel ; Mosley, Clifford ; Naumann, Felix

Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case.

Automated reasoning for attributed graph properties (2018)

Schneider, Sven ; Lambers, Leen ; Orejas, Fernando

Graphs are ubiquitous in computer science. Moreover, in various application fields, graphs are equipped with attributes to express additional information such as names of entities or weights of relationships. Due to the pervasiveness of attributed graphs, it is highly important to have the means to express properties on attributed graphs to strengthen modeling capabilities and to enable analysis. Firstly, we introduce a new logic of attributed graph properties, where the graph part and attribution part are neatly separated. The graph part is equivalent to first-order logic on graphs as introduced by Courcelle. It employs graph morphisms to allow the specification of complex graph patterns. The attribution part is added to this graph part by reverting to the symbolic approach to graph attribution, where attributes are represented symbolically by variables whose possible values are specified by a set of constraints making use of algebraic specifications. Secondly, we extend our refutationally complete tableau-based reasoning method as well as our symbolic model generation approach for graph properties to attributed graph properties. Due to the new logic mentioned above, neatly separating the graph and attribution parts, and the categorical constructions employed only on a more abstract level, we can leave the graph part of the algorithms seemingly unchanged. For the integration of the attribution part into the algorithms, we use an oracle, allowing for flexible adoption of different available SMT solvers in the actual implementation. Finally, our automated reasoning approach for attributed graph properties is implemented in the tool AutoGraph integrating in particular the SMT solver Z3 for the attribute part of the properties. We motivate and illustrate our work with a particular application scenario on graph database query validation.

Unbounded Discrepancy of Deterministic Random Walks on Grids (2018)

Friedrich, Tobias ; Katzmann, Maximilian ; Krohmer, Anton

Random walks are frequently used in randomized algorithms. We study a derandomized variant of a random walk on graphs called the rotor-router model. In this model, instead of distributing tokens randomly, each vertex serves its neighbors in a fixed deterministic order. For most setups, both processes behave in a remarkably similar way: Starting with the same initial configuration, the number of tokens in the rotor-router model deviates only slightly from the expected number of tokens on the corresponding vertex in the random walk model. The maximal difference over all vertices and all times is called single vertex discrepancy. Cooper and Spencer [Combin. Probab. Comput., 15 (2006), pp. 815-822] showed that on Z(d), the single vertex discrepancy is only a constant c(d). Other authors also determined the precise value of c(d) for d = 1, 2. All of these results, however, assume that initially all tokens are only placed on one partition of the bipartite graph Z(d). We show that this assumption is crucial by proving that, otherwise, the single vertex discrepancy can become arbitrarily large. For all dimensions d >= 1 and arbitrary discrepancies l >= 0, we construct configurations that reach a discrepancy of at least l.

DualPanto (2018)

Schneider, Oliver ; Shigeyama, Jotaro ; Kovacs, Robert ; Roumen, Thijs Jan ; Marwecki, Sebastian ; Böckhoff, Nico ; Glöckner, Daniel Amadeus Johannes ; Bounama, Jonas ; Baudisch, Patrick

We present a new haptic device that enables blind users to continuously track the absolute position of moving objects in spatial virtual environments, as is the case in sports or shooter games. Users interact with DualPanto by operating the me handle with one hand and by holding on to the it handle with the other hand. Each handle is connected to a pantograph haptic input/output device. The key feature is that the two handles are spatially registered with respect to each other. When guiding their avatar through a virtual world using the me handle, spatial registration enables users to track moving objects by having the device guide the output hand. This allows blind players of a 1-on-1 soccer game to race for the ball or evade an opponent; it allows blind players of a shooter game to aim at an opponent and dodge shots. In our user study, blind participants reported very high enjoyment when using the device to play (6.5/7).

Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning (2018)

Wang, Cheng ; Yang, Haojin ; Meinel, Christoph

Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.

Bridging the Gap (2019)

Herzog, Benedict ; Hönig, Timo ; Schröder-Preikschat, Wolfgang ; Plauth, Max ; Köhler, Sven ; Polze, Andreas

The recent restructuring of the electricity grid (i.e., smart grid) introduces a number of challenges for today's large-scale computing systems. To operate reliable and efficient, computing systems must adhere not only to technical limits (i.e., thermal constraints) but they must also reduce operating costs, for example, by increasing their energy efficiency. Efforts to improve the energy efficiency, however, are often hampered by inflexible software components that hardly adapt to underlying hardware characteristics. In this paper, we propose an approach to bridge the gap between inflexible software and heterogeneous hardware architectures. Our proposal introduces adaptive software components that dynamically adapt to heterogeneous processing units (i.e., accelerators) during runtime to improve the energy efficiency of computing systems.

Mise-Unseen (2019)

Marwecki, Sebastian ; Wilson, Andrew D. ; Ofek, Eyal ; Franco, Mar Gonzalez ; Holz, Christian

Creating or arranging objects at runtime is needed in many virtual reality applications, but such changes are noticed when they occur inside the user's field of view. We present Mise-Unseen, a software system that applies such scene changes covertly inside the user's field of view. Mise-Unseen leverages gaze tracking to create models of user attention, intention, and spatial memory to determine if and when to inject a change. We present seven applications of Mise-Unseen to unnoticeably modify the scene within view (i) to hide that task difficulty is adapted to the user, (ii) to adapt the experience to the user's preferences, (iii) to time the use of low fidelity effects, (iv) to detect user choice for passive haptics even when lacking physical props, (v) to sustain physical locomotion despite a lack of physical space, (vi) to reduce motion sickness during virtual locomotion, and (vii) to verify user understanding during story progression. We evaluated Mise-Unseen and our applications in a user study with 15 participants and find that while gaze data indeed supports obfuscating changes inside the field of view, a change is rendered unnoticeably by using gaze in combination with common masking techniques.

Editorial (2019)

Björk, Jennie ; Hölze, Katharina

Discovering commute patterns via process mining (2019)

Yousfi, Alaaeddine ; Weske, Mathias

Ubiquitous computing has proven its relevance and efficiency in improving the user experience across a myriad of situations. It is now the ineluctable solution to keep pace with the ever-changing environments in which current systems operate. Despite the achievements of ubiquitous computing, this discipline is still overlooked in business process management. This is surprising, since many of today’s challenges, in this domain, can be addressed by methods and techniques from ubiquitous computing, for instance user context and dynamic aspects of resource locations. This paper takes a first step to integrate methods and techniques from ubiquitous computing in business process management. To do so, we propose discovering commute patterns via process mining. Through our proposition, we can deduce the users’ significant locations, routes, travel times and travel modes. This information can be a stepping-stone toward helping the business process management community embrace the latest achievements in ubiquitous computing, mainly in location-based service. To corroborate our claims, a user study was conducted. The significant places, routes, travel modes and commuting times of our test subjects were inferred with high accuracies. All in all, ubiquitous computing can enrich the processes with new capabilities that go beyond what has been established in business process management so far.

Interactive Close-Up Rendering for Detail plus Overview Visualization of 3D Digital Terrain Models (2019)

Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

This paper presents an interactive rendering technique for detail+overview visualization of 3D digital terrain models using interactive close-ups. A close-up is an alternative presentation of input data varying with respect to geometrical scale, mapping, appearance, as well as Level-of-Detail (LOD) and Level-of-Abstraction (LOA) used. The presented 3D close-up approach enables in-situ comparison of multiple Regionof-Interests (ROIs) simultaneously. We describe a GPU-based rendering technique for the image-synthesis of multiple close-ups in real-time.

Semantic-driven Visualization Techniques for Interactive Exploration of 3D Indoor Models (2019)

Florio, Alessandro ; Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

The availability of detailed virtual 3D building models including representations of indoor elements, allows for a wide number of applications requiring effective exploration and navigation functionality. Depending on the application context, users should be enabled to focus on specific Objects-of-Interests (OOIs) or important building elements. This requires approaches to filtering building parts as well as techniques to visualize important building objects and their relations. For it, this paper explores the application and combination of interactive rendering techniques as well as their semanticallydriven configuration in the context of 3D indoor models.

Real-time Screen-space Geometry Draping for 3D Digital Terrain Models (2019)

Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

A fundamental task in 3D geovisualization and GIS applications is the visualization of vector data that can represent features such as transportation networks or land use coverage. Mapping or draping vector data represented by geometric primitives (e.g., polylines or polygons) to 3D digital elevation or 3D digital terrain models is a challenging task. We present an interactive GPU-based approach that performs geometry-based draping of vector data on per-frame basis using an image-based representation of a 3D digital elevation or terrain model only.

Kyub (2019)

Baudisch, Patrick Markus ; Silber, Arthur ; Kommana, Yannis ; Gruner, Milan ; Wall, Ludwig ; Reuss, Kevin ; Heilman, Lukas ; Kovacs, Robert ; Rechlitz, Daniel ; Roumen, Thijs

We present an interactive editing system for laser cutting called kyub. Kyub allows users to create models efficiently in 3D, which it then unfolds into the 2D plates laser cutters expect. Unlike earlier systems, such as FlatFitFab, kyub affords construction based on closed box structures, which allows users to turn very thin material, such as 4mm plywood, into objects capable of withstanding large forces, such as chairs users can actually sit on. To afford such sturdy construction, every kyub project begins with a simple finger-joint "boxel"-a structure we found to be capable of withstanding over 500kg of load. Users then extend their model by attaching additional boxels. Boxels merge automatically, resulting in larger, yet equally strong structures. While the concept of stacking boxels allows kyub to offer the strong affordance and ease of use of a voxel-based editor, boxels are not confined to a grid and readily combine with kuyb's various geometry deformation tools. In our technical evaluation, objects built with kyub withstood hundreds of kilograms of loads. In our user study, non-engineers rated the learnability of kyub 6.1/7.

Understanding Metamaterial Mechanisms (2019)

Ion, Alexandra ; Lindlbauer, David ; Herholz, Philipp ; Alexa, Marc ; Baudisch, Patrick Markus

In this paper, we establish the underlying foundations of mechanisms that are composed of cell structures-known as metamaterial mechanisms. Such metamaterial mechanisms were previously shown to implement complete mechanisms in the cell structure of a 3D printed material, without the need for assembly. However, their design is highly challenging. A mechanism consists of many cells that are interconnected and impose constraints on each other. This leads to unobvious and non-linear behavior of the mechanism, which impedes user design. In this work, we investigate the underlying topological constraints of such cell structures and their influence on the resulting mechanism. Based on these findings, we contribute a computational design tool that automatically creates a metamaterial mechanism from user-defined motion paths. This tool is only feasible because our novel abstract representation of the global constraints highly reduces the search space of possible cell arrangements.

Software Engineering for Smart Cyber-Physical Systems (2019)

Giese, Holger Burkhard

Currently, a transformation of our technical world into a networked technical world where besides the embedded systems with their interaction with the physical world the interconnection of these nodes in the cyber world becomes a reality can be observed. In parallel nowadays there is a strong trend to employ artificial intelligence techniques and in particular machine learning to make software behave smart. Often cyber-physical systems must be self-adaptive at the level of the individual systems to operate as elements in open, dynamic, and deviating overall structures and to adapt to open and dynamic contexts while being developed, operated, evolved, and governed independently. In this presentation, we will first discuss the envisioned future scenarios for cyber-physical systems with an emphasis on the synergies networking can offer and then characterize which challenges for the design, production, and operation of these systems result. We will then discuss to what extent our current capabilities, in particular concerning software engineering match these challenges and where substantial improvements for the software engineering are crucial. In today's software engineering for embedded systems models are used to plan systems upfront to maximize envisioned properties on the one hand and minimize cost on the other hand. When applying the same ideas to software for smart cyber-physical systems, it soon turned out that for these systems often somehow more subtle links between the involved models and the requirements, users, and environment exist. Self-adaptation and runtime models have been advocated as concepts to covers the demands that result from these subtler links. Lately, both trends have been brought together more thoroughly by the notion of self-aware computing systems. We will review the underlying causes, discuss some our work in this direction, and outline related open challenges and potential for future approaches to software engineering for smart cyber-physical systems.

On the tree conjecture for the network creation game (2019)

Bilò, Davide ; Lenzner, Pascal

Selfish Network Creation focuses on modeling real world networks from a game-theoretic point of view. One of the classic models by Fabrikant et al. (2003) is the network creation game, where agents correspond to nodes in a network which buy incident edges for the price of alpha per edge to minimize their total distance to all other nodes. The model is well-studied but still has intriguing open problems. The most famous conjectures state that the price of anarchy is constant for all alpha and that for alpha >= n all equilibrium networks are trees. We introduce a novel technique for analyzing stable networks for high edge-price alpha and employ it to improve on the best known bound for the latter conjecture. In particular we show that for alpha > 4n - 13 all equilibrium networks must be trees, which implies a constant price of anarchy for this range of alpha. Moreover, we also improve the constant upper bound on the price of anarchy for equilibrium trees.

Transforming pairwise duplicates to entity clusters for high-quality duplicate detection (2019)

Draisbach, Uwe ; Christen, Peter ; Naumann, Felix

Duplicate detection algorithms produce clusters of database records, each cluster representing a single real-world entity. As most of these algorithms use pairwise comparisons, the resulting (transitive) clusters can be inconsistent: Not all records within a cluster are sufficiently similar to be classified as duplicate. Thus, one of many subsequent clustering algorithms can further improve the result. We explain in detail, compare, and evaluate many of these algorithms and introduce three new clustering algorithms in the specific context of duplicate detection. Two of our three new algorithms use the structure of the input graph to create consistent clusters. Our third algorithm, and many other clustering algorithms, focus on the edge weights, instead. For evaluation, in contrast to related work, we experiment on true real-world datasets, and in addition examine in great detail various pair-selection strategies used in practice. While no overall winner emerges, we are able to identify best approaches for different situations. In scenarios with larger clusters, our proposed algorithm, Extended Maximum Clique Clustering (EMCC), and Markov Clustering show the best results. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available.

Concepts and techniques for web-based visualization and processing of massive 3D point clouds with semantics (2019)

Discher, Sören ; Richter, Rico ; Döllner, Jürgen Roland Friedrich

3D point cloud technology facilitates the automated and highly detailed acquisition of real-world environments such as assets, sites, and countries. We present a web-based system for the interactive exploration and inspection of arbitrary large 3D point clouds. Our approach is able to render 3D point clouds with billions of points using spatial data structures and level-of-detail representations. Point-based rendering techniques and post-processing effects are provided to enable task-specific and data-specific filtering, e.g., based on semantics. A set of interaction techniques allows users to collaboratively work with the data (e.g., measuring distances and annotating). Additional value is provided by the system’s ability to display additional, context-providing geodata alongside 3D point clouds and to integrate processing and analysis operations. We have evaluated the presented techniques and in case studies and with different data sets from aerial, mobile, and terrestrial acquisition with up to 120 billion points to show their practicality and feasibility.

SpringFit (2019)

Roumen, Thijs ; Shigeyama, Jotaro ; Rudolph, Julius Cosmo Romeo ; Grzelka, Felix ; Baudisch, Patrick

Joints are crucial to laser cutting as they allow making three-dimensional objects; mounts are crucial because they allow embedding technical components, such as motors. Unfortunately, mounts and joints tend to fail when trying to fabricate a model on a different laser cutter or from a different material. The reason for this lies in the way mounts and joints hold objects in place, which is by forcing them into slightly smaller openings. Such "press fit" mechanisms unfortunately are susceptible to the small changes in diameter that occur when switching to a machine that removes more or less material ("kerf"), as well as to changes in stiffness, as they occur when switching to a different material. We present a software tool called springFit that resolves this problem by replacing the problematic press fit-based mounts and joints with what we call cantilever-based mounts and joints. A cantilever spring is simply a long thin piece of material that pushes against the object to be held. Unlike press fits, cantilever springs are robust against variations in kerf and material; they can even handle very high variations, simply by using longer springs. SpringFit converts models in the form of 2D cutting plans by replacing all contained mounts, notch joints, finger joints, and t-joints. In our technical evaluation, we used springFit to convert 14 models downloaded from the web.

Using Hidden Markov Models for the accurate linguistic analysis of process model activity labels (2019)

Leopold, Henrik ; van der Aa, Han ; Offenberg, Jelmer ; Reijers, Hajo A.

Many process model analysis techniques rely on the accurate analysis of the natural language contents captured in the models’ activity labels. Since these labels are typically short and diverse in terms of their grammatical style, standard natural language processing tools are not suitable to analyze them. While a dedicated technique for the analysis of process model activity labels was proposed in the past, it suffers from considerable limitations. First of all, its performance varies greatly among data sets with different characteristics and it cannot handle uncommon grammatical styles. What is more, adapting the technique requires in-depth domain knowledge. We use this paper to propose a machine learning-based technique for activity label analysis that overcomes the issues associated with this rule-based state of the art. Our technique conceptualizes activity label analysis as a tagging task based on a Hidden Markov Model. By doing so, the analysis of activity labels no longer requires the manual specification of rules. An evaluation using a collection of 15,000 activity labels demonstrates that our machine learning-based technique outperforms the state of the art in all aspects.

Performance evaluation for self-healing systems (2019)

Ghahremani, Sona ; Giese, Holger

Evaluating the performance of self-adaptive systems (SAS) is challenging due to their complexity and interaction with the often highly dynamic environment. In the context of self-healing systems (SHS), employing simulators has been shown to be the most dominant means for performance evaluation. Simulating a SHS also requires realistic fault injection scenarios. We study the state of the practice for evaluating the performance of SHS by means of a systematic literature review. We present the current practice and point out that a more thorough and careful treatment in evaluating the performance of SHS is required.

Adding Value by Combining Business and Sensor Data (2019)

Hesse, Günter ; Matthies, Christoph ; Sinzig, Werner ; Uflacker, Matthias

Industry 4.0 and the Internet of Things are recent developments that have lead to the creation of new kinds of manufacturing data. Linking this new kind of sensor data to traditional business information is crucial for enterprises to take advantage of the data’s full potential. In this paper, we present a demo which allows experiencing this data integration, both vertically between technical and business contexts and horizontally along the value chain. The tool simulates a manufacturing company, continuously producing both business and sensor data, and supports issuing ad-hoc queries that answer specific questions related to the business. In order to adapt to different environments, users can configure sensor characteristics to their needs.

Virtual machine integrity verification in Crowd-Resourcing Virtual Laboratory (2019)

Sianipar, Johannes Harungguan ; Willems, Christian ; Meinel, Christoph

In cloud computing, users are able to use their own operating system (OS) image to run a virtual machine (VM) on a remote host. The virtual machine OS is started by the user using some interfaces provided by a cloud provider in public or private cloud. In peer to peer cloud, the VM is started by the host admin. After the VM is running, the user could get a remote access to the VM to install, configure, and run services. For the security reasons, the user needs to verify the integrity of the running VM, because a malicious host admin could modify the image or even replace the image with a similar image, to be able to get sensitive data from the VM. We propose an approach to verify the integrity of a running VM on a remote host, without using any specific hardware such as Trusted Platform Module (TPM). Our approach is implemented on a Linux platform where the kernel files (vmlinuz and initrd) could be replaced with new files, while the VM is running. kexec is used to reboot the VM with the new kernel files. The new kernel has secret codes that will be used to verify whether the VM was started using the new kernel files. The new kernel is used to further measuring the integrity of the running VM.

Attribute Compartmentation and Greedy UCC Discovery for High-Dimensional Data Anonymisation (2019)

Podlesny, Nikolai Jannik ; Kayem, Anne V. D. M. ; Meinel, Christoph

High-dimensional data is particularly useful for data analytics research. In the healthcare domain, for instance, high-dimensional data analytics has been used successfully for drug discovery. Yet, in order to adhere to privacy legislation, data analytics service providers must guarantee anonymity for data owners. In the context of high-dimensional data, ensuring privacy is challenging because increased data dimensionality must be matched by an exponential growth in the size of the data to avoid sparse datasets. Syntactically, anonymising sparse datasets with methods that rely of statistical significance, makes obtaining sound and reliable results, a challenge. As such, strong privacy is only achievable at the cost of high information loss, rendering the data unusable for data analytics. In this paper, we make two contributions to addressing this problem from both the privacy and information loss perspectives. First, we show that by identifying dependencies between attribute subsets we can eliminate privacy violating attributes from the anonymised dataset. Second, to minimise information loss, we employ a greedy search algorithm to determine and eliminate maximal partial unique attribute combinations. Thus, one only needs to find the minimal set of identifying attributes to prevent re-identification. Experiments on a health cloud based on the SAP HANA platform using a semi-synthetic medical history dataset comprised of 109 attributes, demonstrate the effectiveness of our approach.

Deep En-Route Filtering of Constrained Application Protocol (CoAP) Messages on 6LoWPAN Border Routers (2019)

Seidel, Felix ; Krentz, Konrad-Felix ; Meinel, Christoph

Devices on the Internet of Things (IoT) are usually battery-powered and have limited resources. Hence, energy-efficient and lightweight protocols were designed for IoT devices, such as the popular Constrained Application Protocol (CoAP). Yet, CoAP itself does not include any defenses against denial-of-sleep attacks, which are attacks that aim at depriving victim devices of entering low-power sleep modes. For example, a denial-of-sleep attack against an IoT device that runs a CoAP server is to send plenty of CoAP messages to it, thereby forcing the IoT device to expend energy for receiving and processing these CoAP messages. All current security solutions for CoAP, namely Datagram Transport Layer Security (DTLS), IPsec, and OSCORE, fail to prevent such attacks. To fill this gap, Seitz et al. proposed a method for filtering out inauthentic and replayed CoAP messages "en-route" on 6LoWPAN border routers. In this paper, we expand on Seitz et al.'s proposal in two ways. First, we revise Seitz et al.'s software architecture so that 6LoWPAN border routers can not only check the authenticity and freshness of CoAP messages, but can also perform a wide range of further checks. Second, we propose a couple of such further checks, which, as compared to Seitz et al.'s original checks, more reliably protect IoT devices that run CoAP servers from remote denial-of-sleep attacks, as well as from remote exploits. We prototyped our solution and successfully tested its compatibility with Contiki-NG's CoAP implementation.

LoANs (2019)

Bartz, Christian ; Yang, Haojin ; Bethge, Joseph ; Meinel, Christoph

Recently, deep neural networks have achieved remarkable performance on the task of object detection and recognition. The reason for this success is mainly grounded in the availability of large scale, fully annotated datasets, but the creation of such a dataset is a complicated and costly task. In this paper, we propose a novel method for weakly supervised object detection that simplifies the process of gathering data for training an object detector. We train an ensemble of two models that work together in a student-teacher fashion. Our student (localizer) is a model that learns to localize an object, the teacher (assessor) assesses the quality of the localization and provides feedback to the student. The student uses this feedback to learn how to localize objects and is thus entirely supervised by the teacher, as we are using no labels for training the localizer. In our experiments, we show that our model is very robust to noise and reaches competitive performance compared to a state-of-the-art fully supervised approach. We also show the simplicity of creating a new dataset, based on a few videos (e.g. downloaded from YouTube) and artificially generated data.

Unified Cloud Access Control Model for Cloud Storage Broker (2019)

Sukmana, Muhammad Ihsan Haikal ; Torkura, Kennedy A. ; Graupner, Hendrik ; Cheng, Feng ; Meinel, Christoph

Cloud Storage Broker (CSB) provides value-added cloud storage service for enterprise usage by leveraging multi-cloud storage architecture. However, it raises several challenges for managing resources and its access control in multiple Cloud Service Providers (CSPs) for authorized CSB stakeholders. In this paper we propose unified cloud access control model that provides the abstraction of CSP's services for centralized and automated cloud resource and access control management in multiple CSPs. Our proposal offers role-based access control for CSB stakeholders to access cloud resources by assigning necessary privileges and access control list for cloud resources and CSB stakeholders, respectively, following privilege separation concept and least privilege principle. We implement our unified model in a CSB system called CloudRAID for Business (CfB) with the evaluation result shows it provides system-and-cloud level security service for cfB and centralized resource and access control management in multiple CSPs.

Full Lecture Recording Watching Behavior, or Why Students Watch 90-Min Lectures in 5 Min (2019)

Bauer, Matthias ; Malchow, Martin ; Meinel, Christoph

Many universities record the lectures being held in their facilities to preserve knowledge and to make it available to their students and, at least for some universities and classes, to the broad public. The way with the least effort is to record the whole lecture, which in our case usually is 90 min long. This saves the labor and time of cutting and rearranging lectures scenes to provide short learning videos as known from Massive Open Online Courses (MOOCs), etc. Many lecturers fear that recording their lectures and providing them via an online platform might lead to less participation in the actual lecture. Also, many teachers fear that the lecture recordings are not used with the same focus and dedication as lectures in a lecture hall. In this work, we show that in our experience, full lectures have an average watching duration of just a few minutes and explain the reasons for that and why, in most cases, teachers do not have to worry about that.

A quantifiable trustmModel for Blockchain-based identity management (2019)

Grüner, Andreas ; Mühle, Alexander ; Gayvoronskaya, Tatiana ; Meinel, Christoph

Integrating Biological Context into the Analysis of Gene Expression Data (2019)

Perscheid, Cindy ; Uflacker, Matthias

High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e. g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context. However, gene expression data follows particular patterns - the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.

From face to face (2019)

Drimalla, Hanna ; Landwehr, Niels ; Hess, Ursula ; Dziobek, Isabel

Despite advances in the conceptualisation of facial mimicry, its role in the processing of social information is a matter of debate. In the present study, we investigated the relationship between mimicry and cognitive and emotional empathy. To assess mimicry, facial electromyography was recorded for 70 participants while they completed the Multifaceted Empathy Test, which presents complex context-embedded emotional expressions. As predicted, inter-individual differences in emotional and cognitive empathy were associated with the level of facial mimicry. For positive emotions, the intensity of the mimicry response scaled with the level of state emotional empathy. Mimicry was stronger for the emotional empathy task compared to the cognitive empathy task. The specific empathy condition could be successfully detected from facial muscle activity at the level of single individuals using machine learning techniques. These results support the view that mimicry occurs depending on the social context as a tool to affiliate and it is involved in cognitive as well as emotional empathy.

On integrating design thinking for human-centered requirements engineering (2019)

Hehn, Jennifer ; Mendez, Daniel ; Uebernickel, Falk ; Brenner, Walter ; Broy, Manfred

We elaborate on the possibilities and needs to integrate design thinking into requirements engineering, drawing from our research and project experiences. We suggest three approaches for tailoring and integrating design thinking and requirements engineering with complementary synergies and point at open challenges for research and practice.

An Exploratory Study to Detect Temporal Orientation Using Bluetooth's sensor (2019)

Hernandez, Netzahualcoyotl ; Demiray, Burcu ; Arnrich, Bert ; Favela, Jesus

Mobile sensing technology allows us to investigate human behaviour on a daily basis. In the study, we examined temporal orientation, which refers to the capacity of thinking or talking about personal events in the past and future. We utilise the mksense platform that allows us to use the experience-sampling method. Individual's thoughts and their relationship with smartphone's Bluetooth data is analysed to understand in which contexts people are influenced by social environments, such as the people they spend the most time with. As an exploratory study, we analyse social condition influence through a collection of Bluetooth data and survey information from participant's smartphones. Preliminary results show that people are likely to focus on past events when interacting with close-related people, and focus on future planning when interacting with strangers. Similarly, people experience present temporal orientation when accompanied by known people. We believe that these findings are linked to emotions since, in its most basic state, emotion is a state of physiological arousal combined with an appropriated cognition. In this contribution, we envision a smartphone application for automatically inferring human emotions based on user's temporal orientation by using Bluetooth sensors, we briefly elaborate on the influential factor of temporal orientation episodes and conclude with a discussion and lessons learned.

Magic mirror in my hand, which is the best in the land? (2020)

Kossmann, Jan ; Halfpap, Stefan ; Jankrift, Marcel ; Schlosser, Rainer

Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes. In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.

MDedup (2020)

Koumarelas, Ioannis ; Papenbrock, Thorsten ; Naumann, Felix

Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize or require a lot of pre-labeled training data. Both parameterization and pre-labeling are at least domain-specific if not dataset-specific, which is a problem if a new dataset needs to be cleaned. For this reason, we propose a novel, rule-based and fully automatic duplicate detection approach that is based on matching dependencies (MDs). Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset. To increase the generally low recall of MD-based data cleaning approaches, we propose an additional boosting step. Our experiments show that this approach reaches up to 94% F-measure and 100% precision on our evaluation datasets, which are good numbers considering that the system does not require domain or target data-specific configuration.

Risk-sensitive control of Markov decision processes (2020)

Schlosser, Rainer

In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.

Data preparation for duplicate detection (2020)

Koumarelas, Ioannis ; Jiang, Lan ; Naumann, Felix

Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection. Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.

Self-driving database systems (2020)

Kossmann, Jan ; Schlosser, Rainer

Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.

Geospatial artificial intelligence (2020)

Döllner, Jürgen Roland Friedrich

Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models.

More than a quarter century of creativity and innovation management (2020)

Rose, Robert ; Hölzle, Katharina ; Björk, Jennie

When this journal was founded in 1992 by Tudor Rickards and Susan Moger, there was no academic outlet available that addressed issues at the intersection of creativity and innovation. From zero to 1,163 records, from the new kid on the block to one of the leading journals in creativity and innovation management has been quite a journey, and we would like to reflect on the past 28 years and the intellectual and conceptual structure of Creativity and Innovation Management (CIM). Specifically, we highlight milestones and influential articles, identify how key journal characteristics evolved, outline the (co-)authorship structure, and finally, map the thematic landscape of CIM by means of a text-mining analysis. This study represents the first systematic and comprehensive assessment of the journal's published body of knowledge and helps to understand the journal's influence on the creativity and innovation management community. We conclude by discussing future topics and paths of the journal as well as limitations of our approach.

IMDfence (2020)

Siddiqi, Muhammad Ali ; Dörr, Christian ; Strydis, Christos

Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, entity authentication, remote monitoring and system scalability. The protocol also allows emergency access that results in the graceful degradation of offered services without compromising security and patient safety. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere less than 7% increase in total IMD energy consumption, and less than 14 ms and 9 kB increase in system delay and memory footprint, respectively.

Data Preparation (2020)

Hameed, Mazhar ; Naumann, Felix

Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day. To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.

Greed is good for deterministic scale-free networks (2020)

Chauhan, Ankit ; Friedrich, Tobias ; Rothenberger, Ralf

Large real-world networks typically follow a power-law degree distribution. To study such networks, numerous random graph models have been proposed. However, real-world networks are not drawn at random. Therefore, Brach et al. (27th symposium on discrete algorithms (SODA), pp 1306-1325, 2016) introduced two natural deterministic conditions: (1) a power-law upper bound on the degree distribution (PLB-U) and (2) power-law neighborhoods, that is, the degree distribution of neighbors of each vertex is also upper bounded by a power law (PLB-N). They showed that many real-world networks satisfy both properties and exploit them to design faster algorithms for a number of classical graph problems. We complement their work by showing that some well-studied random graph models exhibit both of the mentioned PLB properties. PLB-U and PLB-N hold with high probability for Chung-Lu Random Graphs and Geometric Inhomogeneous Random Graphs and almost surely for Hyperbolic Random Graphs. As a consequence, all results of Brach et al. also hold with high probability or almost surely for those random graph classes. In the second part we study three classical NP-hard optimization problems on PLB networks. It is known that on general graphs with maximum degree Delta, a greedy algorithm, which chooses nodes in the order of their degree, only achieves a Omega (ln Delta)-approximation forMinimum Vertex Cover and Minimum Dominating Set, and a Omega(Delta)-approximation forMaximum Independent Set. We prove that the PLB-U property with beta>2 suffices for the greedy approach to achieve a constant-factor approximation for all three problems. We also show that these problems are APX-hard even if PLB-U, PLB-N, and an additional power-law lower bound on the degree distribution hold. Hence, a PTAS cannot be expected unless P = NP. Furthermore, we prove that all three problems are in MAX SNP if the PLB-U property holds.

Hybrid search plan generation for generalized graph pattern matching (2020)

Barkowsky, Matthias ; Giese, Holger

In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of interconnected data. To extract useful information from the growing network structures, efficient querying techniques are required. In this paper, we propose an approach for graph pattern matching that allows a uniform handling of arbitrary constraints over the query vertices. Our technique builds on a previously introduced matching algorithm, which takes concrete host graph information into account to dynamically adapt the employed search plan during query execution. The dynamic algorithm is combined with an existing static approach for search plan generation, resulting in a hybrid technique which we further extend by a more sophisticated handling of filtering effects caused by constraint checks. We evaluate the presented concepts empirically based on an implementation for our graph pattern matching tool, the Story Diagram Interpreter, with queries and data provided by the LDBC Social Network Benchmark. Our results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.

Scalable relaxation techniques to solve stochastic dynamic multi-product pricing problems with substitution effects (2020)

Schlosser, Rainer

In many businesses, firms are selling different types of products, which share mutual substitution effects in demand. To compute effective pricing strategies is challenging as the sales probabilities of each of a firm's products can also be affected by the prices of potential substitutes. In this paper, we analyze stochastic dynamic multi-product pricing models for the sale of perishable goods. To circumvent the limitations of time-consuming optimal solutions for highly complex models, we propose different relaxation techniques, which allow to reduce the size of critical model components, such as the state space, the action space, or the set of potential sales events. Our heuristics are able to decrease the size of those components by forming corresponding clusters and using subsets of representative elements. Using numerical examples, we verify that our heuristics make it possible to dramatically reduce the computation time while still obtaining close-to-optimal expected profits. Further, we show that our heuristics are (i) flexible, (ii) scalable, and (iii) can be arbitrarily combined in a mutually supportive way.

Efficient discovery of matching dependencies (2020)

Schirmer, Philipp ; Papenbrock, Thorsten ; Koumarelas, Ioannis ; Naumann, Felix

Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB.

Explainable AI under contract and tort law (2020)

Hacker, Philipp ; Krestel, Ralf ; Grundmann, Stefan ; Naumann, Felix

This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification.

Destructiveness of lexicographic parsimony pressure and alleviation by a concatenation crossover in genetic programming (2020)

Kötzing, Timo ; Lagodzinski, Gregor J. A. ; Lengler, Johannes ; Melnichenko, Anna

For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation: the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts); and also the role and the realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had surprisingly little share in this work. We analyze a simple crossover operator in combination with randomized local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); we denote the resulting algorithm Concatenation Crossover GP. We consider three variants of the well-studied MAJORITY test function, adding large plateaus in different ways to the fitness landscape and thus giving a test bed for analyzing the interplay of variation operators and bloat control mechanisms in a setting with local optima. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control. (C) 2019 Elsevier B.V. All rights reserved.

Bridge damage (2020)

Isailović, Dušan ; Stojanovic, Vladeta ; Trapp, Matthias ; Richter, Rico ; Hajdin, Rade ; Döllner, Jürgen Roland Friedrich

Building Information Modeling (BIM) representations of bridges enriched by inspection data will add tremendous value to future Bridge Management Systems (BMSs). This paper presents an approach for point cloud-based detection of spalling damage, as well as integrating damage components into a BIM via semantic enrichment of an as-built Industry Foundation Classes (IFC) model. An approach for generating the as-built BIM, geometric reconstruction of detected damage point clusters and semantic-enrichment of the corresponding IFC model is presented. Multiview-classification is used and evaluated for the detection of spalling damage features. The semantic enrichment of as-built IFC models is based on injecting classified and reconstructed damage clusters back into the as-built IFC, thus generating an accurate as-is IFC model compliant to the BMS inspection requirements.

Preface to the special issue on the 11th International Conference on Graph Transformation (2020)

Lambers, Leen ; Weber, Jens

This special issue contains extended versions of four selected papers from the 11th International Conference on Graph Transformation (ICGT 2018). The articles cover a tool for computing core graphs via SAT/SMT solvers (graph language definition), graph transformation through graph surfing in reaction systems (a new graph transformation formalism), the essence and initiality of conflicts in M-adhesive transformation systems, and a calculus of concurrent graph-rewriting processes (theory on conflicts and parallel independence).

Mary, Hugo, and Hugo* (2020)

Thamsen, Lauritz ; Beilharz, Jossekin Jakob ; Vinh Thuy Tran ; Nedelkoski, Sasho ; Kao, Odej

Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly.

Lower bounds on the run time of the Univariate Marginal Distribution Algorithm on OneMax (2020)

Krejca, Martin S. ; Witt, Carsten

The Univariate Marginal Distribution Algorithm (UMDA) - a popular estimation-of-distribution algorithm - is studied from a run time perspective. On the classical OneMax benchmark function on bit strings of length n, a lower bound of Omega(lambda + mu root n + n logn), where mu and lambda are algorithm-specific parameters, on its expected run time is proved. This is the first direct lower bound on the run time of UMDA. It is stronger than the bounds that follow from general black-box complexity theory and is matched by the run time of many evolutionary algorithms. The results are obtained through advanced analyses of the stochastic change of the frequencies of bit values maintained by the algorithm, including carefully designed potential functions. These techniques may prove useful in advancing the field of run time analysis for estimation-of-distribution algorithms in general.

Hitting set enumeration with partial information for unique column combination discovery (2020)

Birnick, Johann ; Bläsius, Thomas ; Friedrich, Tobias ; Naumann, Felix ; Papenbrock, Thorsten ; Schirneck, Friedrich Martin

Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations. In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.

On the complexity of the smallest grammar problem over fixed alphabets (2020)

Casel, Katrin ; Fernau, Henning ; Gaspers, Serge ; Gras, Benjamin ; Schmid, Markus L.

In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an O(3(vertical bar w vertical bar)) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the "hierarchical depth" of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.

Human centered AI design for clinical monitoring and data management (2020)

Adnan, Hassan Sami ; Matthews, Sam ; Hackl, M. ; Das, P. P. ; Manaswini, Manisha ; Gadamsetti, S. ; Filali, Maroua ; Owoyele, Babajide ; Santuber, Joaquín ; Edelman, Jonathan

In clinical settings, significant resources are spent on data collection and monitoring patients' health parameters to improve decision-making and provide better care. With increased digitization, the healthcare sector is shifting towards implementing digital technologies for data management and in administration. New technologies offer better treatment opportunities and streamline clinical workflow, but the complexity can cause ineffectiveness, frustration, and errors. To address this, we believe digital solutions alone are not sufficient. Therefore, we take a human-centred design approach for AI development, and apply systems engineering methods to identify system leverage points. We demonstrate how automation enables monitoring clinical parameters, using existing non-intrusive sensor technology, resulting in more resources toward patient care. Furthermore, we provide a framework on digitization of clinical data for integration with data management.

Mobile phone ownership and willingness to receive mHealth services among patients with diabetes mellitus in South-West, Nigeria (2020)

Olamoyegun, Michael Adeyemi ; Raimi, Taiwo Hassan ; Ala, Oluwabukola Ayodele ; Fadare, Joseph Olusesan

Introduction: mobile phone technology is increasingly used to overcome traditional barriers to limiting access to diabetes care. This study evaluated mobile phone ownership and willingness to receive and pay for mobile phone-based diabetic services among people with diabetes in South-West, Nigeria. Methods: two hundred and fifty nine patients with diabetes were consecutively recruited from three tertiary health institutions in South-West, Nigeria. Questionnaire was used to evaluate mobile phone ownership, willingness to receive and pay for mobile phone-based diabetic health care services via voice call and text messaging. Results: 97.3% owned a mobile phone, with 38.9% and 61.1% owning smartphone and basic phone respectively. Males were significantly more willing to receive mobile-phone-based health services than females (81.1% vs 68.1%, p=0.025), likewise married compared to unmarried [77.4% vs 57.1%, p=0.0361. Voice calls (41.3%) and text messages (32.4%), were the most preferred modes of receiving diabetes-related health education with social media (3.1%) and email (1.5%) least. Almost three-quarter of participants (72.6%) who owned mobile phone, were willing to receive mobile phone-based diabetes health services. The educational status of patients (adjusted OR [AORJ: 1.7(95% CI: 1.6 to 2.11), glucometers possession (ACM: 2.0 [95% CI: 1.9 to 2.1) and type of mobile phone owned (AOR: 2.9 [95% CI: 2.8 to 5.0]) were significantly associated with the willingness to receive mobile phone-based diabetic services. Conclusion: the majority of study participants owned mobile phones and would be willing to receive and pay for diabetes-related healthcare delivery services provided the cost is minimal and affordable.

Cyber threat intelligence: A product without a process? (2020)

Oosthoek, Kris ; Doerr, Christian

Purely attention based local feature integration for video classification (2020)

Long, Xiang ; de Melo, Gerard ; He, Dongliang ; Li, Fu ; Chi, Zhizhen ; Wen, Shilei ; Gan, Chuang

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.

The impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time (2020)

Doerr, Benjamin ; Kötzing, Timo ; Lagodzinski, Gregor J. A. ; Lengler, Johannes

While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat. In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism. More specifically, we analyzed the performance of the (1 + 1) GP on the two benchmark functions ORDER and MAJORITY. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(T-init + nlogn) iterations both for ORDER and MAJORITY. For the case without bloat control, the bounds O(T-init logT(i)(nit) + n(logn)(3)) and Omega(T-init + nlogn) (and Omega(T-init log T-init) for n = 1) hold for MAJORITY(1).

Significance-based estimation-of-distribution algorithms (2020)

Doerr, Benjamin ; Krejca, Martin S.

Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based cGA (sig-cGA) optimizes the commonly regarded benchmark functions OneMax (OM), LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed stable compact genetic algorithm-an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model-we prove that it optimizes OM only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OM in polynomial time.

Using AI for mental health analysis and prediction in school surveys (2020)

Adnan, Hassan Sami ; Srsic, Amanda ; Venticich, Pete Milos ; Townend, David M.R.

Background: Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper- based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models. Methods: For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases. Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning can then be deployed to classify specific behaviour and mental health patterns. Results: We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project. Conclusions: This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns of unknown effects.

Improving scalability and reward of utility-driven self-healing for large dynamic architectures (2020)

Ghahremani, Sona ; Giese, Holger ; Vogel, Thomas

Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfies certain conditions. They result in scalable solutions but often with merely satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal decisions by using an often costly optimization, which typically does not scale for large problems. We propose a rule-based and utility-driven adaptation scheme that achieves the benefits of both directions such that the adaptation decisions are optimal, whereas the computation scales by avoiding an expensive optimization. We use this adaptation scheme for architecture-based self-healing of large software systems. For this purpose, we define the utility for large dynamic architectures of such systems based on patterns that define issues the self-healing must address. Moreover, we use pattern-based adaptation rules to resolve these issues. Using a pattern-based scheme to define the utility and adaptation rules allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. In addition to formally analyzing the computational effort and optimality of the proposed scheme, we thoroughly demonstrate its scalability and optimality in terms of reward in comparative experiments with a static rule-based approach as a baseline and a utility-driven approach using a constraint solver. These experiments are based on different failure profiles derived from real-world failure logs. We also investigate the impact of different failure profile characteristics on the scalability and reward to evaluate the robustness of the different approaches.

Evaluation of self-healing systems (2020)

Ghahremani, Sona ; Giese, Holger

Evaluating the performance of self-adaptive systems is challenging due to their interactions with often highly dynamic environments. In the specific case of self-healing systems, the performance evaluations of self-healing approaches and their parameter tuning rely on the considered characteristics of failure occurrences and the resulting interactions with the self-healing actions. In this paper, we first study the state-of-the-art for evaluating the performances of self-healing systems by means of a systematic literature review. We provide a classification of different input types for such systems and analyse the limitations of each input type. A main finding is that the employed inputs are often not sophisticated regarding the considered characteristics for failure occurrences. To further study the impact of the identified limitations, we present experiments demonstrating that wrong assumptions regarding the characteristics of the failure occurrences can result in large performance prediction errors, disadvantageous design-time decisions concerning the selection of alternative self-healing approaches, and disadvantageous deployment-time decisions concerning parameter tuning. Furthermore, the experiments indicate that employing multiple alternative input characteristics can help with reducing the risk of premature disadvantageous design-time decisions.

On discovering and incrementally updating inclusion dependencies (2020)

Shaabani, Nuhad

In today's world, many applications produce large amounts of data at an enormous rate. Analyzing such datasets for metadata is indispensable for effectively understanding, storing, querying, manipulating, and mining them. Metadata summarizes technical properties of a dataset which rang from basic statistics to complex structures describing data dependencies. One type of dependencies is inclusion dependency (IND), which expresses subset-relationships between attributes of datasets. Therefore, inclusion dependencies are important for many data management applications in terms of data integration, query optimization, schema redesign, or integrity checking. So, the discovery of inclusion dependencies in unknown or legacy datasets is at the core of any data profiling effort. For exhaustively detecting all INDs in large datasets, we developed S-indd++, a new algorithm that eliminates the shortcomings of existing IND-detection algorithms and significantly outperforms them. S-indd++ is based on a novel concept for the attribute clustering for efficiently deriving INDs. Inferring INDs from our attribute clustering eliminates all redundant operations caused by other algorithms. S-indd++ is also based on a novel partitioning strategy that enables discording a large number of candidates in early phases of the discovering process. Moreover, S-indd++ does not require to fit a partition into the main memory--this is a highly appreciable property in the face of ever-growing datasets. S-indd++ reduces up to 50% of the runtime of the state-of-the-art approach. None of the approach for discovering INDs is appropriate for the application on dynamic datasets; they can not update the INDs after an update of the dataset without reprocessing it entirely. To this end, we developed the first approach for incrementally updating INDs in frequently changing datasets. We achieved that by reducing the problem of incrementally updating INDs to the incrementally updating the attribute clustering from which all INDs are efficiently derivable. We realized the update of the clusters by designing new operations to be applied to the clusters after every data update. The incremental update of INDs reduces the time of the complete rediscovery by up to 99.999%. All existing algorithms for discovering n-ary INDs are based on the principle of candidate generation--they generate candidates and test their validity in the given data instance. The major disadvantage of this technique is the exponentially growing number of database accesses in terms of SQL queries required for validation. We devised Mind2, the first approach for discovering n-ary INDs without candidate generation. Mind2 is based on a new mathematical framework developed in this thesis for computing the maximum INDs from which all other n-ary INDs are derivable. The experiments showed that Mind2 is significantly more scalable and effective than hypergraph-based algorithms.

CloudStrike (2020)

Torkura, Kennedy A. ; Sukmana, Muhammad Ihsan Haikal ; Cheng, Feng ; Meinel, Christoph

Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.

Generative multi-adversarial network for striking the right balance in abdominal image segmentation (2020)

Rezaei, Mina ; Näppi, Janne J. ; Lippert, Christoph ; Meinel, Christoph ; Yoshida, Hiroyuki

Purpose: The identification of abnormalities that are relatively rare within otherwise normal anatomy is a major challenge for deep learning in the semantic segmentation of medical images. The small number of samples of the minority classes in the training data makes the learning of optimal classification challenging, while the more frequently occurring samples of the majority class hamper the generalization of the classification boundary between infrequently occurring target objects and classes. In this paper, we developed a novel generative multi-adversarial network, called Ensemble-GAN, for mitigating this class imbalance problem in the semantic segmentation of abdominal images. Method: The Ensemble-GAN framework is composed of a single-generator and a multi-discriminator variant for handling the class imbalance problem to provide a better generalization than existing approaches. The ensemble model aggregates the estimates of multiple models by training from different initializations and losses from various subsets of the training data. The single generator network analyzes the input image as a condition to predict a corresponding semantic segmentation image by use of feedback from the ensemble of discriminator networks. To evaluate the framework, we trained our framework on two public datasets, with different imbalance ratios and imaging modalities: the Chaos 2019 and the LiTS 2017. Result: In terms of the F1 score, the accuracies of the semantic segmentation of healthy spleen, liver, and left and right kidneys were 0.93, 0.96, 0.90 and 0.94, respectively. The overall F1 scores for simultaneous segmentation of the lesions and liver were 0.83 and 0.94, respectively. Conclusion: The proposed Ensemble-GAN framework demonstrated outstanding performance in the semantic segmentation of medical images in comparison with other approaches on popular abdominal imaging benchmarks. The Ensemble-GAN has the potential to segment abdominal images more accurately than human experts.

What are we talking about when we talk about technology pivots? (2020)

Bohn, Nicolai ; Kundisch, Dennis

Technology pivots were designed to help digital startups make adjustments to the technology underpinning their products and services. While academia and the media make liberal use of the term "technology pivot," they rarely align themselves to Ries' foundational conceptualization. Recent research suggests that a more granulated conceptualization of technology pivots is required. To scientifically derive a comprehensive conceptualization, we conduct a Delphi study with a panel of 38 experts drawn from academia and practice to explore their understanding of "technology pivots." Our study thus makes an important contribution to advance the seminal work by Ries on technology pivots.

Affect-aware word clouds (2020)

Kulahcioglu, Tugba ; Melo, Gerard de

Word clouds are widely used for non-analytic purposes, such as introducing a topic to students, or creating a gift with personally meaningful text. Surveys show that users prefer tools that yield word clouds with a stronger emotional impact. Fonts and color palettes are powerful typographical signals that may determine this impact. Typically, these signals are assigned randomly, or expected to be chosen by the users. We present an affect-aware font and color palette selection methodology that aims to facilitate more informed choices. We infer associations of fonts with a set of eight affects, and evaluate the resulting data in a series of user studies both on individual words as well as in word clouds. Relying on a recent study to procure affective color palettes, we carry out a similar user study to understand the impact of color choices on word clouds. Our findings suggest that both fonts and color palettes are powerful tools contributing to the affects evoked by a word cloud. The experiments further confirm that the novel datasets we propose are successful in enabling this. We also find that, for the majority of the affects, both signals need to be congruent to create a stronger impact. Based on this data, we implement a prototype that allows users to specify a desired affect and recommends congruent fonts and color palettes for the word.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

366 search hits