Hasso-Plattner-Institut für Digital Engineering GmbH
Filtern
Volltext vorhanden
- nein (196) (entfernen)
Erscheinungsjahr
Dokumenttyp
- Wissenschaftlicher Artikel (103)
- Sonstiges (83)
- Dissertation (8)
- Teil eines Buches (Kapitel) (1)
- Habilitation (1)
Gehört zur Bibliographie
- ja (196)
Schlagworte
- E-Learning (4)
- MOOC (4)
- Scrum (4)
- BPMN (3)
- Business process models (3)
- DMN (3)
- Dynamic pricing (3)
- Security Metrics (3)
- Security Risk Assessment (3)
- Teamwork (3)
Local laws on urban policy, i.e., ordinances directly affect our daily life in various ways (health, business etc.), yet in practice, for many citizens they remain impervious and complex. This article focuses on an approach to make urban policy more accessible and comprehensible to the general public and to government officials, while also addressing pertinent social media postings. Due to the intricacies of the natural language, ranging from complex legalese in ordinances to informal lingo in tweets, it is practical to harness human judgment here. To this end, we mine ordinances and tweets via reasoning based on commonsense knowledge so as to better account for pragmatics and semantics in the text. Ours is pioneering work in ordinance mining, and thus there is no prior labeled training data available for learning. This gap is filled by commonsense knowledge, a prudent choice in situations involving a lack of adequate training data. The ordinance mining can be beneficial to the public in fathoming policies and to officials in assessing policy effectiveness based on public reactions. This work contributes to smart governance, leveraging transparency in governing processes via public involvement. We focus significantly on ordinances contributing to smart cities, hence an important goal is to assess how well an urban region heads towards a smart city as per its policies mapping with smart city characteristics, and the corresponding public satisfaction.
VLDB 2021
(2021)
The 47th International Conference on Very Large Databases (VLDB'21) was held on August 16-20, 2021 as a hybrid conference. It attracted 180 in-person attendees in Copenhagen and 840 remote attendees. In this paper, we describe our key decisions as general chairs and program committee chairs and share the lessons we learned.
About 15 years ago, the first Massive Open Online Courses (MOOCs) appeared and revolutionized online education with more interactive and engaging course designs. Yet, keeping learners motivated and ensuring high satisfaction is one of the challenges today's course designers face. Therefore, many MOOC providers employed gamification elements that only boost extrinsic motivation briefly and are limited to platform support. In this article, we introduce and evaluate a gameful learning design we used in several iterations on computer science education courses. For each of the courses on the fundamentals of the Java programming language, we developed a self-contained, continuous story that accompanies learners through their learning journey and helps visualize key concepts. Furthermore, we share our approach to creating the surrounding story in our MOOCs and provide a guideline for educators to develop their own stories. Our data and the long-term evaluation spanning over four Java courses between 2017 and 2021 indicates the openness of learners toward storified programming courses in general and highlights those elements that had the highest impact. While only a few learners did not like the story at all, most learners consumed the additional story elements we provided. However, learners' interest in influencing the story through majority voting was negligible and did not show a considerable positive impact, so we continued with a fixed story instead. We did not find evidence that learners just participated in the narrative because they worked on all materials. Instead, for 10-16% of learners, the story was their main course motivation. We also investigated differences in the presentation format and concluded that several longer audio-book style videos were most preferred by learners in comparison to animated videos or different textual formats. Surprisingly, the availability of a coherent story embedding examples and providing a context for the practical programming exercises also led to a slightly higher ranking in the perceived quality of the learning material (by 4%). With our research in the context of storified MOOCs, we advance gameful learning designs, foster learner engagement and satisfaction in online courses, and help educators ease knowledge transfer for their learners.
The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP'95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m(vertical bar X vertical bar+1) n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
Circular economy
(2021)
In a circular economy, the use of recycled resources in production is a key performance indicator for management. Yet, academic studies are still unable to inform managers on appropriate recycling and pricing policies. We develop an optimal control model integrating a firm's recycling rate, which can use both virgin and recycled resources in the production process. Our model accounts for recycling influence both at the supply- and demandsides. The positive effect of a firm's use of recycled resources diminishes over time but may increase through investments. Using general formulations for demand and cost, we analytically examine joint dynamic pricing and recycling investment policies in order to determine their optimal interplay over time. We provide numerical experiments to assess the existence of a steady-state and to calculate sensitivity analyses with respect to various model parameters. The analysis shows how to dynamically adapt jointly optimized controls to reach sustainability in the production process. Our results pave the way to sounder sustainable practices for firms operating within a circular economy.
Design thinking is a well-established practical and educational approach to fostering high-level creativity and innovation, which has been refined since the 1950s with the participation of experts like Joy Paul Guilford and Abraham Maslow. Through real-world projects, trainees learn to optimize their creative outcomes by developing and practicing creative cognition and metacognition. This paper provides a holistic perspective on creativity, enabling the formulation of a comprehensive theoretical framework of creative metacognition. It focuses on the design thinking approach to creativity and explores the role of metacognition in four areas of creativity expertise: Products, Processes, People, and Places. The analysis includes task-outcome relationships (product metacognition), the monitoring of strategy effectiveness (process metacognition), an understanding of individual or group strengths and weaknesses (people metacognition), and an examination of the mutual impact between environments and creativity (place metacognition). It also reviews measures taken in design thinking education, including a distribution of cognition and metacognition, to support students in their development of creative mastery. On these grounds, we propose extended methods for measuring creative metacognition with the goal of enhancing comprehensive assessments of the phenomenon. Proposed methodological advancements include accuracy sub-scales, experimental tasks where examinees explore problem and solution spaces, combinations of naturalistic observations with capability testing, as well as physiological assessments as indirect measures of creative metacognition.
CrashNet
(2021)
Destructive car crash tests are an elaborate, time-consuming, and expensive necessity of the automotive development process. Today, finite element method (FEM) simulations are used to reduce costs by simulating car crashes computationally. We propose CrashNet, an encoder-decoder deep neural network architecture that reduces costs further and models specific outcomes of car crashes very accurately. We achieve this by formulating car crash events as time series prediction enriched with a set of scalar features. Traditional sequence-to-sequence models are usually composed of convolutional neural network (CNN) and CNN transpose layers. We propose to concatenate those with an MLP capable of learning how to inject the given scalars into the output time series. In addition, we replace the CNN transpose with 2D CNN transpose layers in order to force the model to process the hidden state of the set of scalars as one time series. The proposed CrashNet model can be trained efficiently and is able to process scalars and time series as input in order to infer the results of crash tests. CrashNet produces results faster and at a lower cost compared to destructive tests and FEM simulations. Moreover, it represents a novel approach in the car safety management domain.
Invention
(2023)
This entry addresses invention from five different perspectives: (i) definition of the term, (ii) mechanisms underlying invention processes, (iii) (pre-)history of human inventions, (iv) intellectual property protection vs open innovation, and (v) case studies of great inventors. Regarding the definition, an invention is the outcome of a creative process taking place within a technological milieu, which is recognized as successful in terms of its effectiveness as an original technology. In the process of invention, a technological possibility becomes realized. Inventions are distinct from either discovery or innovation. In human creative processes, seven mechanisms of invention can be observed, yielding characteristic outcomes: (1) basic inventions, (2) invention branches, (3) invention combinations, (4) invention toolkits, (5) invention exaptations, (6) invention values, and (7) game-changing inventions. The development of humanity has been strongly shaped by inventions ever since early stone tools and the conception of agriculture. An “explosion of creativity” has been associated with Homo sapiens, and inventions in all fields of human endeavor have followed suit, engendering an exponential growth of cumulative culture. This culture development emerges essentially through a reuse of previous inventions, their revision, amendment and rededication. In sociocultural terms, humans have increasingly regulated processes of invention and invention-reuse through concepts such as intellectual property, patents, open innovation and licensing methods. Finally, three case studies of great inventors are considered: Edison, Marconi, and Montessori, next to a discussion of human invention processes as collaborative endeavors.
In liquid-chromatography-tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry ofprotein modifications is not readily available. To overcome this problem,we developed multiFLEX-LF, a computational tool that builds uponFLEXIQuant, which detects modified peptide precursors and quantifiestheir modification extent by monitoring the differences between observedand expected intensities of the unmodified precursors. multiFLEX-LFrelies on robust linear regression to calculate the modification extent of agiven precursor relative to a within-study reference. multiFLEX-LF cananalyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyzemodification dynamics and coregulated modifications, we hierarchicallyclustered the precursors of all proteins based on their computed relativemodification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time pointsduring mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modificationevents. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and thequantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF candrive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies.multiFLEX-LF is available athttps://gitlab.com/SteenOmicsLab/multiflex-lf.
CovRadar
(2022)
The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Omics and male infertility
(2022)
Male infertility is a multifaceted disorder affecting approximately 50% of male partners in infertile couples.
Over the years, male infertility has been diagnosed mainly through semen analysis, hormone evaluations, medical records and physical examinations, which of course are fundamental, but yet inefficient, because 30% of male infertility cases remain idiopathic. This dilemmatic status of the unknown needs to be addressed with more sophisticated and result-driven technologies and/or techniques.
Genetic alterations have been linked with male infertility, thereby unveiling the practicality of investigating this disorder from the "omics" perspective.
Omics aims at analyzing the structure and functions of a whole constituent of a given biological function at different levels, including the molecular gene level (genomics), transcript level (transcriptomics), protein level (proteomics) and metabolites level (metabolomics). In the current study, an overview of the four branches of omics and their roles in male infertility are briefly discussed; the potential usefulness of assessing transcriptomic data to understand this pathology is also elucidated.
After assessing the publicly obtainable transcriptomic data for datasets on male infertility, a total of 1385 datasets were retrieved, of which 10 datasets met the inclusion criteria and were used for further analysis.
These datasets were classified into groups according to the disease or cause of male infertility.
The groups include non-obstructive azoospermia (NOA), obstructive azoospermia (OA), non-obstructive and obstructive azoospermia (NOA and OA), spermatogenic dysfunction, sperm dysfunction, and Y chromosome microdeletion.
Findings revealed that 8 genes (LDHC, PDHA2, TNP1, TNP2, ODF1, ODF2, SPINK2, PCDHB3) were commonly differentially expressed between all disease groups.
Likewise, 56 genes were common between NOA versus NOA and OA (ADAD1, BANF2, BCL2L14, C12orf50, C20orf173, C22orf23, C6orf99, C9orf131, C9orf24, CABS1, CAPZA3, CCDC187, CCDC54, CDKN3, CEP170, CFAP206, CRISP2, CT83, CXorf65, FAM209A, FAM71F1, FAM81B, GALNTL5, GTSF1, H1FNT, HEMGN, HMGB4, KIF2B, LDHC, LOC441601, LYZL2, ODF1, ODF2, PCDHB3, PDHA2, PGK2, PIH1D2, PLCZ1, PROCA1, RIMBP3, ROPN1L, SHCBP1L, SMCP, SPATA16, SPATA19, SPINK2, TEX33, TKTL2, TMCO2, TMCO5A, TNP1, TNP2, TSPAN16, TSSK1B, TTLL2, UBQLN3).
These genes, particularly the above-mentioned 8 genes, are involved in diverse biological processes such as germ cell development, spermatid development, spermatid differentiation, regulation of proteolysis, spermatogenesis and metabolic processes.
Owing to the stage-specific expression of these genes, any mal-expression can ultimately lead to male infertility.
Therefore, currently available data on all branches of omics relating to male fertility can be used to identify biomarkers for diagnosing male infertility, which can potentially help in unravelling some idiopathic cases.
Peer assessment in MOOCs
(2021)
We report on a systematic review of the landscape of peer assessment in massive open online courses (MOOCs) with papers from 2014 to 2020 in 20 leading education technology publication venues across four databases containing education technology-related papers, addressing three research issues: the evolution of peer assessment in MOOCs during the period 2014 to 2020, the methods used in MOOCs to assess peers, and the challenges of and future directions in MOOC peer assessment. We provide summary statistics and a review of methods across the corpus and highlight three directions for improving the use of peer assessment in MOOCs: the need for focusing on scaling learning through peer evaluations, the need for scaling and optimizing team submissions in team peer assessments, and the need for embedding a social process for peer assessment.
A path in an edge-colored graph is rainbow if no two edges of it are colored the same, and the graph is rainbow-connected if there is a rainbow path between each pair of its vertices. The minimum number of colors needed to rainbow-connect a graph G is the rainbow connection number of G, denoted by rc(G).& nbsp;A simple way to rainbow-connect a graph G is to color the edges of a spanning tree with distinct colors and then re-use any of these colors to color the remaining edges of G. This proves that rc(G) <= |V (G)|-1. We ask whether there is a stronger connection between tree-like structures and rainbow coloring than that is implied by the above trivial argument. For instance, is it possible to find an upper bound of t(G)-1 for rc(G), where t(G) is the number of vertices in the largest induced tree of G? The answer turns out to be negative, as there are counter-examples that show that even c .t(G) is not an upper bound for rc(G) for any given constant c.& nbsp;In this work we show that if we consider the forest number f(G), the number of vertices in a maximum induced forest of G, instead of t(G), then surprisingly we do get an upper bound. More specifically, we prove that rc(G) <= f(G) + 2. Our result indicates a stronger connection between rainbow connection and tree-like structures than that was suggested by the simple spanning tree based upper bound.
Graphs play an important role in many areas of Computer Science. In particular, our work is motivated by model-driven software development and by graph databases. For this reason, it is very important to have the means to express and to reason about the properties that a given graph may satisfy. With this aim, in this paper we present a visual logic that allows us to describe graph properties, including navigational properties, i.e., properties about the paths in a graph. The logic is equipped with a deductive tableau method that we have proved to be sound and complete.
Industry 4.0 is transforming how businesses innovate and, as a result, companies are spearheading the movement towards 'Digital Transformation'. While some scholars advocate the use of design thinking to identify new innovative behaviours, cognition experts emphasise the importance of top managers in supporting employees to develop these behaviours. However, there is a dearth of research in this domain and companies are struggling to implement the required behaviours. To address this gap, this study aims to identify and prioritise behavioural strategies conducive to design thinking to inform the creation of a managerial mental model. We identify 20 behavioural strategies from 45 interviewees with practitioners and educators and combine them with the concepts of 'paradigm-mindset-mental model' from cognition theory. The paper contributes to the body of knowledge by identifying and prioritising specific behavioural strategies to form a novel set of survival conditions aligned to the new industrial paradigm of Industry 4.0.
As resources are valuable assets, organizations have to decide which resources to allocate to business process tasks in a way that the process is executed not only effectively but also efficiently. Traditional role-based resource allocation leads to effective process executions, since each task is performed by a resource that has the required skills and competencies to do so. However, the resulting allocations are typically not as efficient as they could be, since optimization techniques have yet to find their way in traditional business process management scenarios. On the other hand, operations research provides a rich set of analytical methods for supporting problem-specific decisions on resource allocation. This paper provides a novel framework for creating transparency on existing tasks and resources, supporting individualized allocations for each activity in a process, and the possibility to integrate problem-specific analytical methods of the operations research domain. To validate the framework, the paper reports on the design and prototypical implementation of a software architecture, which extends a traditional process engine with a dedicated resource management component. This component allows us to define specific resource allocation problems at design time, and it also facilitates optimized resource allocation at run time. The framework is evaluated using a real-world parcel delivery process. The evaluation shows that the quality of the allocation results increase significantly with a technique from operations research in contrast to the traditional applied rule-based approach.
In the field of Business Process Management (BPM), modeling business processes and related data is a critical issue since process activities need to manage data stored in databases. The connection between processes and data is usually handled at the implementation level, even if modeling both processes and data at the conceptual level should help designers in improving business process models and identifying requirements for implementation. Especially in data -and decision-intensive contexts, business process activities need to access data stored both in databases and data warehouses. In this paper, we complete our approach for defining a novel conceptual view that bridges process activities and data. The proposed approach allows the designer to model the connection between business processes and database models and define the operations to perform, providing interesting insights on the overall connected perspective and hints for identifying activities that are crucial for decision support.
Background:
More patient data are needed to improve research on rare liver diseases. Mobile health apps enable an exhaustive data collection. Therefore, the European Reference Network on Hepatological diseases (ERN RARE-LIVER) intends to implement an app for patients with rare liver diseases communicating with a patient registry, but little is known about which features patients and their healthcare providers regard as being useful.
Aims:
This study aimed to investigate how an app for rare liver diseases would be accepted, and to find out which features are considered useful.
Methods:
An anonymous survey was conducted on adult patients with rare liver diseases at a single academic, tertiary care outpatient-service. Additionally, medical experts of the ERN working group on autoimmune hepatitis were invited to participate in an online survey.
Results:
In total, the responses from 100 patients with autoimmune (n = 90) or other rare (n = 10) liver diseases and 32 experts were analyzed. Patients were convinced to use a disease specific app (80%) and expected some benefit to their health (78%) but responses differed signifi-cantly between younger and older patients (93% vs. 62%, p < 0.001; 88% vs. 64%, p < 0.01). Comparing patients' and experts' feedback, patients more often expected a simplified healthcare pathway (e.g. 89% vs. 59% (p < 0.001) wanted access to one's own medical records), while healthcare providers saw the benefit mainly in improving compliance and treatment outcome (e.g. 93% vs. 31% (p < 0.001) and 70% vs. 21% (p < 0.001) expected the app to reduce mistakes in taking medication and improve quality of life, respectively).
Process mining techniques are valuable to gain insights into and help improve (work) processes. Many of these techniques focus on the sequential order in which activities are performed. Few of these techniques consider the statistical relations within processes. In particular, existing techniques do not allow insights into how responses to an event (action) result in desired or undesired outcomes (effects). We propose and formalize the ARE miner, a novel technique that allows us to analyze and understand these action-response-effect patterns. We take a statistical approach to uncover potential dependency relations in these patterns. The goal of this research is to generate processes that are: (1) appropriately represented, and (2) effectively filtered to show meaningful relations. We evaluate the ARE miner in two ways. First, we use an artificial data set to demonstrate the effectiveness of the ARE miner compared to two traditional process-oriented approaches. Second, we apply the ARE miner to a real-world data set from a Dutch healthcare institution. We show that the ARE miner generates comprehensible representations that lead to informative insights into statistical relations between actions, responses, and effects.
Despite advances in machine learning-based clinical prediction models, only few of such models are actually deployed in clinical contexts. Among other reasons, this is due to a lack of validation studies. In this paper, we present and discuss the validation results of a machine learning model for the prediction of acute kidney injury in cardiac surgery patients initially developed on the MIMIC-III dataset when applied to an external cohort of an American research hospital. To help account for the performance differences observed, we utilized interpretability methods based on feature importance, which allowed experts to scrutinize model behavior both at the global and local level, making it possible to gain further insights into why it did not behave as expected on the validation cohort. The knowledge gleaned upon derivation can be potentially useful to assist model update during validation for more generalizable and simpler models. We argue that interpretability methods should be considered by practitioners as a further tool to help explain performance differences and inform model update in validation studies.
Anomaly detection in process mining aims to recognize outlying or unexpected behavior in event logs for purposes such as the removal of noise and identification of conformance violations. Existing techniques for this task are primarily frequency-based, arguing that behavior is anomalous because it is uncommon. However, such techniques ignore the semantics of recorded events and, therefore, do not take the meaning of potential anomalies into consideration. In this work, we overcome this caveat and focus on the detection of anomalies from a semantic perspective, arguing that anomalies can be recognized when process behavior does not make sense. To achieve this, we propose an approach that exploits the natural language associated with events. Our key idea is to detect anomalous process behavior by identifying semantically inconsistent execution patterns. To detect such patterns, we first automatically extract business objects and actions from the textual labels of events. We then compare these against a process-independent knowledge base. By populating this knowledge base with patterns from various kinds of resources, our approach can be used in a range of contexts and domains. We demonstrate the capability of our approach to successfully detect semantic execution anomalies through an evaluation based on a set of real-world and synthetic event logs and show the complementary nature of semantics-based anomaly detection to existing frequency-based techniques.
ReadBouncer
(2022)
Motivation:
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications.
Results:
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.
Recent findings suggest a role of oxytocin on the tendency to spontaneously mimic the emotional facial expressions of others. Oxytocin-related increases of facial mimicry, however, seem to be dependent on contextual factors. Given previous literature showing that people preferentially mimic emotional expressions of individuals associated with high (vs. low) rewards, we examined whether the reward value of the mimicked agent is one factor influencing the oxytocin effects on facial mimicry. To test this hypothesis, 60 male adults received 24 IU of either intranasal oxytocin or placebo in a double-blind, between-subject experiment. Next, the value of male neutral faces was manipulated using an associative learning task with monetary rewards. After the reward associations were learned, participants watched videos of the same faces displaying happy and angry expressions. Facial reactions to the emotional expressions were measured with electromyography. We found that participants judged as more pleasant the face identities associated with high reward values than with low reward values. However, happy expressions by low rewarding faces were more spontaneously mimicked than high rewarding faces. Contrary to our expectations, we did not find a significant direct effect of intranasal oxytocin on facial mimicry, nor on the reward-driven modulation of mimicry. Our results support the notion that mimicry is a complex process that depends on contextual factors, but failed to provide conclusive evidence of a role of oxytocin on the modulation of facial mimicry.
Somatosensory input generated by one's actions (i.e., self-initiated body movements) is generally attenuated. Conversely, externally caused somatosensory input is enhanced, for example, during active touch and the haptic exploration of objects. Here, we used functional magnetic resonance imaging (fMRI) to ask how the brain accomplishes this delicate weighting of self-generated versus externally caused somatosensory components. Finger movements were either self-generated by our participants or induced by functional electrical stimulation (FES) of the same muscles. During half of the trials, electrotactile impulses were administered when the (actively or passively) moving finger reached a predefined flexion threshold. fMRI revealed an interaction effect in the contralateral posterior insular cortex (pIC), which responded more strongly to touch during self-generated than during FES-induced movements. A network analysis via dynamic causal modeling revealed that connectivity from the secondary somatosensory cortex via the pIC to the supplementary motor area was generally attenuated during self-generated relative to FES-induced movements-yet specifically enhanced by touch received during self-generated, but not FES-induced movements. Together, these results suggest a crucial role of the parietal operculum and the posterior insula in differentiating self-generated from externally caused somatosensory information received from one's moving limb.
Rigorous runtime analysis is a major approach towards understanding evolutionary computing techniques, and in this area linear pseudo-Boolean objective functions play a central role. Having an additional linear constraint is then equivalent to the NP-hard Knapsack problem, certain classes thereof have been studied in recent works. In this article, we present a dynamic model of optimizing linear functions under uniform constraints. Starting from an optimal solution with respect to a given constraint bound, we investigate the runtimes that different evolutionary algorithms need to recompute an optimal solution when the constraint bound changes by a certain amount. The classical (1+1) EA and several population-based algorithms are designed for that purpose, and are shown to recompute efficiently. Furthermore, a variant of the (1+(λ,λ))GA for the dynamic optimization problem is studied, whose performance is better when the change of the constraint bound is small.
ganon
(2020)
Motivation:
The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices.
Results:
Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification.
An independency (cliquy) tree of an n-vertex graph G is a spanning tree of G in which the set of leaves induces an independent set (clique). We study the problems of minimizing or maximizing the number of leaves of such trees, and fully characterize their parameterized complexity. We show that all four variants of deciding if an independency/cliquy tree with at least/most l leaves exists parameterized by l are either Para-NP- or W[1]-hard. We prove that minimizing the number of leaves of a cliquy tree parameterized by the number of internal vertices is Para-NP-hard too. However, we show that minimizing the number of leaves of an independency tree parameterized by the number k of internal vertices has an O*(4(k))-time algorithm and a 2k vertex kernel. Moreover, we prove that maximizing the number of leaves of an independency/cliquy tree parameterized by the number k of internal vertices both have an O*(18(k))-time algorithm and an O(k 2(k)) vertex kernel, but no polynomial kernel unless the polynomial hierarchy collapses to the third level. Finally, we present an O(3(n) . f(n))-time algorithm to find a spanning tree where the leaf set has a property that can be decided in f (n) time and has minimum or maximum size.
We present fully polynomial time approximation schemes for a broad class of Holant problems with complex edge weights, which we call Holant polynomials. We transform these problems into partition functions of abstract combinatorial structures known as polymers in statistical physics. Our method involves establishing zero-free regions for the partition functions of polymer models and using the most significant terms of the cluster expansion to approximate them. Results of our technique include new approximation and sampling algorithms for a diverse class of Holant polynomials in the low-temperature regime (i.e. small external field) and approximation algorithms for general Holant problems with small signature weights. Additionally, we give randomised approximation and sampling algorithms with faster running times for more restrictive classes. Finally, we improve the known zero-free regions for a perfect matching polynomial.
Patent document collections are an immense source of knowledge for research and innovation communities worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and analyzing information from this source in an effective manner. Based on deep learning methods for natural language processing, novel approaches have been developed in the field of patent analysis. The goal of these approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article, we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for future work.
In discrete manufacturing, the knowledge about causal relationships makes it possible to avoid unforeseen production downtimes by identifying their root causes. Learning causal structures from real-world settings remains challenging due to high-dimensional data, a mix of discrete and continuous variables, and requirements for preprocessing log data under the causal perspective. In our work, we address these challenges proposing a process for causal reasoning based on raw machine log data from production monitoring. Within this process, we define a set of transformation rules to extract independent and identically distributed observations. Further, we incorporate a variable selection step to handle high-dimensionality and a discretization step to include continuous variables. We enrich a commonly used causal structure learning algorithm with domain-related orientation rules, which provides a basis for causal reasoning. We demonstrate the process on a real-world dataset from a globally operating precision mechanical engineering company. The dataset contains over 40 million log data entries from production monitoring of a single machine. In this context, we determine the causal structures embedded in operational processes. Further, we examine causal effects to support machine operators in avoiding unforeseen production stops, i.e., by detaining machine operators from drawing false conclusions on impacting factors of unforeseen production stops based on experience.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
While supporting the execution of business processes, information systems record event logs. Conformance checking relies on these logs to analyze whether the recorded behavior of a process conforms to the behavior of a normative specification. A key assumption of existing conformance checking techniques, however, is that all events are associated with timestamps that allow to infer a total order of events per process instance. Unfortunately, this assumption is often violated in practice. Due to synchronization issues, manual event recordings, or data corruption, events are only partially ordered. In this paper, we put forward the problem of partial order resolution of event logs to close this gap. It refers to the construction of a probability distribution over all possible total orders of events of an instance. To cope with the order uncertainty in real-world data, we present several estimators for this task, incorporating different notions of behavioral abstraction. Moreover, to reduce the runtime of conformance checking based on partial order resolution, we introduce an approximation method that comes with a bounded error in terms of accuracy. Our experiments with real-world and synthetic data reveal that our approach improves accuracy over the state-of-the-art considerably.
This paper discusses the fitting of linear state space models to given multivariate time series in the presence of constraints imposed on the four main parameter matrices of these models. Constraints arise partly from the assumption that the models have a block-diagonal structure, with each block corresponding to an ARMA process, that allows the reconstruction of independent source components from linear mixtures, and partly from the need to keep models identifiable. The first stage of parameter fitting is performed by the expectation maximisation (EM) algorithm. Due to the identifiability constraint, a subset of the diagonal elements of the dynamical noise covariance matrix needs to be constrained to fixed values (usually unity). For this kind of constraints, so far, no closed-form update rules were available. We present new update rules for this situation, both for updating the dynamical noise covariance matrix directly and for updating a matrix square-root of this matrix. The practical applicability of the proposed algorithm is demonstrated by a low-dimensional simulation example. The behaviour of the EM algorithm, as observed in this example, illustrates the well-known fact that in practical applications, the EM algorithm should be combined with a different algorithm for numerical optimisation, such as a quasi-Newton algorithm.
Many participants in Massive Open Online Courses are full-time employees seeking greater flexibility in their time commitment and the available learning paths. We recently addressed these requirements by splitting up our 6-week courses into three 2-week modules followed by a separate exam. Modularizing courses offers many advantages: Shorter modules are more sustainable and can be combined, reused, and incorporated into learning paths more easily. Time flexibility for learners is also improved as exams can now be offered multiple times per year, while the learning content is available independently. In this article, we answer the question of which impact this modularization has on key learning metrics, such as course completion rates, learning success, and no-show rates. Furthermore, we investigate the influence of longer breaks between modules on these metrics. According to our analysis, course modules facilitate more selective learning behaviors that encourage learners to focus on topics they are the most interested in. At the same time, participation in overarching exams across all modules seems to be less appealing compared to an integrated exam of a 6-week course. While breaks between the modules increase the distinctive appearance of individual modules, a break before the final exam further reduces initial interest in the exams. We further reveal that participation in self-paced courses as a preparation for the final exam is unlikely to attract new learners to the course offerings, even though learners' performance is comparable to instructor-paced courses. The results of our long-term study on course modularization provide a solid foundation for future research and enable educators to make informed decisions about the design of their courses.
StudyMe
(2022)
N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene- based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for mis- sense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood- ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Piloting a Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools
(2022)
Artificial intelligence (AI) offers the potential to support healthcare delivery, but poorly trained or validated algorithms bear risks of harm. Ethical guidelines stated transparency about model development and validation as a requirement for trustworthy AI. Abundant guidance exists to provide transparency through reporting, but poorly reported medical AI tools are common. To close this transparency gap, we developed and piloted a framework to quantify the transparency of medical AI tools with three use cases. Our framework comprises a survey to report on the intended use, training and validation data and processes, ethical considerations, and deployment recommendations. The transparency of each response was scored with either 0, 0.5, or 1 to reflect if the requested information was not, partially, or fully provided. Additionally, we assessed on an analogous three-point scale if the provided responses fulfilled the transparency requirement for a set of trustworthiness criteria from ethical guidelines. The degree of transparency and trustworthiness was calculated on a scale from 0% to 100%. Our assessment of three medical AI use cases pin-pointed reporting gaps and resulted in transparency scores of 67% for two use cases and one with 59%. We report anecdotal evidence that business constraints and limited information from external datasets were major obstacles to providing transparency for the three use cases. The observed transparency gaps also lowered the degree of trustworthiness, indicating compliance gaps with ethical guidelines. All three pilot use cases faced challenges to provide transparency about medical AI tools, but more studies are needed to investigate those in the wider medical AI sector. Applying this framework for an external assessment of transparency may be infeasible if business constraints prevent the disclosure of information. New strategies may be necessary to enable audits of medical AI tools while preserving business secrets.
Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
Multiplicative Up-Drift
(2020)
Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a (1+delta)-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on 1/delta} (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in delta (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large delta and show stronger results for this setting.
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world(1). Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, includingPIEZO1on varicose veins,COL6A1on corneal resistance,MEPEon bone density, andIQGAP2andGMPRon blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenicBRCA1andBRCA2variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community. <br /> Exome sequences from the first 49,960 participants in the UK Biobank highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation
(2020)
We propose a new recurrent generative adversarial architecture named RNN-GAN to mitigate imbalance data problem in medical image semantic segmentation where the number of pixels belongs to the desired object are significantly lower than those belonging to the background. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low recall. To mitigate imbalanced training data impact, we train RNN-GAN with proposed complementary segmentation mask, in addition, ordinary segmentation masks. The RNN-GAN consists of two components: a generator and a discriminator. The generator is trained on the sequence of medical images to learn corresponding segmentation label map plus proposed complementary label both at a pixel level, while the discriminator is trained to distinguish a segmentation image coming from the ground truth or from the generator network. Both generator and discriminator substituted with bidirectional LSTM units to enhance temporal consistency and get inter and intra-slice representation of the features. We show evidence that the proposed framework is applicable to different types of medical images of varied sizes. In our experiments on ACDC-2017, HVSMR-2016, and LiTS-2017 benchmarks we find consistently improved results, demonstrating the efficacy of our approach.
Primary keys (PKs) and foreign keys (FKs) are important elements of relational schemata in various applications, such as query optimization and data integration. However, in many cases, these constraints are unknown or not documented. Detecting them manually is time-consuming and even infeasible in large-scale datasets. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. Several pruning rules are employed to speed up the procedure. We evaluate precision and recall on three benchmarks and two real-world datasets. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, and individual conversations emerge. However, the misuse by spammers, haters, and trolls makes costly content moderation necessary. Sentiment analysis can not only support moderation but also help to understand the dynamics of online discussions. A subtask of content moderation is the identification of toxic comments. To this end, we describe the concept of toxicity and characterize its subclasses. Further, we present various deep learning approaches, including datasets and architectures, tailored to sentiment analysis in online discussions. One way to make these approaches more comprehensible and trustworthy is fine-grained instead of binary comment classification. On the downside, more classes require more training data. Therefore, we propose to augment training data by using transfer learning. We discuss real-world applications, such as semi-automated comment moderation and troll detection. Finally, we outline future challenges and current limitations in light of most recent research publications.
Rapid decline of glomerular filtration rate estimated from creatinine (eGFRcrea) is associated with severe clinical endpoints. In contrast to cross-sectionally assessed eGFRcrea, the genetic basis for rapid eGFRcrea decline is largely unknown. To help define this, we meta-analyzed 42 genome-wide association studies from the Chronic Kidney Diseases Genetics Consortium and United Kingdom Biobank to identify genetic loci for rapid eGFRcrea decline. Two definitions of eGFRcrea decline were used: 3 mL/min/1.73m(2)/year or more ("Rapid3"; encompassing 34,874 cases, 107,090 controls) and eGFRcrea decline 25% or more and eGFRcrea under 60 mL/min/1.73m(2) at follow-up among those with eGFRcrea 60 mL/min/1.73m(2) or more at baseline ("CKDi25"; encompassing 19,901 cases, 175,244 controls). Seven independent variants were identified across six loci for Rapid3 and/or CKDi25: consisting of five variants at four loci with genome-wide significance (near UMOD-PDILT (2), PRKAG2, WDR72, OR2S2) and two variants among 265 known eGFRcrea variants (near GATM, LARP4B). All these loci were novel for Rapid3 and/or CKDi25 and our bioinformatic follow-up prioritized variants and genes underneath these loci. The OR2S2 locus is novel for any eGFRcrea trait including interesting candidates. For the five genome-wide significant lead variants, we found supporting effects for annual change in blood urea nitrogen or cystatin-based eGFR, but not for GATM or (LARP4B). Individuals at high compared to those at low genetic risk (8-14 vs. 0-5 adverse alleles) had a 1.20-fold increased risk of acute kidney injury (95% confidence interval 1.08-1.33). Thus, our identified loci for rapid kidney function decline may help prioritize therapeutic targets and identify mechanisms and individuals at risk for sustained deterioration of kidney function.
The 2020 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers’ meeting was held from January 13th to January 17th 2020 in Nyborg, Denmark. Among the participants were scientists as well as developers working in the field of computational mass spectrometry (MS) and proteomics. The 4-day program was split between introductory keynote lectures and parallel hackathon sessions. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts, and to actively contribute to highly relevant research projects. We successfully produced several new tools that will be useful to the proteomics community by improving data analysis as well as facilitating future research. All keynote recordings are available on https://doi.org/10.5281/zenodo.3890181.
Social networking sites (SNS) are a rich source of latent information about individual characteristics. Crawling and analyzing this content provides a new approach for enterprises to personalize services and put forward product recommendations. In the past few years, commercial brands made a gradual appearance on social media platforms for advertisement, customers support and public relation purposes and by now it became a necessity throughout all branches. This online identity can be represented as a brand personality that reflects how a brand is perceived by its customers. We exploited recent research in text analysis and personality detection to build an automatic brand personality prediction model on top of the (Five-Factor Model) and (Linguistic Inquiry and Word Count) features extracted from publicly available benchmarks. Predictive evaluation on brands' accounts reveals that Facebook platform provides a slight advantage over Twitter platform in offering more self-disclosure for users' to express their emotions especially their demographic and psychological traits. Results also confirm the wider perspective that the same social media account carry a quite similar and comparable personality scores over different social media platforms. For evaluating our prediction results on actual brands' accounts, we crawled the Facebook API and Twitter API respectively for 100k posts from the most valuable brands' pages in the USA and we visualize exemplars of comparison results and present suggestions for future directions.
Viper
(2021)
Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
Embedded real-time systems generate state sequences where time elapses between state changes. Ensuring that such systems adhere to a provided specification of admissible or desired behavior is essential. Formal model-based testing is often a suitable cost-effective approach. We introduce an extended version of the formalism of symbolic graphs, which encompasses types as well as attributes, for representing states of dynamic systems. Relying on this extension of symbolic graphs, we present a novel formalism of timed graph transformation systems (TGTSs) that supports the model-based development of dynamic real-time systems at an abstract level where possible state changes and delays are specified by graph transformation rules. We then introduce an extended form of the metric temporal graph logic (MTGL) with increased expressiveness to improve the applicability of MTGL for the specification of timed graph sequences generated by a TGTS. Based on the metric temporal operators of MTGL and its built-in graph binding mechanics, we express properties on the structure and attributes of graphs as well as on the occurrence of graphs over time that are related by their inner structure. We provide formal support for checking whether a single generated timed graph sequence adheres to a provided MTGL specification. Relying on this logical foundation, we develop a testing framework for TGTSs that are specified using MTGL. Lastly, we apply this testing framework to a running example by using our prototypical implementation in the tool AutoGraph.
Web-based E-Learning uses Internet technologies and digital media to deliver education content to learners. Many universities in recent years apply their capacity in producing Massive Open Online Courses (MOOCs). They have been offering MOOCs with an expectation of rendering a comprehensive online apprenticeship. Typically, an online content delivery process requires an Internet connection. However, access to the broadband has never been a readily available resource in many regions. In Africa, poor and no networks are yet predominantly experienced by Internet users, frequently causing offline each moment a digital device disconnect from a network. As a result, a learning process is always disrupted, delayed and terminated in such regions. This paper raises the concern of E-Learning in poor and low bandwidths, in fact, it highlights the needs for an Offline-Enabled mode. The paper also explores technical approaches beamed to enhance the user experience inWeb-based E-Learning, particular in Africa.
In 1997, Henry Lieberman stated that debugging is the dirty little secret of computer science. Since then, several promising debugging technologies have been developed such as back-in-time debuggers and automatic fault localization methods. However, the last study about the state-of-the-art in debugging is still more than 15 years old and so it is not clear whether these new approaches have been applied in practice or not. For that reason, we investigate the current state of debugging in a comprehensive study. First, we review the available literature and learn about current approaches and study results. Second, we observe several professional developers while debugging and interview them about their experiences. Third, we create a questionnaire that serves as the basis for a larger online debugging survey. Based on these results, we present new insights into debugging practice that help to suggest new directions for future research.
Which event happened first?
(2021)
First come, first served: Critical choices between alternative actions are often made based on events external to an organization, and reacting promptly to their occurrence can be a major advantage over the competition. In Business Process Management (BPM), such deferred choices can be expressed in process models, and they are an important aspect of process engines. Blockchain-based process execution approaches are no exception to this, but are severely limited by the inherent properties of the platform: The isolated environment prevents direct access to external entities and data, and the non-continual runtime based entirely on atomic transactions impedes the monitoring and detection of events. In this paper we provide an in-depth examination of the semantics of deferred choice, and transfer them to environments such as the blockchain. We introduce and compare several oracle architectures able to satisfy certain requirements, and show that they can be implemented using state-of-the-art blockchain technology.
Comprior
(2021)
Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.
Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.
Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
In this paper, we analyze stochastic dynamic pricing and advertising differential games in special oligopoly markets with constant price and advertising elasticity. We consider the sale of perishable as well as durable goods and include adoption effects in the demand. Based on a unique stochastic feedback Nash equilibrium, we derive closed-form solution formulas of the value functions and the optimal feedback policies of all competing firms. Efficient simulation techniques are used to evaluate optimally controlled sales processes over time. This way, the evolution of optimal controls as well as the firms’ profit distributions are analyzed. Moreover, we are able to compare feedback solutions of the stochastic model with its deterministic counterpart. We show that the market power of the competing firms is exactly the same as in the deterministic version of the model. Further, we discover two fundamental effects that determine the relation between both models. First, the volatility in demand results in a decline of expected profits compared to the deterministic model. Second, we find that saturation effects in demand have an opposite character. We show that the second effect can be strong enough to either exactly balance or even overcompensate the first one. As a result we are able to identify cases in which feedback solutions of the deterministic model provide useful approximations of solutions of the stochastic model.
In the course of patient treatments, psychotherapists aim to meet the challenges of being both a trusted, knowledgeable conversation partner and a diligent documentalist. We are developing the digital whiteboard system Tele-Board MED (TBM), which allows the therapist to take digital notes during the session together with the patient. This study investigates what therapists are experiencing when they document with TBM in patient sessions for the first time and whether this documentation saves them time when writing official clinical documents. As the core of this study, we conducted four anamnesis session dialogues with behavior psychotherapists and volunteers acting in the role of patients. Following a mixed-method approach, the data collection and analysis involved self-reported emotion samples, user experience curves and questionnaires. We found that even in the very first patient session with TBM, therapists come to feel comfortable, develop a positive feeling and can concentrate on the patient. Regarding administrative documentation tasks, we found with the TBM report generation feature the therapists save 60% of the time they normally spend on writing case reports to the health insurance.
The classification of vulnerabilities is a fundamental step to derive formal attributes that allow a deeper analysis. Therefore, it is required that this classification has to be performed timely and accurate. Since the current situation demands a manual interaction in the classification process, the timely processing becomes a serious issue. Thus, we propose an automated alternative to the manual classification, because the amount of identified vulnerabilities per day cannot be processed manually anymore. We implemented two different approaches that are able to automatically classify vulnerabilities based on the vulnerability description. We evaluated our approaches, which use Neural Networks and the Naive Bayes methods respectively, on the base of publicly known vulnerabilities.
Business process simulation is an important means for quantitative analysis of a business process and to compare different process alternatives. With the Business Process Model and Notation (BPMN) being the state-of-the-art language for the graphical representation of business processes, many existing process simulators support already the simulation of BPMN diagrams. However, they do not provide well-defined interfaces to integrate new concepts in the simulation environment. In this work, we present the design and architecture of a proof-of-concept implementation of an open and extensible BPMN process simulator. It also supports the simulation of multiple BPMN processes at a time and relies on the building blocks of the well-founded discrete event simulation. The extensibility is assured by a plug-in concept. Its feasibility is demonstrated by extensions supporting new BPMN concepts, such as the simulation of business rule activities referencing decision models and batch activities.
In university teaching today, it is common practice to record regular lectures and special events such as conferences and speeches. With these recordings, a large fundus of video teaching material can be created quickly and easily. Typically, lectures have a length of about one and a half hours and usually take place once or twice a week based on the credit hours. Depending on the number of lectures and other events recorded, the number of recordings available is increasing rapidly, which means that an appropriate form of provisioning is essential for the students. This is usually done in the form of lecture video platforms. In this work, we have investigated how lecture video platforms and the contained knowledge can be improved and accessed more easily by an increasing number of students. We came up with a multistep process we have applied to our own lecture video web portal that can be applied to other solutions as well.
Embedded smart home — remote lab MOOC with optional real hardware experience for over 4000 students
(2018)
MOOCs (Massive Open Online Courses) become more and more popular for learners of all ages to study further or to learn new subjects of interest. The purpose of this paper is to introduce a different MOOC course style. Typically, video content is shown teaching the student new information. After watching a video, self-test questions can be answered. Finally, the student answers weekly exams and final exams like the self test questions. Out of the points that have been scored for weekly and final exams a certificate can be issued. Our approach extends the possibility to receive points for the final score with practical programming exercises on real hardware. It allows the student to do embedded programming by communicating over GPIO pins to control LEDs and measure sensor values. Additionally, they can visualize values on an embedded display using web technologies, which are an essential part of embedded and smart home devices to communicate with common APIs. Students have the opportunity to solve all tasks within the online remote lab and at home on the same kind of hardware. The evaluation of this MOOCs indicates the interesting design for students to learn an engineering technique with new technology approaches in an appropriate, modern, supporting and motivating way of teaching.
When students watch learning videos online, they usually need to watch several hours of video content. In the end, not every minute of a video is relevant for the exam. Additionally, students need to add notes to clarify issues of a lecture. There are several possibilities to enhance the metadata of a video, e.g. a typical way to add user-specific information to an online video is a comment functionality, which allows users to share their thoughts and questions with the public. In contrast to common video material which can be found online, lecture videos are used for exam preparation. Due to this difference, the idea comes up to annotate lecture videos with markers and personal notes for a better understanding of the taught content. Especially, students learning for an exam use their notes to refresh their memories. To ease this learning method with lecture videos, we introduce the annotation feature in our video lecture archive. This functionality supports the students with keeping track of their thoughts by providing an intuitive interface to easily add, modify or remove their ideas. This annotation function is integrated in the video player. Hence, scrolling to a separate annotation area on the website is not necessary. Furthermore, the annotated notes can be exported together with the slide content to a PDF file, which can then be printed easily. Lecture video annotations support and motivate students to learn and watch videos from an E-Learning video archive.
An efficient Design Space Exploration (DSE) is imperative for the design of modern, highly complex embedded systems in order to steer the development towards optimal design points. The early evaluation of design decisions at system-level abstraction layer helps to find promising regions for subsequent development steps in lower abstraction levels by diminishing the complexity of the search problem. In recent works, symbolic techniques, especially Answer Set Programming (ASP) modulo Theories (ASPmT), have been shown to find feasible solutions of highly complex system-level synthesis problems with non-linear constraints very efficiently. In this paper, we present a novel approach to a holistic system-level DSE based on ASPmT. To this end, we include additional background theories that concurrently guarantee compliance with hard constraints and perform the simultaneous optimization of several design objectives. We implement and compare our approach with a state-of-the-art preference handling framework for ASP. Experimental results indicate that our proposed method produces better solutions with respect to both diversity and convergence to the true Pareto front.
Operational decisions in business processes can be modeled by using the Decision Model and Notation (DMN). The complementary use of DMN for decision modeling and of the Business Process Model and Notation (BPMN) for process design realizes the separation of concerns principle. For supporting separation of concerns during the design phase, it is crucial to understand which aspects of decision-making enclosed in a process model should be captured by a dedicated decision model. Whereas existing work focuses on the extraction of decision models from process control flow, the connection of process-related data and decision models is still unexplored. In this paper, we investigate how process-related data used for making decisions can be represented in process models and we distinguish a set of BPMN patterns capturing such information. Then, we provide a formal mapping of the identified BPMN patterns to corresponding DMN models and apply our approach to a real-world healthcare process.
Making the domain tangible
(2017)
Programmers collaborate continuously with domain experts to explore the problem space and to shape a solution that fits the users’ needs. In doing so, all parties develop a shared vocabulary, which is above all a list of named concepts and their relationships to each other. Nowadays, many programmers favor object-oriented programming because it allows them to directly represent real-world concepts and interactions from the vocabulary as code. However, when existing domain data is not yet represented as objects, it becomes a challenge to initially bring existing domain data into object-oriented systems and to keep the source code readable. While source code might be comprehensible to programmers, domain experts can struggle, given their non-programming background. We present a new approach to provide a mapping of existing data sources into the object-oriented programming environment. We support keeping the code of the domain model compact and readable while adding implicit means to access external information as internal domain objects. This should encourage programmers to explore different ways to build the software system quickly. Eventually, our approach fosters communication with the domain experts, especially at the beginning of a project. When the details in the problem space are not yet clear, the source code provides a valuable, tangible communication artifact.
Design thinking is acknowledged as a thriving innovation practice plus something more, something in the line of a deep understanding of innovation processes. At the same time, quite how and why design thinking works-in scientific terms-appeared an open question at first. Over recent years, empirical research has achieved great progress in illuminating the principles that make design thinking successful. Lately, the community began to explore an additional approach. Rather than setting up novel studies, investigations into the history of design thinking hold the promise of adding systematically to our comprehension of basic principles. This chapter makes a start in revisiting design thinking history with the aim of explicating scientific understandings that inform design thinking practices today. It offers a summary of creative thinking theories that were brought to Stanford Engineering in the 1950s by John E. Arnold.
Modern server systems with large NUMA architectures necessitate (i) data being distributed over the available computing nodes and (ii) NUMA-aware query processing to enable effective parallel processing in database systems. As these architectures incur significant latency and throughout penalties for accessing non-local data, queries should be executed as close as possible to the data. To further increase both performance and efficiency, data that is not relevant for the query result should be skipped as early as possible. One way to achieve this goal is horizontal partitioning to improve static partition pruning. As part of our ongoing work on workload-driven partitioning, we have implemented a recent approach called aggressive data skipping and extended it to handle both analytical as well as transactional access patterns. In this paper, we evaluate this approach with the workload and data of a production enterprise system of a Global 2000 company. The results show that over 80% of all tuples can be skipped in average while the resulting partitioning schemata are surprisingly stable over time.
Logical modeling has been widely used to understand and expand the knowledge about protein interactions among different pathways. Realizing this, the caspo-ts system has been proposed recently to learn logical models from time series data. It uses Answer Set Programming to enumerate Boolean Networks (BNs) given prior knowledge networks and phosphoproteomic time series data. In the resulting sequence of solutions, similar BNs are typically clustered together. This can be problematic for large scale problems where we cannot explore the whole solution space in reasonable time. Our approach extends the caspo-ts system to cope with the important use case of finding diverse solutions of a problem with a large number of solutions. We first present the algorithm for finding diverse solutions and then we demonstrate the results of the proposed approach on two different benchmark scenarios in systems biology: (1) an artificial dataset to model TCR signaling and (2) the HPN-DREAM challenge dataset to model breast cancer cell lines.
Metamaterial Devices
(2018)
In our hands-on demonstration, we show several objects, the functionality of which is defined by the objects' internal micro-structure. Such metamaterial machines can (1) be mechanisms based on their microstructures, (2) employ simple mechanical computation, or (3) change their outside to interact with their environment. They are 3D printed from one piece and we support their creating by providing interactive software tools.
Cloud storage brokerage is an abstraction aimed at providing value-added services. However, Cloud Service Brokers are challenged by several security issues including enlarged attack surfaces due to integration of disparate components and API interoperability issues. Therefore, appropriate security risk assessment methods are required to identify and evaluate these security issues, and examine the efficiency of countermeasures. A possible approach for satisfying these requirements is employment of threat modeling concepts, which have been successfully applied in traditional paradigms. In this work, we employ threat models including attack trees, attack graphs and Data Flow Diagrams against a Cloud Service Broker (CloudRAID) and analyze these security threats and risks. Furthermore, we propose an innovative technique for combining Common Vulnerability Scoring System (CVSS) and Common Configuration Scoring System (CCSS) base scores in probabilistic attack graphs to cater for configuration-based vulnerabilities which are typically leveraged for attacking cloud storage systems. This approach is necessary since existing schemes do not provide sufficient security metrics, which are imperatives for comprehensive risk assessments. We demonstrate the efficiency of our proposal by devising CCSS base scores for two common attacks against cloud storage: Cloud Storage Enumeration Attack and Cloud Storage Exploitation Attack. These metrics are then used in Attack Graph Metric-based risk assessment. Our experimental evaluation shows that our approach caters for the aforementioned gaps and provides efficient security hardening options. Therefore, our proposals can be employed to improve cloud security.
The problem of constructing and maintaining a tree topology in a distributed manner is a challenging task in WSNs. This is because the nodes have limited computational and memory resources and the network changes over time. We propose the Dynamic Gallager-Humblet-Spira (D-GHS) algorithm that builds and maintains a minimum spanning tree. To do so, we divide D-GHS into four phases, namely neighbor discovery, tree construction, data collection, and tree maintenance. In the neighbor discovery phase, the nodes collect information about their neighbors and the link quality. In the tree construction, D-GHS finds the minimum spanning tree by executing the Gallager-Humblet-Spira algorithm. In the data collection phase, the sink roots the minimum spanning tree at itself, and each node sends data packets. In the tree maintenance phase, the nodes repair the tree when communication failures occur. The emulation results show that D-GHS reduces the number of control messages and the energy consumption, at the cost of a slight increase in memory size and convergence time.
An energy consumption model for multiModal wireless sensor networks based on wake-up radio receivers
(2018)
Energy consumption is a major concern in Wireless Sensor Networks. A significant waste of energy occurs due to the idle listening and overhearing problems, which are typically avoided by turning off the radio, while no transmission is ongoing. The classical approach for allowing the reception of messages in such situations is to use a low-duty-cycle protocol, and to turn on the radio periodically, which reduces the idle listening problem, but requires timers and usually unnecessary wakeups. A better solution is to turn on the radio only on demand by using a Wake-up Radio Receiver (WuRx). In this paper, an energy model is presented to estimate the energy saving in various multi-hop network topologies under several use cases, when a WuRx is used instead of a classical low-duty-cycling protocol. The presented model also allows for estimating the benefit of various WuRx properties like using addressing or not.
Scrum2kanban
(2018)
Using university capstone courses to teach agile software development methodologies has become commonplace, as agile methods have gained support in professional software development. This usually means students are introduced to and work with the currently most popular agile methodology: Scrum. However, as the agile methods employed in the industry change and are adapted to different contexts, university courses must follow suit. A prime example of this is the Kanban method, which has recently gathered attention in the industry. In this paper, we describe a capstone course design, which adds the hands-on learning of the lean principles advocated by Kanban into a capstone project run with Scrum. This both ensures that students are aware of recent process frameworks and ideas as well as gain a more thorough overview of how agile methods can be employed in practice. We describe the details of the course and analyze the participating students' perceptions as well as our observations. We analyze the development artifacts, created by students during the course in respect to the two different development methodologies. We further present a summary of the lessons learned as well as recommendations for future similar courses. The survey conducted at the end of the course revealed an overwhelmingly positive attitude of students towards the integration of Kanban into the course.
802.15.4 security protects against the replay, injection, and eavesdropping of 802.15.4 frames. A core concept of 802.15.4 security is the use of frame counters for both nonce generation and anti-replay protection. While being functional, frame counters (i) cause an increased energy consumption as they incur a per-frame overhead of 4 bytes and (ii) only provide sequential freshness. The Last Bits (LB) optimization does reduce the per-frame overhead of frame counters, yet at the cost of an increased RAM consumption and occasional energy-and time-consuming resynchronization actions. Alternatively, the timeslotted channel hopping (TSCH) media access control (MAC) protocol of 802.15.4 avoids the drawbacks of frame counters by replacing them with timeslot indices, but findings of Yang et al. question the security of TSCH in general. In this paper, we assume the use of ContikiMAC, which is a popular asynchronous MAC protocol for 802.15.4 networks. Under this assumption, we propose an Intra-Layer Optimization for 802.15.4 Security (ILOS), which intertwines 802.15.4 security and ContikiMAC. In effect, ILOS reduces the security-related per-frame overhead even more than the LB optimization, as well as achieves strong freshness. Furthermore, unlike the LB optimization, ILOS neither incurs an increased RAM consumption nor requires resynchronization actions. Beyond that, ILOS integrates with and advances other security supplements to ContikiMAC. We implemented ILOS using OpenMotes and the Contiki operating system.
CurEx
(2018)
The integration of diverse structured and unstructured information sources into a unified, domain-specific knowledge base is an important task in many areas. A well-maintained knowledge base enables data analysis in complex scenarios, such as risk analysis in the financial sector or investigating large data leaks, such as the Paradise or Panama papers. Both the creation of such knowledge bases, as well as their continuous maintenance and curation involves many complex tasks and considerable manual effort. With CurEx, we present a modular system that allows structured and unstructured data sources to be integrated into a domain-specific knowledge base. In particular, we (i) enable the incremental improvement of each individual integration component; (ii) enable the selective generation of multiple knowledge graphs from the information contained in the knowledge base; and (iii) provide two distinct user interfaces tailored to the needs of data engineers and end-users respectively. The former has curation capabilities and controls the integration process, whereas the latter focuses on the exploration of the generated knowledge graph.
Beacon in the Dark
(2018)
The large amount of heterogeneous data in these email corpora renders experts' investigations by hand infeasible. Auditors or journalists, e.g., who are looking for irregular or inappropriate content or suspicious patterns, are in desperate need for computer-aided exploration tools to support their investigations.
We present our Beacon system for the exploration of such corpora at different levels of detail. A distributed processing pipeline combines text mining methods and social network analysis to augment the already semi-structured nature of emails. The user interface ties into the resulting cleaned and enriched dataset. For the interface design we identify three objectives expert users have: gain an initial overview of the data to identify leads to investigate, understand the context of the information at hand, and have meaningful filters to iteratively focus onto a subset of emails. To this end we make use of interactive visualisations based on rearranged and aggregated extracted information to reveal salient patterns.
The detection of all inclusion dependencies (INDs) in an unknown dataset is at the core of any data profiling effort. Apart from the discovery of foreign key relationships, INDs can help perform data integration, integrity checking, schema (re-)design, and query optimization. With the advent of Big Data, the demand increases for efficient INDs discovery algorithms that can scale with the input data size. To this end, we propose S-INDD++ as a scalable system for detecting unary INDs in large datasets. S-INDD++ applies a new stepwise partitioning technique that helps discard a large number of attributes in early phases of the detection by processing the first partitions of smaller sizes. S-INDD++ also extends the concept of the attribute clustering to decide which attributes to be discarded based on the clustering result of each partition. Moreover, in contrast to the state-of-the-art, S-INDD++ does not require the partition to fit into the main memory-which is a highly appreciable property in the face of the ever growing datasets. We conducted an exhaustive evaluation of S-INDD++ by applying it to large datasets with thousands attributes and more than 266 million tuples. The results show the high superiority of S-INDD++ over the state-of-the-art. S-INDD++ reduced up to 50 % of the runtime in comparison with BINDER, and up to 98 % in comparison with S-INDD.
One particular challenge in the Internet of Things is the management of many heterogeneous things. The things are typically constrained devices with limited memory, power, network and processing capacity. Configuring every device manually is a tedious task. We propose an interoperable way to configure an IoT network automatically using existing standards. The proposed NETCONF-MQTT bridge intermediates between the constrained devices (speaking MQTT) and the network management standard NETCONF. The NETCONF-MQTT bridge generates dynamically YANG data models from the semantic description of the device capabilities based on the oneM2M ontology. We evaluate the approach for two use cases, i.e. describing an actuator and a sensor scenario.
Live migration is an important feature in modern software-defined datacenters and cloud computing environments. Dynamic resource management, load balance, power saving and fault tolerance are all dependent on the live migration feature. Despite the importance of live migration, the cost of live migration cannot be ignored and may result in service availability degradation. Live migration cost includes the migration time, downtime, CPU overhead, network and power consumption. There are many research articles that discuss the problem of live migration cost with different scopes like analyzing the cost and relate it to the parameters that control it, proposing new migration algorithms that minimize the cost and also predicting the migration cost. For the best of our knowledge, most of the papers that discuss the migration cost problem focus on open source hypervisors. For the research articles focus on VMware environments, none of the published articles proposed migration time, network overhead and power consumption modeling for single and multiple VMs live migration. In this paper, we propose empirical models for the live migration time, network overhead and power consumption for single and multiple VMs migration. The proposed models are obtained using a VMware based testbed.
For the last ten years, almost every theoretical result concerning the expected run time of a randomized search heuristic used drift theory, making it the arguably most important tool in this domain. Its success is due to its ease of use and its powerful result: drift theory allows the user to derive bounds on the expected first-hitting time of a random process by bounding expected local changes of the process - the drift. This is usually far easier than bounding the expected first-hitting time directly. Due to the widespread use of drift theory, it is of utmost importance to have the best drift theorems possible. We improve the fundamental additive, multiplicative, and variable drift theorems by stating them in a form as general as possible and providing examples of why the restrictions we keep are still necessary. Our additive drift theorem for upper bounds only requires the process to be nonnegative, that is, we remove unnecessary restrictions like a finite, discrete, or bounded search space. As corollaries, the same is true for our upper bounds in the case of variable and multiplicative drift.
One of the most important aspects of a randomized algorithm is bounding its expected run time on various problems. Formally speaking, this means bounding the expected first-hitting time of a random process. The two arguably most popular tools to do so are the fitness level method and drift theory. The fitness level method considers arbitrary transition probabilities but only allows the process to move toward the goal. On the other hand, drift theory allows the process to move into any direction as long as it move closer to the goal in expectation; however, this tendency has to be monotone and, thus, the transition probabilities cannot be arbitrary. We provide a result that combines the benefit of these two approaches: our result gives a lower and an upper bound for the expected first-hitting time of a random process over {0,..., n} that is allowed to move forward and backward by 1 and can use arbitrary transition probabilities. In case that the transition probabilities are known, our bounds coincide and yield the exact value of the expected first-hitting time. Further, we also state the stationary distribution as well as the mixing time of a special case of our scenario.
For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation. First, the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts). Second, the role and realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had a surprisingly little share in this work. We analyze a simple crossover operator in combination with local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); the resulting algorithm is denoted Concatenation Crossover GP. For this purpose three variants of the wellstudied Majority test function with large plateaus are considered. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control.
High-throughput RNA sequencing (RNAseq) produces large data sets containing expression levels of thousands of genes. The analysis of RNAseq data leads to a better understanding of gene functions and interactions, which eventually helps to study diseases like cancer and develop effective treatments. Large-scale RNAseq expression studies on cancer comprise samples from multiple cancer types and aim to identify their distinct molecular characteristics. Analyzing samples from different cancer types implies analyzing samples from different tissue origin. Such multi-tissue RNAseq data sets require a meaningful analysis that accounts for the inherent tissue-related bias: The identified characteristics must not originate from the differences in tissue types, but from the actual differences in cancer types. However, current analysis procedures do not incorporate that aspect. As a result, we propose to integrate a tissue-awareness into the analysis of multi-tissue RNAseq data. We introduce an extension for gene selection that provides a tissue-wise context for every gene and can be flexibly combined with any existing gene selection approach. We suggest to expand conventional evaluation by additional metrics that are sensitive to the tissue-related bias. Evaluations show that especially low complexity gene selection approaches profit from introducing tissue-awareness.
User-generated content on social media platforms is a rich source of latent information about individual variables. Crawling and analyzing this content provides a new approach for enterprises to personalize services and put forward product recommendations. In the past few years, brands made a gradual appearance on social media platforms for advertisement, customers support and public relation purposes and by now it became a necessity throughout all branches. This online identity can be represented as a brand personality that reflects how a brand is perceived by its customers. We exploited recent research in text analysis and personality detection to build an automatic brand personality prediction model on top of the (Five-Factor Model) and (Linguistic Inquiry and Word Count) features extracted from publicly available benchmarks. The proposed model reported significant accuracy in predicting specific personality traits form brands. For evaluating our prediction results on actual brands, we crawled the Facebook API for 100k posts from the most valuable brands' pages in the USA and we visualize exemplars of comparison results and present suggestions for future directions.
This paper investigates the applicability of CMOS decoupling cells for mitigating the Single Event Transient (SET) effects in standard combinational gates. The concept is based on the insertion of two decoupling cells between the gate's output and the power/ground terminals. To verify the proposed hardening approach, extensive SPICE simulations have been performed with standard combinational cells designed in IHP's 130 nm bulk CMOS technology. Obtained simulation results have shown that the insertion of decoupling cells results in the increase of the gate's critical charge, thus reducing the gate's soft error rate (SER). Moreover, the decoupling cells facilitate the suppression of SET pulses propagating through the gate. It has been shown that the decoupling cells may be a competitive alternative to gate upsizing and gate duplication for hardening the gates with lower critical charge and multiple (3 or 4) inputs, as well as for filtering the short SET pulses induced by low-LET particles.
In rural/remote areas, resource constrained smart micro-grid (RCSMG) architectures can provide a cost-effective power supply alternative in cases when connectivity to the national power grid is impeded by factors such as load shedding. RCSMG architectures can be designed to handle communications over a distributed lossy network in order to minimise operation costs. However, due to the unreliable nature of lossy networks communication data can be distorted by noise additions that alter the veracity of the data. In this chapter, we consider cases in which an adversary who is internal to the RCSMG, deliberately distorts communicated data to gain an unfair advantage over the RCSMG’s users. The adversary’s goal is to mask malicious data manipulations as distortions due to additive noise due to communication channel unreliability. Distinguishing malicious data distortions from benign distortions is important in ensuring trustworthiness of the RCSMG. Perturbation data anonymisation algorithms can be used to alter transmitted data to ensure that adversarial manipulation of the data reveals no information that the adversary can take advantage of. However, because existing data perturbation anonymisation algorithms operate by using additive noise to anonymise data, using these algorithms in the RCSMG context is challenging. This is due to the fact that distinguishing benign noise additions from malicious noise additions is a difficult problem. In this chapter, we present a brief survey of cases of privacy violations due to inferences drawn from observed power consumption patterns in RCSMGs centred on inference, and propose a method of mitigating these risks. The lesson here is that while RCSMGs give users more control over power management and distribution, good anonymisation is essential to protecting personal information on RCSMGs.
In this chapter, we provide a framework to specify how cheating attacks can be conducted successfully on power marketing schemes in resource constrained smart micro-grids. This is an important problem because such cheating attacks can destabilise and in the worst case result in a breakdown of the micro-grid. We consider three aspects, in relation to modelling cheating attacks on power auctioning schemes. First, we aim to specify exactly how in spite of the resource constrained character of the micro-grid, cheating can be conducted successfully. Second, we consider how mitigations can be modelled to prevent cheating, and third, we discuss methods of maintaining grid stability and reliability even in the presence of cheating attacks. We use an Automated-Cheating-Attack (ACA) conception to build a taxonomy of cheating attacks based on the idea of adversarial acquisition of surplus energy. Adversarial acquisitions of surplus energy allow malicious users to pay less for access to more power than the quota allowed for the price paid. The impact on honest users, is the lack of an adequate supply of energy to meet power demand requests. We conclude with a discussion of the performance overhead of provoking, detecting, and mitigating such attacks efficiently.
Resource constrained smart micro-grid architectures describe a class of smart micro-grid architectures that handle communications operations over a lossy network and depend on a distributed collection of power generation and storage units. Disadvantaged communities with no or intermittent access to national power networks can benefit from such a micro-grid model by using low cost communication devices to coordinate the power generation, consumption, and storage. Furthermore, this solution is both cost-effective and environmentally-friendly. One model for such micro-grids, is for users to agree to coordinate a power sharing scheme in which individual generator owners sell excess unused power to users wanting access to power. Since the micro-grid relies on distributed renewable energy generation sources which are variable and only partly predictable, coordinating micro-grid operations with distributed algorithms is necessity for grid stability. Grid stability is crucial in retaining user trust in the dependability of the micro-grid, and user participation in the power sharing scheme, because user withdrawals can cause the grid to breakdown which is undesirable. In this chapter, we present a distributed architecture for fair power distribution and billing on microgrids. The architecture is designed to operate efficiently over a lossy communication network, which is an advantage for disadvantaged communities. We build on the architecture to discuss grid coordination notably how tasks such as metering, power resource allocation, forecasting, and scheduling can be handled. All four tasks are managed by a feedback control loop that monitors the performance and behaviour of the micro-grid, and based on historical data makes decisions to ensure the smooth operation of the grid. Finally, since lossy networks are undependable, differentiating system failures from adversarial manipulations is an important consideration for grid stability. We therefore provide a characterisation of potential adversarial models and discuss possible mitigation measures.
Studies indicate that reliable access to power is an important enabler for economic growth. To this end, modern energy management systems have seen a shift from reliance on time-consuming manual procedures , to highly automated management , with current energy provisioning systems being run as cyber-physical systems . Operating energy grids as a cyber-physical system offers the advantage of increased reliability and dependability , but also raises issues of security and privacy. In this chapter, we provide an overview of the contents of this book showing the interrelation between the topics of the chapters in terms of smart energy provisioning. We begin by discussing the concept of smart-grids in general, proceeding to narrow our focus to smart micro-grids in particular. Lossy networks also provide an interesting framework for enabling the implementation of smart micro-grids in remote/rural areas, where deploying standard smart grids is economically and structurally infeasible. To this end, we consider an architectural design for a smart micro-grid suited to low-processing capable devices. We model malicious behaviour, and propose mitigation measures based properties to distinguish normal from malicious behaviour .
Power Systems
(2018)
Studies indicate that reliable access to power is an important enabler for economic growth. To this end, modern energy management systems have seen a shift from reliance on time-consuming manual procedures, to highly automated management, with current energy provisioning systems being run as cyber-physical systems. Operating energy grids as a cyber-physical system offers the advantage of increased reliability and dependability, but also raises issues of security and privacy. In this chapter, we provide an overview of the contents of this book showing the interrelation between the topics of the chapters in terms of smart energy provisioning. We begin by discussing the concept of smart-grids in general, proceeding to narrow our focus to smart micro-grids in particular. Lossy networks also provide an interesting framework for enabling the implementation of smart micro-grids in remote/rural areas, where deploying standard smart grids is economically and structurally infeasible. To this end, we consider an architectural design for a smart micro-grid suited to low-processing capable devices. We model malicious behaviour, and propose mitigation measures based properties to distinguish normal from malicious behaviour.
Monitoring is a key prerequisite for self-adaptive software and many other forms of operating software. Monitoring relevant lower level phenomena like the occurrences of exceptions and diagnosis data requires to carefully examine which detailed information is really necessary and feasible to monitor. Adaptive monitoring permits observing a greater variety of details with less overhead, if most of the time the MAPE-K loop can operate using only a small subset of all those details. However, engineering such an adaptive monitoring is a major engineering effort on its own that further complicates the development of self-adaptive software. The proposed approach overcomes the outlined problems by providing generic adaptive monitoring via runtime models. It reduces the effort to introduce and apply adaptive monitoring by avoiding additional development effort for controlling the monitoring adaptation. Although the generic approach is independent from the monitoring purpose, it still allows for substantial savings regarding the monitoring resource consumption as demonstrated by an example.
Modern routing algorithms reduce query time by depending heavily on preprocessed data. The recently developed Navigation Data Standard (NDS) enforces a separation between algorithms and map data, rendering preprocessing inapplicable. Furthermore, map data is partitioned into tiles with respect to their geographic coordinates. With the limited memory found in portable devices, the number of tiles loaded becomes the major factor for run time. We study routing under these restrictions and present new algorithms as well as empirical evaluations. Our results show that, on average, the most efficient algorithm presented uses more than 20 times fewer tile loads than a normal A*.
Minimising Information Loss on Anonymised High Dimensional Data with Greedy In-Memory Processing
(2018)
Minimising information loss on anonymised high dimensional data is important for data utility. Syntactic data anonymisation algorithms address this issue by generating datasets that are neither use-case specific nor dependent on runtime specifications. This results in anonymised datasets that can be re-used in different scenarios which is performance efficient. However, syntactic data anonymisation algorithms incur high information loss on high dimensional data, making the data unusable for analytics. In this paper, we propose an optimised exact quasi-identifier identification scheme, based on the notion of k-anonymity, to generate anonymised high dimensional datasets efficiently, and with low information loss. The optimised exact quasi-identifier identification scheme works by identifying and eliminating maximal partial unique column combination (mpUCC) attributes that endanger anonymity. By using in-memory processing to handle the attribute selection procedure, we significantly reduce the processing time required. We evaluated the effectiveness of our proposed approach with an enriched dataset drawn from multiple real-world data sources, and augmented with synthetic values generated in close alignment with the real-world data distributions. Our results indicate that in-memory processing drops attribute selection time for the mpUCC candidates from 400s to 100s, while significantly reducing information loss. In addition, we achieve a time complexity speed-up of O(3(n/3)) approximate to O(1.4422(n)).
We analyze the problem of response suggestion in a closed domain along a real-world scenario of a digital library. We present a text-processing pipeline to generate question-answer pairs from chat transcripts. On this limited amount of training data, we compare retrieval-based, conditioned-generation, and dedicated representation learning approaches for response suggestion. Our results show that retrieval-based methods that strive to find similar, known contexts are preferable over parametric approaches from the conditioned-generation family, when the training data is limited. We, however, identify a specific representation learning approach that is competitive to the retrieval-based approaches despite the training data limitation.
PlAnalyzer
(2018)
In this work we propose PIAnalyzer, a novel approach to analyze PendingIntent related vulnerabilities. We empirically evaluate PIAnalyzer on a set of 1000 randomly selected applications from the Google Play Store and find 1358 insecure usages of Pendinglntents, including 70 severe vulnerabilities. We manually inspected ten reported vulnerabilities out of which nine correctly reported vulnerabilities, indicating a high precision. The evaluation shows that PIAnalyzer is efficient with an average execution time of 13 seconds per application.
Currently we are witnessing profound changes in the geospatial domain. Driven by recent ICT developments, such as web services, serviceoriented computing or open-source software, an explosion of geodata and geospatial applications or rapidly growing communities of non-specialist users, the crucial issue is the provision and integration of geospatial intelligence in these rapidly changing, heterogeneous developments. This paper introduces the concept of Servicification into geospatial data processing. Its core idea is the provision of expertise through a flexible number of web-based software service modules. Selection and linkage of these services to user profiles, application tasks, data resources, or additional software allow for the compilation of flexible, time-sensitive geospatial data handling processes. Encapsulated in a string of discrete services, the approach presented here aims to provide non-specialist users with geospatial expertise required for the effective, professional solution of a defined application problem. Providing users with geospatial intelligence in the form of web-based, modular services, is a completely different approach to geospatial data processing. This novel concept puts geospatial intelligence, made available through services encapsulating rule bases and algorithms, in the centre and at the disposal of the users, regardless of their expertise.
Recently blockchain technology has been introduced to execute interacting business processes in a secure and transparent way. While the foundations for process enactment on blockchain have been researched, the execution of decisions on blockchain has not been addressed yet. In this paper we argue that decisions are an essential aspect of interacting business processes, and, therefore, also need to be executed on blockchain. The immutable representation of decision logic can be used by the interacting processes, so that decision taking will be more secure, more transparent, and better auditable. The approach is based on a mapping of the DMN language S-FEEL to Solidity code to be run on the Ethereum blockchain. The work is evaluated by a proof-of-concept prototype and an empirical cost evaluation.
OpenLL
(2018)
Today's rendering APIs lack robust functionality and capabilities for dynamic, real-time text rendering and labeling, which represent key requirements for 3D application design in many fields. As a consequence, most rendering systems are barely or not at all equipped with respective capabilities. This paper drafts the unified text rendering and labeling API OpenLL intended to complement common rendering APIs, frameworks, and transmission formats. For it, various uses of static and dynamic placement of labels are showcased and a text interaction technique is presented. Furthermore, API design constraints with respect to state-of-the-art text rendering techniques are discussed. This contribution is intended to initiate a community-driven specification of a free and open label library.
The emergence of cloud computing allows users to easily host their Virtual Machines with no up-front investment and the guarantee of always available anytime anywhere. But with the Virtual Machine (VM) is hosted outside of user's premise, the user loses the physical control of the VM as it could be running on untrusted host machines in the cloud. Malicious host administrator could launch live memory dumping, Spectre, or Meltdown attacks in order to extract sensitive information from the VM's memory, e.g. passwords or cryptographic keys of applications running in the VM. In this paper, inspired by the moving target defense (MTD) scheme, we propose a novel approach to increase the security of application's sensitive data in the VM by continuously moving the sensitive data among several memory allocations (blocks) in Random Access Memory (RAM). A movement function is added into the application source code in order for the function to be running concurrently with the application's main function. Our approach could reduce the possibility of VM's sensitive data in the memory to be leaked into memory dump file by 2 5% and secure the sensitive data from Spectre and Meltdown attacks. Our approach's overhead depends on the number and the size of the sensitive data.
Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present a cross-collection topic model combined with automatic domain term extraction and phrase segmentation. This model distinguishes collection-specific and collection-independent words based on information entropy and reveals commonalities and differences of multiple text collections. We evaluate our model on patents, scientific papers, newspaper articles, forum posts, and Wikipedia articles. In comparison to state-of-the-art cross-collection topic modeling, our model achieves up to 13% higher topic coherence, up to 4% lower perplexity, and up to 31% higher document classification accuracy. More importantly, our approach is the first topic model that ensures disjunct general and specific word distributions, resulting in clear-cut topic representations.