publish.UP Search

Discovering commute patterns via process mining (2019)

Ubiquitous computing has proven its relevance and efficiency in improving the user experience across a myriad of situations. It is now the ineluctable solution to keep pace with the ever-changing environments in which current systems operate. Despite the achievements of ubiquitous computing, this discipline is still overlooked in business process management. This is surprising, since many of today’s challenges, in this domain, can be addressed by methods and techniques from ubiquitous computing, for instance user context and dynamic aspects of resource locations. This paper takes a first step to integrate methods and techniques from ubiquitous computing in business process management. To do so, we propose discovering commute patterns via process mining. Through our proposition, we can deduce the users’ significant locations, routes, travel times and travel modes. This information can be a stepping-stone toward helping the business process management community embrace the latest achievements in ubiquitous computing, mainly in location-based service. To corroborate our claims, a user study was conducted. The significant places, routes, travel modes and commuting times of our test subjects were inferred with high accuracies. All in all, ubiquitous computing can enrich the processes with new capabilities that go beyond what has been established in business process management so far.

Spear spamming-resistant expertise analysis and ranking incollaborative tagging systems (2011)

Yeung, Ching-man Au ; Noll, Michael G. ; Gibbins, Nicholas ; Meinel, Christoph ; Shadbolt, Nigel

In this article, we discuss the notions of experts and expertise in resource discovery in the context of collaborative tagging systems. We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors. First, an expert should possess a high-quality collection of resources, while the quality of a Web resource in turn depends on the expertise of the users who have assigned tags to it, forming a mutual reinforcement relationship. Second, an expert should be one who tends to identify interesting or useful resources before other users discover them, thus bringing these resources to the attention of the community of users. We propose a graph-based algorithm, SPEAR (spamming-resistant expertise analysis and ranking), which implements the above ideas for ranking users in a folksonomy. Our experiments show that our assumptions on expertise in resource discovery, and SPEAR as an implementation of these ideas, allow us to promote experts and demote spammers at the same time, with performance significantly better than the original hypertext-induced topic search algorithm and simple statistical measures currently used in most collaborative tagging systems.

Generalized aggregate quality of service computation for composite services (2012)

Yang, Yong ; Dumas, Marlon ; Garcia-Banuelos, Luciano ; Polyvyanyy, Artem ; Zhang, Liang

This article addresses the problem of estimating the Quality of Service (QoS) of a composite service given the QoS of the services participating in the composition. Previous solutions to this problem impose restrictions on the topology of the orchestration models, limiting their applicability to well-structured orchestration models for example. This article lifts these restrictions by proposing a method for aggregate QoS computation that deals with more general types of unstructured orchestration models. The applicability and scalability of the proposed method are validated using a collection of models from industrial practice.

A framework for improved video text detection and recognition (2014)

Yang, Haojin ; Quehl, Bernhard ; Sack, Harald

Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.

Avoiding irreducible CSC conflicts by internal communication (2009)

Wist, Dominic ; Wollowski, Ralf ; Schaefer, Mark ; Vogler, Walter

Resynthesis of handshake specifications obtained e. g. from BALSA or TANGRAM with speed-independent logic synthesis from STGs is a promising approach. To deal with state-space explosion, we suggested STG decomposition; a problem is that decomposition can lead to irreducible CSC conflicts. Here, we present a new approach to solve such conflicts by introducing internal communication between the components. We give some first, very encouraging results for very large STGs concerning synthesis time and circuit area.

Signal transition graph decomposition internal communication for speed independent circuit implementation (2011)

Wist, Dominic ; Schaefer, Mark ; Vogler, Walter ; Wollowski, Ralf

Logic synthesis of speed independent circuits based on signal transition graph (STG) decomposition is a promising approach to tackle complexity problems like state-space explosion. Unfortunately, decomposition can result in components that in isolation have irreducible complete state coding conflicts. In earlier work, the authors showed how to resolve such conflicts by introducing internal communication between components, but only for very restricted specification structures. Here, they improve their former work by presenting algorithms for identifying delay transitions and inserting gyroscopes for specifications having a much more general structure. Thus, the authors are now able to synthesise controllers from real-life specifications. For all algorithms, they present correctness proofs and show their successful application to benchmarks, including very complex STGs arising in the context of control resynthesis.

VMI-PL: A monitoring language for virtual platforms using virtual machine introspection (2014)

Westphal, Florian ; Axelsson, Stefan ; Neuhaus, Christian ; Polze, Andreas

With the growth of virtualization and cloud computing, more and more forensic investigations rely on being able to perform live forensics on a virtual machine using virtual machine introspection (VMI). Inspecting a virtual machine through its hypervisor enables investigation without risking contamination of the evidence, crashing the computer, etc. To further access to these techniques for the investigator/researcher we have developed a new VMI monitoring language. This language is based on a review of the most commonly used VMI-techniques to date, and it enables the user to monitor the virtual machine's memory, events and data streams. A prototype implementation of our monitoring system was implemented in KVM, though implementation on any hypervisor that uses the common x86 virtualization hardware assistance support should be straightforward. Our prototype outperforms the proprietary VMWare VProbes in many cases, with a maximum performance loss of 18% for a realistic test case, which we consider acceptable. Our implementation is freely available under a liberal software distribution license. (C) 2014 Digital Forensics Research Workshop. Published by Elsevier Ltd. All rights reserved.

What's creative about sentences? (2022)

Weinstein, Theresa Julia ; Ceh, Simon Majed ; Meinel, Christoph ; Benedek, Mathias

Evaluating creativity of verbal responses or texts is a challenging task due to psychometric issues associated with subjective ratings and the peculiarities of textual data. We explore an approach to objectively assess the creativity of responses in a sentence generation task to 1) better understand what language-related aspects are valued by human raters and 2) further advance the developments toward automating creativity evaluations. Over the course of two prior studies, participants generated 989 four-word sentences based on a four-letter prompt with the instruction to be creative. We developed an algorithm that scores each sentence on eight different metrics including 1) general word infrequency, 2) word combination infrequency, 3) context-specific word uniqueness, 4) syntax uniqueness, 5) rhyme, 6) phonetic similarity, and similarity of 7) sequence spelling and 8) semantic meaning to the cue. The text metrics were then used to explain the averaged creativity ratings of eight human raters. We found six metrics to be significantly correlated with the human ratings, explaining a total of 16% of their variance. We conclude that the creative impression of sentences is partly driven by different aspects of novelty in word choice and syntax, as well as rhythm and sound, which are amenable to objective assessment.

Optimizing event pattern matching using business process models (2014)

Weidlich, Matthias ; Ziekow, Holger ; Gal, Avigdor ; Mendling, Jan ; Weske, Mathias

A growing number of enterprises use complex event processing for monitoring and controlling their operations, while business process models are used to document working procedures. In this work, we propose a comprehensive method for complex event processing optimization using business process models. Our proposed method is based on the extraction of behaviorial constraints that are used, in turn, to rewrite patterns for event detection, and select and transform execution plans. We offer a set of rewriting rules that is shown to be complete with respect to the all, seq, and any patterns. The effectiveness of our method is demonstrated in an experimental evaluation with a large number of processes from an insurance company. We illustrate that the proposed optimization leads to significant savings in query processing. By integrating the optimization in state-of-the-art systems for event pattern matching, we demonstrate that these savings materialize in different technical infrastructures and can be combined with existing optimization techniques.

Causal behavioural profiles - efficient computation, applications, and evaluation (2011)

Weidlich, Matthias ; Polyvyanyy, Artem ; Mendling, Jan ; Weske, Mathias

Analysis of behavioural consistency is an important aspect of software engineering. In process and service management, consistency verification of behavioural models has manifold applications. For instance, a business process model used as system specification and a corresponding workflow model used as implementation have to be consistent. Another example would be the analysis to what degree a process log of executed business operations is consistent with the corresponding normative process model. Typically, existing notions of behaviour equivalence, such as bisimulation and trace equivalence, are applied as consistency notions. Still, these notions are exponential in computation and yield a Boolean result. In many cases, however, a quantification of behavioural deviation is needed along with concepts to isolate the source of deviation. In this article, we propose causal behavioural profiles as the basis for a consistency notion. These profiles capture essential behavioural information, such as order, exclusiveness, and causality between pairs of activities of a process model. Consistency based on these profiles is weaker than trace equivalence, but can be computed efficiently for a broad class of models. In this article, we introduce techniques for the computation of causal behavioural profiles using structural decomposition techniques for sound free-choice workflow systems if unstructured net fragments are acyclic or can be traced back to S-or T-nets. We also elaborate on the findings of applying our technique to three industry model collections.

Process compliance analysis based on behavioural profiles (2011)

Weidlich, Matthias ; Polyvyanyy, Artem ; Desai, Nirmit ; Mendling, Jan ; Weske, Mathias

Process compliance measurement is getting increasing attention in companies due to stricter legal requirements and market pressure for operational excellence. In order to judge on compliance of the business processing, the degree of behavioural deviation of a case, i.e., an observed execution sequence, is quantified with respect to a process model (referred to as fitness, or recall). Recently, different compliance measures have been proposed. Still, nearly all of them are grounded on state-based techniques and the trace equivalence criterion, in particular. As a consequence, these approaches have to deal with the state explosion problem. In this paper, we argue that a behavioural abstraction may be leveraged to measure the compliance of a process log - a collection of cases. To this end, we utilise causal behavioural profiles that capture the behavioural characteristics of process models and cases, and can be computed efficiently. We propose different compliance measures based on these profiles, discuss the impact of noise in process logs on our measures, and show how diagnostic information on non-compliance is derived. As a validation, we report on findings of applying our approach in a case study with an international service provider.

Propagating changes between aligned process models (2012)

Weidlich, Matthias ; Mendling, Jan ; Weske, Mathias

There is a wide variety of drivers for business process modelling initiatives, reaching from organisational redesign to the development of information systems. Consequently, a common business process is often captured in multiple models that overlap in content due to serving different purposes. Business process management aims at flexible adaptation to changing business needs. Hence, changes of business processes occur frequently and have to be incorporated in the respective process models. Once a process model is changed, related process models have to be updated accordingly, despite the fact that those process models may only be loosely coupled. In this article, we introduce an approach that supports change propagation between related process models. Given a change in one process model, we leverage the behavioural abstraction of behavioural profiles for corresponding activities in order to determine a change region in another model. Our approach is able to cope with changes in pairs of models that are not related by hierarchical refinement and show behavioural inconsistencies. We evaluate the applicability of our approach with two real-world process model collections. To this end, we either deduce change operations from different model revisions or rely on synthetic change operations.

Perceived consistency between process models (2012)

Weidlich, Matthias ; Mendling, Jan

Process-aware information systems typically involve various kinds of process stakeholders. That, in turn, leads to multiple process models that capture a common process from different perspectives and at different levels of abstraction. In order to guarantee a certain degree of uniformity, the consistency of such related process models is evaluated using formal criteria. However, it is unclear how modelling experts assess the consistency between process models, and which kind of notion they perceive to be appropriate. In this paper, we focus on control flow aspects and investigate the adequacy of consistency notions. In particular, we report findings from an online experiment, which allows us to compare in how far trace equivalence and two notions based on behavioural profiles approximate expert perceptions on consistency. Analysing 69 expert statements from process analysts, we conclude that trace equivalence is not suited to be applied as a consistency notion, whereas the notions based on behavioural profiles approximate the perceived consistency of our subjects significantly. Therefore, our contribution is an empirically founded answer to the correlation of behaviour consistency notions and the consistency perception by experts in the field of business process modelling.

Behaviour equivalence and compatibility of business process models with complex correspondences (2012)

Weidlich, Matthias ; Dijkman, Remco ; Weske, Mathias

Once multiple models of a business process are created for different purposes or to capture different variants, verification of behaviour equivalence or compatibility is needed. Equivalence verification ensures that two business process models specify the same behaviour. Since different process models are likely to differ with respect to their assumed level of abstraction and the actions that they take into account, equivalence notions have to cope with correspondences between sets of actions and actions that exist in one process but not in the other. In this paper, we present notions of equivalence and compatibility that can handle these problems. In essence, we present a notion of equivalence that works on correspondences between sets of actions rather than single actions. We then integrate our equivalence notion with work on behaviour inheritance that copes with actions that exist in one process but not in the other, leading to notions of behaviour compatibility. Compatibility notions verify that two models have the same behaviour with respect to the actions that they have in common. As such, our contribution is a collection of behaviour equivalence and compatibility notions that are applicable in more general settings than existing ones.

Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning (2018)

Wang, Cheng ; Yang, Haojin ; Meinel, Christoph

Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.

data4life - Eine nutzerkontrollierte Gesundheitsdaten-Infrastruktu (2019)

von Schorlemer, Stephan ; Weiß, Christian-Cornelius

Model-Driven engineering of self-adaptive software with EUREMA (2014)

Vogel, Thomas ; Giese, Holger

The development of self-adaptive software requires the engineering of an adaptation engine that controls the underlying adaptable software by feedback loops. The engine often describes the adaptation by runtime models representing the adaptable software and by activities such as analysis and planning that use these models. To systematically address the interplay between runtime models and adaptation activities, runtime megamodels have been proposed. A runtime megamodel is a specific model capturing runtime models and adaptation activities. In this article, we go one step further and present an executable modeling language for ExecUtable RuntimE MegAmodels (EUREMA) that eases the development of adaptation engines by following a model-driven engineering approach. We provide a domain-specific modeling language and a runtime interpreter for adaptation engines, in particular feedback loops. Megamodels are kept alive at runtime and by interpreting them, they are directly executed to run feedback loops. Additionally, they can be dynamically adjusted to adapt feedback loops. Thus, EUREMA supports development by making feedback loops explicit at a higher level of abstraction and it enables solutions where multiple feedback loops interact or operate on top of each other and self-adaptation co-exists with offline adaptation for evolution.

Detecting layout templates in complex multiregion files (2021)

Vitagliano, Gerardo ; Jiang, Lan ; Naumann, Felix

Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files.

Cultural Theory’s contributions to climate science (2022)

Verweij, Marco ; Ney, Steven ; Thompson, Michael

In his article, 'Social constructionism and climate science denial', Hansson claims to present empirical evidence that the cultural theory developed by Dame Mary Douglas, Aaron Wildavsky and ourselves (among others) leads to (climate) science denial. In this reply, we show that there is no validity to these claims. First, we show that Hansson's empirical evidence that cultural theory has led to climate science denial falls apart under closer inspection. Contrary to Hansson's claims, cultural theory has made significant contributions to understanding and addressing climate change. Second, we discuss various features of Douglas' cultural theory that differentiate it from other constructivist approaches and make it compatible with the scientific method. Thus, we also demonstrate that cultural theory cannot be accused of epistemic relativism.

Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation (2020)

Vaid, Akhil ; Somani, Sulaiman ; Russak, Adam J. ; De Freitas, Jessica K. ; Chaudhry, Fayzan F. ; Paranjpe, Ishan ; Johnson, Kipp W. ; Lee, Samuel J. ; Miotto, Riccardo ; Richter, Felix ; Zhao, Shan ; Beckmann, Noam D. ; Naik, Nidhi ; Kia, Arash ; Timsina, Prem ; Lala, Anuradha ; Paranjpe, Manish ; Golden, Eddye ; Danieletto, Matteo ; Singh, Manbir ; Meyer, Dara ; O'Reilly, Paul F. ; Huckins, Laura ; Kovatch, Patricia ; Finkelstein, Joseph ; Freeman, Robert M. ; Argulian, Edgar ; Kasarskis, Andrew ; Percha, Bethany ; Aberg, Judith A. ; Bagiella, Emilia ; Horowitz, Carol R. ; Murphy, Barbara ; Nestler, Eric J. ; Schadt, Eric E. ; Cho, Judy H. ; Cordon-Cardo, Carlos ; Fuster, Valentin ; Charney, Dennis S. ; Reich, David L. ; Böttinger, Erwin ; Levin, Matthew A. ; Narula, Jagat ; Fayad, Zahi A. ; Just, Allan C. ; Charney, Alexander W. ; Nadkarni, Girish N. ; Glicksberg, Benjamin S.

Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.

Predictive approaches for acute dialysis requirement and death in COVID-19 (2021)

Vaid, Akhil ; Chan, Lili ; Chaudhary, Kumardeep ; Jaladanki, Suraj K. ; Paranjpe, Ishan ; Russak, Adam J. ; Kia, Arash ; Timsina, Prem ; Levin, Matthew A. ; He, John Cijiang ; Böttinger, Erwin ; Charney, Alexander W. ; Fayad, Zahi A. ; Coca, Steven G. ; Glicksberg, Benjamin S. ; Nadkarni, Girish N.

Background and objectives AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. Design, setting, participants, & measurements Using data from adult patients hospitalized with COVID-19 from five hospitals from theMount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to theMount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission. Results A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precisionrecall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction. Conclusions An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models.

Object and process migration in .NET (2009)

Troeger, Peter ; Polze, Andreas

Many of today's distributed computing systems in the field do not Support the migration of execution entities among computing nodes (luring runtime. The relatively static association between units of processing and computing nodes makes it difficult to implement fault-tolerant behavior or load-balancing schemes. The concept of code migration may provide a solution to the above-mentioned problems. it can be defined as the movement of processes, objects, or components from one computing node to another during system runtime in a distributed environment. With the advent of the virtual machine-based NET framework, many of the cross-language heterogeneity issues have been resolved. With the commercial implementation, the shared source "Rotor", and the open-source "Mono" implementation on hand, we have focused on cross-operating system heterogeneity issues and present interoperability and migration schemes for applications distributed over different operating systems (namely Linux and Windows 2000) as well as various NET implementations. Within this paper, we describe the integration of a migration facility with the hell) of Aspect- Oriented Programming (AOP) into the NET framework. AOP is interesting as it addresses non-functional system properties on the middleware level, without the need to manipulate lower system layers like the operating system itself. Most features required to implement object or process migration (such as reflection mechanisms or a machine-independent executable format) are already present in the NET frameworks, so the integration of such a concept is a natural extension of the system capabilities. We have implemented several proof-of-concept applications for different use case scenarios. The paper contains an experimental evaluation of the performance impact of object migration in context of those applications.

CloudStrike (2020)

Torkura, Kennedy A. ; Sukmana, Muhammad Ihsan Haikal ; Cheng, Feng ; Meinel, Christoph

Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.

Mary, Hugo, and Hugo* (2020)

Thamsen, Lauritz ; Beilharz, Jossekin Jakob ; Vinh Thuy Tran ; Nedelkoski, Sasho ; Kao, Odej

Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly.

Multiperiod robust optimization for proactive resource provisioning in virtualized data centers (2014)

Takouna, Ibrahim ; Sachs, Kai ; Meinel, Christoph

Object Versioning to Support Recovery Needs Using Proxies to Preserve Previous Development States in Lively (2015)

Steinert, Bastian ; Thamsen, Lauritz ; Felgentreff, Tim ; Hirschfeld, Robert

We present object versioning as a generic approach to preserve access to previous development and application states. Version-aware references can manage the modifications made to the target object and record versions as desired. Such references can be provided without modifications to the virtual machine. We used proxies to implement the proposed concepts and demonstrate the Lively Kernel running on top of this object versioning layer. This enables Lively users to undo the effects of direct manipulation and other programming actions.

Leveraging parameter dependencies in high-field asymmetric waveform ion-mobility spectrometry and size exclusion chromatography for proteome-wide cross-linking mass spectrometry (2022)

Sinn, Ludwig R. ; Giese, Sven Hans-Joachim ; Stuiver, Marchel ; Rappsilber, Juri

Ion-mobility spectrometry shows great promise to tackle analytically challenging research questions by adding another separation dimension to liquid chromatography-mass spectrometry. The understanding of how analyte properties influence ion mobility has increased through recent studies, but no clear rationale for the design of customized experimental settings has emerged. Here, we leverage machine learning to deepen our understanding of field asymmetric waveform ion-mobility spectrometry for the analysis of cross-linked peptides. Knowing that predominantly m/z and then the size and charge state of an analyte influence the separation, we found ideal compensation voltages correlating with the size exclusion chromatography fraction number. The effect of this relationship on the analytical depth can be substantial as exploiting it allowed us to almost double unique residue pair detections in a proteome-wide cross-linking experiment. Other applications involving liquid- and gas-phase separation may also benefit from considering such parameter dependencies.

Coronavirus 2019 and people living with human immunodeficiency virus (2020)

Background: There are limited data regarding the clinical impact of coronavirus disease 2019 (COVID-19) on people living with human immunodeficiency virus (PLWH). In this study, we compared outcomes for PLWH with COVID-19 to a matched comparison group. Methods: We identified 88 PLWH hospitalized with laboratory-confirmed COVID-19 in our hospital system in New York City between 12 March and 23 April 2020. We collected data on baseline clinical characteristics, laboratory values, HIV status, treatment, and outcomes from this group and matched comparators (1 PLWH to up to 5 patients by age, sex, race/ethnicity, and calendar week of infection). We compared clinical characteristics and outcomes (death, mechanical ventilation, hospital discharge) for these groups, as well as cumulative incidence of death by HIV status. Results: Patients did not differ significantly by HIV status by age, sex, or race/ethnicity due to the matching algorithm. PLWH hospitalized with COVID-19 had high proportions of HIV virologic control on antiretroviral therapy. PLWH had greater proportions of smoking (P < .001) and comorbid illness than uninfected comparators. There was no difference in COVID-19 severity on admission by HIV status (P = .15). Poor outcomes for hospitalized PLWH were frequent but similar to proportions in comparators; 18% required mechanical ventilation and 21% died during follow-up (compared with 23% and 20%, respectively). There was similar cumulative incidence of death over time by HIV status (P = .94). Conclusions: We found no differences in adverse outcomes associated with HIV infection for hospitalized COVID-19 patients compared with a demographically similar patient group.

IMDfence (2020)

Siddiqi, Muhammad Ali ; Dörr, Christian ; Strydis, Christos

Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, entity authentication, remote monitoring and system scalability. The protocol also allows emergency access that results in the graceful degradation of offered services without compromising security and patient safety. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere less than 7% increase in total IMD energy consumption, and less than 14 ms and 9 kB increase in system delay and memory footprint, respectively.

Interactive photo editing on smartphones via intrinsic decomposition (2021)

Shekhar, Sumit ; Reimann, Max ; Mayer, Maximilian ; Semmo, Amir ; Pasewaldt, Sebastian ; Döllner, Jürgen ; Trapp, Matthias

Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data.

Interactive visualization of generalized virtual 3D city models using level-of-abstraction transitions (2012)

Semmo, Amir ; Trapp, Matthias ; Kyprianidis, Jan Eric ; Döllner, Jürgen Roland Friedrich

Virtual 3D city models play an important role in the communication of complex geospatial information in a growing number of applications, such as urban planning, navigation, tourist information, and disaster management. In general, homogeneous graphic styles are used for visualization. For instance, photorealism is suitable for detailed presentations, and non-photorealism or abstract stylization is used to facilitate guidance of a viewer's gaze to prioritized information. However, to adapt visualization to different contexts and contents and to support saliency-guided visualization based on user interaction or dynamically changing thematic information, a combination of different graphic styles is necessary. Design and implementation of such combined graphic styles pose a number of challenges, specifically from the perspective of real-time 3D visualization. In this paper, the authors present a concept and an implementation of a system that enables different presentation styles, their seamless integration within a single view, and parametrized transitions between them, which are defined according to tasks, camera view, and image resolution. The paper outlines potential usage scenarios and application fields together with a performance evaluation of the implementation.

Cartography-Oriented Design of 3D Geospatial Information Visualization - Overview and Techniques (2015)

Semmo, Amir ; Trapp, Matthias ; Jobst, Markus ; Döllner, Jürgen Roland Friedrich

In economy, society and personal life map-based interactive geospatial visualization becomes a natural element of a growing number of applications and systems. The visualization of 3D geospatial information, however, raises the question how to represent the information in an effective way. Considerable research has been done in technology-driven directions in the fields of cartography and computer graphics (e.g., design principles, visualization techniques). Here, non-photorealistic rendering (NPR) represents a promising visualization category - situated between both fields - that offers a large number of degrees for the cartography-oriented visual design of complex 2D and 3D geospatial information for a given application context. Still today, however, specifications and techniques for mapping cartographic design principles to the state-of-the-art rendering pipeline of 3D computer graphics remain to be explored. This paper revisits cartographic design principles for 3D geospatial visualization and introduces an extended 3D semiotic model that complies with the general, interactive visualization pipeline. Based on this model, we propose NPR techniques to interactively synthesize cartographic renditions of basic feature types, such as terrain, water, and buildings. In particular, it includes a novel iconification concept to seamlessly interpolate between photorealistic and cartographic representations of 3D landmarks. Our work concludes with a discussion of open challenges in this field of research, including topics, such as user interaction and evaluation.

Concepts for cartography-oriented visualization of virtual 3D city models (2012)

Semmo, Amir ; Hildebrandt, Dieter ; Trapp, Matthias ; Döllner, Jürgen Roland Friedrich

Virtual 3D city models serve as an effective medium with manifold applications in geoinformation systems and services. To date, most 3D city models are visualized using photorealistic graphics. But an effective communication of geoinformation significantly depends on how important information is designed and cognitively processed in the given application context. One possibility to visually emphasize important information is based on non-photorealistic rendering, which comprehends artistic depiction styles and is characterized by its expressiveness and communication aspects. However, a direct application of non-photorealistic rendering techniques primarily results in monotonic visualization that lacks cartographic design aspects. In this work, we present concepts for cartography-oriented visualization of virtual 3D city models. These are based on coupling non-photorealistic rendering techniques and semantics-based information for a user, context, and media-dependent representation of thematic information. This work highlights challenges for cartography-oriented visualization of 3D geovirtual environments, presents stylization techniques and discusses their applications and ideas for a standardized visualization. In particular, the presented concepts enable a real-time and dynamic visualization of thematic geoinformation.

Interactive image filtering for level-of-abstraction texturing of virtual 3D scenes (2015)

Semmo, Amir ; Döllner, Jürgen Roland Friedrich

Texture mapping is a key technology in computer graphics. For the visual design of 3D scenes, in particular, effective texturing depends significantly on how important contents are expressed, e.g., by preserving global salient structures, and how their depiction is cognitively processed by the user in an application context. Edge-preserving image filtering is one key approach to address these concerns. Much research has focused on applying image filters in a post-process stage to generate artistically stylized depictions. However, these approaches generally do not preserve depth cues, which are important for the perception of 3D visualization (e.g., texture gradient). To this end, filtering is required that processes texture data coherently with respect to linear perspective and spatial relationships. In this work, we present an approach for texturing 3D scenes with perspective coherence by arbitrary image filters. We propose decoupled deferred texturing with (1) caching strategies to interactively perform image filtering prior to texture mapping and (2) for each mipmap level separately to enable a progressive level of abstraction, using (3) direct interaction interfaces to parameterize the visualization according to spatial, semantic, and thematic data. We demonstrate the potentials of our method by several applications using touch or natural language inputs to serve the different interests of users in specific information, including illustrative visualization, focus+context visualization, geometric detail removal, and semantic depth of field. The approach supports frame-to-frame coherence, order-independent transparency, multitexturing, and content-based filtering. In addition, it seamlessly integrates into real-time rendering pipelines and is extensible for custom interaction techniques. (C) 2015 Elsevier Ltd. All rights reserved.

Next generation cooperative wearables (2017)

Seiffert, Martin ; Holstein, Flavio ; Schlosser, Rainer ; Schiller, Jochen

Currently available wearables are usually based on a single sensor node with integrated capabilities for classifying different activities. The next generation of cooperative wearables could be able to identify not only activities, but also to evaluate them qualitatively using the data of several sensor nodes attached to the body, to provide detailed feedback for the improvement of the execution. Especially within the application domains of sports and health-care, such immediate feedback to the execution of body movements is crucial for (re-) learning and improving motor skills. To enable such systems for a broad range of activities, generalized approaches for human motion assessment within sensor networks are required. In this paper, we present a generalized trainable activity assessment chain (AAC) for the online assessment of periodic human activity within a wireless body area network. AAC evaluates the execution of separate movements of a prior trained activity on a fine-grained quality scale. We connect qualitative assessment with human knowledge by projecting the AAC on the hierarchical decomposition of motion performed by the human body as well as establishing the assessment on a kinematic evaluation of biomechanically distinct motion fragments. We evaluate AAC in a real-world setting and show that AAC successfully delimits the movements of correctly performed activity from faulty executions and provides detailed reasons for the activity assessment.

Dynamic hierarchical mega models : comprehensive traceability and its efficient maintenance (2010)

Seibel, Andreas ; Neumann, Stefan ; Giese, Holger

In the world of model-driven engineering (MDE) support for traceability and maintenance of traceability information is essential. On the one hand, classical traceability approaches for MDE address this need by supporting automated creation of traceability information on the model element level. On the other hand, global model management approaches manually capture traceability information on the model level. However, there is currently no approach that supports comprehensive traceability, comprising traceability information on both levels, and efficient maintenance of traceability information, which requires a high-degree of automation and scalability. In this article, we present a comprehensive traceability approach that combines classical traceability approaches for MDE and global model management in form of dynamic hierarchical mega models. We further integrate efficient maintenance of traceability information based on top of dynamic hierarchical mega models. The proposed approach is further outlined by using an industrial case study and by presenting an implementation of the concepts in form of a prototype.

Automated reasoning for attributed graph properties (2018)

Schneider, Sven ; Lambers, Leen ; Orejas, Fernando

Graphs are ubiquitous in computer science. Moreover, in various application fields, graphs are equipped with attributes to express additional information such as names of entities or weights of relationships. Due to the pervasiveness of attributed graphs, it is highly important to have the means to express properties on attributed graphs to strengthen modeling capabilities and to enable analysis. Firstly, we introduce a new logic of attributed graph properties, where the graph part and attribution part are neatly separated. The graph part is equivalent to first-order logic on graphs as introduced by Courcelle. It employs graph morphisms to allow the specification of complex graph patterns. The attribution part is added to this graph part by reverting to the symbolic approach to graph attribution, where attributes are represented symbolically by variables whose possible values are specified by a set of constraints making use of algebraic specifications. Secondly, we extend our refutationally complete tableau-based reasoning method as well as our symbolic model generation approach for graph properties to attributed graph properties. Due to the new logic mentioned above, neatly separating the graph and attribution parts, and the categorical constructions employed only on a more abstract level, we can leave the graph part of the algorithms seemingly unchanged. For the integration of the attribution part into the algorithms, we use an oracle, allowing for flexible adoption of different available SMT solvers in the actual implementation. Finally, our automated reasoning approach for attributed graph properties is implemented in the tool AutoGraph integrating in particular the SMT solver Z3 for the attribute part of the properties. We motivate and illustrate our work with a particular application scenario on graph database query validation.

A logic-based incremental approach to graph repair featuring delta preservation (2021)

Schneider, Sven ; Lambers, Leen ; Orejas, Fernando

We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.

Distributed detection of sequential anomalies in univariate time series (2021)

Schneider, Johannes ; Wenig, Phillip ; Papenbrock, Thorsten

The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.

Efficient distributed discovery of bidirectional order dependencies (2022)

Schmidl, Sebastian ; Papenbrock, Thorsten

Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.

Risk-sensitive control of Markov decision processes (2020)

Schlosser, Rainer

In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.

Scalable relaxation techniques to solve stochastic dynamic multi-product pricing problems with substitution effects (2020)

Schlosser, Rainer

In many businesses, firms are selling different types of products, which share mutual substitution effects in demand. To compute effective pricing strategies is challenging as the sales probabilities of each of a firm's products can also be affected by the prices of potential substitutes. In this paper, we analyze stochastic dynamic multi-product pricing models for the sale of perishable goods. To circumvent the limitations of time-consuming optimal solutions for highly complex models, we propose different relaxation techniques, which allow to reduce the size of critical model components, such as the state space, the action space, or the set of potential sales events. Our heuristics are able to decrease the size of those components by forming corresponding clusters and using subsets of representative elements. Using numerical examples, we verify that our heuristics make it possible to dramatically reduce the computation time while still obtaining close-to-optimal expected profits. Further, we show that our heuristics are (i) flexible, (ii) scalable, and (iii) can be arbitrarily combined in a mutually supportive way.

Heuristic mean-variance optimization in Markov decision processes using state-dependent risk aversion (2022)

Schlosser, Rainer

In dynamic decision problems, it is challenging to find the right balance between maximizing expected rewards and minimizing risks. In this paper, we consider NP-hard mean-variance (MV) optimization problems in Markov decision processes with a finite time horizon. We present a heuristic approach to solve MV problems, which is based on state-dependent risk aversion and efficient dynamic programming techniques. Our approach can also be applied to mean-semivariance (MSV) problems, which particularly focus on the downside risk. We demonstrate the applicability and the effectiveness of our heuristic for dynamic pricing applications. Using reproducible examples, we show that our approach outperforms existing state-of-the-art benchmark models for MV and MSV problems while also providing competitive runtimes. Further, compared to models based on constant risk levels, we find that state-dependent risk aversion allows to more effectively intervene in case sales processes deviate from their planned paths. Our concepts are domain independent, easy to implement and of low computational complexity.

Dynamic pricing and advertising of perishable products with inventory holding costs (2015)

Schlosser, Rainer

We examine a special class of dynamic pricing and advertising models for the sale of perishable goods, including marginal unit costs and inventory holding costs. The time horizon is assumed to be finite and we allow several model parameters to be dependent on time. For the stochastic version of the model, we derive closed-form expressions of the value function as well as of the optimal pricing and advertising policy in feedback form. Moreover, we show that for small unit shares, the model converges to a deterministic version of the problem, whose explicit solution is characterized by an overage and an underage case. We quantify the close relationship between the open-loop solution of the deterministic model and the expected evolution of optimally controlled stochastic sales processes. For both models, we derive sensitivity results. We find that in the case of positive holding costs, on average, optimal prices increase in time and advertising rates decrease. Furthermore, we analytically verify the excellent quality of optimal feedback policies of deterministic models applied in stochastic models. (C) 2015 Elsevier B.V. All rights reserved.

Efficient discovery of matching dependencies (2020)

Schirmer, Philipp ; Papenbrock, Thorsten ; Koumarelas, Ioannis ; Naumann, Felix

Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB.

Towards a system for complex analysis of security events in large-scale networks (2017)

Sapegin, Andrey ; Jaeger, David ; Cheng, Feng ; Meinel, Christoph

After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems.

More than a quarter century of creativity and innovation management (2020)

Rose, Robert ; Hölzle, Katharina ; Björk, Jennie

When this journal was founded in 1992 by Tudor Rickards and Susan Moger, there was no academic outlet available that addressed issues at the intersection of creativity and innovation. From zero to 1,163 records, from the new kid on the block to one of the leading journals in creativity and innovation management has been quite a journey, and we would like to reflect on the past 28 years and the intellectual and conceptual structure of Creativity and Innovation Management (CIM). Specifically, we highlight milestones and influential articles, identify how key journal characteristics evolved, outline the (co-)authorship structure, and finally, map the thematic landscape of CIM by means of a text-mining analysis. This study represents the first systematic and comprehensive assessment of the journal's published body of knowledge and helps to understand the journal's influence on the creativity and innovation management community. We conclude by discussing future topics and paths of the journal as well as limitations of our approach.

An alert correlation platform for memory-supported techniques (2012)

Roschke, Sebastian ; Cheng, Feng ; Meinel, Christoph

Intrusion Detection Systems (IDS) have been widely deployed in practice for detecting malicious behavior on network communication and hosts. False-positive alerts are a popular problem for most IDS approaches. The solution to address this problem is to enhance the detection process by correlation and clustering of alerts. To meet the practical requirements, this process needs to be finished fast, which is a challenging task as the amount of alerts in large-scale IDS deployments is significantly high. We identifytextitdata storage and processing algorithms to be the most important factors influencing the performance of clustering and correlation. We propose and implement a highly efficient alert correlation platform. For storage, a column-based database, an In-Memory alert storage, and memory-based index tables lead to significant improvements of the performance. For processing, algorithms are designed and implemented which are optimized for In-Memory databases, e.g. an attack graph-based correlation algorithm. The platform can be distributed over multiple processing units to share memory and processing power. A standardized interface is designed to provide a unified view of result reports for end users. The efficiency of the platform is tested by practical experiments with several alert storage approaches, multiple algorithms, as well as a local and a distributed deployment.

High-quality attack graph-based IDS correlation (2013)

Roschke, Sebastian ; Cheng, Feng ; Meinel, Christoph

Intrusion Detection Systems are widely deployed in computer networks. As modern attacks are getting more sophisticated and the number of sensors and network nodes grow, the problem of false positives and alert analysis becomes more difficult to solve. Alert correlation was proposed to analyse alerts and to decrease false positives. Knowledge about the target system or environment is usually necessary for efficient alert correlation. For representing the environment information as well as potential exploits, the existing vulnerabilities and their Attack Graph (AG) is used. It is useful for networks to generate an AG and to organize certain vulnerabilities in a reasonable way. In this article, a correlation algorithm based on AGs is designed that is capable of detecting multiple attack scenarios for forensic analysis. It can be parameterized to adjust the robustness and accuracy. A formal model of the algorithm is presented and an implementation is tested to analyse the different parameters on a real set of alerts from a local network. To improve the speed of the algorithm, a multi-core version is proposed and a HMM-supported version can be used to further improve the quality. The parallel implementation is tested on a multi-core correlation platform, using CPUs and GPUs.

Pareto optimization for subset selection with dynamic cost constraints (2022)

Roostapour, Vahid ; Neumann, Aneta ; Neumann, Frank ; Friedrich, Tobias

We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

187 search hits