TY  - JOUR
A1  - Hacker, Philipp
A1  - Krestel, Ralf
A1  - Grundmann, Stefan
A1  - Naumann, Felix
T1  - Explainable AI under contract and tort law
BT  - legal incentives and technical challenges
JF  - Artificial intelligence and law
N2  - This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification.
KW  - explainability
KW  - explainable AI
KW  - interpretable machine learning
KW  - contract
KW  - law
KW  - tort law
KW  - explainability-accuracy trade-off
KW  - medical malpractice
KW  - corporate takeovers
Y1  - 2020
U6  - https://doi.org/10.1007/s10506-020-09260-6
SN  - 0924-8463
SN  - 1572-8382
VL  - 28
IS  - 4
SP  - 415
EP  - 439
PB  - Springer
CY  - Dordrecht
ER  - 
TY  - JOUR
A1  - Lambers, Leen
A1  - Weber, Jens
T1  - Preface to the special issue on the 11th International Conference on Graph Transformation
JF  - Journal of Logical and Algebraic Methods in Programming
N2  - This special issue contains extended versions of four selected papers from the 11th International Conference on Graph Transformation (ICGT 2018). The articles cover a tool for computing core graphs via SAT/SMT solvers (graph language definition), graph transformation through graph surfing in reaction systems (a new graph transformation formalism), the essence and initiality of conflicts in M-adhesive transformation systems, and a calculus of concurrent graph-rewriting processes (theory on conflicts and parallel independence).
KW  - graph transformation
KW  - graph languages
KW  - conflicts and dependencies in
KW  - concurrent graph rewriting
Y1  - 2020
U6  - https://doi.org/10.1016/j.jlamp.2020.100525
SN  - 2352-2208
VL  - 112
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - Transformation rules with nested application conditions
BT  - critical pairs, initial conflicts & minimality
JF  - Theoretical computer science
N2  - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet. 

In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
KW  - Graph transformation
KW  - Critical pairs
KW  - Initial conflicts
KW  - Application
KW  - conditions
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2021.07.023
SN  - 0304-3975
SN  - 1879-2294
VL  - 884
SP  - 44
EP  - 67
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Discher, Sören
A1  - Richter, Rico
A1  - Döllner, Jürgen Roland Friedrich
T1  - Interactive and View-Dependent See-Through Lenses for Massive 3D Point Clouds
JF  - Advances in 3D Geoinformation
N2  - 3D point clouds are a digital representation of our world and used in a variety of applications. They are captured with LiDAR or derived by image-matching approaches to get surface information of objects, e.g., indoor scenes, buildings, infrastructures, cities, and landscapes. We present novel interaction and visualization techniques for heterogeneous, time variant, and semantically rich 3D point clouds. Interactive and view-dependent see-through lenses are introduced as exploration tools to enhance recognition of objects, semantics, and temporal changes within 3D point cloud depictions. We also develop filtering and highlighting techniques that are used to dissolve occlusion to give context-specific insights. All techniques can be combined with an out-of-core real-time rendering system for massive 3D point clouds. We have evaluated the presented approach with 3D point clouds from different application domains. The results show the usability and how different visualization and exploration tasks can be improved for a variety of domain-specific applications.
KW  - 3D point clouds
KW  - LIDAR
KW  - Visualization
KW  - Point-based rendering
Y1  - 2016
SN  - 978-3-319-25691-7
SN  - 978-3-319-25689-4
U6  - https://doi.org/10.1007/978-3-319-25691-7_3
SN  - 1863-2246
SP  - 49
EP  - 62
PB  - Springer
CY  - Cham
ER  - 
TY  - JOUR
A1  - Koßmann, Jan
A1  - Papenbrock, Thorsten
A1  - Naumann, Felix
T1  - Data dependencies for query optimization
BT  - a survey
JF  - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment
N2  - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
KW  - Query optimization
KW  - Query execution
KW  - Data dependencies
KW  - Data profiling
KW  - Unique column combinations
KW  - Functional dependencies
KW  - Order dependencies
KW  - Inclusion dependencies
KW  - Relational data
KW  - SQL
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00676-3
SN  - 1066-8888
SN  - 0949-877X
VL  - 31
IS  - 1
SP  - 1
EP  - 22
PB  - Springer
CY  - Berlin ; Heidelberg ; New York
ER  - 
TY  - JOUR
A1  - Quinzan, Francesco
A1  - Göbel, Andreas
A1  - Wagner, Markus
A1  - Friedrich, Tobias
T1  - Evolutionary algorithms and submodular functions
BT  - benefits of heavy-tailed mutations
JF  - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal
N2  - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
KW  - Evolutionary algorithms
KW  - Mutation operators
KW  - Submodular functions
KW  - Matroids
Y1  - 2021
U6  - https://doi.org/10.1007/s11047-021-09841-7
SN  - 1572-9796
VL  - 20
IS  - 3
SP  - 561
EP  - 575
PB  - Springer Science + Business Media B.V.
CY  - Dordrecht
ER  - 
TY  - JOUR
A1  - Oosthoek, Kris
A1  - Dörr, Christian
T1  - Cyber security threats to bitcoin exchanges
BT  - adversary exploitation and laundering techniques
JF  - IEEE transactions on network and service management : a publication of the IEEE
N2  - Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem.
KW  - Bitcoin
KW  - Computer crime
KW  - Cryptography
KW  - Ecosystems
KW  - Currencies
KW  - Industries
KW  - Vocabulary
KW  - cryptocurrency exchanges
KW  - cyber
KW  - security
KW  - cyber threat intelligence
KW  - attacks
KW  - vulnerabilities
KW  - forensics
Y1  - 2021
U6  - https://doi.org/10.1109/TNSM.2020.3046145
SN  - 1932-4537
VL  - 18
IS  - 2
SP  - 1616
EP  - 1628
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Schneider, Sven
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - A logic-based incremental approach to graph repair featuring delta preservation
JF  - International journal on software tools for technology transfer : STTT
N2  - We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.
KW  - Nested graph conditions
KW  - Graph repair
KW  - Model repair
KW  - Consistency
KW  - restoration
KW  - Delta preservation
KW  - Graph databases
KW  - Model-driven
KW  - engineering
Y1  - 2021
U6  - https://doi.org/10.1007/s10009-020-00584-x
SN  - 1433-2779
SN  - 1433-2787
VL  - 23
IS  - 3
SP  - 369
EP  - 410
PB  - Springer
CY  - Berlin ; Heidelberg
ER  - 
TY  - JOUR
A1  - Göbel, Andreas
A1  - Lagodzinski, Gregor J. A.
A1  - Seidel, Karen
T1  - Counting homomorphisms to trees modulo a prime
JF  - ACM transactions on computation theory : TOCT / Association for Computing Machinery
N2  - Many important graph-theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article, we study the complexity of #(p) HOMSTOH, the problem of counting graph homomorphisms from an input graph to a graph H modulo a prime number p. Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo 2, however, the influence of the structure of H on the tractability was shown to persist, which yields similar dichotomies. <br /> Our main result states that for every tree H and every prime p the problem #pHOMSTOH is either polynomial time computable or #P-p-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph H when counting modulo 2. In contrast to previous results on modular counting, the tractable cases of #pHOMSTOH are essentially the same for all values of the modulo when H is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime p. These results are the first suggesting that such dichotomies hold not only for the modulo 2 case but also for the modular counting functions of all primes p.
KW  - Graph homomorphisms
KW  - modular counting
KW  - complexity dichotomy
Y1  - 2021
U6  - https://doi.org/10.1145/3460958
SN  - 1942-3454
SN  - 1942-3462
VL  - 13
IS  - 3
SP  - 1
EP  - 33
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Krejca, Martin Stefan
T1  - A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes
JF  - Theoretical computer science
N2  - With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors.
KW  - Theory
KW  - Estimation-of-distribution algorithm
KW  - Run time analysis
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2020.11.028
SN  - 0304-3975
SN  - 1879-2294
VL  - 851
SP  - 121
EP  - 128
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Haarmann, Stephan
A1  - Holfter, Adrian
A1  - Pufahl, Luise
A1  - Weske, Mathias
T1  - Formal framework for checking compliance of data-driven case management
JF  - Journal on data semantics : JoDS
N2  - Business processes are often specified in descriptive or normative models. Both types of models should adhere to internal and external regulations, such as company guidelines or laws. Employing compliance checking techniques, it is possible to verify process models against rules. While traditionally compliance checking focuses on well-structured processes, we address case management scenarios. In case management, knowledge workers drive multi-variant and adaptive processes. Our contribution is based on the fragment-based case management approach, which splits a process into a set of fragments. The fragments are synchronized through shared data but can, otherwise, be dynamically instantiated and executed. We formalize case models using Petri nets. We demonstrate the formalization for design-time and run-time compliance checking and present a proof-of-concept implementation. The application of the implemented compliance checking approach to a use case exemplifies its effectiveness while designing a case model. The empirical evaluation on a set of case models for measuring the performance of the approach shows that rules can often be checked in less than a second.
KW  - Compliance checking
KW  - Case management
KW  - Model verification
KW  - Data-centric
KW  - processes
Y1  - 2021
U6  - https://doi.org/10.1007/s13740-021-00120-3
SN  - 1861-2032
SN  - 1861-2040
VL  - 10
IS  - 1-2
SP  - 143
EP  - 163
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - JOUR
A1  - Bonifati, Angela
A1  - Mior, Michael J.
A1  - Naumann, Felix
A1  - Noack, Nele Sina
T1  - How inclusive are we?
BT  - an analysis of gender diversity in database venues
JF  - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data
N2  - ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors. <br /> We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Y1  - 2022
U6  - https://doi.org/10.1145/3516431.3516438
SN  - 0163-5808
SN  - 1943-5835
VL  - 50
IS  - 4
SP  - 30
EP  - 35
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Shekhar, Sumit
A1  - Reimann, Max
A1  - Mayer, Maximilian
A1  - Semmo, Amir
A1  - Pasewaldt, Sebastian
A1  - Döllner, Jürgen
A1  - Trapp, Matthias
T1  - Interactive photo editing on smartphones via intrinsic decomposition
JF  - Computer graphics forum : journal of the European Association for Computer Graphics
N2  - Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data.
KW  - CCS Concepts
KW  - center dot Computing
KW  - methodologie
KW  - Image-based rendering
KW  - Image
KW  - processing
KW  - Computational photography
Y1  - 2021
U6  - https://doi.org/10.1111/cgf.142650
SN  - 0167-7055
SN  - 1467-8659
VL  - 40
SP  - 497
EP  - 510
PB  - Blackwell
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Neumann, Frank
A1  - Sutton, Andrew M.
T1  - Time Complexity Analysis of Evolutionary Algorithms on Random Satisfiable k-CNF Formulas
JF  - Algorithmica : an international journal in computer science
N2  - We contribute to the theoretical understanding of randomized search heuristics by investigating their optimization behavior on satisfiable random k-satisfiability instances both in the planted solution model and the uniform model conditional on satisfiability. Denoting the number of variables by n, our main technical result is that the simple () evolutionary algorithm with high probability finds a satisfying assignment in time when the clause-variable density is at least logarithmic. For low density instances, evolutionary algorithms seem to be less effective, and all we can show is a subexponential upper bound on the runtime for densities below . We complement these mathematical results with numerical experiments on a broader density spectrum. They indicate that, indeed, the () EA is less efficient on lower densities. Our experiments also suggest that the implicit constants hidden in our main runtime guarantee are low. Our main result extends and considerably improves the result obtained by Sutton and Neumann (Lect Notes Comput Sci 8672:942-951, 2014) in terms of runtime, minimum density, and clause length. These improvements are made possible by establishing a close fitness-distance correlation in certain parts of the search space. This approach might be of independent interest and could be useful for other average-case analyses of randomized search heuristics. While the notion of a fitness-distance correlation has been around for a long time, to the best of our knowledge, this is the first time that fitness-distance correlation is explicitly used to rigorously prove a performance statement for an evolutionary algorithm.
KW  - Runtime analysis
KW  - Satisfiability
KW  - Fitness-distance correlation
Y1  - 2016
U6  - https://doi.org/10.1007/s00453-016-0190-3
SN  - 0178-4617
SN  - 1432-0541
VL  - 78
SP  - 561
EP  - 586
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Boissier, Martin
T1  - Robust and budget-constrained encoding configurations for in-memory database systems
JF  - Proceedings of the VLDB Endowment
N2  - Data encoding has been applied to database systems for decades as it mitigates bandwidth bottlenecks and reduces storage requirements. But even in the presence of these advantages, most in-memory database systems use data encoding only conservatively as the negative impact on runtime performance can be severe. Real-world systems with large parts being infrequently accessed and cost efficiency constraints in cloud environments require solutions that automatically and efficiently select encoding techniques, including heavy-weight compression. In this paper, we introduce workload-driven approaches to automaticaly determine memory budget-constrained encoding configurations using greedy heuristics and linear programming. We show for TPC-H, TPC-DS, and the Join Order Benchmark that optimized encoding configurations can reduce the main memory footprint significantly without a loss in runtime performance over state-of-the-art dictionary encoding. To yield robust selections, we extend the linear programming-based approach to incorporate query runtime constraints and mitigate unexpected performance regressions.
KW  - General Earth and Planetary Sciences
KW  - Water Science and Technology
KW  - Geography, Planning and Development
Y1  - 2021
U6  - https://doi.org/10.14778/3503585.3503588
SN  - 2150-8097
VL  - 15
IS  - 4
SP  - 780
EP  - 793
PB  - Association for Computing Machinery (ACM)
CY  - [New York]
ER  - 
TY  - JOUR
A1  - Vitagliano, Gerardo
A1  - Jiang, Lan
A1  - Naumann, Felix
T1  - Detecting layout templates in complex multiregion files
JF  - Proceedings of the VLDB Endowment
N2  - Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files.
Y1  - 2022
U6  - https://doi.org/10.14778/3494124.3494145
SN  - 2150-8097
VL  - 15
IS  - 3
SP  - 646
EP  - 658
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Ghahremani, Sona
A1  - Giese, Holger
T1  - Evaluation of self-healing systems
BT  - An analysis of the state-of-the-art and required improvements
JF  - Computers
N2  - Evaluating the performance of self-adaptive systems is challenging due to their interactions with often highly dynamic environments. In the specific case of self-healing systems, the performance evaluations of self-healing approaches and their parameter tuning rely on the considered characteristics of failure occurrences and the resulting interactions with the self-healing actions. In this paper, we first study the state-of-the-art for evaluating the performances of self-healing systems by means of a systematic literature review. We provide a classification of different input types for such systems and analyse the limitations of each input type. A main finding is that the employed inputs are often not sophisticated regarding the considered characteristics for failure occurrences. To further study the impact of the identified limitations, we present experiments demonstrating that wrong assumptions regarding the characteristics of the failure occurrences can result in large performance prediction errors, disadvantageous design-time decisions concerning the selection of alternative self-healing approaches, and disadvantageous deployment-time decisions concerning parameter tuning. Furthermore, the experiments indicate that employing multiple alternative input characteristics can help with reducing the risk of premature disadvantageous design-time decisions.
KW  - self-healing
KW  - failure model
KW  - performance
KW  - simulation
KW  - evaluation
Y1  - 2020
U6  - https://doi.org/10.3390/computers9010016
SN  - 2073-431X
VL  - 9
IS  - 1
PB  - MDPI
CY  - Basel
ER  - 
TY  - GEN
A1  - Hesse, Günter
A1  - Matthies, Christoph
A1  - Sinzig, Werner
A1  - Uflacker, Matthias
T1  - Adding Value by Combining Business and Sensor Data
BT  - an Industry 4.0 Use Case
T2  - Database Systems for Advanced Applications
N2  - Industry 4.0 and the Internet of Things are recent developments that have lead to the creation of new kinds of manufacturing data. Linking this new kind of sensor data to traditional business information is crucial for enterprises to take advantage of the data’s full potential. In this paper, we present a demo which allows experiencing this data integration, both vertically between technical and business contexts and horizontally along the value chain. The tool simulates a manufacturing company, continuously producing both business and sensor data, and supports issuing ad-hoc queries that answer specific questions related to the business. In order to adapt to different environments, users can configure sensor characteristics to their needs.
KW  - Industry 4.0
KW  - Internet of Things
KW  - Data integration
Y1  - 2019
SN  - 978-3-030-18590-9
SN  - 978-3-030-18589-3
U6  - https://doi.org/10.1007/978-3-030-18590-9_80
SN  - 0302-9743
SN  - 1611-3349
VL  - 11448
SP  - 528
EP  - 532
PB  - Springer
CY  - Cham
ER  - 
TY  - JOUR
A1  - Wang, Cheng
A1  - Yang, Haojin
A1  - Meinel, Christoph
T1  - Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
JF  - ACM transactions on multimedia computing, communications, and applications
N2  - Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.
KW  - Deep learning
KW  - LSTM
KW  - multimodal representations
KW  - image captioning
KW  - mutli-task learning
Y1  - 2018
U6  - https://doi.org/10.1145/3115432
SN  - 1551-6857
SN  - 1551-6865
VL  - 14
IS  - 2
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - CHAP
A1  - Grüner, Andreas
A1  - Mühle, Alexander
A1  - Gayvoronskaya, Tatiana
A1  - Meinel, Christoph
T1  - A quantifiable trustmModel for Blockchain-based identity management
T2  - IEEE 2018 International Congress on Cybermatics / 2018 IEEE Conferences on Internet of Things, Green Computing and Communications, cyber, physical and Social Computing, Smart Data, Blockchain, Computer and Information Technology
KW  - Blockchain
KW  - distributed ledger technology
KW  - digital identity
KW  - self-sovereign identity
KW  - trust
KW  - identity management
Y1  - 2019
SN  - 978-1-5386-7975-3
U6  - https://doi.org/10.1109/Cybermatics_2018.2018.00250
SP  - 1475
EP  - 1482
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Chujfi-La-Roche, Salim
A1  - Meinel, Christoph
T1  - Matching cognitively sympathetic individual styles to develop collective intelligence in digital communities
JF  - AI & society : the journal of human-centred systems and machine intelligence
N2  - Creation, collection and retention of knowledge in digital communities is an activity that currently requires being explicitly targeted as a secure method of keeping intellectual capital growing in the digital era. In particular, we consider it relevant to analyze and evaluate the empathetic cognitive personalities and behaviors that individuals now have with the change from face-to-face communication (F2F) to computer-mediated communication (CMC) online. This document proposes a cyber-humanistic approach to enhance the traditional SECI knowledge management model. A cognitive perception is added to its cyclical process following design thinking interaction, exemplary for improvement of the method in which knowledge is continuously created, converted and shared. In building a cognitive-centered model, we specifically focus on the effective identification and response to cognitive stimulation of individuals, as they are the intellectual generators and multiplicators of knowledge in the online environment. Our target is to identify how geographically distributed-digital-organizations should align the individual's cognitive abilities to promote iteration and improve interaction as a reliable stimulant of collective intelligence. The new model focuses on analyzing the four different stages of knowledge processing, where individuals with sympathetic cognitive personalities can significantly boost knowledge creation in a virtual social system. For organizations, this means that multidisciplinary individuals can maximize their extensive potential, by externalizing their knowledge in the correct stage of the knowledge creation process, and by collaborating with their appropriate sympathetically cognitive remote peers.
KW  - argumentation research
KW  - cyber humanistic
KW  - cognition
KW  - collaboration
KW  - knowledge building
KW  - knowledge management
KW  - teamwork
KW  - virtual groups
Y1  - 2017
U6  - https://doi.org/10.1007/s00146-017-0780-x
SN  - 0951-5666
SN  - 1435-5655
VL  - 35
IS  - 1
SP  - 5
EP  - 15
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Torkura, Kennedy A.
A1  - Sukmana, Muhammad Ihsan Haikal
A1  - Cheng, Feng
A1  - Meinel, Christoph
T1  - CloudStrike
BT  - chaos engineering for security and resiliency in cloud infrastructure
JF  - IEEE access : practical research, open solutions
N2  - Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.
KW  - cloud security
KW  - security chaos engineering
KW  - resilient architectures
KW  - security risk assessment
Y1  - 2020
U6  - https://doi.org/10.1109/ACCESS.2020.3007338
SN  - 2169-3536
VL  - 8
SP  - 123044
EP  - 123060
PB  - Institute of Electrical and Electronics Engineers 
CY  - Piscataway
ER  - 
TY  - JOUR
A1  - Grüner, Andreas
A1  - Mühle, Alexander
A1  - Meinel, Christoph
T1  - ATIB
BT  - Design and evaluation of an architecture for brokered self-sovereign identity integration and trust-enhancing attribute aggregation for service provider
JF  - IEEE access : practical research, open solutions / Institute of Electrical and Electronics Engineers
N2  - Identity management is a principle component of securing online services. In the advancement of traditional identity management patterns, the identity provider remained a Trusted Third Party (TTP). The service provider and the user need to trust a particular identity provider for correct attributes amongst other demands. This paradigm changed with the invention of blockchain-based Self-Sovereign Identity (SSI) solutions that primarily focus on the users. SSI reduces the functional scope of the identity provider to an attribute provider while enabling attribute aggregation. Besides that, the development of new protocols, disregarding established protocols and a significantly fragmented landscape of SSI solutions pose considerable challenges for an adoption by service providers. We propose an Attribute Trust-enhancing Identity Broker (ATIB) to leverage the potential of SSI for trust-enhancing attribute aggregation. Furthermore, ATIB abstracts from a dedicated SSI solution and offers standard protocols. Therefore, it facilitates the adoption by service providers. Despite the brokered integration approach, we show that ATIB provides a high security posture. Additionally, ATIB does not compromise the ten foundational SSI principles for the users.
KW  - Blockchains
KW  - Protocols
KW  - Authentication
KW  - Licenses
KW  - Security
KW  - Privacy
KW  - Identity management systems
KW  - Attribute aggregation
KW  - attribute assurance
KW  - digital identity
KW  - identity broker
KW  - self-sovereign identity
KW  - trust model
Y1  - 2021
U6  - https://doi.org/10.1109/ACCESS.2021.3116095
SN  - 2169-3536
VL  - 9
SP  - 138553
EP  - 138570
PB  - Institute of Electrical and Electronics Engineers
CY  - New York, NY
ER  - 
TY  - THES
A1  - Wang, Cheng
T1  - Deep Learning of Multimodal Representations
Y1  - 2016
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Integrative biomarker detection on high-dimensional gene expression data sets
BT  - a survey on prior knowledge approaches
JF  - Briefings in bioinformatics
N2  - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
KW  - gene selection
KW  - external knowledge bases
KW  - biomarker detection
KW  - gene
KW  - expression
KW  - prior knowledge
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbaa151
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Loster, Michael
A1  - Koumarelas, Ioannis
A1  - Naumann, Felix
T1  - Knowledge transfer for entity resolution with siamese neural networks
JF  - ACM journal of data and information quality
N2  - The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.
KW  - Entity resolution
KW  - duplicate detection
KW  - transfer learning
KW  - neural
KW  - networks
KW  - metric learning
KW  - similarity learning
KW  - data quality
Y1  - 2021
U6  - https://doi.org/10.1145/3410157
SN  - 1936-1955
SN  - 1936-1963
VL  - 13
IS  - 1
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Kaitoua, Abdulrahman
A1  - Rabl, Tilmann
A1  - Markl, Volker
T1  - A distributed data exchange engine for polystores
JF  - Information technology : methods and applications of informatics and information technology
JF  - Information technology : Methoden und innovative Anwendungen der Informatik und Informationstechnik
N2  - There is an increasing interest in fusing data from heterogeneous sources. Combining data sources increases the utility of existing datasets, generating new information and creating services of higher quality. A central issue in working with heterogeneous sources is data migration: In order to share and process data in different engines, resource intensive and complex movements and transformations between computing engines, services, and stores are necessary.
Muses is a distributed, high-performance data migration engine that is able to interconnect distributed data stores by forwarding, transforming, repartitioning, or broadcasting data among distributed engines' instances in a resource-, cost-, and performance-adaptive manner. As such, it performs seamless information sharing across all participating resources in a standard, modular manner. We show an overall improvement of 30 % for pipelining jobs across multiple engines, even when we count the overhead of Muses in the execution time. This performance gain implies that Muses can be used to optimise large pipelines that leverage multiple engines.
KW  - distributed systems
KW  - data migration
KW  - data transformation
KW  - big data
KW  - engine
KW  - data integration
Y1  - 2020
U6  - https://doi.org/10.1515/itit-2019-0037
SN  - 1611-2776
SN  - 2196-7032
VL  - 62
IS  - 3-4
SP  - 145
EP  - 156
PB  - De Gruyter
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Dreseler, Markus
A1  - Boissier, Martin
A1  - Rabl, Tilmann
A1  - Uflacker, Matthias
T1  - Quantifying TPC-H choke points and their optimizations
JF  - Proceedings of the VLDB Endowment
N2  - TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.
This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly ifluenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.
Y1  - 2020
U6  - https://doi.org/10.14778/3389133.3389138
SN  - 2150-8097
VL  - 13
IS  - 8
SP  - 1206
EP  - 1220
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Borchert, Florian
A1  - Mock, Andreas
A1  - Tomczak, Aurelie
A1  - Hügel, Jonas
A1  - Alkarkoukly, Samer
A1  - Knurr, Alexander
A1  - Volckmar, Anna-Lena
A1  - Stenzinger, Albrecht
A1  - Schirmacher, Peter
A1  - Debus, Jürgen
A1  - Jäger, Dirk
A1  - Longerich, Thomas
A1  - Fröhling, Stefan
A1  - Eils, Roland
A1  - Bougatf, Nina
A1  - Sax, Ulrich
A1  - Schapranow, Matthieu-Patrick
T1  - Correction to: Knowledge bases and software support for variant interpretation in precision oncology
JF  - Briefings in bioinformatics
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbab246
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 6
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Roostapour, Vahid
A1  - Neumann, Aneta
A1  - Neumann, Frank
A1  - Friedrich, Tobias
T1  - Pareto optimization for subset selection with dynamic cost constraints
JF  - Artificial intelligence
N2  - We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.
KW  - Subset selection
KW  - Submodular function
KW  - Multi-objective optimization
KW  - Runtime analysis
Y1  - 2022
U6  - https://doi.org/10.1016/j.artint.2021.103597
SN  - 0004-3702
SN  - 1872-7921
VL  - 302
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Taleb, Aiham
A1  - Rohrer, Csaba
A1  - Bergner, Benjamin
A1  - De Leon, Guilherme
A1  - Rodrigues, Jonas Almeida
A1  - Schwendicke, Falk
A1  - Lippert, Christoph
A1  - Krois, Joachim
T1  - Self-supervised learning methods for label-efficient dental caries classification
JF  - Diagnostics : open access journal
N2  - High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data. 

In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist. 

This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency. 
In other words, the resulting models can be fine-tuned using few labels (annotations). 

Our results show that using as few as 18 annotations can produce >= 45% sensitivity, which is comparable to human-level diagnostic performance. 
This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive.
KW  - unsupervised methods
KW  - self-supervised learning
KW  - representation learning
KW  - dental caries classification
KW  - data driven approaches
KW  - annotation
KW  - efficient deep learning
Y1  - 2022
U6  - https://doi.org/10.3390/diagnostics12051237
SN  - 2075-4418
VL  - 12
IS  - 5
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Pfitzner, Bjarne
A1  - Steckhan, Nico
A1  - Arnrich, Bert
T1  - Federated learning in a medical context
BT  - a systematic literature review
JF  - ACM transactions on internet technology : TOIT / Association for Computing
N2  - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
KW  - Federated learning
Y1  - 2021
U6  - https://doi.org/10.1145/3412357
SN  - 1533-5399
SN  - 1557-6051
VL  - 21
IS  - 2
SP  - 1
EP  - 31
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - De Freitas, Jessica K.
A1  - Johnson, Kipp W.
A1  - Golden, Eddye
A1  - Nadkarni, Girish N.
A1  - Dudley, Joel T.
A1  - Böttinger, Erwin
A1  - Glicksberg, Benjamin S.
A1  - Miotto, Riccardo
T1  - Phe2vec
BT  - Automated disease phenotyping based on unsupervised embeddings from electronic health records
JF  - Patterns
N2  - Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Y1  - 2021
U6  - https://doi.org/10.1016/j.patter.2021.100337
SN  - 2666-3899
VL  - 2
IS  - 9
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Borchert, Florian
A1  - Mock, Andreas
A1  - Tomczak, Aurelie
A1  - Hügel, Jonas
A1  - Alkarkoukly, Samer
A1  - Knurr, Alexander
A1  - Volckmar, Anna-Lena
A1  - Stenzinger, Albrecht
A1  - Schirmacher, Peter
A1  - Debus, Jürgen
A1  - Jäger, Dirk
A1  - Longerich, Thomas
A1  - Fröhling, Stefan
A1  - Eils, Roland
A1  - Bougatf, Nina
A1  - Sax, Ulrich
A1  - Schapranow, Matthieu-Patrick
T1  - Knowledge bases and software support for variant interpretation in precision oncology
JF  - Briefings in bioinformatics
N2  - Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process.
KW  - HiGHmed
KW  - personalized medicine
KW  - molecular tumor board
KW  - data integration
KW  - cancer therapy
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbab134
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 6
PB  - Oxford Univ. Press
CY  - Oxford
ER  -