TY - JOUR
A1 - Mattis, Toni
A1 - Beckmann, Tom
A1 - Rein, Patrick
A1 - Hirschfeld, Robert
T1 - First-class concepts
BT - Reified architectural knowledge beyond dominant decompositions
JF - Journal of object technology : JOT / ETH Zürich, Department of Computer Science
N2 - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge.
Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment.
In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs.
KW - software engineering
KW - modularity
KW - exploratory programming
KW - program
KW - comprehension
KW - remodularization
KW - architecture recovery
Y1 - 2022
U6 - https://doi.org/10.5381/jot.2022.21.2.a6
SN - 1660-1769
VL - 21
IS - 2
SP - 1
EP - 15
PB - ETH Zürich, Department of Computer Science
CY - Zürich
ER -
TY - JOUR
A1 - Schmidl, Sebastian
A1 - Papenbrock, Thorsten
T1 - Efficient distributed discovery of bidirectional order dependencies
JF - The VLDB journal
N2 - Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
KW - Bidirectional order dependencies
KW - Distributed computing
KW - Actor
KW - programming
KW - Parallelization
KW - Data profiling
KW - Dependency discovery
Y1 - 2021
U6 - https://doi.org/10.1007/s00778-021-00683-4
SN - 1066-8888
SN - 0949-877X
VL - 31
IS - 1
SP - 49
EP - 74
PB - Springer
CY - Berlin ; Heidelberg ; New York
ER -
TY - JOUR
A1 - Bonifati, Angela
A1 - Mior, Michael J.
A1 - Naumann, Felix
A1 - Noack, Nele Sina
T1 - How inclusive are we?
BT - an analysis of gender diversity in database venues
JF - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data
N2 - ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors.
We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Y1 - 2022
U6 - https://doi.org/10.1145/3516431.3516438
SN - 0163-5808
SN - 1943-5835
VL - 50
IS - 4
SP - 30
EP - 35
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Roostapour, Vahid
A1 - Neumann, Aneta
A1 - Neumann, Frank
A1 - Friedrich, Tobias
T1 - Pareto optimization for subset selection with dynamic cost constraints
JF - Artificial intelligence
N2 - We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.
KW - Subset selection
KW - Submodular function
KW - Multi-objective optimization
KW - Runtime analysis
Y1 - 2022
U6 - https://doi.org/10.1016/j.artint.2021.103597
SN - 0004-3702
SN - 1872-7921
VL - 302
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Taleb, Aiham
A1 - Rohrer, Csaba
A1 - Bergner, Benjamin
A1 - De Leon, Guilherme
A1 - Rodrigues, Jonas Almeida
A1 - Schwendicke, Falk
A1 - Lippert, Christoph
A1 - Krois, Joachim
T1 - Self-supervised learning methods for label-efficient dental caries classification
JF - Diagnostics : open access journal
N2 - High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data.
In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist.
This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency.
In other words, the resulting models can be fine-tuned using few labels (annotations).
Our results show that using as few as 18 annotations can produce >= 45% sensitivity, which is comparable to human-level diagnostic performance.
This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive.
KW - unsupervised methods
KW - self-supervised learning
KW - representation learning
KW - dental caries classification
KW - data driven approaches
KW - annotation
KW - efficient deep learning
Y1 - 2022
U6 - https://doi.org/10.3390/diagnostics12051237
SN - 2075-4418
VL - 12
IS - 5
PB - MDPI
CY - Basel
ER -
TY - JOUR
A1 - Lambers, Leen
A1 - Orejas, Fernando
T1 - Transformation rules with nested application conditions
BT - critical pairs, initial conflicts & minimality
JF - Theoretical computer science
N2 - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet.
In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
KW - Graph transformation
KW - Critical pairs
KW - Initial conflicts
KW - Application
KW - conditions
Y1 - 2021
U6 - https://doi.org/10.1016/j.tcs.2021.07.023
SN - 0304-3975
SN - 1879-2294
VL - 884
SP - 44
EP - 67
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Koßmann, Jan
A1 - Papenbrock, Thorsten
A1 - Naumann, Felix
T1 - Data dependencies for query optimization
BT - a survey
JF - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment
N2 - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
KW - Query optimization
KW - Query execution
KW - Data dependencies
KW - Data profiling
KW - Unique column combinations
KW - Functional dependencies
KW - Order dependencies
KW - Inclusion dependencies
KW - Relational data
KW - SQL
Y1 - 2021
U6 - https://doi.org/10.1007/s00778-021-00676-3
SN - 1066-8888
SN - 0949-877X
VL - 31
IS - 1
SP - 1
EP - 22
PB - Springer
CY - Berlin ; Heidelberg ; New York
ER -
TY - JOUR
A1 - Quinzan, Francesco
A1 - Göbel, Andreas
A1 - Wagner, Markus
A1 - Friedrich, Tobias
T1 - Evolutionary algorithms and submodular functions
BT - benefits of heavy-tailed mutations
JF - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal
N2 - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
KW - Evolutionary algorithms
KW - Mutation operators
KW - Submodular functions
KW - Matroids
Y1 - 2021
U6 - https://doi.org/10.1007/s11047-021-09841-7
SN - 1572-9796
VL - 20
IS - 3
SP - 561
EP - 575
PB - Springer Science + Business Media B.V.
CY - Dordrecht
ER -
TY - JOUR
A1 - Oosthoek, Kris
A1 - Dörr, Christian
T1 - Cyber security threats to bitcoin exchanges
BT - adversary exploitation and laundering techniques
JF - IEEE transactions on network and service management : a publication of the IEEE
N2 - Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem.
KW - Bitcoin
KW - Computer crime
KW - Cryptography
KW - Ecosystems
KW - Currencies
KW - Industries
KW - Vocabulary
KW - cryptocurrency exchanges
KW - cyber
KW - security
KW - cyber threat intelligence
KW - attacks
KW - vulnerabilities
KW - forensics
Y1 - 2021
U6 - https://doi.org/10.1109/TNSM.2020.3046145
SN - 1932-4537
VL - 18
IS - 2
SP - 1616
EP - 1628
PB - IEEE
CY - New York
ER -
TY - JOUR
A1 - Schneider, Sven
A1 - Lambers, Leen
A1 - Orejas, Fernando
T1 - A logic-based incremental approach to graph repair featuring delta preservation
JF - International journal on software tools for technology transfer : STTT
N2 - We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.
KW - Nested graph conditions
KW - Graph repair
KW - Model repair
KW - Consistency
KW - restoration
KW - Delta preservation
KW - Graph databases
KW - Model-driven
KW - engineering
Y1 - 2021
U6 - https://doi.org/10.1007/s10009-020-00584-x
SN - 1433-2779
SN - 1433-2787
VL - 23
IS - 3
SP - 369
EP - 410
PB - Springer
CY - Berlin ; Heidelberg
ER -