TY - JOUR
A1 - Mattis, Toni
A1 - Beckmann, Tom
A1 - Rein, Patrick
A1 - Hirschfeld, Robert
T1 - First-class concepts
BT - Reified architectural knowledge beyond dominant decompositions
JF - Journal of object technology : JOT / ETH Zürich, Department of Computer Science
N2 - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge.
Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment.
In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs.
KW - software engineering
KW - modularity
KW - exploratory programming
KW - program
KW - comprehension
KW - remodularization
KW - architecture recovery
Y1 - 2022
U6 - https://doi.org/10.5381/jot.2022.21.2.a6
SN - 1660-1769
VL - 21
IS - 2
SP - 1
EP - 15
PB - ETH Zürich, Department of Computer Science
CY - Zürich
ER -
TY - JOUR
A1 - Schmidl, Sebastian
A1 - Papenbrock, Thorsten
T1 - Efficient distributed discovery of bidirectional order dependencies
JF - The VLDB journal
N2 - Bidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.
KW - Bidirectional order dependencies
KW - Distributed computing
KW - Actor
KW - programming
KW - Parallelization
KW - Data profiling
KW - Dependency discovery
Y1 - 2021
U6 - https://doi.org/10.1007/s00778-021-00683-4
SN - 1066-8888
SN - 0949-877X
VL - 31
IS - 1
SP - 49
EP - 74
PB - Springer
CY - Berlin ; Heidelberg ; New York
ER -
TY - JOUR
A1 - Bonifati, Angela
A1 - Mior, Michael J.
A1 - Naumann, Felix
A1 - Noack, Nele Sina
T1 - How inclusive are we?
BT - an analysis of gender diversity in database venues
JF - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data
N2 - ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors.
We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science.
Y1 - 2022
U6 - https://doi.org/10.1145/3516431.3516438
SN - 0163-5808
SN - 1943-5835
VL - 50
IS - 4
SP - 30
EP - 35
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Roostapour, Vahid
A1 - Neumann, Aneta
A1 - Neumann, Frank
A1 - Friedrich, Tobias
T1 - Pareto optimization for subset selection with dynamic cost constraints
JF - Artificial intelligence
N2 - We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases.
KW - Subset selection
KW - Submodular function
KW - Multi-objective optimization
KW - Runtime analysis
Y1 - 2022
U6 - https://doi.org/10.1016/j.artint.2021.103597
SN - 0004-3702
SN - 1872-7921
VL - 302
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Gévay, Gábor E.
A1 - Rabl, Tilmann
A1 - Breß, Sebastian
A1 - Madai-Tahy, Loránd
A1 - Quiané-Ruiz, Jorge-Arnulfo
A1 - Markl, Volker
T1 - Imperative or functional control flow handling
BT - why not the best of both worlds?
JF - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data
N2 - Modern data analysis tasks often involve control flow statements, such as the iterations in PageRank and K-means. To achieve scalability, developers usually implement these tasks in distributed dataflow systems, such as Spark and Flink. Designers of such systems have to choose between providing imperative or functional control flow constructs to users. Imperative constructs are easier to use, but functional constructs are easier to compile to an efficient dataflow job. We propose Mitos, a system where control flow is both easy to use and efficient. Mitos relies on an intermediate representation based on the static single assignment form. This allows us to abstract away from specific control flow constructs and treat any imperative control flow uniformly both when building the dataflow job and when coordinating the distributed execution.
Y1 - 2022
U6 - https://doi.org/10.1145/3542700.3542715
SN - 0163-5808
VL - 51
IS - 1
SP - 60
EP - 67
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Schneider, Johannes
A1 - Wenig, Phillip
A1 - Papenbrock, Thorsten
T1 - Distributed detection of sequential anomalies in univariate time series
JF - The VLDB journal : the international journal on very large data bases
N2 - The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
KW - Distributed programming
KW - Sequential anomaly
KW - Actor model
KW - Data mining
KW - Time series
Y1 - 2021
U6 - https://doi.org/10.1007/s00778-021-00657-6
SN - 1066-8888
SN - 0949-877X
VL - 30
IS - 4
SP - 579
EP - 602
PB - Springer
CY - Berlin
ER -
TY - JOUR
A1 - Pfitzner, Bjarne
A1 - Steckhan, Nico
A1 - Arnrich, Bert
T1 - Federated learning in a medical context
BT - a systematic literature review
JF - ACM transactions on internet technology : TOIT / Association for Computing
N2 - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
KW - Federated learning
Y1 - 2021
U6 - https://doi.org/10.1145/3412357
SN - 1533-5399
SN - 1557-6051
VL - 21
IS - 2
SP - 1
EP - 31
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Lambers, Leen
A1 - Orejas, Fernando
T1 - Transformation rules with nested application conditions
BT - critical pairs, initial conflicts & minimality
JF - Theoretical computer science
N2 - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet.
In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
KW - Graph transformation
KW - Critical pairs
KW - Initial conflicts
KW - Application
KW - conditions
Y1 - 2021
U6 - https://doi.org/10.1016/j.tcs.2021.07.023
SN - 0304-3975
SN - 1879-2294
VL - 884
SP - 44
EP - 67
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Koßmann, Jan
A1 - Papenbrock, Thorsten
A1 - Naumann, Felix
T1 - Data dependencies for query optimization
BT - a survey
JF - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment
N2 - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
KW - Query optimization
KW - Query execution
KW - Data dependencies
KW - Data profiling
KW - Unique column combinations
KW - Functional dependencies
KW - Order dependencies
KW - Inclusion dependencies
KW - Relational data
KW - SQL
Y1 - 2021
U6 - https://doi.org/10.1007/s00778-021-00676-3
SN - 1066-8888
SN - 0949-877X
VL - 31
IS - 1
SP - 1
EP - 22
PB - Springer
CY - Berlin ; Heidelberg ; New York
ER -
TY - JOUR
A1 - Quinzan, Francesco
A1 - Göbel, Andreas
A1 - Wagner, Markus
A1 - Friedrich, Tobias
T1 - Evolutionary algorithms and submodular functions
BT - benefits of heavy-tailed mutations
JF - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal
N2 - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
KW - Evolutionary algorithms
KW - Mutation operators
KW - Submodular functions
KW - Matroids
Y1 - 2021
U6 - https://doi.org/10.1007/s11047-021-09841-7
SN - 1572-9796
VL - 20
IS - 3
SP - 561
EP - 575
PB - Springer Science + Business Media B.V.
CY - Dordrecht
ER -