Refine
Year of publication
- 2021 (35) (remove)
Language
- English (35) (remove)
Is part of the Bibliography
- yes (35)
Keywords
- COVID-19 (3)
- Functional dependencies (2)
- data profiling (2)
- databases (2)
- dialysis (2)
- machine learning (2)
- AKI (1)
- Actor model (1)
- Application (1)
- Approximation algorithms (1)
- Artificial neural networks (1)
- Attribute aggregation (1)
- Authentication (1)
- Big Data (1)
- Bitcoin (1)
- Blockchains (1)
- CCS Concepts (1)
- Case management (1)
- Complexity theory (1)
- Compliance checking (1)
- Computational photography (1)
- Computer crime (1)
- Consistency (1)
- Critical pairs (1)
- Cryptography (1)
- Currencies (1)
- Data dependencies (1)
- Data mining (1)
- Data profiling (1)
- Data-centric (1)
- Delta preservation (1)
- Distributed (1)
- Distributed programming (1)
- Ecosystems (1)
- Electrocardiography (1)
- Entity resolution (1)
- Estimation-of-distribution algorithm (1)
- Evolutionary algorithms (1)
- Federated learning (1)
- Gender Inequality (1)
- General Earth and Planetary Sciences (1)
- Geography, Planning and Development (1)
- Graph databases (1)
- Graph homomorphisms (1)
- Graph repair (1)
- Graph transformation (1)
- HiGHmed (1)
- IMU (1)
- Identity management systems (1)
- Image (1)
- Image-based rendering (1)
- Inclusion dependencies (1)
- Industries (1)
- Initial conflicts (1)
- Lakes (1)
- Licenses (1)
- Machine (1)
- Matroids (1)
- Model repair (1)
- Model verification (1)
- Model-driven (1)
- Mutation operators (1)
- Nested graph conditions (1)
- Order dependencies (1)
- PPMI (parkinson's progression markers initiative) (1)
- Privacy (1)
- Protocols (1)
- QRS detection (1)
- Query execution (1)
- Query optimization (1)
- R Shiny (1)
- Relational data (1)
- Run time analysis (1)
- SQL (1)
- Scientific Publication Indicators (1)
- Security (1)
- Semantics (1)
- Sequential anomaly (1)
- Sexism (1)
- Signal-to-noise ratio (1)
- Submodular functions (1)
- Theory (1)
- Time series (1)
- Unique column combination (1)
- Unique column combinations (1)
- Vocabulary (1)
- W[3]-completeness (1)
- Water Science and Technology (1)
- abundance (1)
- acute renal failure (1)
- analysis (1)
- application (1)
- attacks (1)
- attribute assurance (1)
- biomarker detection (1)
- cancer therapy (1)
- center dot Computing (1)
- chronic dialysis (1)
- clinical (1)
- clinical nephrology (1)
- complexity dichotomy (1)
- conditions (1)
- cryptocurrency exchanges (1)
- cyber (1)
- cyber threat intelligence (1)
- data cleansing (1)
- data integration (1)
- data quality (1)
- dependency (1)
- differential (1)
- digital identity (1)
- duplicate detection (1)
- ePA (1)
- electronic health records (1)
- end-stage kidney disease (1)
- endogenous (1)
- engineering (1)
- enumeration complexity (1)
- epistasis (1)
- experimental design (1)
- expression (1)
- external knowledge bases (1)
- factual (1)
- feature selection (1)
- forensics (1)
- functional dependency (1)
- gene (1)
- gene selection (1)
- health information privacy concern (1)
- identity broker (1)
- inclusion (1)
- information storage and (1)
- label-free (1)
- learning (1)
- methodologie (1)
- metric learning (1)
- mixed-methods (1)
- modular counting (1)
- molecular tumor board (1)
- motion capture (1)
- motivations (1)
- networks (1)
- neural (1)
- orthopedic (1)
- parameterized complexity (1)
- parkinson's disease (1)
- parsimonious reduction (1)
- personal electronic health records (1)
- personalized medicine (1)
- prediction (1)
- prior knowledge (1)
- processes (1)
- processing (1)
- proteomics (1)
- reliability (1)
- restoration (1)
- retrieval (1)
- security (1)
- self-sovereign identity (1)
- similarity learning (1)
- software/instrumentation (1)
- technology adoption (1)
- transfer learning (1)
- transversal hypergraph (1)
- trust model (1)
- vulnerabilities (1)
- wearable movement sensor (1)
- workflow (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (35) (remove)
The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet.
In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
Germany's electronic patient record ("ePA") launched in 2021 with several attempts and years of delay. The development of such a large-scale project is a complex task, and so is its adoption. Individual attitudes towards an electronic health record are crucial, as individuals can reject opting-in to it and making any national efforts unachievable. Although the integration of an electronic health record serves potential benefits, it also constitutes risks for an individual's privacy. With a mixed-methods study design, this work provides evidence that different types of motivations and contextual privacy antecedents affect usage intentions towards the ePA. Most significantly, individual motivations stemming from feelings of volition or external mandates positively affect ePA adoption, although internal incentives are more powerful.
With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem.
We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.
Many important graph-theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article, we study the complexity of #(p) HOMSTOH, the problem of counting graph homomorphisms from an input graph to a graph H modulo a prime number p. Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo 2, however, the influence of the structure of H on the tractability was shown to persist, which yields similar dichotomies. <br /> Our main result states that for every tree H and every prime p the problem #pHOMSTOH is either polynomial time computable or #P-p-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph H when counting modulo 2. In contrast to previous results on modular counting, the tractable cases of #pHOMSTOH are essentially the same for all values of the modulo when H is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime p. These results are the first suggesting that such dichotomies hold not only for the modulo 2 case but also for the modular counting functions of all primes p.
A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes
(2021)
With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors.