Refine
Year of publication
- 2021 (35) (remove)
Language
- English (35) (remove)
Is part of the Bibliography
- yes (35)
Keywords
- COVID-19 (3)
- Functional dependencies (2)
- data profiling (2)
- databases (2)
- dialysis (2)
- machine learning (2)
- AKI (1)
- Actor model (1)
- Application (1)
- Approximation algorithms (1)
- Artificial neural networks (1)
- Attribute aggregation (1)
- Authentication (1)
- Big Data (1)
- Bitcoin (1)
- Blockchains (1)
- CCS Concepts (1)
- Case management (1)
- Complexity theory (1)
- Compliance checking (1)
- Computational photography (1)
- Computer crime (1)
- Consistency (1)
- Critical pairs (1)
- Cryptography (1)
- Currencies (1)
- Data dependencies (1)
- Data mining (1)
- Data profiling (1)
- Data-centric (1)
- Delta preservation (1)
- Distributed (1)
- Distributed programming (1)
- Ecosystems (1)
- Electrocardiography (1)
- Entity resolution (1)
- Estimation-of-distribution algorithm (1)
- Evolutionary algorithms (1)
- Federated learning (1)
- Gender Inequality (1)
- General Earth and Planetary Sciences (1)
- Geography, Planning and Development (1)
- Graph databases (1)
- Graph homomorphisms (1)
- Graph repair (1)
- Graph transformation (1)
- HiGHmed (1)
- IMU (1)
- Identity management systems (1)
- Image (1)
- Image-based rendering (1)
- Inclusion dependencies (1)
- Industries (1)
- Initial conflicts (1)
- Lakes (1)
- Licenses (1)
- Machine (1)
- Matroids (1)
- Model repair (1)
- Model verification (1)
- Model-driven (1)
- Mutation operators (1)
- Nested graph conditions (1)
- Order dependencies (1)
- PPMI (parkinson's progression markers initiative) (1)
- Privacy (1)
- Protocols (1)
- QRS detection (1)
- Query execution (1)
- Query optimization (1)
- R Shiny (1)
- Relational data (1)
- Run time analysis (1)
- SQL (1)
- Scientific Publication Indicators (1)
- Security (1)
- Semantics (1)
- Sequential anomaly (1)
- Sexism (1)
- Signal-to-noise ratio (1)
- Submodular functions (1)
- Theory (1)
- Time series (1)
- Unique column combination (1)
- Unique column combinations (1)
- Vocabulary (1)
- W[3]-completeness (1)
- Water Science and Technology (1)
- abundance (1)
- acute renal failure (1)
- analysis (1)
- application (1)
- attacks (1)
- attribute assurance (1)
- biomarker detection (1)
- cancer therapy (1)
- center dot Computing (1)
- chronic dialysis (1)
- clinical (1)
- clinical nephrology (1)
- complexity dichotomy (1)
- conditions (1)
- cryptocurrency exchanges (1)
- cyber (1)
- cyber threat intelligence (1)
- data cleansing (1)
- data integration (1)
- data quality (1)
- dependency (1)
- differential (1)
- digital identity (1)
- duplicate detection (1)
- ePA (1)
- electronic health records (1)
- end-stage kidney disease (1)
- endogenous (1)
- engineering (1)
- enumeration complexity (1)
- epistasis (1)
- experimental design (1)
- expression (1)
- external knowledge bases (1)
- factual (1)
- feature selection (1)
- forensics (1)
- functional dependency (1)
- gene (1)
- gene selection (1)
- health information privacy concern (1)
- identity broker (1)
- inclusion (1)
- information storage and (1)
- label-free (1)
- learning (1)
- methodologie (1)
- metric learning (1)
- mixed-methods (1)
- modular counting (1)
- molecular tumor board (1)
- motion capture (1)
- motivations (1)
- networks (1)
- neural (1)
- orthopedic (1)
- parameterized complexity (1)
- parkinson's disease (1)
- parsimonious reduction (1)
- personal electronic health records (1)
- personalized medicine (1)
- prediction (1)
- prior knowledge (1)
- processes (1)
- processing (1)
- proteomics (1)
- reliability (1)
- restoration (1)
- retrieval (1)
- security (1)
- self-sovereign identity (1)
- similarity learning (1)
- software/instrumentation (1)
- technology adoption (1)
- transfer learning (1)
- transversal hypergraph (1)
- trust model (1)
- vulnerabilities (1)
- wearable movement sensor (1)
- workflow (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (35) (remove)
The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson's Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson's disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem.
Detection of the QRS complex is a long-standing topic in the context of electrocardiography and many algorithms build upon the knowledge of the QRS positions. Although the first solutions to this problem were proposed in the 1970s and 1980s, there is still potential for improvements. Advancements in neural network technology made in recent years also lead to the emergence of enhanced QRS detectors based on artificial neural networks. In this work, we propose a method for assessing the certainty that is in each of the detected QRS complexes, i.e. how confident the QRS detector is that there is, in fact, a QRS complex in the position where it was detected. We further show how this metric can be utilised to distinguish correctly detected QRS complexes from false detections.
Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet.
In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
Germany's electronic patient record ("ePA") launched in 2021 with several attempts and years of delay. The development of such a large-scale project is a complex task, and so is its adoption. Individual attitudes towards an electronic health record are crucial, as individuals can reject opting-in to it and making any national efforts unachievable. Although the integration of an electronic health record serves potential benefits, it also constitutes risks for an individual's privacy. With a mixed-methods study design, this work provides evidence that different types of motivations and contextual privacy antecedents affect usage intentions towards the ePA. Most significantly, individual motivations stemming from feelings of volition or external mandates positively affect ePA adoption, although internal incentives are more powerful.
With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.