TY  - JOUR
A1  - Schneider, Johannes
A1  - Wenig, Phillip
A1  - Papenbrock, Thorsten
T1  - Distributed detection of sequential anomalies in univariate time series
JF  - The VLDB journal : the international journal on very large data bases
N2  - The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.
KW  - Distributed programming
KW  - Sequential anomaly
KW  - Actor model
KW  - Data mining
KW  - Time series
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00657-6
SN  - 1066-8888
SN  - 0949-877X
VL  - 30
IS  - 4
SP  - 579
EP  - 602
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - Transformation rules with nested application conditions
BT  - critical pairs, initial conflicts & minimality
JF  - Theoretical computer science
N2  - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet. 

In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts.
KW  - Graph transformation
KW  - Critical pairs
KW  - Initial conflicts
KW  - Application
KW  - conditions
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2021.07.023
SN  - 0304-3975
SN  - 1879-2294
VL  - 884
SP  - 44
EP  - 67
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Henkenjohann, Richard
T1  - Role of individual motivations and privacy concerns in the adoption of German electronic patient record apps
BT  - a mixed-methods study
JF  - International journal of environmental research and public health : IJERPH / Molecular Diversity Preservation International
N2  - Germany's electronic patient record ("ePA") launched in 2021 with several attempts and years of delay. The development of such a large-scale project is a complex task, and so is its adoption. Individual attitudes towards an electronic health record are crucial, as individuals can reject opting-in to it and making any national efforts unachievable. Although the integration of an electronic health record serves potential benefits, it also constitutes risks for an individual's privacy. With a mixed-methods study design, this work provides evidence that different types of motivations and contextual privacy antecedents affect usage intentions towards the ePA. Most significantly, individual motivations stemming from feelings of volition or external mandates positively affect ePA adoption, although internal incentives are more powerful.
KW  - personal electronic health records
KW  - technology adoption
KW  - endogenous
KW  - motivations
KW  - health information privacy concern
KW  - mixed-methods
KW  - ePA
Y1  - 2021
U6  - https://doi.org/10.3390/ijerph18189553
SN  - 1660-4601
VL  - 18
IS  - 18
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Caruccio, Loredana
A1  - Deufemia, Vincenzo
A1  - Naumann, Felix
A1  - Polese, Giuseppe
T1  - Discovering relaxed functional dependencies based on multi-attribute dominance
JF  - IEEE transactions on knowledge and data engineering
N2  - With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
KW  - Complexity theory
KW  - Approximation algorithms
KW  - Big Data
KW  - Distributed
KW  - databases
KW  - Semantics
KW  - Lakes
KW  - Functional dependencies
KW  - data profiling
KW  - data cleansing
Y1  - 2020
U6  - https://doi.org/10.1109/TKDE.2020.2967722
SN  - 1041-4347
SN  - 1558-2191
VL  - 33
IS  - 9
SP  - 3212
EP  - 3228
PB  - Institute of Electrical and Electronics Engineers
CY  - New York, NY
ER  - 
TY  - JOUR
A1  - Koßmann, Jan
A1  - Papenbrock, Thorsten
A1  - Naumann, Felix
T1  - Data dependencies for query optimization
BT  - a survey
JF  - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment
N2  - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions.
KW  - Query optimization
KW  - Query execution
KW  - Data dependencies
KW  - Data profiling
KW  - Unique column combinations
KW  - Functional dependencies
KW  - Order dependencies
KW  - Inclusion dependencies
KW  - Relational data
KW  - SQL
Y1  - 2021
U6  - https://doi.org/10.1007/s00778-021-00676-3
SN  - 1066-8888
SN  - 0949-877X
VL  - 31
IS  - 1
SP  - 1
EP  - 22
PB  - Springer
CY  - Berlin ; Heidelberg ; New York
ER  - 
TY  - JOUR
A1  - Quinzan, Francesco
A1  - Göbel, Andreas
A1  - Wagner, Markus
A1  - Friedrich, Tobias
T1  - Evolutionary algorithms and submodular functions
BT  - benefits of heavy-tailed mutations
JF  - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal
N2  - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances.
KW  - Evolutionary algorithms
KW  - Mutation operators
KW  - Submodular functions
KW  - Matroids
Y1  - 2021
U6  - https://doi.org/10.1007/s11047-021-09841-7
SN  - 1572-9796
VL  - 20
IS  - 3
SP  - 561
EP  - 575
PB  - Springer Science + Business Media B.V.
CY  - Dordrecht
ER  - 
TY  - JOUR
A1  - Oosthoek, Kris
A1  - Dörr, Christian
T1  - Cyber security threats to bitcoin exchanges
BT  - adversary exploitation and laundering techniques
JF  - IEEE transactions on network and service management : a publication of the IEEE
N2  - Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem.
KW  - Bitcoin
KW  - Computer crime
KW  - Cryptography
KW  - Ecosystems
KW  - Currencies
KW  - Industries
KW  - Vocabulary
KW  - cryptocurrency exchanges
KW  - cyber
KW  - security
KW  - cyber threat intelligence
KW  - attacks
KW  - vulnerabilities
KW  - forensics
Y1  - 2021
U6  - https://doi.org/10.1109/TNSM.2020.3046145
SN  - 1932-4537
VL  - 18
IS  - 2
SP  - 1616
EP  - 1628
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Schneider, Sven
A1  - Lambers, Leen
A1  - Orejas, Fernando
T1  - A logic-based incremental approach to graph repair featuring delta preservation
JF  - International journal on software tools for technology transfer : STTT
N2  - We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain.
KW  - Nested graph conditions
KW  - Graph repair
KW  - Model repair
KW  - Consistency
KW  - restoration
KW  - Delta preservation
KW  - Graph databases
KW  - Model-driven
KW  - engineering
Y1  - 2021
U6  - https://doi.org/10.1007/s10009-020-00584-x
SN  - 1433-2779
SN  - 1433-2787
VL  - 23
IS  - 3
SP  - 369
EP  - 410
PB  - Springer
CY  - Berlin ; Heidelberg
ER  - 
TY  - JOUR
A1  - Göbel, Andreas
A1  - Lagodzinski, Gregor J. A.
A1  - Seidel, Karen
T1  - Counting homomorphisms to trees modulo a prime
JF  - ACM transactions on computation theory : TOCT / Association for Computing Machinery
N2  - Many important graph-theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article, we study the complexity of #(p) HOMSTOH, the problem of counting graph homomorphisms from an input graph to a graph H modulo a prime number p. Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo 2, however, the influence of the structure of H on the tractability was shown to persist, which yields similar dichotomies. <br /> Our main result states that for every tree H and every prime p the problem #pHOMSTOH is either polynomial time computable or #P-p-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph H when counting modulo 2. In contrast to previous results on modular counting, the tractable cases of #pHOMSTOH are essentially the same for all values of the modulo when H is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime p. These results are the first suggesting that such dichotomies hold not only for the modulo 2 case but also for the modular counting functions of all primes p.
KW  - Graph homomorphisms
KW  - modular counting
KW  - complexity dichotomy
Y1  - 2021
U6  - https://doi.org/10.1145/3460958
SN  - 1942-3454
SN  - 1942-3462
VL  - 13
IS  - 3
SP  - 1
EP  - 33
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Doerr, Benjamin
A1  - Krejca, Martin Stefan
T1  - A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes
JF  - Theoretical computer science
N2  - With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors.
KW  - Theory
KW  - Estimation-of-distribution algorithm
KW  - Run time analysis
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2020.11.028
SN  - 0304-3975
SN  - 1879-2294
VL  - 851
SP  - 121
EP  - 128
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Haarmann, Stephan
A1  - Holfter, Adrian
A1  - Pufahl, Luise
A1  - Weske, Mathias
T1  - Formal framework for checking compliance of data-driven case management
JF  - Journal on data semantics : JoDS
N2  - Business processes are often specified in descriptive or normative models. Both types of models should adhere to internal and external regulations, such as company guidelines or laws. Employing compliance checking techniques, it is possible to verify process models against rules. While traditionally compliance checking focuses on well-structured processes, we address case management scenarios. In case management, knowledge workers drive multi-variant and adaptive processes. Our contribution is based on the fragment-based case management approach, which splits a process into a set of fragments. The fragments are synchronized through shared data but can, otherwise, be dynamically instantiated and executed. We formalize case models using Petri nets. We demonstrate the formalization for design-time and run-time compliance checking and present a proof-of-concept implementation. The application of the implemented compliance checking approach to a use case exemplifies its effectiveness while designing a case model. The empirical evaluation on a set of case models for measuring the performance of the approach shows that rules can often be checked in less than a second.
KW  - Compliance checking
KW  - Case management
KW  - Model verification
KW  - Data-centric
KW  - processes
Y1  - 2021
U6  - https://doi.org/10.1007/s13740-021-00120-3
SN  - 1861-2032
SN  - 1861-2040
VL  - 10
IS  - 1-2
SP  - 143
EP  - 163
PB  - Springer
CY  - Heidelberg
ER  - 
TY  - JOUR
A1  - Oliveira-Ciabati, Livia
A1  - Loures dos Santos, Luciane
A1  - Hsiou Schmaltz, Annie
A1  - Sasso, Ariane Morassi
A1  - Castro, Margaret de
A1  - Souza, João Paulo
T1  - Scientific sexism
BT  - the gender bias in the scientific production of the Universidade de São Paulo
JF  - Revista de saúde pública : publication of the Faculdade de Saúde Pública da Universidade de São Paulo = Journal of public health
N2  - OBJECTIVE: 
To investigate gender inequity in the scientific production of the University of Sao Paulo. 

METHODS: 
Members of the University of Sao Paulo faculty are the study population. The Web of Science repository was the source of the publication metrics. We selected the measures: total publications and citations, average of citations per year and item, H-index, and history of citations between 1950 and 2019. We used the name of the faculty member as a proxy to the gender identity. We use descriptive statistics to characterize the metrics. We evaluated the scissors effect by selecting faculty members with a high H-index. The historical series of citations was projected until 2100. We carry out analyses for the general population and working time subgroups: less than 10 years, 10 to 20 years, and 20 years or more. 

RESULTS: 
Of the 8,325 faculty members, we included 3,067 (36.8%). Among those included, 1,893 (61.7%) were male and 1,174 (38.28%) female. The male gender presented higher values in the publication metrics (average of articles: M = 67.0 versus F = 49.7; average of citations/year: M = 53.9 versus F = 35.9), and H-index (M = 14.5 versus F = 12.4). Among the 100 individuals with the highest H-index (>= 37), 83% are male. The male curve grows faster in the historical series of citations, opening a difference between the groups whose separation is confirmed by the projection. 


DISCUSSION:
Scientific production at the Universidade de Sao Paulo is subject to a gender bias. Two-thirds of the faculty are male, and hiring over the past few decades perpetuates this pattern. The large majority of high impact faculty members are male.

CONCLUSION: 
Our analysis suggests that the Universidade de Sao Paulo will not overcome gender inequality in scientific production without substantive affirmative action. Development does not happen by chance but through choices that are affirmative, decisive, and long-term oriented.
KW  - Sexism
KW  - Scientific Publication Indicators
KW  - Gender Inequality
Y1  - 2021
U6  - https://doi.org/10.11606/s1518-8787.2021055002939
SN  - 1518-8787
VL  - 55
PB  - Faculdade de Saúde Pública da Universidade de São Paulo
CY  - São Paulo
ER  - 
TY  - JOUR
A1  - Shekhar, Sumit
A1  - Reimann, Max
A1  - Mayer, Maximilian
A1  - Semmo, Amir
A1  - Pasewaldt, Sebastian
A1  - Döllner, Jürgen
A1  - Trapp, Matthias
T1  - Interactive photo editing on smartphones via intrinsic decomposition
JF  - Computer graphics forum : journal of the European Association for Computer Graphics
N2  - Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data.
KW  - CCS Concepts
KW  - center dot Computing
KW  - methodologie
KW  - Image-based rendering
KW  - Image
KW  - processing
KW  - Computational photography
Y1  - 2021
U6  - https://doi.org/10.1111/cgf.142650
SN  - 0167-7055
SN  - 1467-8659
VL  - 40
SP  - 497
EP  - 510
PB  - Blackwell
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Prill, Robert
A1  - Walter, Marina
A1  - Królikowska, Aleksandra
A1  - Becker, Roland
T1  - A systematic review of diagnostic accuracy and clinical applications of wearable movement sensors for knee joint rehabilitation
JF  - Sensors
N2  - In clinical practice, only a few reliable measurement instruments are available for monitoring knee joint rehabilitation. Advances to replace motion capturing with sensor data measurement have been made in the last years. Thus, a systematic review of the literature was performed, focusing on the implementation, diagnostic accuracy, and facilitators and barriers of integrating wearable sensor technology in clinical practices based on a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. For critical appraisal, the COSMIN Risk of Bias tool for reliability and measurement of error was used. PUBMED, Prospero, Cochrane database, and EMBASE were searched for eligible studies. Six studies reporting reliability aspects in using wearable sensor technology at any point after knee surgery in humans were included. All studies reported excellent results with high reliability coefficients, high limits of agreement, or a few detectable errors. They used different or partly inappropriate methods for estimating reliability or missed reporting essential information. Therefore, a moderate risk of bias must be considered. Further quality criterion studies in clinical settings are needed to synthesize the evidence for providing transparent recommendations for the clinical use of wearable movement sensors in knee joint rehabilitation.
KW  - wearable movement sensor
KW  - IMU
KW  - motion capture
KW  - reliability
KW  - clinical
KW  - orthopedic
Y1  - 2021
U6  - https://doi.org/10.3390/s21248221
SN  - 1424-8220
VL  - 21
IS  - 24
PB  - MDPI
CY  - Basel
ER  - 
TY  - JOUR
A1  - Boissier, Martin
T1  - Robust and budget-constrained encoding configurations for in-memory database systems
JF  - Proceedings of the VLDB Endowment
N2  - Data encoding has been applied to database systems for decades as it mitigates bandwidth bottlenecks and reduces storage requirements. But even in the presence of these advantages, most in-memory database systems use data encoding only conservatively as the negative impact on runtime performance can be severe. Real-world systems with large parts being infrequently accessed and cost efficiency constraints in cloud environments require solutions that automatically and efficiently select encoding techniques, including heavy-weight compression. In this paper, we introduce workload-driven approaches to automaticaly determine memory budget-constrained encoding configurations using greedy heuristics and linear programming. We show for TPC-H, TPC-DS, and the Join Order Benchmark that optimized encoding configurations can reduce the main memory footprint significantly without a loss in runtime performance over state-of-the-art dictionary encoding. To yield robust selections, we extend the linear programming-based approach to incorporate query runtime constraints and mitigate unexpected performance regressions.
KW  - General Earth and Planetary Sciences
KW  - Water Science and Technology
KW  - Geography, Planning and Development
Y1  - 2021
U6  - https://doi.org/10.14778/3503585.3503588
SN  - 2150-8097
VL  - 15
IS  - 4
SP  - 780
EP  - 793
PB  - Association for Computing Machinery (ACM)
CY  - [New York]
ER  - 
TY  - JOUR
A1  - Vitagliano, Gerardo
A1  - Jiang, Lan
A1  - Naumann, Felix
T1  - Detecting layout templates in complex multiregion files
JF  - Proceedings of the VLDB Endowment
N2  - Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files.
Y1  - 2022
U6  - https://doi.org/10.14778/3494124.3494145
SN  - 2150-8097
VL  - 15
IS  - 3
SP  - 646
EP  - 658
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Björk, Jennie
A1  - Hölzle, Katharina
A1  - Boer, Harry
T1  - ‘What will we learn from the current crisis?’
JF  - Creativity and innovation management
Y1  - 2021
U6  - https://doi.org/10.1111/caim.12442
SN  - 0963-1690
SN  - 1467-8691
VL  - 30
IS  - 2
SP  - 231
EP  - 232
PB  - Wiley-Blackwell
CY  - Oxford [u.a.]
ER  - 
TY  - JOUR
A1  - Blaesius, Thomas
A1  - Friedrich, Tobias
A1  - Schirneck, Friedrich Martin
T1  - The complexity of dependency detection and discovery in relational databases
JF  - Theoretical computer science
N2  - Multi-column dependencies in relational databases come associated with two different computational tasks. The detection problem is to decide whether a dependency of a certain type and size holds in a given database, the discovery problem asks to enumerate all valid dependencies of that type. We settle the complexity of both of these problems for unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We show that the detection of UCCs and FDs is W[2]-complete when parameterized by the solution size. The discovery of inclusion-wise minimal UCCs is proven to be equivalent under parsimonious reductions to the transversal hypergraph problem of enumerating the minimal hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. We further identify the detection of INDs as one of the first natural W[3]-complete problems. The discovery of maximal INDs is shown to be equivalent to enumerating the maximal satisfying assignments of antimonotone, 3-normalized Boolean formulas.
KW  - data profiling
KW  - enumeration complexity
KW  - functional dependency
KW  - inclusion
KW  - dependency
KW  - parameterized complexity
KW  - parsimonious reduction
KW  - transversal hypergraph
KW  - Unique column combination
KW  - W[3]-completeness
Y1  - 2021
U6  - https://doi.org/10.1016/j.tcs.2021.11.020
SN  - 0304-3975
SN  - 1879-2294
VL  - 900
SP  - 79
EP  - 96
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Grüner, Andreas
A1  - Mühle, Alexander
A1  - Meinel, Christoph
T1  - ATIB
BT  - Design and evaluation of an architecture for brokered self-sovereign identity integration and trust-enhancing attribute aggregation for service provider
JF  - IEEE access : practical research, open solutions / Institute of Electrical and Electronics Engineers
N2  - Identity management is a principle component of securing online services. In the advancement of traditional identity management patterns, the identity provider remained a Trusted Third Party (TTP). The service provider and the user need to trust a particular identity provider for correct attributes amongst other demands. This paradigm changed with the invention of blockchain-based Self-Sovereign Identity (SSI) solutions that primarily focus on the users. SSI reduces the functional scope of the identity provider to an attribute provider while enabling attribute aggregation. Besides that, the development of new protocols, disregarding established protocols and a significantly fragmented landscape of SSI solutions pose considerable challenges for an adoption by service providers. We propose an Attribute Trust-enhancing Identity Broker (ATIB) to leverage the potential of SSI for trust-enhancing attribute aggregation. Furthermore, ATIB abstracts from a dedicated SSI solution and offers standard protocols. Therefore, it facilitates the adoption by service providers. Despite the brokered integration approach, we show that ATIB provides a high security posture. Additionally, ATIB does not compromise the ten foundational SSI principles for the users.
KW  - Blockchains
KW  - Protocols
KW  - Authentication
KW  - Licenses
KW  - Security
KW  - Privacy
KW  - Identity management systems
KW  - Attribute aggregation
KW  - attribute assurance
KW  - digital identity
KW  - identity broker
KW  - self-sovereign identity
KW  - trust model
Y1  - 2021
U6  - https://doi.org/10.1109/ACCESS.2021.3116095
SN  - 2169-3536
VL  - 9
SP  - 138553
EP  - 138570
PB  - Institute of Electrical and Electronics Engineers
CY  - New York, NY
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Integrative biomarker detection on high-dimensional gene expression data sets
BT  - a survey on prior knowledge approaches
JF  - Briefings in bioinformatics
N2  - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
KW  - gene selection
KW  - external knowledge bases
KW  - biomarker detection
KW  - gene
KW  - expression
KW  - prior knowledge
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbaa151
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Loster, Michael
A1  - Koumarelas, Ioannis
A1  - Naumann, Felix
T1  - Knowledge transfer for entity resolution with siamese neural networks
JF  - ACM journal of data and information quality
N2  - The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. <br /> We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.
KW  - Entity resolution
KW  - duplicate detection
KW  - transfer learning
KW  - neural
KW  - networks
KW  - metric learning
KW  - similarity learning
KW  - data quality
Y1  - 2021
U6  - https://doi.org/10.1145/3410157
SN  - 1936-1955
SN  - 1936-1963
VL  - 13
IS  - 1
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Borchert, Florian
A1  - Mock, Andreas
A1  - Tomczak, Aurelie
A1  - Hügel, Jonas
A1  - Alkarkoukly, Samer
A1  - Knurr, Alexander
A1  - Volckmar, Anna-Lena
A1  - Stenzinger, Albrecht
A1  - Schirmacher, Peter
A1  - Debus, Jürgen
A1  - Jäger, Dirk
A1  - Longerich, Thomas
A1  - Fröhling, Stefan
A1  - Eils, Roland
A1  - Bougatf, Nina
A1  - Sax, Ulrich
A1  - Schapranow, Matthieu-Patrick
T1  - Knowledge bases and software support for variant interpretation in precision oncology
JF  - Briefings in bioinformatics
N2  - Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process.
KW  - HiGHmed
KW  - personalized medicine
KW  - molecular tumor board
KW  - data integration
KW  - cancer therapy
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbab134
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 6
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Kraus, Sara Milena
A1  - Mathew-Stephen, Mariet
A1  - Schapranow, Matthieu-Patrick
T1  - Eatomics
BT  - Shiny exploration of quantitative proteomics data
JF  - Journal of proteome research
N2  - Quantitative proteomics data are becoming increasingly more available, and as a consequence are being analyzed and interpreted by a larger group of users. However, many of these users have less programming experience. Furthermore, experimental designs and setups are getting more complicated, especially when tissue biopsies are analyzed. Luckily, the proteomics community has already established some best practices on how to conduct quality control, differential abundance analysis and enrichment analysis. However, an easy-to-use application that wraps together all steps for the exploration and flexible analysis of quantitative proteomics data is not yet available. For Eatomics, we utilize the R Shiny framework to implement carefully chosen parts of established analysis workflows to (i) make them accessible in a user-friendly way, (ii) add a multitude of interactive exploration possibilities, and (iii) develop a unique experimental design setup module, which interactively translates a given research hypothesis into a differential abundance and enrichment analysis formula. In this, we aim to fulfill the needs of a growing group of inexperienced quantitative proteomics data analysts. Eatomics may be tested with demo data directly online via https://we.analyzegenomes.com/now/eatomics/or with the user's own data by installation from the Github repository at https://github.com/Millchmaedchen/Eatomics.
KW  - R Shiny
KW  - application
KW  - label-free
KW  - proteomics
KW  - analysis
KW  - differential
KW  - abundance
KW  - experimental design
Y1  - 2021
U6  - https://doi.org/10.1021/acs.jproteome.0c00398
SN  - 1535-3893
SN  - 1535-3907
VL  - 20
IS  - 1
SP  - 1070
EP  - 1078
PB  - American Chemical Society
CY  - Washington
ER  - 
TY  - JOUR
A1  - Borchert, Florian
A1  - Mock, Andreas
A1  - Tomczak, Aurelie
A1  - Hügel, Jonas
A1  - Alkarkoukly, Samer
A1  - Knurr, Alexander
A1  - Volckmar, Anna-Lena
A1  - Stenzinger, Albrecht
A1  - Schirmacher, Peter
A1  - Debus, Jürgen
A1  - Jäger, Dirk
A1  - Longerich, Thomas
A1  - Fröhling, Stefan
A1  - Eils, Roland
A1  - Bougatf, Nina
A1  - Sax, Ulrich
A1  - Schapranow, Matthieu-Patrick
T1  - Correction to: Knowledge bases and software support for variant interpretation in precision oncology
JF  - Briefings in bioinformatics
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbab246
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 6
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Vaid, Akhil
A1  - Chan, Lili
A1  - Chaudhary, Kumardeep
A1  - Jaladanki, Suraj K.
A1  - Paranjpe, Ishan
A1  - Russak, Adam J.
A1  - Kia, Arash
A1  - Timsina, Prem
A1  - Levin, Matthew A.
A1  - He, John Cijiang
A1  - Böttinger, Erwin
A1  - Charney, Alexander W.
A1  - Fayad, Zahi A.
A1  - Coca, Steven G.
A1  - Glicksberg, Benjamin S.
A1  - Nadkarni, Girish N.
T1  - Predictive approaches for acute dialysis requirement and death in COVID-19
JF  - Clinical journal of the American Society of Nephrology : CJASN
N2  - Background and objectives
AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. 

Design, setting, participants, & measurements
Using data from adult patients hospitalized with COVID-19 from five hospitals from theMount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to theMount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission.

Results
A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precisionrecall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction.

Conclusions
 An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models.
KW  - COVID-19
KW  - dialysis
KW  - machine learning
KW  - prediction
KW  - AKI
Y1  - 2021
U6  - https://doi.org/10.2215/CJN.17311120
SN  - 1555-9041
SN  - 1555-905X
VL  - 16
IS  - 8
SP  - 1158
EP  - 1168
PB  - American Society of Nephrology
CY  - Washington
ER  - 
TY  - JOUR
A1  - Dellepiane, Sergio
A1  - Vaid, Akhil
A1  - Jaladanki, Suraj K.
A1  - Coca, Steven
A1  - Fayad, Zahi A.
A1  - Charney, Alexander W.
A1  - Böttinger, Erwin
A1  - He, John Cijiang
A1  - Glicksberg, Benjamin S.
A1  - Chan, Lili
A1  - Nadkarni, Girish
T1  - Acute kidney injury in patients hospitalized with COVID-19 in New York City
BT  - Temporal Trends From March 2020 to April 2021
JF  - Kidney medicine
Y1  - 2021
U6  - https://doi.org/10.1016/j.xkme.2021.06.008
SN  - 2590-0595
VL  - 3
IS  - 5
SP  - 877
EP  - 879
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - GEN
A1  - Dellepiane, Sergio
A1  - Vaid, Akhil
A1  - Jaladanki, Suraj K.
A1  - Coca, Steven
A1  - Fayad, Zahi A.
A1  - Charney, Alexander W.
A1  - Böttinger, Erwin
A1  - He, John Cijiang
A1  - Glicksberg, Benjamin S.
A1  - Chan, Lili
A1  - Nadkarni, Girish
T1  - Acute kidney injury in patients hospitalized with COVID-19 in New York City
BT  - Temporal Trends From March 2020 to April 2021
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 21 
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585415
SN  - 2590-0595
IS  - 5
ER  - 
TY  - JOUR
A1  - Datta, Suparno
A1  - Sachs, Jan Philipp
A1  - Freitas da Cruz, Harry
A1  - Martensen, Tom
A1  - Bode, Philipp
A1  - Morassi Sasso, Ariane
A1  - Glicksberg, Benjamin S.
A1  - Böttinger, Erwin
T1  - FIBER
BT  - enabling flexible retrieval of electronic health records data for clinical predictive modeling
JF  - JAMIA open
N2  - Objectives: 
The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. 

Materials and Methods: 
FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER's capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. 

Results:
Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case.

Conclusion: 
FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.
KW  - databases
KW  - factual
KW  - electronic health records
KW  - information storage and
KW  - retrieval
KW  - workflow
KW  - software/instrumentation
Y1  - 2021
U6  - https://doi.org/10.1093/jamiaopen/ooab048
SN  - 2574-2531
VL  - 4
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - De Freitas, Jessica K.
A1  - Johnson, Kipp W.
A1  - Golden, Eddye
A1  - Nadkarni, Girish N.
A1  - Dudley, Joel T.
A1  - Böttinger, Erwin
A1  - Glicksberg, Benjamin S.
A1  - Miotto, Riccardo
T1  - Phe2vec
BT  - Automated disease phenotyping based on unsupervised embeddings from electronic health records
JF  - Patterns
N2  - Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
Y1  - 2021
U6  - https://doi.org/10.1016/j.patter.2021.100337
SN  - 2666-3899
VL  - 2
IS  - 9
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - JOUR
A1  - Chromik, Jonas
A1  - Pirl, Lukas
A1  - Beilharz, Jossekin Jakob
A1  - Arnrich, Bert
A1  - Polze, Andreas
T1  - Certainty in QRS detection with artificial neural networks
JF  - Biomedical signal processing and control
N2  - Detection of the QRS complex is a long-standing topic in the context of electrocardiography and many algorithms build upon the knowledge of the QRS positions. Although the first solutions to this problem were proposed in the 1970s and 1980s, there is still potential for improvements. Advancements in neural network technology made in recent years also lead to the emergence of enhanced QRS detectors based on artificial neural networks. In this work, we propose a method for assessing the certainty that is in each of the detected QRS complexes, i.e. how confident the QRS detector is that there is, in fact, a QRS complex in the position where it was detected. We further show how this metric can be utilised to distinguish correctly detected QRS complexes from false detections.
KW  - QRS detection
KW  - Electrocardiography
KW  - Artificial neural networks
KW  - Machine
KW  - learning
KW  - Signal-to-noise ratio
Y1  - 2021
U6  - https://doi.org/10.1016/j.bspc.2021.102628
SN  - 1746-8094
SN  - 1746-8108
VL  - 68
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Konigorski, Stefan
T1  - Causal inference in developmental medicine and neurology
JF  - Developmental medicine and child neurology
Y1  - 2021
U6  - https://doi.org/10.1111/dmcn.14813
SN  - 0012-1622
SN  - 1469-8749
VL  - 63
IS  - 5
SP  - 498
EP  - 498
PB  - Wiley-Blackwell
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Cope, Justin L.
A1  - Baukmann, Hannes A.
A1  - Klinger, Jörn E.
A1  - Ravarani, Charles N. J.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
A1  - Schmidt, Marco F.
T1  - Interaction-based feature selection algorithm outperforms polygenic risk score in predicting Parkinson’s Disease status
JF  - Frontiers in genetics
N2  - Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson's Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson's disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem.
KW  - epistasis
KW  - machine learning
KW  - feature selection
KW  - parkinson's disease
KW  - PPMI (parkinson's progression markers initiative)
Y1  - 2021
U6  - https://doi.org/10.3389/fgene.2021.744557
SN  - 1664-8021
VL  - 12
PB  - Frontiers Media
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Pfitzner, Bjarne
A1  - Steckhan, Nico
A1  - Arnrich, Bert
T1  - Federated learning in a medical context
BT  - a systematic literature review
JF  - ACM transactions on internet technology : TOIT / Association for Computing
N2  - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.
KW  - Federated learning
Y1  - 2021
U6  - https://doi.org/10.1145/3412357
SN  - 1533-5399
SN  - 1557-6051
VL  - 21
IS  - 2
SP  - 1
EP  - 31
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Chan, Lili
A1  - Jaladanki, Suraj K.
A1  - Somani, Sulaiman
A1  - Paranjpe, Ishan
A1  - Kumar, Arvind
A1  - Zhao, Shan
A1  - Kaufman, Lewis
A1  - Leisman, Staci
A1  - Sharma, Shuchita
A1  - He, John Cijiang
A1  - Murphy, Barbara
A1  - Fayad, Zahi A.
A1  - Levin, Matthew A.
A1  - Böttinger, Erwin
A1  - Charney, Alexander W.
A1  - Glicksberg, Benjamin
A1  - Coca, Steven G.
A1  - Nadkarni, Girish N.
T1  - Outcomes of patients on maintenance dialysis hospitalized with COVID-19
JF  - Clinical journal of the American Society of Nephrology : CJASN
KW  - chronic dialysis
KW  - COVID-19
KW  - end-stage kidney disease
Y1  - 2021
U6  - https://doi.org/10.2215/CJN.12360720
SN  - 1555-9041
SN  - 1555-905X
VL  - 16
IS  - 3
SP  - 452
EP  - 455
PB  - American Society of Nephrology
CY  - Washington
ER  - 
TY  - JOUR
A1  - Chan, Lili
A1  - Chaudhary, Kumardeep
A1  - Saha, Aparna
A1  - Chauhan, Kinsuk
A1  - Vaid, Akhil
A1  - Zhao, Shan
A1  - Paranjpe, Ishan
A1  - Somani, Sulaiman
A1  - Richter, Felix
A1  - Miotto, Riccardo
A1  - Lala, Anuradha
A1  - Kia, Arash
A1  - Timsina, Prem
A1  - Li, Li
A1  - Freeman, Robert
A1  - Chen, Rong
A1  - Narula, Jagat
A1  - Just, Allan C.
A1  - Horowitz, Carol
A1  - Fayad, Zahi
A1  - Cordon-Cardo, Carlos
A1  - Schadt, Eric
A1  - Levin, Matthew A.
A1  - Reich, David L.
A1  - Fuster, Valentin
A1  - Murphy, Barbara
A1  - He, John C.
A1  - Charney, Alexander W.
A1  - Böttinger, Erwin
A1  - Glicksberg, Benjamin
A1  - Coca, Steven G.
A1  - Nadkarni, Girish N.
T1  - AKI in hospitalized patients with COVID-19
JF  - Journal of the American Society of Nephrology : JASN
N2  - Background:
Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associatedwith worse outcomes. However, AKI among hospitalized patients with COVID19 in the United States is not well described. 

Methods:
This retrospective, observational study involved a review of data from electronic health records of patients aged >= 18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. 

Results:
Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patientswith AKI required dialysis. The proportionswith stages 1, 2, or 3 AKIwere 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. 

Conclusions:
AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge.
KW  - acute renal failure
KW  - clinical nephrology
KW  - dialysis
KW  - COVID-19
Y1  - 2021
U6  - https://doi.org/10.1681/ASN.2020050615
SN  - 1046-6673
SN  - 1533-3450
VL  - 32
IS  - 1
SP  - 151
EP  - 160
PB  - American Society of Nephrology
CY  - Washington
ER  -