TY - JOUR
A1 - Vaid, Akhil
A1 - Somani, Sulaiman
A1 - Russak, Adam J.
A1 - De Freitas, Jessica K.
A1 - Chaudhry, Fayzan F.
A1 - Paranjpe, Ishan
A1 - Johnson, Kipp W.
A1 - Lee, Samuel J.
A1 - Miotto, Riccardo
A1 - Richter, Felix
A1 - Zhao, Shan
A1 - Beckmann, Noam D.
A1 - Naik, Nidhi
A1 - Kia, Arash
A1 - Timsina, Prem
A1 - Lala, Anuradha
A1 - Paranjpe, Manish
A1 - Golden, Eddye
A1 - Danieletto, Matteo
A1 - Singh, Manbir
A1 - Meyer, Dara
A1 - O'Reilly, Paul F.
A1 - Huckins, Laura
A1 - Kovatch, Patricia
A1 - Finkelstein, Joseph
A1 - Freeman, Robert M.
A1 - Argulian, Edgar
A1 - Kasarskis, Andrew
A1 - Percha, Bethany
A1 - Aberg, Judith A.
A1 - Bagiella, Emilia
A1 - Horowitz, Carol R.
A1 - Murphy, Barbara
A1 - Nestler, Eric J.
A1 - Schadt, Eric E.
A1 - Cho, Judy H.
A1 - Cordon-Cardo, Carlos
A1 - Fuster, Valentin
A1 - Charney, Dennis S.
A1 - Reich, David L.
A1 - Böttinger, Erwin
A1 - Levin, Matthew A.
A1 - Narula, Jagat
A1 - Fayad, Zahi A.
A1 - Just, Allan C.
A1 - Charney, Alexander W.
A1 - Nadkarni, Girish N.
A1 - Glicksberg, Benjamin S.
T1 - Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation
JF - Journal of medical internet research : international scientific journal for medical research, information and communication on the internet ; JMIR
N2 - Background:
COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking.
Objective:
The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points.
Methods:
We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions.
Results:
Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction.
Conclusions:
We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.
KW - machine learning
KW - COVID-19
KW - electronic health record
KW - TRIPOD
KW - clinical
KW - informatics
KW - prediction
KW - mortality
KW - EHR
KW - cohort
KW - hospital
KW - performance
Y1 - 2020
U6 - https://doi.org/10.2196/24018
SN - 1439-4456
SN - 1438-8871
VL - 22
IS - 11
PB - Healthcare World
CY - Richmond, Va.
ER -
TY - JOUR
A1 - Torkura, Kennedy A.
A1 - Sukmana, Muhammad Ihsan Haikal
A1 - Cheng, Feng
A1 - Meinel, Christoph
T1 - CloudStrike
BT - chaos engineering for security and resiliency in cloud infrastructure
JF - IEEE access : practical research, open solutions
N2 - Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.
KW - cloud security
KW - security chaos engineering
KW - resilient architectures
KW - security risk assessment
Y1 - 2020
U6 - https://doi.org/10.1109/ACCESS.2020.3007338
SN - 2169-3536
VL - 8
SP - 123044
EP - 123060
PB - Institute of Electrical and Electronics Engineers
CY - Piscataway
ER -
TY - JOUR
A1 - Thamsen, Lauritz
A1 - Beilharz, Jossekin Jakob
A1 - Vinh Thuy Tran,
A1 - Nedelkoski, Sasho
A1 - Kao, Odej
T1 - Mary, Hugo, and Hugo*
BT - learning to schedule distributed data-parallel processing jobs on shared clusters
JF - Concurrency and computation : practice & experience
N2 - Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly.
KW - cluster resource management
KW - distributed data-parallel processing
KW - job
KW - co-location
KW - reinforcement learning
KW - self-learning scheduler
Y1 - 2020
U6 - https://doi.org/10.1002/cpe.5823
SN - 1532-0626
SN - 1532-0634
VL - 33
IS - 18
PB - Wiley
CY - Hoboken
ER -
TY - JOUR
A1 - Sigel, Keith Magnus
A1 - Swartz, Talia H.
A1 - Golden, Eddye
A1 - Paranjpe, Ishan
A1 - Somani, Sulaiman
A1 - Richter, Felix
A1 - De Freitas, Jessica K.
A1 - Miotto, Riccardo
A1 - Zhao, Shan
A1 - Polak, Paz
A1 - Mutetwa, Tinaye
A1 - Factor, Stephanie
A1 - Mehandru, Saurabh
A1 - Mullen, Michael
A1 - Cossarini, Francesca
A1 - Böttinger, Erwin
A1 - Fayad, Zahi
A1 - Merad, Miriam
A1 - Gnjatic, Sacha
A1 - Aberg, Judith
A1 - Charney, Alexander
A1 - Nadkarni, Girish
A1 - Glicksberg, Benjamin S.
T1 - Coronavirus 2019 and people living with human immunodeficiency virus
BT - outcomes for hospitalized patients in New York City
JF - Clinical infectious diseases : electronic edition
N2 - Background:
There are limited data regarding the clinical impact of coronavirus disease 2019 (COVID-19) on people living with human immunodeficiency virus (PLWH). In this study, we compared outcomes for PLWH with COVID-19 to a matched comparison group.
Methods:
We identified 88 PLWH hospitalized with laboratory-confirmed COVID-19 in our hospital system in New York City between 12 March and 23 April 2020. We collected data on baseline clinical characteristics, laboratory values, HIV status, treatment, and outcomes from this group and matched comparators (1 PLWH to up to 5 patients by age, sex, race/ethnicity, and calendar week of infection). We compared clinical characteristics and outcomes (death, mechanical ventilation, hospital discharge) for these groups, as well as cumulative incidence of death by HIV status.
Results:
Patients did not differ significantly by HIV status by age, sex, or race/ethnicity due to the matching algorithm. PLWH hospitalized with COVID-19 had high proportions of HIV virologic control on antiretroviral therapy. PLWH had greater proportions of smoking (P < .001) and comorbid illness than uninfected comparators. There was no difference in COVID-19 severity on admission by HIV status (P = .15). Poor outcomes for hospitalized PLWH were frequent but similar to proportions in comparators; 18% required mechanical ventilation and 21% died during follow-up (compared with 23% and 20%, respectively). There was similar cumulative incidence of death over time by HIV status (P = .94).
Conclusions:
We found no differences in adverse outcomes associated with HIV infection for hospitalized COVID-19 patients compared with a demographically similar patient group.
KW - human immunodeficiency virus
KW - coronavirus 2019
KW - severe acute respiratory
KW - syndrome coronavirus 2
Y1 - 2020
U6 - https://doi.org/10.1093/cid/ciaa880
SN - 1058-4838
SN - 1537-6591
VL - 71
IS - 11
SP - 2933
EP - 2938
PB - Oxford Univ. Press
CY - Cary, NC
ER -
TY - JOUR
A1 - Siddiqi, Muhammad Ali
A1 - Dörr, Christian
A1 - Strydis, Christos
T1 - IMDfence
BT - architecting a secure protocol for implantable medical devices
JF - IEEE access
N2 - Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, entity authentication, remote monitoring and system scalability. The protocol also allows emergency access that results in the graceful degradation of offered services without compromising security and patient safety. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere less than 7% increase in total IMD energy consumption, and less than 14 ms and 9 kB increase in system delay and memory footprint, respectively.
KW - protocols
KW - implants
KW - authentication
KW - ecosystems
KW - remote monitoring
KW - scalability
KW - authentication protocol
KW - battery-depletion attack
KW - battery
KW - DoS
KW - denial-of-service attack
KW - IMD
KW - implantable medical device
KW - non-repudiation
KW - smart card
KW - zero-power defense
Y1 - 2020
U6 - https://doi.org/10.1109/ACCESS.2020.3015686
SN - 2169-3536
VL - 8
SP - 147948
EP - 147964
PB - Institute of Electrical and Electronics Engineers
CY - Piscataway
ER -
TY - THES
A1 - Shaabani, Nuhad
T1 - On discovering and incrementally updating inclusion dependencies
N2 - In today's world, many applications produce large amounts of data at an enormous rate. Analyzing such datasets for metadata is indispensable for effectively understanding, storing, querying, manipulating, and mining them. Metadata summarizes technical properties of a dataset which rang from basic statistics to complex structures describing data dependencies. One type of dependencies is inclusion dependency (IND), which expresses subset-relationships between attributes of datasets. Therefore, inclusion dependencies are important for many data management applications in terms of data integration, query optimization, schema redesign, or integrity checking. So, the discovery of inclusion dependencies in unknown or legacy datasets is at the core of any data profiling effort.
For exhaustively detecting all INDs in large datasets, we developed S-indd++, a new algorithm that eliminates the shortcomings of existing IND-detection algorithms and significantly outperforms them. S-indd++ is based on a novel concept for the attribute clustering for efficiently deriving INDs. Inferring INDs from our attribute clustering eliminates all redundant operations caused by other algorithms. S-indd++ is also based on a novel partitioning strategy that enables discording a large number of candidates in early phases of the discovering process. Moreover, S-indd++ does not require to fit a partition into the main memory--this is a highly appreciable property in the face of ever-growing datasets. S-indd++ reduces up to 50% of the runtime of the state-of-the-art approach.
None of the approach for discovering INDs is appropriate for the application on dynamic datasets; they can not update the INDs after an update of the dataset without reprocessing it entirely. To this end, we developed the first approach for incrementally updating INDs in frequently changing datasets. We achieved that by reducing the problem of incrementally updating INDs to the incrementally updating the attribute clustering from which all INDs are efficiently derivable. We realized the update of the clusters by designing new operations to be applied to the clusters after every data update. The incremental update of INDs reduces the time of the complete rediscovery by up to 99.999%.
All existing algorithms for discovering n-ary INDs are based on the principle of candidate generation--they generate candidates and test their validity in the given data instance. The major disadvantage of this technique is the exponentially growing number of database accesses in terms of SQL queries required for validation. We devised Mind2, the first approach for discovering n-ary INDs without candidate generation. Mind2 is based on a new mathematical framework developed in this thesis for computing the maximum INDs from which all other n-ary INDs are derivable. The experiments showed that Mind2 is significantly more scalable and effective than hypergraph-based algorithms.
N2 - Viele Anwendungen produzieren mit schnellem Tempo große Datenmengen. Die Profilierung solcher Datenmengen nach ihren Metadaten ist unabdingbar für ihre effektive Verwaltung und ihre Analyse. Metadaten fassen technische Eigenschaften einer Datenmenge zusammen, welche von einfachen Statistiken bis komplexe und Datenabhängigkeiten beschreibende Strukturen umfassen. Eine Form solcher Abhängigkeiten sind Inklusionsabhängigkeiten (INDs), die Teilmengenbeziehungen zwischen Attributen der Datenmengen ausdrücken. Dies macht INDs wichtig für viele Anwendungen wie Datenintegration, Anfragenoptimierung, Schemaentwurf und Integritätsprüfung. Somit ist die Entdeckung von INDs in unbekannten Datenmengen eine zentrale Aufgabe der Datenprofilierung.
Ich entwickelte einen neuen Algorithmus namens S-indd++ für die IND-Entdeckung in großen Datenmengen. S-indd++ beseitigt die Defizite existierender Algorithmen für die IND-Entdeckung und somit ist er performanter. S-indd++ berechnet INDs sehr effizient basierend auf einem neuen Clustering der Attribute. S-indd++ wendet auch eine neue Partitionierungsmethode an, die das Verwerfen einer großen Anzahl von Kandidaten in früheren Phasen des Entdeckungsprozesses ermöglicht. Außerdem setzt S-indd++ nicht voraus, dass eine Datenpartition komplett in den Hauptspeicher passen muss. S-indd++ reduziert die Laufzeit der IND-Entdeckung um bis 50 %.
Keiner der IND-Entdeckungsalgorithmen ist geeignet für die Anwendung auf dynamischen Daten. Zu diesem Zweck entwickelte ich das erste Verfahren für das inkrementelle Update von INDs in häufig geänderten Daten. Ich erreichte dies bei der Reduzierung des Problems des inkrementellen Updates von INDs auf dem inkrementellen Update des Attribute-Clustering, von dem INDs effizient ableitbar sind. Ich realisierte das Update der Cluster beim Entwurf von neuen Operationen, die auf den Clustern nach jedem Update der Daten angewendet werden. Das inkrementelle Update von INDs reduziert die Zeit der statischen IND-Entdeckung um bis 99,999 %.
Alle vorhandenen Algorithmen für die n-ary-IND-Entdeckung basieren auf dem Prinzip der Kandidatengenerierung. Der Hauptnachteil dieser Methode ist die exponentiell wachsende Anzahl der SQL-Anfragen, die für die Validierung der Kandidaten nötig sind. Zu diesem Zweck entwickelte ich Mind2, den ersten Algorithmus für n-ary-IND-Entdeckung ohne Kandidatengenerierung. Mind2 basiert auf einem neuen mathematischen Framework für die Berechnung der maximalen INDs, von denen alle anderen n-ary-INDs ableitbar sind. Die Experimente zeigten, dass Mind2 wesentlich skalierbarer und leistungsfähiger ist als die auf Hypergraphen basierenden Algorithmen.
T2 - Beitrag zur Entdeckung und inkrementellen Aktualisierung von Inklusionsabhängigkeiten
KW - Inclusion Dependency
KW - Data Profiling
KW - Data Mining
KW - Algorithms
KW - Inclusion Dependency Discovery
KW - Incrementally Inclusion Dependencies Discovery
KW - Metadata Discovery
KW - S-indd++
KW - Mind2
KW - Change Data Capture
KW - Incremental Discovery
KW - Big Data
KW - Data Integration
KW - Foreign Keys
KW - Dynamic Data
KW - Foreign Keys Discovery
KW - Data Profiling
KW - Data Mining
KW - Algorithmen
KW - Inklusionsabhängigkeiten
KW - Inklusionsabhängigkeiten Entdeckung
KW - Datenintegration
KW - Metadaten Entdeckung
Y1 - 2020
U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-471862
ER -
TY - JOUR
A1 - Schlosser, Rainer
T1 - Risk-sensitive control of Markov decision processes
BT - a moment-based approach with target distributions
JF - Computers & operations research : and their applications to problems of world concern
N2 - In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.
KW - risk aversion
KW - Markov decision process
KW - dynamic programming
KW - dynamic
KW - pricing
KW - heuristics
Y1 - 2020
U6 - https://doi.org/10.1016/j.cor.2020.104997
SN - 0305-0548
VL - 123
PB - Elsevier
CY - Oxford
ER -
TY - JOUR
A1 - Schlosser, Rainer
T1 - Scalable relaxation techniques to solve stochastic dynamic multi-product pricing problems with substitution effects
JF - Journal of revenue and pricing management
N2 - In many businesses, firms are selling different types of products, which share mutual substitution effects in demand. To compute effective pricing strategies is challenging as the sales probabilities of each of a firm's products can also be affected by the prices of potential substitutes. In this paper, we analyze stochastic dynamic multi-product pricing models for the sale of perishable goods. To circumvent the limitations of time-consuming optimal solutions for highly complex models, we propose different relaxation techniques, which allow to reduce the size of critical model components, such as the state space, the action space, or the set of potential sales events. Our heuristics are able to decrease the size of those components by forming corresponding clusters and using subsets of representative elements. Using numerical examples, we verify that our heuristics make it possible to dramatically reduce the computation time while still obtaining close-to-optimal expected profits. Further, we show that our heuristics are (i) flexible, (ii) scalable, and (iii) can be arbitrarily combined in a mutually supportive way.
KW - multi-product pricing
KW - substitution effects
KW - data-driven demand
KW - dynamic
KW - programming
KW - heuristics
Y1 - 2020
U6 - https://doi.org/10.1057/s41272-020-00249-z
SN - 1476-6930
SN - 1477-657X
VL - 20
IS - 1
SP - 54
EP - 65
PB - Palgrave Macmillan
CY - Basingstoke
ER -
TY - JOUR
A1 - Schirmer, Philipp
A1 - Papenbrock, Thorsten
A1 - Koumarelas, Ioannis
A1 - Naumann, Felix
T1 - Efficient discovery of matching dependencies
JF - ACM transactions on database systems : TODS
N2 - Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets.
We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB.
KW - matching dependencies
KW - functional dependencies
KW - dependency discovery
KW - data profiling
KW - data matching
KW - entity resolution
KW - similarity measures
Y1 - 2020
U6 - https://doi.org/10.1145/3392778
SN - 0362-5915
SN - 1557-4644
VL - 45
IS - 3
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Rose, Robert
A1 - Hölzle, Katharina
A1 - Björk, Jennie
T1 - More than a quarter century of creativity and innovation management
BT - the journal's characteristics, evolution, and a look ahead
JF - Creativity and innovation management
N2 - When this journal was founded in 1992 by Tudor Rickards and Susan Moger, there was no academic outlet available that addressed issues at the intersection of creativity and innovation. From zero to 1,163 records, from the new kid on the block to one of the leading journals in creativity and innovation management has been quite a journey, and we would like to reflect on the past 28 years and the intellectual and conceptual structure of Creativity and Innovation Management (CIM). Specifically, we highlight milestones and influential articles, identify how key journal characteristics evolved, outline the (co-)authorship structure, and finally, map the thematic landscape of CIM by means of a text-mining analysis. This study represents the first systematic and comprehensive assessment of the journal's published body of knowledge and helps to understand the journal's influence on the creativity and innovation management community. We conclude by discussing future topics and paths of the journal as well as limitations of our approach.
KW - anniversary
KW - bibliometrics
KW - creativity and innovation management
KW - science mapping
Y1 - 2020
U6 - https://doi.org/10.1111/caim.12361
SN - 0963-1690
SN - 1467-8691
VL - 29
IS - 1
SP - 5
EP - 20
PB - Wiley-Blackwell
CY - Oxford
ER -
TY - JOUR
A1 - Rezaei, Mina
A1 - Näppi, Janne J.
A1 - Lippert, Christoph
A1 - Meinel, Christoph
A1 - Yoshida, Hiroyuki
T1 - Generative multi-adversarial network for striking the right balance in abdominal image segmentation
JF - International journal of computer assisted radiology and surgery
N2 - Purpose: The identification of abnormalities that are relatively rare within otherwise normal anatomy is a major challenge for deep learning in the semantic segmentation of medical images. The small number of samples of the minority classes in the training data makes the learning of optimal classification challenging, while the more frequently occurring samples of the majority class hamper the generalization of the classification boundary between infrequently occurring target objects and classes. In this paper, we developed a novel generative multi-adversarial network, called Ensemble-GAN, for mitigating this class imbalance problem in the semantic segmentation of abdominal images. Method: The Ensemble-GAN framework is composed of a single-generator and a multi-discriminator variant for handling the class imbalance problem to provide a better generalization than existing approaches. The ensemble model aggregates the estimates of multiple models by training from different initializations and losses from various subsets of the training data. The single generator network analyzes the input image as a condition to predict a corresponding semantic segmentation image by use of feedback from the ensemble of discriminator networks. To evaluate the framework, we trained our framework on two public datasets, with different imbalance ratios and imaging modalities: the Chaos 2019 and the LiTS 2017. Result: In terms of the F1 score, the accuracies of the semantic segmentation of healthy spleen, liver, and left and right kidneys were 0.93, 0.96, 0.90 and 0.94, respectively. The overall F1 scores for simultaneous segmentation of the lesions and liver were 0.83 and 0.94, respectively. Conclusion: The proposed Ensemble-GAN framework demonstrated outstanding performance in the semantic segmentation of medical images in comparison with other approaches on popular abdominal imaging benchmarks. The Ensemble-GAN has the potential to segment abdominal images more accurately than human experts.
KW - imbalanced learning
KW - generative multi-discriminative networks
KW - semantic
KW - segmentation
KW - abdominal imaging
Y1 - 2020
U6 - https://doi.org/10.1007/s11548-020-02254-4
SN - 1861-6410
SN - 1861-6429
VL - 15
IS - 11
SP - 1847
EP - 1858
PB - Springer
CY - Berlin
ER -
TY - JOUR
A1 - Oosthoek, Kris
A1 - Doerr, Christian
T1 - Cyber threat intelligence: A product without a process?
JF - International journal of intelligence and counterintelligence
Y1 - 2020
U6 - https://doi.org/10.1080/08850607.2020.1780062
SN - 0885-0607
SN - 1521-0561
VL - 34
IS - 2
SP - 300
EP - 315
PB - Taylor & Francis
CY - London
ER -
TY - JOUR
A1 - Olamoyegun, Michael Adeyemi
A1 - Raimi, Taiwo Hassan
A1 - Ala, Oluwabukola Ayodele
A1 - Fadare, Joseph Olusesan
T1 - Mobile phone ownership and willingness to receive mHealth services among patients with diabetes mellitus in South-West, Nigeria
JF - Pan African medical journal : PAMJ
N2 - Introduction:
mobile phone technology is increasingly used to overcome traditional barriers to limiting access to diabetes care. This study evaluated mobile phone ownership and willingness to receive and pay for mobile phone-based diabetic services among people with diabetes in South-West, Nigeria.
Methods:
two hundred and fifty nine patients with diabetes were consecutively recruited from three tertiary health institutions in South-West, Nigeria. Questionnaire was used to evaluate mobile phone ownership, willingness to receive and pay for mobile phone-based diabetic health care services via voice call and text messaging.
Results:
97.3% owned a mobile phone, with 38.9% and 61.1% owning smartphone and basic phone respectively. Males were significantly more willing to receive mobile-phone-based health services than females (81.1% vs 68.1%, p=0.025), likewise married compared to unmarried [77.4% vs 57.1%, p=0.0361. Voice calls (41.3%) and text messages (32.4%), were the most preferred modes of receiving diabetes-related health education with social media (3.1%) and email (1.5%) least. Almost three-quarter of participants (72.6%) who owned mobile phone, were willing to receive mobile phone-based diabetes health services. The educational status of patients (adjusted OR [AORJ: 1.7(95% CI: 1.6 to 2.11), glucometers possession (ACM: 2.0 [95% CI: 1.9 to 2.1) and type of mobile phone owned (AOR: 2.9 [95% CI: 2.8 to 5.0]) were significantly associated with the willingness to receive mobile phone-based diabetic services.
Conclusion:
the majority of study participants owned mobile phones and would be willing to receive and pay for diabetes-related healthcare delivery services provided the cost is minimal and affordable.
KW - mobile phone
KW - ownership
KW - diabetes
KW - healthcare
KW - Nigeria
Y1 - 2020
U6 - https://doi.org/10.11604/pamj.2020.37.29.25174
SN - 1937-8688
VL - 37
PB - African Field Epidemiology Network (AFENET)
CY - Kampala, Uganda
ER -
TY - JOUR
A1 - Long, Xiang
A1 - de Melo, Gerard
A1 - He, Dongliang
A1 - Li, Fu
A1 - Chi, Zhizhen
A1 - Wen, Shilei
A1 - Gan, Chuang
T1 - Purely attention based local feature integration for video classification
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
N2 - Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.
KW - Feature extraction
KW - Convolution
KW - Computational modeling
KW - Plugs
KW - Three-dimensional displays
KW - Task analysis
KW - Two dimensional displays
KW - Video classification
KW - action recognition
KW - attention mechanism
KW - computer vision
KW - Algorithms
KW - Neural Networks
KW - Computer
Y1 - 2020
U6 - https://doi.org/10.1109/TPAMI.2020.3029554
SN - 0162-8828
SN - 1939-3539
SN - 2160-9292
VL - 44
IS - 4
SP - 2140
EP - 2154
PB - Inst. of Electr. and Electronics Engineers
CY - Los Alamitos
ER -
TY - JOUR
A1 - Lambers, Leen
A1 - Weber, Jens
T1 - Preface to the special issue on the 11th International Conference on Graph Transformation
JF - Journal of Logical and Algebraic Methods in Programming
N2 - This special issue contains extended versions of four selected papers from the 11th International Conference on Graph Transformation (ICGT 2018). The articles cover a tool for computing core graphs via SAT/SMT solvers (graph language definition), graph transformation through graph surfing in reaction systems (a new graph transformation formalism), the essence and initiality of conflicts in M-adhesive transformation systems, and a calculus of concurrent graph-rewriting processes (theory on conflicts and parallel independence).
KW - graph transformation
KW - graph languages
KW - conflicts and dependencies in
KW - concurrent graph rewriting
Y1 - 2020
U6 - https://doi.org/10.1016/j.jlamp.2020.100525
SN - 2352-2208
VL - 112
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Kötzing, Timo
A1 - Lagodzinski, Gregor J. A.
A1 - Lengler, Johannes
A1 - Melnichenko, Anna
T1 - Destructiveness of lexicographic parsimony pressure and alleviation by a concatenation crossover in genetic programming
JF - Theoretical computer science
N2 - For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation: the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts); and also the role and the realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had surprisingly little share in this work.
We analyze a simple crossover operator in combination with randomized local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); we denote the resulting algorithm Concatenation Crossover GP. We consider three variants of the well-studied MAJORITY test function, adding large plateaus in different ways to the fitness landscape and thus giving a test bed for analyzing the interplay of variation operators and bloat control mechanisms in a setting with local optima. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control. (C) 2019 Elsevier B.V. All rights reserved.
KW - genetic programming
KW - mutation
KW - theory
KW - run time analysis
Y1 - 2020
U6 - https://doi.org/10.1016/j.tcs.2019.11.036
SN - 0304-3975
VL - 816
SP - 96
EP - 113
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Kulahcioglu, Tugba
A1 - Melo, Gerard de
T1 - Affect-aware word clouds
JF - ACM transactions on interactive intelligent systems
N2 - Word clouds are widely used for non-analytic purposes, such as introducing a topic to students, or creating a gift with personally meaningful text. Surveys show that users prefer tools that yield word clouds with a stronger emotional impact. Fonts and color palettes are powerful typographical signals that may determine this impact. Typically, these signals are assigned randomly, or expected to be chosen by the users. We present an affect-aware font and color palette selection methodology that aims to facilitate more informed choices. We infer associations of fonts with a set of eight affects, and evaluate the resulting data in a series of user studies both on individual words as well as in word clouds. Relying on a recent study to procure affective color palettes, we carry out a similar user study to understand the impact of color choices on word clouds. Our findings suggest that both fonts and color palettes are powerful tools contributing to the affects evoked by a word cloud. The experiments further confirm that the novel datasets we propose are successful in enabling this. We also find that, for the majority of the affects, both signals need to be congruent to create a stronger impact. Based on this data, we implement a prototype that allows users to specify a desired affect and recommends congruent fonts and color palettes for the word.
KW - affective interfaces
KW - word clouds
KW - typography
KW - color palettes
Y1 - 2020
U6 - https://doi.org/10.1145/3370928
SN - 2160-6455
SN - 2160-6463
VL - 10
IS - 4
PB - Association for Computing Machinery
CY - New York, NY
ER -
TY - JOUR
A1 - Krejca, Martin S.
A1 - Witt, Carsten
T1 - Lower bounds on the run time of the Univariate Marginal Distribution Algorithm on OneMax
JF - Theoretical computer science : the journal of the EATCS
N2 - The Univariate Marginal Distribution Algorithm (UMDA) - a popular estimation-of-distribution algorithm - is studied from a run time perspective. On the classical OneMax benchmark function on bit strings of length n, a lower bound of Omega(lambda + mu root n + n logn), where mu and lambda are algorithm-specific parameters, on its expected run time is proved. This is the first direct lower bound on the run time of UMDA. It is stronger than the bounds that follow from general black-box complexity theory and is matched by the run time of many evolutionary algorithms. The results are obtained through advanced analyses of the stochastic change of the frequencies of bit values maintained by the algorithm, including carefully designed potential functions. These techniques may prove useful in advancing the field of run time analysis for estimation-of-distribution algorithms in general.
KW - estimation-of-distribution algorithm
KW - run time analysis
KW - lower bound
Y1 - 2020
U6 - https://doi.org/10.1016/j.tcs.2018.06.004
SN - 0304-3975
SN - 1879-2294
VL - 832
SP - 143
EP - 165
PB - Elsevier
CY - Amsterdam [u.a.]
ER -
TY - JOUR
A1 - Koumarelas, Ioannis
A1 - Papenbrock, Thorsten
A1 - Naumann, Felix
T1 - MDedup
BT - duplicate detection with matching dependencies
JF - Proceedings of the VLDB Endowment
N2 - Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize or require a lot of pre-labeled training data. Both parameterization and pre-labeling are at least domain-specific if not dataset-specific, which is a problem if a new dataset needs to be cleaned.
For this reason, we propose a novel, rule-based and fully automatic duplicate detection approach that is based on matching dependencies (MDs). Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset. To increase the generally low recall of MD-based data cleaning approaches, we propose an additional boosting step. Our experiments show that this approach reaches up to 94% F-measure and 100% precision on our evaluation datasets, which are good numbers considering that the system does not require domain or target data-specific configuration.
Y1 - 2020
U6 - https://doi.org/10.14778/3377369.3377379
SN - 2150-8097
VL - 13
IS - 5
SP - 712
EP - 725
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Koumarelas, Ioannis
A1 - Jiang, Lan
A1 - Naumann, Felix
T1 - Data preparation for duplicate detection
JF - Journal of data and information quality : (JDIQ)
N2 - Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection.
Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR.
KW - data preparation
KW - data wrangling
KW - record linkage
KW - duplicate detection
KW - similarity measures
Y1 - 2020
U6 - https://doi.org/10.1145/3377878
SN - 1936-1955
SN - 1936-1963
VL - 12
IS - 3
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Kossmann, Jan
A1 - Schlosser, Rainer
T1 - Self-driving database systems
BT - a conceptual approach
JF - Distributed and parallel databases
N2 - Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.
KW - database systems
KW - self-driving
KW - recursive tuning
KW - workload prediction
KW - robustness
Y1 - 2020
U6 - https://doi.org/10.1007/s10619-020-07288-w
SN - 0926-8782
SN - 1573-7578
VL - 38
IS - 4
SP - 795
EP - 817
PB - Springer
CY - Dordrecht
ER -
TY - JOUR
A1 - Kossmann, Jan
A1 - Halfpap, Stefan
A1 - Jankrift, Marcel
A1 - Schlosser, Rainer
T1 - Magic mirror in my hand, which is the best in the land?
BT - an experimental evaluation of index selection algorithms
JF - Proceedings of the VLDB Endowment
N2 - Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.
Y1 - 2020
U6 - https://doi.org/10.14778/3407790.3407832
SN - 2150-8097
VL - 13
IS - 11
SP - 2382
EP - 2395
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Kaitoua, Abdulrahman
A1 - Rabl, Tilmann
A1 - Markl, Volker
T1 - A distributed data exchange engine for polystores
JF - Information technology : methods and applications of informatics and information technology
JF - Information technology : Methoden und innovative Anwendungen der Informatik und Informationstechnik
N2 - There is an increasing interest in fusing data from heterogeneous sources. Combining data sources increases the utility of existing datasets, generating new information and creating services of higher quality. A central issue in working with heterogeneous sources is data migration: In order to share and process data in different engines, resource intensive and complex movements and transformations between computing engines, services, and stores are necessary.
Muses is a distributed, high-performance data migration engine that is able to interconnect distributed data stores by forwarding, transforming, repartitioning, or broadcasting data among distributed engines' instances in a resource-, cost-, and performance-adaptive manner. As such, it performs seamless information sharing across all participating resources in a standard, modular manner. We show an overall improvement of 30 % for pipelining jobs across multiple engines, even when we count the overhead of Muses in the execution time. This performance gain implies that Muses can be used to optimise large pipelines that leverage multiple engines.
KW - distributed systems
KW - data migration
KW - data transformation
KW - big data
KW - engine
KW - data integration
Y1 - 2020
U6 - https://doi.org/10.1515/itit-2019-0037
SN - 1611-2776
SN - 2196-7032
VL - 62
IS - 3-4
SP - 145
EP - 156
PB - De Gruyter
CY - Berlin
ER -
TY - JOUR
A1 - Isailović, Dušan
A1 - Stojanovic, Vladeta
A1 - Trapp, Matthias
A1 - Richter, Rico
A1 - Hajdin, Rade
A1 - Döllner, Jürgen Roland Friedrich
T1 - Bridge damage
BT - detection, IFC-based semantic enrichment and visualization
JF - Automation in construction : an international research journal
N2 - Building Information Modeling (BIM) representations of bridges enriched by inspection data will add tremendous value to future Bridge Management Systems (BMSs). This paper presents an approach for point cloud-based detection of spalling damage, as well as integrating damage components into a BIM via semantic enrichment of an as-built Industry Foundation Classes (IFC) model. An approach for generating the as-built BIM, geometric reconstruction of detected damage point clusters and semantic-enrichment of the corresponding IFC model is presented. Multiview-classification is used and evaluated for the detection of spalling damage features. The semantic enrichment of as-built IFC models is based on injecting classified and reconstructed damage clusters back into the as-built IFC, thus generating an accurate as-is IFC model compliant to the BMS inspection requirements.
KW - damage detection
KW - building information modeling
KW - 3D point clouds
KW - multiview classification
KW - bridge management systems
Y1 - 2020
U6 - https://doi.org/10.1016/j.autcon.2020.103088
SN - 0926-5805
SN - 1872-7891
VL - 112
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Hameed, Mazhar
A1 - Naumann, Felix
T1 - Data Preparation
BT - a survey of commercial tools
JF - SIGMOD record
N2 - Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day.
To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.
KW - data quality
KW - data cleaning
KW - data wrangling
Y1 - 2020
U6 - https://doi.org/10.1145/3444831.3444835
SN - 0163-5808
SN - 1943-5835
VL - 49
IS - 3
SP - 18
EP - 29
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Hacker, Philipp
A1 - Krestel, Ralf
A1 - Grundmann, Stefan
A1 - Naumann, Felix
T1 - Explainable AI under contract and tort law
BT - legal incentives and technical challenges
JF - Artificial intelligence and law
N2 - This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification.
KW - explainability
KW - explainable AI
KW - interpretable machine learning
KW - contract
KW - law
KW - tort law
KW - explainability-accuracy trade-off
KW - medical malpractice
KW - corporate takeovers
Y1 - 2020
U6 - https://doi.org/10.1007/s10506-020-09260-6
SN - 0924-8463
SN - 1572-8382
VL - 28
IS - 4
SP - 415
EP - 439
PB - Springer
CY - Dordrecht
ER -
TY - JOUR
A1 - Ghahremani, Sona
A1 - Giese, Holger
A1 - Vogel, Thomas
T1 - Improving scalability and reward of utility-driven self-healing for large dynamic architectures
JF - ACM transactions on autonomous and adaptive systems
N2 - Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfies certain conditions. They result in scalable solutions but often with merely satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal decisions by using an often costly optimization, which typically does not scale for large problems. We propose a rule-based and utility-driven adaptation scheme that achieves the benefits of both directions such that the adaptation decisions are optimal, whereas the computation scales by avoiding an expensive optimization. We use this adaptation scheme for architecture-based self-healing of large software systems. For this purpose, we define the utility for large dynamic architectures of such systems based on patterns that define issues the self-healing must address. Moreover, we use pattern-based adaptation rules to resolve these issues. Using a pattern-based scheme to define the utility and adaptation rules allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. In addition to formally analyzing the computational effort and optimality of the proposed scheme, we thoroughly demonstrate its scalability and optimality in terms of reward in comparative experiments with a static rule-based approach as a baseline and a utility-driven approach using a constraint solver. These experiments are based on different failure profiles derived from real-world failure logs. We also investigate the impact of different failure profile characteristics on the scalability and reward to evaluate the robustness of the different approaches.
KW - self-healing
KW - adaptation rules
KW - architecture-based adaptation
KW - utility
KW - reward
KW - scalability
KW - performance
KW - failure profile model
Y1 - 2020
U6 - https://doi.org/10.1145/3380965
SN - 1556-4665
SN - 1556-4703
VL - 14
IS - 3
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Ghahremani, Sona
A1 - Giese, Holger
T1 - Evaluation of self-healing systems
BT - An analysis of the state-of-the-art and required improvements
JF - Computers
N2 - Evaluating the performance of self-adaptive systems is challenging due to their interactions with often highly dynamic environments. In the specific case of self-healing systems, the performance evaluations of self-healing approaches and their parameter tuning rely on the considered characteristics of failure occurrences and the resulting interactions with the self-healing actions. In this paper, we first study the state-of-the-art for evaluating the performances of self-healing systems by means of a systematic literature review. We provide a classification of different input types for such systems and analyse the limitations of each input type. A main finding is that the employed inputs are often not sophisticated regarding the considered characteristics for failure occurrences. To further study the impact of the identified limitations, we present experiments demonstrating that wrong assumptions regarding the characteristics of the failure occurrences can result in large performance prediction errors, disadvantageous design-time decisions concerning the selection of alternative self-healing approaches, and disadvantageous deployment-time decisions concerning parameter tuning. Furthermore, the experiments indicate that employing multiple alternative input characteristics can help with reducing the risk of premature disadvantageous design-time decisions.
KW - self-healing
KW - failure model
KW - performance
KW - simulation
KW - evaluation
Y1 - 2020
U6 - https://doi.org/10.3390/computers9010016
SN - 2073-431X
VL - 9
IS - 1
PB - MDPI
CY - Basel
ER -
TY - JOUR
A1 - Döllner, Jürgen Roland Friedrich
T1 - Geospatial artificial intelligence
BT - potentials of machine learning for 3D point clouds and geospatial digital twins
JF - Journal of photogrammetry, remote sensing and geoinformation science : PFG : Photogrammetrie, Fernerkundung, Geoinformation
N2 - Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models.
N2 - Georäumliche Künstliche Intelligenz: Potentiale des Maschinellen Lernens für 3D-Punktwolken und georäumliche digitale Zwillinge. Künstliche Intelligenz (KI) verändert grundlegend die Art und Weise, wie IT-Lösungen in allen Anwendungsbereichen, einschließlich dem Geoinformationsbereich, implementiert und betrieben werden. In diesem Beitrag stellen wir KI-basierte Techniken für 3D-Punktwolken als einen Baustein der georäumlichen KI vor. Zunächst werden kurz der Begriff "KI” und die technologischen Entwicklungen skizziert, die für die Anwendung von KI auf IT-Lösungen aus der Sicht der Softwaretechnik erforderlich sind. Als nächstes charakterisieren wir 3D-Punktwolken als Schlüsselkategorie von Geodaten und ihre Rolle für den Aufbau von räumlichen digitalen Zwillingen; wir erläutern die Machbarkeit der Ansätze für Maschinelles Lernen (ML) und Deep Learning (DL) in Bezug auf 3D-Punktwolken. Insbesondere argumentieren wir, dass 3D-Punktwolken als Korpus mit ähnlichen Eigenschaften wie natürlichsprachliche Korpusse gesehen werden können und
formulieren eine "Natürlichkeitshypothese” für 3D-Punktwolken. Im Hauptteil stellen wir einen Workflow zur Interpretation von 3D-Punktwolken auf der Grundlage von ML/DL-Ansätzen vor, die eine domänenspezifische und anwendungsspezifische Semantik für 3D-Punktwolken ableiten, ohne explizite räumliche 3D-Modelle oder explizite Regelsätze erstellen zu müssen. Abschließend wird an Beispielen gezeigt, wie ML/DL es ermöglichen, Basisdaten für räumliche digitale Zwillinge, wie z.B. für virtuelle 3D-Stadtmodelle, Innenraummodelle oder Gebäudeinformationsmodelle, effizient aufzubauen und zu pflegen.
KW - geospatial artificial intelligence
KW - machine learning
KW - deep learning
KW - 3D
KW - point clouds
KW - geospatial digital twins
KW - 3D city models
Y1 - 2020
U6 - https://doi.org/10.1007/s41064-020-00102-3
SN - 2512-2789
SN - 2512-2819
VL - 88
IS - 1
SP - 15
EP - 24
PB - Springer International Publishing
CY - Cham
ER -
TY - JOUR
A1 - Dreseler, Markus
A1 - Boissier, Martin
A1 - Rabl, Tilmann
A1 - Uflacker, Matthias
T1 - Quantifying TPC-H choke points and their optimizations
JF - Proceedings of the VLDB Endowment
N2 - TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.
This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly ifluenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.
Y1 - 2020
U6 - https://doi.org/10.14778/3389133.3389138
SN - 2150-8097
VL - 13
IS - 8
SP - 1206
EP - 1220
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Doerr, Benjamin
A1 - Kötzing, Timo
A1 - Lagodzinski, Gregor J. A.
A1 - Lengler, Johannes
T1 - The impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time
JF - Theoretical computer science : the journal of the EATCS
N2 - While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat.
In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism.
More specifically, we analyzed the performance of the (1 + 1) GP on the two benchmark functions ORDER and MAJORITY. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(T-init + nlogn) iterations both for ORDER and MAJORITY. For the case without bloat control, the bounds O(T-init logT(i)(nit) + n(logn)(3)) and Omega(T-init + nlogn) (and Omega(T-init log T-init) for n = 1) hold for MAJORITY(1).
KW - genetic programming
KW - bloat control
KW - theory
KW - runtime analysis
Y1 - 2020
U6 - https://doi.org/10.1016/j.tcs.2020.01.011
SN - 0304-3975
SN - 1879-2294
VL - 816
SP - 144
EP - 168
PB - Elsevier
CY - Amsterdam [u.a.]
ER -
TY - JOUR
A1 - Doerr, Benjamin
A1 - Krejca, Martin S.
T1 - Significance-based estimation-of-distribution algorithms
JF - IEEE transactions on evolutionary computation
N2 - Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based cGA (sig-cGA) optimizes the commonly regarded benchmark functions OneMax (OM), LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed stable compact genetic algorithm-an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model-we prove that it optimizes OM only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OM in polynomial time.
KW - heuristic algorithms
KW - sociology
KW - statistics
KW - history
KW - probabilistic
KW - logic
KW - benchmark testing
KW - genetic algorithms
KW - estimation-of-distribution
KW - algorithm (EDA)
KW - run time analysis
KW - theory
Y1 - 2020
U6 - https://doi.org/10.1109/TEVC.2019.2956633
SN - 1089-778X
SN - 1941-0026
VL - 24
IS - 6
SP - 1025
EP - 1034
PB - Institute of Electrical and Electronics Engineers
CY - New York, NY
ER -
TY - JOUR
A1 - Cseh, Ágnes
A1 - Fleiner, Tamas
T1 - The complexity of cake cutting with unequal shares
JF - ACM transactions on algorithms : TALG
N2 - An unceasing problem of our prevailing society is the fair division of goods. The problem of proportional cake cutting focuses on dividing a heterogeneous and divisible resource, the cake, among n players who value pieces according to their own measure function. The goal is to assign each player a not necessarily connected part of the cake that the player evaluates at least as much as her proportional share.
In this article, we investigate the problem of proportional division with unequal shares, where each player is entitled to receive a predetermined portion of the cake. Our main contribution is threefold. First we present a protocol for integer demands, which delivers a proportional solution in fewer queries than all known protocols. By giving a matching lower bound, we then show that our protocol is asymptotically the fastest possible. Finally, we turn to irrational demands and solve the proportional cake cutting problem by reducing it to the same problem with integer demands only. All results remain valid in a highly general cake cutting model, which can be of independent interest.
KW - Cake cutting
KW - fair division
KW - unequal shares
KW - proportional division
Y1 - 2020
U6 - https://doi.org/10.1145/3380742
SN - 1549-6325
SN - 1549-6333
VL - 16
IS - 3
PB - Association for Computing Machinery
CY - New York
ER -
TY - JOUR
A1 - Cseh, Agnes
A1 - Heeger, Klaus
T1 - The stable marriage problem with ties and restricted edges
JF - Discrete optimization
N2 - In the stable marriage problem, a set of men and a set of women are given, each of whom has a strictly ordered preference list over the acceptable agents in the opposite class. A matching is called stable if it is not blocked by any pair of agents, who mutually prefer each other to their respective partner. Ties in the preferences allow for three different definitions for a stable matching: weak, strong and super-stability. Besides this, acceptable pairs in the instance can be restricted in their ability of blocking a matching or being part of it, which again generates three categories of restrictions on acceptable pairs. Forced pairs must be in a stable matching, forbidden pairs must not appear in it, and lastly, free pairs cannot block any matching.
Our computational complexity study targets the existence of a stable solution for each of the three stability definitions, in the presence of each of the three types of restricted pairs. We solve all cases that were still open. As a byproduct, we also derive that the maximum size weakly stable matching problem is hard even in very dense graphs, which may be of independent interest.
KW - stable matchings
KW - restricted edges
KW - complexity
Y1 - 2020
U6 - https://doi.org/10.1016/j.disopt.2020.100571
SN - 1572-5286
SN - 1873-636X
VL - 36
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Chauhan, Ankit
A1 - Friedrich, Tobias
A1 - Rothenberger, Ralf
T1 - Greed is good for deterministic scale-free networks
JF - Algorithmica : an international journal in computer science
N2 - Large real-world networks typically follow a power-law degree distribution. To study such networks, numerous random graph models have been proposed. However, real-world networks are not drawn at random. Therefore, Brach et al. (27th symposium on discrete algorithms (SODA), pp 1306-1325, 2016) introduced two natural deterministic conditions: (1) a power-law upper bound on the degree distribution (PLB-U) and (2) power-law neighborhoods, that is, the degree distribution of neighbors of each vertex is also upper bounded by a power law (PLB-N). They showed that many real-world networks satisfy both properties and exploit them to design faster algorithms for a number of classical graph problems. We complement their work by showing that some well-studied random graph models exhibit both of the mentioned PLB properties. PLB-U and PLB-N hold with high probability for Chung-Lu Random Graphs and Geometric Inhomogeneous Random Graphs and almost surely for Hyperbolic Random Graphs. As a consequence, all results of Brach et al. also hold with high probability or almost surely for those random graph classes. In the second part we study three classical NP-hard optimization problems on PLB networks. It is known that on general graphs with maximum degree Delta, a greedy algorithm, which chooses nodes in the order of their degree, only achieves a Omega (ln Delta)-approximation forMinimum Vertex Cover and Minimum Dominating Set, and a Omega(Delta)-approximation forMaximum Independent Set. We prove that the PLB-U property with beta>2 suffices for the greedy approach to achieve a constant-factor approximation for all three problems. We also show that these problems are APX-hard even if PLB-U, PLB-N, and an additional power-law lower bound on the degree distribution hold. Hence, a PTAS cannot be expected unless P = NP. Furthermore, we prove that all three problems are in MAX SNP if the PLB-U property holds.
KW - random graphs
KW - deterministic properties
KW - power-law
KW - approximation
KW - APX-hardness
Y1 - 2020
U6 - https://doi.org/10.1007/s00453-020-00729-z
SN - 0178-4617
SN - 1432-0541
VL - 82
IS - 11
SP - 3338
EP - 3389
PB - Springer
CY - New York
ER -
TY - JOUR
A1 - Casel, Katrin
A1 - Fernau, Henning
A1 - Gaspers, Serge
A1 - Gras, Benjamin
A1 - Schmid, Markus L.
T1 - On the complexity of the smallest grammar problem over fixed alphabets
JF - Theory of computing systems
N2 - In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an O(3(vertical bar w vertical bar)) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the "hierarchical depth" of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.
KW - grammar-based compression
KW - smallest grammar problem
KW - straight-line
KW - programs
KW - NP-completeness
KW - exact exponential-time algorithms
Y1 - 2020
U6 - https://doi.org/10.1007/s00224-020-10013-w
SN - 1432-4350
SN - 1433-0490
VL - 65
IS - 2
SP - 344
EP - 409
PB - Springer
CY - New York
ER -
TY - JOUR
A1 - Bohn, Nicolai
A1 - Kundisch, Dennis
T1 - What are we talking about when we talk about technology pivots?
BT - a Delphi study
JF - Information & management
N2 - Technology pivots were designed to help digital startups make adjustments to the technology underpinning their products and services. While academia and the media make liberal use of the term "technology pivot," they rarely align themselves to Ries' foundational conceptualization. Recent research suggests that a more granulated conceptualization of technology pivots is required. To scientifically derive a comprehensive conceptualization, we conduct a Delphi study with a panel of 38 experts drawn from academia and practice to explore their understanding of "technology pivots." Our study thus makes an important contribution to advance the seminal work by Ries on technology pivots.
KW - digital startup
KW - lean startup approach
KW - technology pivot
KW - conceptualization
KW - Delphi study
Y1 - 2020
U6 - https://doi.org/10.1016/j.im.2020.103319
SN - 0378-7206
SN - 1872-7530
VL - 57
IS - 6
PB - Elsevier
CY - Amsterdam
ER -
TY - JOUR
A1 - Birnick, Johann
A1 - Bläsius, Thomas
A1 - Friedrich, Tobias
A1 - Naumann, Felix
A1 - Papenbrock, Thorsten
A1 - Schirneck, Friedrich Martin
T1 - Hitting set enumeration with partial information for unique column combination discovery
JF - Proceedings of the VLDB Endowment
N2 - Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations.
In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Y1 - 2020
U6 - https://doi.org/10.14778/3407790.3407824
SN - 2150-8097
VL - 13
IS - 11
SP - 2270
EP - 2283
PB - Association for Computing Machinery
CY - [New York, NY]
ER -
TY - JOUR
A1 - Barkowsky, Matthias
A1 - Giese, Holger
T1 - Hybrid search plan generation for generalized graph pattern matching
JF - Journal of logical and algebraic methods in programming
N2 - In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of interconnected data. To extract useful information from the growing network structures, efficient querying techniques are required.
In this paper, we propose an approach for graph pattern matching that allows a uniform handling of arbitrary constraints over the query vertices. Our technique builds on a previously introduced matching algorithm, which takes concrete host graph information into account to dynamically adapt the employed search plan during query execution. The dynamic algorithm is combined with an existing static approach for search plan generation, resulting in a hybrid technique which we further extend by a more sophisticated handling of filtering effects caused by constraint checks. We evaluate the presented concepts empirically based on an implementation for our graph pattern matching tool, the Story Diagram Interpreter, with queries and data provided by the LDBC Social Network Benchmark. Our results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.
KW - graph pattern matching
KW - search plan generation
Y1 - 2020
U6 - https://doi.org/10.1016/j.jlamp.2020.100563
SN - 2352-2208
VL - 114
PB - Elsevier
CY - New York
ER -
TY - JOUR
A1 - Adnan, Hassan Sami
A1 - Srsic, Amanda
A1 - Venticich, Pete Milos
A1 - Townend, David M.R.
T1 - Using AI for mental health analysis and prediction in school surveys
JF - European journal of public health
N2 - Background:
Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper- based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models.
Methods:
For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases.
Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning
can then be deployed to classify specific behaviour and mental health patterns.
Results:
We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through
rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project.
Conclusions:
This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns
of unknown effects.
KW - ethics
KW - artificial intelligence
KW - adolescent
KW - child
KW - confidentiality
KW - health personnel
KW - mental disorders
KW - mental health
KW - personal satisfaction
KW - privacy
KW - school (environment)
KW - statutes and laws
KW - public health medicine
KW - surveillance
KW - medical
KW - prevention
KW - datasets
KW - machine learning
KW - supervised machine learning
KW - data analysis
Y1 - 2020
U6 - https://doi.org/10.1093/eurpub/ckaa165.336
SN - 1101-1262
SN - 1464-360X
VL - 30
SP - V125
EP - V125
PB - Oxford Univ. Press
CY - Oxford [u.a.]
ER -
TY - JOUR
A1 - Adnan, Hassan Sami
A1 - Matthews, Sam
A1 - Hackl, M.
A1 - Das, P. P.
A1 - Manaswini, Manisha
A1 - Gadamsetti, S.
A1 - Filali, Maroua
A1 - Owoyele, Babajide
A1 - Santuber, Joaquín
A1 - Edelman, Jonathan
T1 - Human centered AI design for clinical monitoring and data management
JF - European journal of public health : official journal of the European Health Association
N2 - In clinical settings, significant resources are spent on data collection and monitoring patients' health parameters to improve decision-making and provide better care. With increased digitization, the healthcare sector is shifting towards implementing digital technologies for data management and in administration. New technologies offer better treatment opportunities and streamline clinical workflow, but the complexity can cause ineffectiveness, frustration, and errors. To address this, we believe digital solutions alone are not sufficient. Therefore, we take a human-centred design approach for AI development, and apply systems engineering methods to identify system leverage points. We demonstrate how automation enables monitoring clinical parameters, using existing non-intrusive sensor technology, resulting in more resources toward patient care. Furthermore, we provide a framework on digitization of clinical data for integration with data management.
Y1 - 2020
U6 - https://doi.org/10.1093/eurpub/ckaa165.225
SN - 1101-1262
SN - 1464-360X
VL - 30
IS - 5
SP - V86
EP - V86
PB - Oxford Univ. Press
CY - Oxford
ER -