TY - CHAP A1 - Hartmann, Anika M. A1 - Kandil, Farid I. A1 - Steckhan, Nico A1 - Häupl, Thomas A1 - Kessler, Christian S. A1 - Michalsen, Andreas A1 - Koppold-Liebscher, Daniela A. T1 - Rheumatoid arthritis benefits from fasting and plant-based diet: an exploratory randomized controlled trial (NUTRIFAST) T2 - Annals of the rheumatic diseases Y1 - 2022 U6 - https://doi.org/10.1136/annrheumdis-2022-eular.452 SN - 0003-4967 SN - 1468-2060 VL - 81 SP - 558 EP - 559 PB - BMJ Publishing Group CY - London ER - TY - CHAP A1 - Masanneck, Lars A1 - Räuber, S. A1 - Gieseler, Pauline A1 - Ruck, T. A1 - Stern, Ariel Dora A1 - Meuth, S. G. A1 - Pawlitzki, M. T1 - Geography and a changing technology landscape: analysing coverage of German multiple sclerosis care networks and digital health technology adoption in multiple sclerosis trials T2 - Multiple sclerosis journal Y1 - 2022 U6 - https://doi.org/10.1177/13524585221123687 SN - 1352-4585 SN - 1477-0970 VL - 28 IS - Supplement 3 SP - 492 EP - 493 PB - Sage CY - London ER - TY - JOUR A1 - Chromik, Jonas A1 - Klopfenstein, Sophie Anne Ines A1 - Pfitzner, Bjarne A1 - Sinno, Zeena-Carola A1 - Arnrich, Bert A1 - Balzer, Felix A1 - Poncette, Akira-Sebastian T1 - Computational approaches to alleviate alarm fatigue in intensive care medicine: a systematic literature review JF - Frontiers in digital health N2 - Patient monitoring technology has been used to guide therapy and alert staff when a vital sign leaves a predefined range in the intensive care unit (ICU) for decades. However, large amounts of technically false or clinically irrelevant alarms provoke alarm fatigue in staff leading to desensitisation towards critical alarms. With this systematic review, we are following the Preferred Reporting Items for Systematic Reviews (PRISMA) checklist in order to summarise scientific efforts that aimed to develop IT systems to reduce alarm fatigue in ICUs. 69 peer-reviewed publications were included. The majority of publications targeted the avoidance of technically false alarms, while the remainder focused on prediction of patient deterioration or alarm presentation. The investigated alarm types were mostly associated with heart rate or arrhythmia, followed by arterial blood pressure, oxygen saturation, and respiratory rate. Most publications focused on the development of software solutions, some on wearables, smartphones, or headmounted displays for delivering alarms to staff. The most commonly used statistical models were tree-based. In conclusion, we found strong evidence that alarm fatigue can be alleviated by IT-based solutions. However, future efforts should focus more on the avoidance of clinically non-actionable alarms which could be accelerated by improving the data availability. KW - alarm fatigue KW - alarm management KW - alarm optimisation KW - intensive care unit KW - IT system KW - patient monitoring KW - ICU KW - critical care Y1 - 2022 U6 - https://doi.org/10.3389/fdgth.2022.843747 SN - 2673-253X VL - 4 PB - Frontiers Media CY - Lausanne ER - TY - JOUR A1 - Petzolt, Sophie A1 - Hölzle, Katharina A1 - Kullik, Oliver A1 - Gergeleit, Wiebke A1 - Radunski, Anne T1 - Organisational digital transformation of SMEs—development and application of a digital transformation maturity model for business model transformation JF - International journal of innovation in management N2 - One of the most challenging difficulties for incumbent organisations, especially small- and medium-sized enterprises (SMEs), is to manage digital transformation driven by technological change. Incumbent organisations' responses to digital transformation have been extensively studied in the current literature. However, most research neglects digital transformation in SMEs. There are hardly any valid developed measures for the maturity of digital transformation. We present a holistic digital transformation maturity model based on an extensive literature review, qualitative computer-assisted data analysis, and empirical findings. The digital transformation maturity model focuses on small- and medium-sized enterprises' unique features and characteristics. We proved the practical applicability and relevance of the digital transformation maturity model in an extensive study involving various organisations, particularly German SMEs (n = 310). Organisations can use this model to assess themselves initially and, through this process, gain a comprehensive understanding of the multiple forms of digital transformation. KW - organisational digital transformation KW - German Mittelstand KW - SMEs KW - maturity model KW - business model transformation KW - organizational change Y1 - 2022 U6 - https://doi.org/10.1142/S1363919622400175 SN - 1363-9196 SN - 1757-5877 VL - 26 IS - 3 PB - World Scientific Publ. CY - Singapore ER - TY - JOUR A1 - Andjelkovic, Marko A1 - Marjanovic, Milos A1 - Drasko, Bojan A1 - Calligaro, Cristiano A1 - Schrape, Oliver A1 - Gatti, Umberto A1 - Kuentzer, Felipe A. A1 - Ilic, Stefan A1 - Ristic, Goran A1 - Krstić, Miloš T1 - Analysis of single event transient effects in standard delay cells based on decoupling capacitors JF - Journal of circuits, systems, and computers : JCSC N2 - Single Event Transients (SETs), i.e., voltage glitches induced in combinational logic as a result of the passage of energetic particles, represent an increasingly critical reliability threat for modern complementary metal oxide semiconductor (CMOS) integrated circuits (ICs) employed in space missions. In rad-hard ICs implemented with standard digital cells, special design techniques should be applied to reduce the Soft Error Rate (SER) due to SETs. To this end, it is essential to consider the SET robustness of individual standard cells. Among the wide range of logic cells available in standard cell libraries, the standard delay cells (SDCs) implemented with the skew-sized inverters are exceptionally vulnerable to SETs. Namely, the SET pulses induced in these cells may be hundreds of picoseconds longer than those in other standard cells. In this work, an alternative design of a SDC based on two inverters and two decoupling capacitors is introduced. Electrical simulations have shown that the propagation delay and SET robustness of the proposed delay cell are strongly influenced by the transistor sizes and supply voltage, while the impact of temperature is moderate. The proposed design is more tolerant to SETs than the SDCs with skew-sized inverters, and occupies less area compared to the hardening configurations based on partial and complete duplication. Due to the low transistor count (only six transistors), the proposed delay cell could also be used as a SET filter. KW - single event transients KW - standard delay cells KW - decoupling capacitors Y1 - 2022 U6 - https://doi.org/10.1142/S0218126622400072 SN - 0218-1266 SN - 1793-6454 VL - 31 IS - 18 PB - World Scientific CY - Singapore [u.a.] ER - TY - JOUR A1 - Brönneke, Jan Benedikt A1 - Müller, Jennifer A1 - Mouratis, Konstantinos A1 - Hagen, Julia A1 - Stern, Ariel Dora T1 - Regulatory, legal, and market aspects of smart wearables for cardiac monitoring JF - Sensors N2 - In the area of cardiac monitoring, the use of digitally driven technologies is on the rise. While the development of medical products is advancing rapidly, allowing for new use-cases in cardiac monitoring and other areas, regulatory and legal requirements that govern market access are often evolving slowly, sometimes creating market barriers. This article gives a brief overview of the existing clinical studies regarding the use of smart wearables in cardiac monitoring and provides insight into the main regulatory and legal aspects that need to be considered when such products are intended to be used in a health care setting. Based on this brief overview, the article elaborates on the specific requirements in the main areas of authorization/certification and reimbursement/compensation, as well as data protection and data security. Three case studies are presented as examples of specific market access procedures: the USA, Germany, and Belgium. This article concludes that, despite the differences in specific requirements, market access pathways in most countries are characterized by a number of similarities, which should be considered early on in product development. The article also elaborates on how regulatory and legal requirements are currently being adapted for digitally driven wearables and proposes an ongoing evolution of these requirements to facilitate market access for beneficial medical technology in the future. KW - medical devices KW - regulation KW - market access KW - smart wearables Y1 - 2021 U6 - https://doi.org/10.3390/s21144937 SN - 1424-8220 VL - 21 IS - 14 PB - MDPI CY - Basel ER - TY - JOUR A1 - Owoyele, Babajide A1 - Trujillo, James A1 - de Melo, Gerard A1 - Pouw, Wim T1 - Masked-Piper: masking personal identities in visual recordings while preserving multimodal information JF - SoftwareX N2 - In this increasingly data-rich world, visual recordings of human behavior are often unable to be shared due to concerns about privacy. Consequently, data sharing in fields such as behavioral science, multimodal communication, and human movement research is often limited. In addition, in legal and other non-scientific contexts, privacy-related concerns may preclude the sharing of video recordings and thus remove the rich multimodal context that humans recruit to communicate. Minimizing the risk of identity exposure while preserving critical behavioral information would maximize utility of public resources (e.g., research grants) and time invested in audio-visual research. Here we present an open-source computer vision tool that masks the identities of humans while maintaining rich information about communicative body movements. Furthermore, this masking tool can be easily applied to many videos, leveraging computational tools to augment the reproducibility and accessibility of behavioral research. The tool is designed for researchers and practitioners engaged in kinematic and affective research. Application areas include teaching/education, communication and human movement research, CCTV, and legal contexts. KW - multimodal communication KW - kinematic research KW - data privacy KW - open science KW - masking KW - research reproducibility Y1 - 2022 U6 - https://doi.org/10.1016/j.softx.2022.101236 SN - 2352-7110 VL - 20 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Datta, Suparno A1 - Morassi Sasso, Ariane A1 - Kiwit, Nina A1 - Bose, Subhronil A1 - Nadkarni, Girish A1 - Miotto, Riccardo A1 - Böttinger, Erwin P. T1 - Predicting hypertension onset from longitudinal electronic health records with deep learning JF - JAMIA Open N2 - Objective: Hypertension has long been recognized as one of the most important predisposing factors for cardiovascular diseases and mortality. In recent years, machine learning methods have shown potential in diagnostic and predictive approaches in chronic diseases. Electronic health records (EHRs) have emerged as a reliable source of longitudinal data. The aim of this study is to predict the onset of hypertension using modern deep learning (DL) architectures, specifically long short-term memory (LSTM) networks, and longitudinal EHRs. Materials and Methods: We compare this approach to the best performing models reported from previous works, particularly XGboost, applied to aggregated features. Our work is based on data from 233 895 adult patients from a large health system in the United States. We divided our population into 2 distinct longitudinal datasets based on the diagnosis date. To ensure generalization to unseen data, we trained our models on the first dataset (dataset A "train and validation") using cross-validation, and then applied the models to a second dataset (dataset B "test") to assess their performance. We also experimented with 2 different time-windows before the onset of hypertension and evaluated the impact on model performance. Results: With the LSTM network, we were able to achieve an area under the receiver operating characteristic curve value of 0.98 in the "train and validation" dataset A and 0.94 in the "test" dataset B for a prediction time window of 1 year. Lipid disorders, type 2 diabetes, and renal disorders are found to be associated with incident hypertension. Conclusion: These findings show that DL models based on temporal EHR data can improve the identification of patients at high risk of hypertension and corresponding driving factors. In the long term, this work may support identifying individuals who are at high risk for developing hypertension and facilitate earlier intervention to prevent the future development of hypertension. KW - machine learning KW - electronic health records KW - deep learning KW - hypertension Y1 - 2022 U6 - https://doi.org/10.1093/jamiaopen/ooac097 SN - 2574-2531 VL - 5 IS - 4 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Katzmann, Maximilian A1 - Meyer, Ulrich A1 - Penschuck, Manuel A1 - Weyand, Christopher T1 - Efficiently generating geometric inhomogeneous and hyperbolic random graphs JF - Network Science N2 - Hyperbolic random graphs (HRGs) and geometric inhomogeneous random graphs (GIRGs) are two similar generative network models that were designed to resemble complex real-world networks. In particular, they have a power-law degree distribution with controllable exponent beta and high clustering that can be controlled via the temperature T. We present the first implementation of an efficient GIRG generator running in expected linear time. Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to T = 0 . We also support parallelization, although this is not the focus of this paper. Moreover, we note that our generators draw from the correct probability distribution, that is, they involve no approximation. Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. This makes it possible to specify the desired expected average degree as input. Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straightforward inclusion does not hold in practice. However, the difference is negligible for most use cases. KW - hyperbolic random graphs KW - geometric inhomogeneous random graph Y1 - 2022 U6 - https://doi.org/10.1017/nws.2022.32 SN - 2050-1242 SN - 2050-1250 VL - 10 IS - 4 SP - 361 EP - 380 PB - Cambridge Univ. Press CY - New York ER - TY - JOUR A1 - Tan, Jing A1 - Khalili, Ramin A1 - Karl, Holger A1 - Hecker, Artur T1 - Multi-agent reinforcement learning for long-term network resource allocation through auction: a V2X application JF - Computer communications : the international journal for the computer and telecommunications industry N2 - We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentral-ized decision making among autonomous agents. We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation. In the static case, the mechanism provably has Nash equilibria with optimal resource allocation. In a dynamic environment, this mechanism's requirement of complete information is impossible to achieve. For such environments, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, thus greatly reducing information need. Our algorithm is also capable of learning from long-term and sparse reward signals with varying delay. Empirical results from the simulation of a V2X application confirm that through learning, agents with the learning algorithm significantly improve both system and individual performance, reducing up to 30% of offloading failure rate, communication overhead and load variation, increasing computation resource utilization and fairness. Results also confirm the algorithm's good convergence and generalization property in different environments. KW - offloading KW - distributed systems KW - reinforcement learning KW - decentralized decision-making Y1 - 2022 U6 - https://doi.org/10.1016/j.comcom.2022.07.047 SN - 0140-3664 SN - 1873-703X VL - 194 SP - 333 EP - 347 PB - Elsevier Science CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Reimann, Max A1 - Buchheim, Benito A1 - Semmo, Amir A1 - Döllner, Jürgen A1 - Trapp, Matthias T1 - Controlling strokes in fast neural style transfer using content transforms JF - The visual computer : international journal of computer graphics N2 - Fast style transfer methods have recently gained popularity in art-related applications as they make a generalized real-time stylization of images practicable. However, they are mostly limited to one-shot stylizations concerning the interactive adjustment of style elements. In particular, the expressive control over stroke sizes or stroke orientations remains an open challenge. To this end, we propose a novel stroke-adjustable fast style transfer network that enables simultaneous control over the stroke size and intensity, and allows a wider range of expressive editing than current approaches by utilizing the scale-variance of convolutional neural networks. Furthermore, we introduce a network-agnostic approach for style-element editing by applying reversible input transformations that can adjust strokes in the stylized output. At this, stroke orientations can be adjusted, and warping-based effects can be applied to stylistic elements, such as swirls or waves. To demonstrate the real-world applicability of our approach, we present StyleTune, a mobile app for interactive editing of neural style transfers at multiple levels of control. Our app allows stroke adjustments on a global and local level. It furthermore implements an on-device patch-based upsampling step that enables users to achieve results with high output fidelity and resolutions of more than 20 megapixels. Our approach allows users to art-direct their creations and achieve results that are not possible with current style transfer applications. Y1 - 2022 U6 - https://doi.org/10.1007/s00371-077-07518-x SN - 0178-2789 SN - 1432-2315 VL - 38 SP - 4019 EP - 4033 PB - Springer CY - New York ER - TY - JOUR A1 - Wenig, Phillip A1 - Schmidl, Sebastian A1 - Papenbrock, Thorsten T1 - TimeEval: a benchmarking toolkit for time series anomaly detection algorithms JF - Proceedings of the VLDB Endowment N2 - Detecting anomalous subsequences in time series is an important task in time series analytics because it serves the identification of special events, such as production faults, delivery bottlenecks, system defects, or heart flicker. Consequently, many algorithms have been developed for the automatic detection of such anomalous patterns. The enormous number of approaches (i.e., more than 158 as of today), the lack of properly labeled test data, and the complexity of time series anomaly benchmarking have, though, led to a situation where choosing the best detection technique for a given anomaly detection task is a difficult challenge. In this demonstration, we present TIMEEVAL, an extensible, scalable and automatic benchmarking toolkit for time series anomaly detection algorithms. TIMEEVAL includes an extensive data generator and supports both interactive and batch evaluation scenarios. With our novel toolkit, we aim to ease the evaluation effort and help the community to provide more meaningful evaluations. Y1 - 2022 U6 - https://doi.org/10.14778/3554821.3554873 SN - 2150-8097 VL - 15 IS - 12 SP - 3678 EP - 3681 PB - Association for Computing Machinery CY - New York, NY ER - TY - JOUR A1 - Simonini, Giovanni A1 - Zecchini, Luca A1 - Bergamaschi, Sonia A1 - Naumann, Felix T1 - Entity resolution on-demand JF - Proceedings of the VLDB Endowment N2 - Entity Resolution (ER) aims to identify and merge records that refer to the same real-world entity. ER is typically employed as an expensive cleaning step on the entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely on the user's application, which may need only a fraction of them. For instance, when dealing with Web data, we would like to be able to filter the entities of interest gathered from multiple sources without cleaning the entire, continuously-growing data. Similarly, when querying data lakes, we want to transform data on-demand and return the results in a timely manner-a fundamental requirement of ELT (Extract-Load-Transform) pipelines. We propose BrewER, a framework to evaluate SQL SP queries on dirty data while progressively returning results as if they were issued on cleaned data. BrewER tries to focus the cleaning effort on one entity at a time, following an ORDER BY predicate. Thus, it inherently supports top-k and stop-and-resume execution. For a wide range of applications, a significant amount of resources can be saved. We exhaustively evaluate and show the efficacy of BrewER on four real-world datasets. Y1 - 2022 U6 - https://doi.org/10.14778/3523210.3523226 SN - 2150-8097 VL - 15 IS - 7 SP - 1506 EP - 1518 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Benson, Lawrence A1 - Papke, Leon A1 - Rabl, Tilmann T1 - PerMA-Bench: benchmarking persistent memory access JF - Proceedings of the VLDB Endowment N2 - Persistent memory's (PMem) byte-addressability and persistence at DRAM-like speed with SSD-like capacity have the potential to cause a major performance shift in database storage systems. With the availability of Intel Optane DC Persistent Memory, initial benchmarks evaluate the performance of real PMem hardware. However, these results apply to only a single server and it is not yet clear how workloads compare across different PMem servers. In this paper, we propose PerMA-Bench, a con.gurable benchmark framework that allows users to evaluate the bandwidth, latency, and operations per second for customizable database-related PMem access. Based on PerMA-Bench, we perform an extensive evaluation of PMem performance across four di.erent server configurations, containing both first- and second-generation Optane, with additional parameters such as DIMM power budget and number of DIMMs per server. We validate our results with existing systems and show the impact of low-level design choices. We conduct a price-performance comparison that shows while there are large differences across Optane DIMMs, PMem is generally competitive with DRAM. We discuss our findings and identify eight general and implementation-specific aspects that influence PMem performance and should be considered in future work to improve PMem-aware designs. Y1 - 2022 U6 - https://doi.org/10.14778/3551793.3551807 SN - 2150-8097 VL - 15 IS - 11 SP - 2463 EP - 2476 PB - Association for Computing Machinery CY - New York, NY ER - TY - JOUR A1 - Konak, Orhan A1 - van de Water, Robin A1 - Döring, Valentin A1 - Fiedler, Tobias A1 - Liebe, Lucas A1 - Masopust, Leander A1 - Postnov, Kirill A1 - Sauerwald, Franz A1 - Treykorn, Felix A1 - Wischmann, Alexander A1 - Gjoreski, Hristijan A1 - Luštrek, Mitja A1 - Arnrich, Bert T1 - HARE BT - unifying the human activity recognition engineering workflow JF - Sensors N2 - Sensor-based human activity recognition is becoming ever more prevalent. The increasing importance of distinguishing human movements, particularly in healthcare, coincides with the advent of increasingly compact sensors. A complex sequence of individual steps currently characterizes the activity recognition pipeline. It involves separate data collection, preparation, and processing steps, resulting in a heterogeneous and fragmented process. To address these challenges, we present a comprehensive framework, HARE, which seamlessly integrates all necessary steps. HARE offers synchronized data collection and labeling, integrated pose estimation for data anonymization, a multimodal classification approach, and a novel method for determining optimal sensor placement to enhance classification results. Additionally, our framework incorporates real-time activity recognition with on-device model adaptation capabilities. To validate the effectiveness of our framework, we conducted extensive evaluations using diverse datasets, including our own collected dataset focusing on nursing activities. Our results show that HARE’s multimodal and on-device trained model outperforms conventional single-modal and offline variants. Furthermore, our vision-based approach for optimal sensor placement yields comparable results to the trained model. Our work advances the field of sensor-based human activity recognition by introducing a comprehensive framework that streamlines data collection and classification while offering a novel method for determining optimal sensor placement. KW - human activity recognition KW - multimodal classification KW - privacy preservation KW - real-time classification KW - sensor placement Y1 - 2023 U6 - https://doi.org/10.3390/s23239571 SN - 1424-8220 VL - 23 IS - 23 PB - MDPI CY - Basel ER - TY - JOUR A1 - Zhou, Lin A1 - Fischer, Eric A1 - Brahms, Clemens Markus A1 - Granacher, Urs A1 - Arnrich, Bert T1 - DUO-GAIT BT - a gait dataset for walking under dual-task and fatigue conditions with inertial measurement units JF - Scientific data N2 - In recent years, there has been a growing interest in developing and evaluating gait analysis algorithms based on inertial measurement unit (IMU) data, which has important implications, including sports, assessment of diseases, and rehabilitation. Multi-tasking and physical fatigue are two relevant aspects of daily life gait monitoring, but there is a lack of publicly available datasets to support the development and testing of methods using a mobile IMU setup. We present a dataset consisting of 6-minute walks under single- (only walking) and dual-task (walking while performing a cognitive task) conditions in unfatigued and fatigued states from sixteen healthy adults. Especially, nine IMUs were placed on the head, chest, lower back, wrists, legs, and feet to record under each of the above-mentioned conditions. The dataset also includes a rich set of spatio-temporal gait parameters that capture the aspects of pace, symmetry, and variability, as well as additional study-related information to support further analysis. This dataset can serve as a foundation for future research on gait monitoring in free-living environments. Y1 - 2023 U6 - https://doi.org/10.1038/s41597-023-02391-w SN - 2052-4463 VL - 10 IS - 1 PB - Nature Publ. Group CY - London ER - TY - JOUR A1 - Anders, Christoph A1 - Arnrich, Bert T1 - Wearable electroencephalography and multi-modal mental state classification: a systematic literature review JF - Computers in biology and medicine : an international journal N2 - Background: Wearable multi-modal time-series classification applications outperform their best uni-modal counterparts and hold great promise. A modality that directly measures electrical correlates from the brain is electroencephalography. Due to varying noise sources, different key brain regions, key frequency bands, and signal characteristics like non-stationarity, techniques for data pre-processing and classification algorithms are task-dependent. Method: Here, a systematic literature review on mental state classification for wearable electroencephalog-raphy is presented. Four search terms in different combinations were used for an in-title search. The search was executed on the 29th of June 2022, across Google Scholar, PubMed, IEEEXplore, and ScienceDirect. 76 most relevant publications were set into context as the current state-of-the-art in mental state time-series classification. Results: Pre-processing techniques, features, and time-series classification models were analyzed. Across publications, a window length of one second was mainly chosen for classification and spectral features were utilized the most. The achieved performance per time-series classification model is analyzed, finding linear discriminant analysis, decision trees, and k-nearest neighbors models outperform support-vector machines by a factor of up to 1.5. A historical analysis depicts future trends while under-reported aspects relevant to practical applications are discussed. Conclusions: Five main conclusions are given, covering utilization of available area for electrode placement on the head, most often or scarcely utilized features and time-series classification model architectures, baseline reporting practices, as well as explainability and interpretability of Deep Learning. The importance of a 'test battery' assessing the influence of data pre-processing and multi-modality on time-series classification performance is emphasized. KW - wearable electroencephalography KW - systematic literature review KW - mental state classification KW - time-series classification KW - affective computing KW - data pre-processing KW - feature extraction KW - reproducibility KW - multi-modality KW - filtering Y1 - 2022 U6 - https://doi.org/10.1016/j.compbiomed.2022.106088 SN - 0010-4825 SN - 1879-0534 VL - 150 PB - Elsevier Science CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Nordmeyer, Sarah A1 - Kraus, Milena A1 - Ziehm, Matthias A1 - Kirchner, Marieluise A1 - Schafstedde, Marie A1 - Kelm, Marcus A1 - Niquet, Sylvia A1 - Stephen, Mariet Mathew A1 - Baczko, Istvan A1 - Knosalla, Christoph A1 - Schapranow, Matthieu-Patrick A1 - Dittmar, Gunnar A1 - Gotthardt, Michael A1 - Falcke, Martin A1 - Regitz-Zagrosek, Vera A1 - Kuehne, Titus A1 - Mertins, Philipp T1 - Disease- and sex-specific differences in patients with heart valve disease BT - a proteome study JF - Life Science Alliance N2 - Pressure overload in patients with aortic valve stenosis and volume overload in mitral valve regurgitation trigger specific forms of cardiac remodeling; however, little is known about similarities and differences in myocardial proteome regulation. We performed proteome profiling of 75 human left ventricular myocardial biopsies (aortic stenosis = 41, mitral regurgitation = 17, and controls = 17) using high-resolution tandem mass spectrometry next to clinical and hemodynamic parameter acquisition. In patients of both disease groups, proteins related to ECM and cytoskeleton were more abundant, whereas those related to energy metabolism and proteostasis were less abundant compared with controls. In addition, disease group-specific and sex-specific differences have been observed. Male patients with aortic stenosis showed more proteins related to fibrosis and less to energy metabolism, whereas female patients showed strong reduction in proteostasis-related proteins. Clinical imaging was in line with proteomic findings, showing elevation of fibrosis in both patient groups and sex differences. Disease-and sex-specific proteomic profiles provide insight into cardiac remodeling in patients with heart valve disease and might help improve the understanding of molecular mechanisms and the development of individualized treatment strategies. Y1 - 2023 U6 - https://doi.org/10.26508/lsa.202201411 SN - 2575-1077 VL - 6 IS - 3 PB - EMBO Press CY - Heidelberg ER - TY - JOUR A1 - Taleb, Aiham A1 - Rohrer, Csaba A1 - Bergner, Benjamin A1 - De Leon, Guilherme A1 - Rodrigues, Jonas Almeida A1 - Schwendicke, Falk A1 - Lippert, Christoph A1 - Krois, Joachim T1 - Self-supervised learning methods for label-efficient dental caries classification JF - Diagnostics : open access journal N2 - High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data. In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist. This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency. In other words, the resulting models can be fine-tuned using few labels (annotations). Our results show that using as few as 18 annotations can produce >= 45% sensitivity, which is comparable to human-level diagnostic performance. This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive. KW - unsupervised methods KW - self-supervised learning KW - representation learning KW - dental caries classification KW - data driven approaches KW - annotation KW - efficient deep learning Y1 - 2022 U6 - https://doi.org/10.3390/diagnostics12051237 SN - 2075-4418 VL - 12 IS - 5 PB - MDPI CY - Basel ER - TY - JOUR A1 - Tang, Mitchell A1 - Nakamoto, Carter H. A1 - Stern, Ariel Dora A1 - Mehrotra, Ateev T1 - Trends in remote patient monitoring use in traditional medicare JF - JAMA internal medicine Y1 - 2022 U6 - https://doi.org/10.1001/jamainternmed.2022.3043 SN - 2168-6106 SN - 2168-6114 VL - 182 IS - 9 SP - 1005 EP - 1006 PB - American Medical Association CY - Chicago, Ill. ER - TY - JOUR A1 - Altenburg, Tom A1 - Giese, Sven Hans-Joachim A1 - Wang, Shengbo A1 - Muth, Thilo A1 - Renard, Bernhard Y. T1 - Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides JF - Nature machine intelligence N2 - Fragmentation of peptides leaves characteristic patterns in mass spectrometry data, which can be used to identify protein sequences, but this method is challenging for mutated or modified sequences for which limited information exist. Altenburg et al. use an ad hoc learning approach to learn relevant patterns directly from unannotated fragmentation spectra. Mass spectrometry-based proteomics provides a holistic snapshot of the entire protein set of living cells on a molecular level. Currently, only a few deep learning approaches exist that involve peptide fragmentation spectra, which represent partial sequence information of proteins. Commonly, these approaches lack the ability to characterize less studied or even unknown patterns in spectra because of their use of explicit domain knowledge. Here, to elevate unrestricted learning from spectra, we introduce 'ad hoc learning of fragmentation' (AHLF), a deep learning model that is end-to-end trained on 19.2 million spectra from several phosphoproteomic datasets. AHLF is interpretable, and we show that peak-level feature importance values and pairwise interactions between peaks are in line with corresponding peptide fragments. We demonstrate our approach by detecting post-translational modifications, specifically protein phosphorylation based on only the fragmentation spectrum without a database search. AHLF increases the area under the receiver operating characteristic curve (AUC) by an average of 9.4% on recent phosphoproteomic data compared with the current state of the art on this task. Furthermore, use of AHLF in rescoring search results increases the number of phosphopeptide identifications by a margin of up to 15.1% at a constant false discovery rate. To show the broad applicability of AHLF, we use transfer learning to also detect cross-linked peptides, as used in protein structure analysis, with an AUC of up to 94%. Y1 - 2022 U6 - https://doi.org/10.1038/s42256-022-00467-7 SN - 2522-5839 VL - 4 IS - 4 SP - 378 EP - 388 PB - Springer Nature Publishing CY - London ER - TY - JOUR A1 - Konigorski, Stefan A1 - Wernicke, Sarah A1 - Slosarek, Tamara A1 - Zenner, Alexander M. A1 - Strelow, Nils A1 - Ruether, Darius F. A1 - Henschel, Florian A1 - Manaswini, Manisha A1 - Pottbäcker, Fabian A1 - Edelman, Jonathan A. A1 - Owoyele, Babajide A1 - Danieletto, Matteo A1 - Golden, Eddye A1 - Zweig, Micol A1 - Nadkarni, Girish N. A1 - Böttinger, Erwin T1 - StudyU: a platform for designing and conducting innovative digital N-of-1 trials JF - Journal of medical internet research N2 - N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully functional platforms are not yet available. Here, we present the open source StudyU platform, which includes the StudyU Designer and StudyU app. With the StudyU Designer, scientists are given a collaborative web application to digitally specify, publish, and conduct N-of-1 trials. The StudyU app is a smartphone app with innovative user-centric elements for participants to partake in trials published through the StudyU Designer to assess the effects of different interventions on their health. Thereby, the StudyU platform allows clinicians and researchers worldwide to easily design and conduct digital N-of-1 trials in a safe manner. We envision that StudyU can change the landscape of personalized treatments both for patients and healthy individuals, democratize and personalize evidence generation for self-optimization and medicine, and can be integrated in clinical practice. KW - digital interventions KW - N-of-1 trial KW - SCED KW - single-case experimental design KW - web application KW - mobile application KW - app KW - digital health Y1 - 2022 U6 - https://doi.org/10.2196/35884 SN - 1439-4456 SN - 1438-8871 VL - 24 IS - 7 PB - Healthcare World CY - Richmond, Va. ER - TY - GEN A1 - Dellepiane, Sergio A1 - Vaid, Akhil A1 - Jaladanki, Suraj K. A1 - Coca, Steven A1 - Fayad, Zahi A. A1 - Charney, Alexander W. A1 - Böttinger, Erwin A1 - He, John Cijiang A1 - Glicksberg, Benjamin S. A1 - Chan, Lili A1 - Nadkarni, Girish T1 - Acute kidney injury in patients hospitalized with COVID-19 in New York City BT - temporal trends from March 2020 to April 2021 T2 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät T3 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 21 Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585415 SN - 2590-0595 IS - 5 ER - TY - JOUR A1 - Langenhan, Jennifer A1 - Jaeger, Carsten A1 - Baum, Katharina A1 - Simon, Mareike A1 - Lisec, Jan T1 - A flexible tool to correct superimposed mass isotopologue distributions in GC-APCI-MS flux experiments JF - Metabolites N2 - The investigation of metabolic fluxes and metabolite distributions within cells by means of tracer molecules is a valuable tool to unravel the complexity of biological systems. Technological advances in mass spectrometry (MS) technology such as atmospheric pressure chemical ionization (APCI) coupled with high resolution (HR), not only allows for highly sensitive analyses but also broadens the usefulness of tracer-based experiments, as interesting signals can be annotated de novo when not yet present in a compound library. However, several effects in the APCI ion source, i.e., fragmentation and rearrangement, lead to superimposed mass isotopologue distributions (MID) within the mass spectra, which need to be corrected during data evaluation as they will impair enrichment calculation otherwise. Here, we present and evaluate a novel software tool to automatically perform such corrections. We discuss the different effects, explain the implemented algorithm, and show its application on several experimental datasets. This adjustable tool is available as an R package from CRAN. KW - mass isotopologue distribution KW - enrichment calculation KW - flux KW - experiments KW - atmospheric pressure chemical ionization KW - R package KW - CorMID Y1 - 2022 U6 - https://doi.org/10.3390/metabo12050408 SN - 2218-1989 VL - 12 IS - 5 PB - MDPI CY - Basel ER - TY - JOUR A1 - Sinn, Ludwig R. A1 - Giese, Sven Hans-Joachim A1 - Stuiver, Marchel A1 - Rappsilber, Juri T1 - Leveraging parameter dependencies in high-field asymmetric waveform ion-mobility spectrometry and size exclusion chromatography for proteome-wide cross-linking mass spectrometry JF - Analytical chemistry : the authoritative voice of the analytical community N2 - Ion-mobility spectrometry shows great promise to tackle analytically challenging research questions by adding another separation dimension to liquid chromatography-mass spectrometry. The understanding of how analyte properties influence ion mobility has increased through recent studies, but no clear rationale for the design of customized experimental settings has emerged. Here, we leverage machine learning to deepen our understanding of field asymmetric waveform ion-mobility spectrometry for the analysis of cross-linked peptides. Knowing that predominantly m/z and then the size and charge state of an analyte influence the separation, we found ideal compensation voltages correlating with the size exclusion chromatography fraction number. The effect of this relationship on the analytical depth can be substantial as exploiting it allowed us to almost double unique residue pair detections in a proteome-wide cross-linking experiment. Other applications involving liquid- and gas-phase separation may also benefit from considering such parameter dependencies. Y1 - 2022 U6 - https://doi.org/10.1021/acs.analchem.1c04373 SN - 0003-2700 SN - 1520-6882 VL - 94 IS - 11 SP - 4627 EP - 4634 PB - American Chemical Society CY - Columbus, Ohio ER - TY - JOUR A1 - Gevay, Gabor E. A1 - Rabl, Tilmann A1 - Bress, Sebastian A1 - Maclai-Tahy, Lorand A1 - Quiane-Ruiz, Jorge-Arnulfo A1 - Markl, Volker T1 - Imperative or functional control flow handling BT - why not the best of both worlds? JF - SIGMOD record N2 - Modern data analysis tasks often involve control flow statements, such as the iterations in PageRank and K-means. To achieve scalability, developers usually implement these tasks in distributed dataflow systems, such as Spark and Flink. Designers of such systems have to choose between providing imperative or functional control flow constructs to users. Imperative constructs are easier to use, but functional constructs are easier to compile to an efficient dataflow job. We propose Mitos, a system where control flow is both easy to use and efficient. Mitos relies on an intermediate representation based on the static single assignment form. This allows us to abstract away from specific control flow constructs and treat any imperative control flow uniformly both when building the dataflow job and when coordinating the distributed execution. Y1 - 2022 U6 - https://doi.org/10.1145/3542700.3542715 SN - 0163-5808 SN - 1943-5835 VL - 51 IS - 1 SP - 60 EP - 67 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Verweij, Marco A1 - Ney, Steven A1 - Thompson, Michael T1 - Cultural Theory’s contributions to climate science BT - reply to Hansson JF - European journal for philosophy of science N2 - In his article, 'Social constructionism and climate science denial', Hansson claims to present empirical evidence that the cultural theory developed by Dame Mary Douglas, Aaron Wildavsky and ourselves (among others) leads to (climate) science denial. In this reply, we show that there is no validity to these claims. First, we show that Hansson's empirical evidence that cultural theory has led to climate science denial falls apart under closer inspection. Contrary to Hansson's claims, cultural theory has made significant contributions to understanding and addressing climate change. Second, we discuss various features of Douglas' cultural theory that differentiate it from other constructivist approaches and make it compatible with the scientific method. Thus, we also demonstrate that cultural theory cannot be accused of epistemic relativism. KW - Mary Douglas KW - Aaron Wildavsky KW - Cultural theory KW - Climate change Y1 - 2022 U6 - https://doi.org/10.1007/s13194-022-00464-y SN - 1879-4912 SN - 1879-4920 VL - 12 IS - 2 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Boissier, Martin T1 - Robust and budget-constrained encoding configurations for in-memory database systems JF - Proceedings of the VLDB Endowment N2 - Data encoding has been applied to database systems for decades as it mitigates bandwidth bottlenecks and reduces storage requirements. But even in the presence of these advantages, most in-memory database systems use data encoding only conservatively as the negative impact on runtime performance can be severe. Real-world systems with large parts being infrequently accessed and cost efficiency constraints in cloud environments require solutions that automatically and efficiently select encoding techniques, including heavy-weight compression. In this paper, we introduce workload-driven approaches to automaticaly determine memory budget-constrained encoding configurations using greedy heuristics and linear programming. We show for TPC-H, TPC-DS, and the Join Order Benchmark that optimized encoding configurations can reduce the main memory footprint significantly without a loss in runtime performance over state-of-the-art dictionary encoding. To yield robust selections, we extend the linear programming-based approach to incorporate query runtime constraints and mitigate unexpected performance regressions. KW - General Earth and Planetary Sciences KW - Water Science and Technology KW - Geography, Planning and Development Y1 - 2021 U6 - https://doi.org/10.14778/3503585.3503588 SN - 2150-8097 VL - 15 IS - 4 SP - 780 EP - 793 PB - Association for Computing Machinery (ACM) CY - [New York] ER - TY - JOUR A1 - Björk, Jennie A1 - Hölzle, Katharina A1 - Boer, Harry T1 - ‘What will we learn from the current crisis?’ JF - Creativity and innovation management Y1 - 2021 U6 - https://doi.org/10.1111/caim.12442 SN - 0963-1690 SN - 1467-8691 VL - 30 IS - 2 SP - 231 EP - 232 PB - Wiley-Blackwell CY - Oxford [u.a.] ER - TY - JOUR A1 - Bonifati, Angela A1 - Mior, Michael J. A1 - Naumann, Felix A1 - Noack, Nele Sina T1 - How inclusive are we? BT - an analysis of gender diversity in database venues JF - SIGMOD record / Association for Computing Machinery, Special Interest Group on Management of Data N2 - ACM SIGMOD, VLDB and other database organizations have committed to fostering an inclusive and diverse community, as do many other scientific organizations. Recently, different measures have been taken to advance these goals, especially for underrepresented groups. One possible measure is double-blind reviewing, which aims to hide gender, ethnicity, and other properties of the authors.
We report the preliminary results of a gender diversity analysis of publications of the database community across several peer-reviewed venues, and also compare women's authorship percentages in both single-blind and double-blind venues along the years. We also obtained a cross comparison of the obtained results in data management with other relevant areas in Computer Science. Y1 - 2022 U6 - https://doi.org/10.1145/3516431.3516438 SN - 0163-5808 SN - 1943-5835 VL - 50 IS - 4 SP - 30 EP - 35 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Reimann, Max A1 - Buchheim, Benito A1 - Semmo, Amir A1 - Döllner, Jürgen A1 - Trapp, Matthias T1 - Controlling strokes in fast neural style transfer using content transforms JF - The Visual Computer N2 - Fast style transfer methods have recently gained popularity in art-related applications as they make a generalized real-time stylization of images practicable. However, they are mostly limited to one-shot stylizations concerning the interactive adjustment of style elements. In particular, the expressive control over stroke sizes or stroke orientations remains an open challenge. To this end, we propose a novel stroke-adjustable fast style transfer network that enables simultaneous control over the stroke size and intensity, and allows a wider range of expressive editing than current approaches by utilizing the scale-variance of convolutional neural networks. Furthermore, we introduce a network-agnostic approach for style-element editing by applying reversible input transformations that can adjust strokes in the stylized output. At this, stroke orientations can be adjusted, and warping-based effects can be applied to stylistic elements, such as swirls or waves. To demonstrate the real-world applicability of our approach, we present StyleTune, a mobile app for interactive editing of neural style transfers at multiple levels of control. Our app allows stroke adjustments on a global and local level. It furthermore implements an on-device patch-based upsampling step that enables users to achieve results with high output fidelity and resolutions of more than 20 megapixels. Our approach allows users to art-direct their creations and achieve results that are not possible with current style transfer applications. Y1 - 2022 U6 - https://doi.org/10.1007/s00371-022-02518-x SN - 0178-2789 SN - 1432-2315 VL - 38 IS - 12 SP - 4019 EP - 4033 PB - Springer CY - New York ER - TY - JOUR A1 - Borchert, Florian A1 - Mock, Andreas A1 - Tomczak, Aurelie A1 - Hügel, Jonas A1 - Alkarkoukly, Samer A1 - Knurr, Alexander A1 - Volckmar, Anna-Lena A1 - Stenzinger, Albrecht A1 - Schirmacher, Peter A1 - Debus, Jürgen A1 - Jäger, Dirk A1 - Longerich, Thomas A1 - Fröhling, Stefan A1 - Eils, Roland A1 - Bougatf, Nina A1 - Sax, Ulrich A1 - Schapranow, Matthieu-Patrick T1 - Correction to: Knowledge bases and software support for variant interpretation in precision oncology JF - Briefings in bioinformatics Y1 - 2021 U6 - https://doi.org/10.1093/bib/bbab246 SN - 1467-5463 SN - 1477-4054 VL - 22 IS - 6 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Borchert, Florian A1 - Mock, Andreas A1 - Tomczak, Aurelie A1 - Hügel, Jonas A1 - Alkarkoukly, Samer A1 - Knurr, Alexander A1 - Volckmar, Anna-Lena A1 - Stenzinger, Albrecht A1 - Schirmacher, Peter A1 - Debus, Jürgen A1 - Jäger, Dirk A1 - Longerich, Thomas A1 - Fröhling, Stefan A1 - Eils, Roland A1 - Bougatf, Nina A1 - Sax, Ulrich A1 - Schapranow, Matthieu-Patrick T1 - Knowledge bases and software support for variant interpretation in precision oncology JF - Briefings in bioinformatics N2 - Precision oncology is a rapidly evolving interdisciplinary medical specialty. Comprehensive cancer panels are becoming increasingly available at pathology departments worldwide, creating the urgent need for scalable cancer variant annotation and molecularly informed treatment recommendations. A wealth of mainly academia-driven knowledge bases calls for software tools supporting the multi-step diagnostic process. We derive a comprehensive list of knowledge bases relevant for variant interpretation by a review of existing literature followed by a survey among medical experts from university hospitals in Germany. In addition, we review cancer variant interpretation tools, which integrate multiple knowledge bases. We categorize the knowledge bases along the diagnostic process in precision oncology and analyze programmatic access options as well as the integration of knowledge bases into software tools. The most commonly used knowledge bases provide good programmatic access options and have been integrated into a range of software tools. For the wider set of knowledge bases, access options vary across different parts of the diagnostic process. Programmatic access is limited for information regarding clinical classifications of variants and for therapy recommendations. The main issue for databases used for biological classification of pathogenic variants and pathway context information is the lack of standardized interfaces. There is no single cancer variant interpretation tool that integrates all identified knowledge bases. Specialized tools are available and need to be further developed for different steps in the diagnostic process. KW - HiGHmed KW - personalized medicine KW - molecular tumor board KW - data integration KW - cancer therapy Y1 - 2021 U6 - https://doi.org/10.1093/bib/bbab134 SN - 1467-5463 SN - 1477-4054 VL - 22 IS - 6 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Prill, Robert A1 - Walter, Marina A1 - Królikowska, Aleksandra A1 - Becker, Roland T1 - A systematic review of diagnostic accuracy and clinical applications of wearable movement sensors for knee joint rehabilitation JF - Sensors N2 - In clinical practice, only a few reliable measurement instruments are available for monitoring knee joint rehabilitation. Advances to replace motion capturing with sensor data measurement have been made in the last years. Thus, a systematic review of the literature was performed, focusing on the implementation, diagnostic accuracy, and facilitators and barriers of integrating wearable sensor technology in clinical practices based on a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. For critical appraisal, the COSMIN Risk of Bias tool for reliability and measurement of error was used. PUBMED, Prospero, Cochrane database, and EMBASE were searched for eligible studies. Six studies reporting reliability aspects in using wearable sensor technology at any point after knee surgery in humans were included. All studies reported excellent results with high reliability coefficients, high limits of agreement, or a few detectable errors. They used different or partly inappropriate methods for estimating reliability or missed reporting essential information. Therefore, a moderate risk of bias must be considered. Further quality criterion studies in clinical settings are needed to synthesize the evidence for providing transparent recommendations for the clinical use of wearable movement sensors in knee joint rehabilitation. KW - wearable movement sensor KW - IMU KW - motion capture KW - reliability KW - clinical KW - orthopedic Y1 - 2021 U6 - https://doi.org/10.3390/s21248221 SN - 1424-8220 VL - 21 IS - 24 PB - MDPI CY - Basel ER - TY - JOUR A1 - Chan, Lili A1 - Jaladanki, Suraj K. A1 - Somani, Sulaiman A1 - Paranjpe, Ishan A1 - Kumar, Arvind A1 - Zhao, Shan A1 - Kaufman, Lewis A1 - Leisman, Staci A1 - Sharma, Shuchita A1 - He, John Cijiang A1 - Murphy, Barbara A1 - Fayad, Zahi A. A1 - Levin, Matthew A. A1 - Böttinger, Erwin A1 - Charney, Alexander W. A1 - Glicksberg, Benjamin A1 - Coca, Steven G. A1 - Nadkarni, Girish N. T1 - Outcomes of patients on maintenance dialysis hospitalized with COVID-19 JF - Clinical journal of the American Society of Nephrology : CJASN KW - chronic dialysis KW - COVID-19 KW - end-stage kidney disease Y1 - 2021 U6 - https://doi.org/10.2215/CJN.12360720 SN - 1555-9041 SN - 1555-905X VL - 16 IS - 3 SP - 452 EP - 455 PB - American Society of Nephrology CY - Washington ER - TY - JOUR A1 - Datta, Suparno A1 - Sachs, Jan Philipp A1 - Freitas da Cruz, Harry A1 - Martensen, Tom A1 - Bode, Philipp A1 - Morassi Sasso, Ariane A1 - Glicksberg, Benjamin S. A1 - Böttinger, Erwin T1 - FIBER BT - enabling flexible retrieval of electronic health records data for clinical predictive modeling JF - JAMIA open N2 - Objectives: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER's capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process. KW - databases KW - factual KW - electronic health records KW - information storage and KW - retrieval KW - workflow KW - software/instrumentation Y1 - 2021 U6 - https://doi.org/10.1093/jamiaopen/ooab048 SN - 2574-2531 VL - 4 IS - 3 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - De Freitas, Jessica K. A1 - Johnson, Kipp W. A1 - Golden, Eddye A1 - Nadkarni, Girish N. A1 - Dudley, Joel T. A1 - Böttinger, Erwin A1 - Glicksberg, Benjamin S. A1 - Miotto, Riccardo T1 - Phe2vec BT - Automated disease phenotyping based on unsupervised embeddings from electronic health records JF - Patterns N2 - Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts. Y1 - 2021 U6 - https://doi.org/10.1016/j.patter.2021.100337 SN - 2666-3899 VL - 2 IS - 9 PB - Elsevier CY - Amsterdam ER - TY - CHAP A1 - Adnan, Hassan Sami A1 - Srsic, Amanda A1 - Venticich, Pete Milos A1 - Townend, David M.R. T1 - Using AI for mental health analysis and prediction in school surveys T2 - European journal of public health N2 - Background: Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper- based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models. Methods: For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases. Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning can then be deployed to classify specific behaviour and mental health patterns. Results: We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project. Conclusions: This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns of unknown effects. KW - ethics KW - artificial intelligence KW - adolescent KW - child KW - confidentiality KW - health personnel KW - mental disorders KW - mental health KW - personal satisfaction KW - privacy KW - school (environment) KW - statutes and laws KW - public health medicine KW - surveillance KW - medical KW - prevention KW - datasets KW - machine learning KW - supervised machine learning KW - data analysis Y1 - 2020 U6 - https://doi.org/10.1093/eurpub/ckaa165.336 SN - 1101-1262 SN - 1464-360X VL - 30 SP - V125 EP - V125 PB - Oxford Univ. Press CY - Oxford [u.a.] ER - TY - JOUR A1 - Chan, Lili A1 - Chaudhary, Kumardeep A1 - Saha, Aparna A1 - Chauhan, Kinsuk A1 - Vaid, Akhil A1 - Zhao, Shan A1 - Paranjpe, Ishan A1 - Somani, Sulaiman A1 - Richter, Felix A1 - Miotto, Riccardo A1 - Lala, Anuradha A1 - Kia, Arash A1 - Timsina, Prem A1 - Li, Li A1 - Freeman, Robert A1 - Chen, Rong A1 - Narula, Jagat A1 - Just, Allan C. A1 - Horowitz, Carol A1 - Fayad, Zahi A1 - Cordon-Cardo, Carlos A1 - Schadt, Eric A1 - Levin, Matthew A. A1 - Reich, David L. A1 - Fuster, Valentin A1 - Murphy, Barbara A1 - He, John C. A1 - Charney, Alexander W. A1 - Böttinger, Erwin A1 - Glicksberg, Benjamin A1 - Coca, Steven G. A1 - Nadkarni, Girish N. T1 - AKI in hospitalized patients with COVID-19 JF - Journal of the American Society of Nephrology : JASN N2 - Background: Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associatedwith worse outcomes. However, AKI among hospitalized patients with COVID19 in the United States is not well described. Methods: This retrospective, observational study involved a review of data from electronic health records of patients aged >= 18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. Results: Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patientswith AKI required dialysis. The proportionswith stages 1, 2, or 3 AKIwere 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. Conclusions: AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge. KW - acute renal failure KW - clinical nephrology KW - dialysis KW - COVID-19 Y1 - 2021 U6 - https://doi.org/10.1681/ASN.2020050615 SN - 1046-6673 SN - 1533-3450 VL - 32 IS - 1 SP - 151 EP - 160 PB - American Society of Nephrology CY - Washington ER - TY - JOUR A1 - Oliveira-Ciabati, Livia A1 - Loures dos Santos, Luciane A1 - Hsiou Schmaltz, Annie A1 - Sasso, Ariane Morassi A1 - Castro, Margaret de A1 - Souza, João Paulo T1 - Scientific sexism BT - the gender bias in the scientific production of the Universidade de São Paulo JF - Revista de saúde pública : publication of the Faculdade de Saúde Pública da Universidade de São Paulo = Journal of public health N2 - OBJECTIVE: To investigate gender inequity in the scientific production of the University of Sao Paulo. METHODS: Members of the University of Sao Paulo faculty are the study population. The Web of Science repository was the source of the publication metrics. We selected the measures: total publications and citations, average of citations per year and item, H-index, and history of citations between 1950 and 2019. We used the name of the faculty member as a proxy to the gender identity. We use descriptive statistics to characterize the metrics. We evaluated the scissors effect by selecting faculty members with a high H-index. The historical series of citations was projected until 2100. We carry out analyses for the general population and working time subgroups: less than 10 years, 10 to 20 years, and 20 years or more. RESULTS: Of the 8,325 faculty members, we included 3,067 (36.8%). Among those included, 1,893 (61.7%) were male and 1,174 (38.28%) female. The male gender presented higher values in the publication metrics (average of articles: M = 67.0 versus F = 49.7; average of citations/year: M = 53.9 versus F = 35.9), and H-index (M = 14.5 versus F = 12.4). Among the 100 individuals with the highest H-index (>= 37), 83% are male. The male curve grows faster in the historical series of citations, opening a difference between the groups whose separation is confirmed by the projection. DISCUSSION: Scientific production at the Universidade de Sao Paulo is subject to a gender bias. Two-thirds of the faculty are male, and hiring over the past few decades perpetuates this pattern. The large majority of high impact faculty members are male. CONCLUSION: Our analysis suggests that the Universidade de Sao Paulo will not overcome gender inequality in scientific production without substantive affirmative action. Development does not happen by chance but through choices that are affirmative, decisive, and long-term oriented. KW - Sexism KW - Scientific Publication Indicators KW - Gender Inequality Y1 - 2021 U6 - https://doi.org/10.11606/s1518-8787.2021055002939 SN - 1518-8787 VL - 55 PB - Faculdade de Saúde Pública da Universidade de São Paulo CY - São Paulo ER - TY - JOUR A1 - Long, Xiang A1 - de Melo, Gerard A1 - He, Dongliang A1 - Li, Fu A1 - Chi, Zhizhen A1 - Wen, Shilei A1 - Gan, Chuang T1 - Purely attention based local feature integration for video classification JF - IEEE Transactions on Pattern Analysis and Machine Intelligence N2 - Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios. KW - Feature extraction KW - Convolution KW - Computational modeling KW - Plugs KW - Three-dimensional displays KW - Task analysis KW - Two dimensional displays KW - Video classification KW - action recognition KW - attention mechanism KW - computer vision KW - Algorithms KW - Neural Networks KW - Computer Y1 - 2020 U6 - https://doi.org/10.1109/TPAMI.2020.3029554 SN - 0162-8828 SN - 1939-3539 SN - 2160-9292 VL - 44 IS - 4 SP - 2140 EP - 2154 PB - Inst. of Electr. and Electronics Engineers CY - Los Alamitos ER - TY - JOUR A1 - Haarmann, Stephan A1 - Holfter, Adrian A1 - Pufahl, Luise A1 - Weske, Mathias T1 - Formal framework for checking compliance of data-driven case management JF - Journal on data semantics : JoDS N2 - Business processes are often specified in descriptive or normative models. Both types of models should adhere to internal and external regulations, such as company guidelines or laws. Employing compliance checking techniques, it is possible to verify process models against rules. While traditionally compliance checking focuses on well-structured processes, we address case management scenarios. In case management, knowledge workers drive multi-variant and adaptive processes. Our contribution is based on the fragment-based case management approach, which splits a process into a set of fragments. The fragments are synchronized through shared data but can, otherwise, be dynamically instantiated and executed. We formalize case models using Petri nets. We demonstrate the formalization for design-time and run-time compliance checking and present a proof-of-concept implementation. The application of the implemented compliance checking approach to a use case exemplifies its effectiveness while designing a case model. The empirical evaluation on a set of case models for measuring the performance of the approach shows that rules can often be checked in less than a second. KW - Compliance checking KW - Case management KW - Model verification KW - Data-centric KW - processes Y1 - 2021 U6 - https://doi.org/10.1007/s13740-021-00120-3 SN - 1861-2032 SN - 1861-2040 VL - 10 IS - 1-2 SP - 143 EP - 163 PB - Springer CY - Heidelberg ER - TY - JOUR A1 - Doerr, Benjamin A1 - Kötzing, Timo T1 - Lower bounds from fitness levels made easy JF - Algorithmica N2 - One of the first and easy to use techniques for proving run time bounds for evolutionary algorithms is the so-called method of fitness levels by Wegener. It uses a partition of the search space into a sequence of levels which are traversed by the algorithm in increasing order, possibly skipping levels. An easy, but often strong upper bound for the run time can then be derived by adding the reciprocals of the probabilities to leave the levels (or upper bounds for these). Unfortunately, a similarly effective method for proving lower bounds has not yet been established. The strongest such method, proposed by Sudholt (2013), requires a careful choice of the viscosity parameters gamma(i), j, 0 <= i < j <= n. In this paper we present two new variants of the method, one for upper and one for lower bounds. Besides the level leaving probabilities, they only rely on the probabilities that levels are visited at all. We show that these can be computed or estimated without greater difficulties and apply our method to reprove the following known results in an easy and natural way. (i) The precise run time of the (1+1) EA on LEADINGONES. (ii) A lower bound for the run time of the (1+1) EA on ONEMAX, tight apart from an O(n) term. (iii) A lower bound for the run time of the (1+1) EA on long k-paths (which differs slightly from the previous result due to a small error in the latter). We also prove a tighter lower bound for the run time of the (1+1) EA on jump functions by showing that, regardless of the jump size, only with probability O(2(-n)) the algorithm can avoid to jump over the valley of low fitness. KW - First hitting time KW - Fitness level method KW - Evolutionary computation Y1 - 2022 U6 - https://doi.org/10.1007/s00453-022-00952-w SN - 0178-4617 SN - 1432-0541 PB - Springer CY - New York ER - TY - JOUR A1 - Bläsius, Thomas A1 - Freiberger, Cedric A1 - Friedrich, Tobias A1 - Katzmann, Maximilian A1 - Montenegro-Retana, Felix A1 - Thieffry, Marianne T1 - Efficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic Geometry JF - ACM Transactions on Algorithms N2 - A standard approach to accelerating shortest path algorithms on networks is the bidirectional search, which explores the graph from the start and the destination, simultaneously. In practice this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry.
To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is (O) over tilde (n(2-1/alpha) + n(1/(2 alpha)) + delta(max)) with high probability, where alpha is an element of (1/2, 1) controls the power-law exponent of the degree distribution, and dmax is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it. KW - Random graphs KW - hyperbolic geometry KW - scale-free networks KW - bidirectional shortest path Y1 - 2022 U6 - https://doi.org/10.1145/3516483 SN - 1549-6325 SN - 1549-6333 VL - 18 IS - 2 SP - 1 EP - 32 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Doerr, Benjamin A1 - Krejca, Martin Stefan T1 - A simplified run time analysis of the univariate marginal distribution algorithm on LeadingOnes JF - Theoretical computer science N2 - With elementary means, we prove a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LEADINGONES benchmark function in the desirable regime with low genetic drift. If the population size is at least quasilinear, then, with high probability, the UMDA samples the optimum in a number of iterations that is linear in the problem size divided by the logarithm of the UMDA's selection rate. This improves over the previous guarantee, obtained by Dang and Lehre (2015) via the deep level-based population method, both in terms of the run time and by demonstrating further run time gains from small selection rates. Under similar assumptions, we prove a lower bound that matches our upper bound up to constant factors. KW - Theory KW - Estimation-of-distribution algorithm KW - Run time analysis Y1 - 2021 U6 - https://doi.org/10.1016/j.tcs.2020.11.028 SN - 0304-3975 SN - 1879-2294 VL - 851 SP - 121 EP - 128 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Bano, Dorina A1 - Michael, Judith A1 - Rumpe, Bernhard A1 - Varga, Simon A1 - Weske, Mathias T1 - Process-aware digital twin cockpit synthesis from event logs JF - Journal of computer languages N2 - The engineering of digital twins and their user interaction parts with explicated processes, namely processaware digital twin cockpits (PADTCs), is challenging due to the complexity of the systems and the need for information from different disciplines within the engineering process. Therefore, it is interesting to investigate how to facilitate their engineering by using already existing data, namely event logs, and reducing the number of manual steps for their engineering. Current research lacks systematic, automated approaches to derive process-aware digital twin cockpits even though some helpful techniques already exist in the areas of process mining and software engineering. Within this paper, we present a low-code development approach that reduces the amount of hand-written code needed and uses process mining techniques to derive PADTCs. We describe what models could be derived from event log data, which generative steps are needed for the engineering of PADTCs, and how process mining could be incorporated into the resulting application. This process is evaluated using the MIMIC III dataset for the creation of a PADTC prototype for an automated hospital transportation system. This approach can be used for early prototyping of PADTCs as it needs no hand-written code in the first place, but it still allows for the iterative evolvement of the application. This empowers domain experts to create their PADTC prototypes. KW - process-aware digital twin cockpit KW - low-code development approaches KW - sensor data KW - event log KW - process mining KW - process-awareness Y1 - 2022 U6 - https://doi.org/10.1016/j.cola.2022.101121 SN - 2590-1184 SN - 2665-9182 VL - 70 PB - Elsevier CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Blaesius, Thomas A1 - Friedrich, Tobias A1 - Schirneck, Friedrich Martin T1 - The complexity of dependency detection and discovery in relational databases JF - Theoretical computer science N2 - Multi-column dependencies in relational databases come associated with two different computational tasks. The detection problem is to decide whether a dependency of a certain type and size holds in a given database, the discovery problem asks to enumerate all valid dependencies of that type. We settle the complexity of both of these problems for unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We show that the detection of UCCs and FDs is W[2]-complete when parameterized by the solution size. The discovery of inclusion-wise minimal UCCs is proven to be equivalent under parsimonious reductions to the transversal hypergraph problem of enumerating the minimal hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. We further identify the detection of INDs as one of the first natural W[3]-complete problems. The discovery of maximal INDs is shown to be equivalent to enumerating the maximal satisfying assignments of antimonotone, 3-normalized Boolean formulas. KW - data profiling KW - enumeration complexity KW - functional dependency KW - inclusion KW - dependency KW - parameterized complexity KW - parsimonious reduction KW - transversal hypergraph KW - Unique column combination KW - W[3]-completeness Y1 - 2021 U6 - https://doi.org/10.1016/j.tcs.2021.11.020 SN - 0304-3975 SN - 1879-2294 VL - 900 SP - 79 EP - 96 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Andree, Kerstin A1 - Ihde, Sven A1 - Weske, Mathias A1 - Pufahl, Luise T1 - An exception handling framework for case management JF - Software and Systems Modeling N2 - In order to achieve their business goals, organizations heavily rely on the operational excellence of their business processes. In traditional scenarios, business processes are usually well-structured, clearly specifying when and how certain tasks have to be executed. Flexible and knowledge-intensive processes are gathering momentum, where a knowledge worker drives the execution of a process case and determines the exact process path at runtime. In the case of an exception, the knowledge worker decides on an appropriate handling. While there is initial work on exception handling in well-structured business processes, exceptions in case management have not been sufficiently researched. This paper proposes an exception handling framework for stage-oriented case management languages, namely Guard Stage Milestone Model, Case Management Model and Notation, and Fragment-based Case Management. The effectiveness of the framework is evaluated with two real-world use cases showing that it covers all relevant exceptions and proposed handling strategies. KW - Exception handling KW - Knowledge-intensive processes KW - Flexible processes; KW - Case management Y1 - 2022 U6 - https://doi.org/10.1007/s10270-022-00993-3 SN - 1619-1366 SN - 1619-1374 VL - 21 IS - 3 SP - 939 EP - 962 PB - Springer CY - Heidelberg ER - TY - JOUR A1 - Koßmann, Jan A1 - Papenbrock, Thorsten A1 - Naumann, Felix T1 - Data dependencies for query optimization BT - a survey JF - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment N2 - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions. KW - Query optimization KW - Query execution KW - Data dependencies KW - Data profiling KW - Unique column combinations KW - Functional dependencies KW - Order dependencies KW - Inclusion dependencies KW - Relational data KW - SQL Y1 - 2021 U6 - https://doi.org/10.1007/s00778-021-00676-3 SN - 1066-8888 SN - 0949-877X VL - 31 IS - 1 SP - 1 EP - 22 PB - Springer CY - Berlin ; Heidelberg ; New York ER - TY - JOUR A1 - Roostapour, Vahid A1 - Neumann, Aneta A1 - Neumann, Frank A1 - Friedrich, Tobias T1 - Pareto optimization for subset selection with dynamic cost constraints JF - Artificial intelligence N2 - We consider the subset selection problem for function f with constraint bound B that changes over time. Within the area of submodular optimization, various greedy approaches are commonly used. For dynamic environments we observe that the adaptive variants of these greedy approaches are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a phi=(alpha(f)/2)(1 - 1/e(alpha)f)-approximation, where alpha(f) is the submodularity ratio of f, for each possible constraint bound b <= B. Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that B increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms. We also consider EAMC, a new evolutionary algorithm with polynomial expected time guarantee to maintain phi approximation ratio, and NSGA-II with two different population sizes as advanced multi-objective optimization algorithm, to demonstrate their challenges in optimizing the maximum coverage problem. Our empirical analysis shows that, within the same number of evaluations, POMC is able to perform as good as NSGA-II under linear constraint, while EAMC performs significantly worse than all considered algorithms in most cases. KW - Subset selection KW - Submodular function KW - Multi-objective optimization KW - Runtime analysis Y1 - 2022 U6 - https://doi.org/10.1016/j.artint.2021.103597 SN - 0004-3702 SN - 1872-7921 VL - 302 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Cseh, Ágnes A1 - Fleiner, Tamas T1 - The complexity of cake cutting with unequal shares JF - ACM transactions on algorithms : TALG N2 - An unceasing problem of our prevailing society is the fair division of goods. The problem of proportional cake cutting focuses on dividing a heterogeneous and divisible resource, the cake, among n players who value pieces according to their own measure function. The goal is to assign each player a not necessarily connected part of the cake that the player evaluates at least as much as her proportional share.
In this article, we investigate the problem of proportional division with unequal shares, where each player is entitled to receive a predetermined portion of the cake. Our main contribution is threefold. First we present a protocol for integer demands, which delivers a proportional solution in fewer queries than all known protocols. By giving a matching lower bound, we then show that our protocol is asymptotically the fastest possible. Finally, we turn to irrational demands and solve the proportional cake cutting problem by reducing it to the same problem with integer demands only. All results remain valid in a highly general cake cutting model, which can be of independent interest. KW - Cake cutting KW - fair division KW - unequal shares KW - proportional division Y1 - 2020 U6 - https://doi.org/10.1145/3380742 SN - 1549-6325 SN - 1549-6333 VL - 16 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Hehn, Jennifer A1 - Mendez, Daniel A1 - Uebernickel, Falk A1 - Brenner, Walter A1 - Broy, Manfred T1 - On integrating design thinking for human-centered requirements engineering JF - IEEE software N2 - We elaborate on the possibilities and needs to integrate design thinking into requirements engineering, drawing from our research and project experiences. We suggest three approaches for tailoring and integrating design thinking and requirements engineering with complementary synergies and point at open challenges for research and practice. KW - requirements engineering KW - prototypes KW - software KW - electronic mail KW - tools KW - organizations KW - design thinking Y1 - 2019 U6 - https://doi.org/10.1109/MS.2019.2957715 SN - 0740-7459 SN - 1937-4194 VL - 37 IS - 2 SP - 25 EP - 31 PB - Inst. of Electr. and Electronics Engineers CY - Los Alamitos ER - TY - JOUR A1 - Lambers, Leen A1 - Orejas, Fernando T1 - Transformation rules with nested application conditions BT - critical pairs, initial conflicts & minimality JF - Theoretical computer science N2 - Recently, initial conflicts were introduced in the framework of M-adhesive categories as an important optimization of critical pairs. In particular, they represent a proper subset such that each conflict is represented in a minimal context by a unique initial one. The theory of critical pairs has been extended in the framework of M-adhesive categories to rules with nested application conditions (ACs), restricting the applicability of a rule and generalizing the well-known negative application conditions. A notion of initial conflicts for rules with ACs does not exist yet. In this paper, on the one hand, we extend the theory of initial conflicts in the framework of M-adhesive categories to transformation rules with ACs. They represent a proper subset again of critical pairs for rules with ACs, and represent each conflict in a minimal context uniquely. They are moreover symbolic because we can show that in general no finite and complete set of conflicts for rules with ACs exists. On the other hand, we show that critical pairs are minimally M-complete, whereas initial conflicts are minimally complete. Finally, we introduce important special cases of rules with ACs for which we can obtain finite, minimally (M-)complete sets of conflicts. KW - Graph transformation KW - Critical pairs KW - Initial conflicts KW - Application KW - conditions Y1 - 2021 U6 - https://doi.org/10.1016/j.tcs.2021.07.023 SN - 0304-3975 SN - 1879-2294 VL - 884 SP - 44 EP - 67 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Weinstein, Theresa Julia A1 - Ceh, Simon Majed A1 - Meinel, Christoph A1 - Benedek, Mathias T1 - What's creative about sentences? BT - a computational approach to assessing creativity in a sentence generation task JF - Creativity Research Journal N2 - Evaluating creativity of verbal responses or texts is a challenging task due to psychometric issues associated with subjective ratings and the peculiarities of textual data. We explore an approach to objectively assess the creativity of responses in a sentence generation task to 1) better understand what language-related aspects are valued by human raters and 2) further advance the developments toward automating creativity evaluations. Over the course of two prior studies, participants generated 989 four-word sentences based on a four-letter prompt with the instruction to be creative. We developed an algorithm that scores each sentence on eight different metrics including 1) general word infrequency, 2) word combination infrequency, 3) context-specific word uniqueness, 4) syntax uniqueness, 5) rhyme, 6) phonetic similarity, and similarity of 7) sequence spelling and 8) semantic meaning to the cue. The text metrics were then used to explain the averaged creativity ratings of eight human raters. We found six metrics to be significantly correlated with the human ratings, explaining a total of 16% of their variance. We conclude that the creative impression of sentences is partly driven by different aspects of novelty in word choice and syntax, as well as rhythm and sound, which are amenable to objective assessment. Y1 - 2022 U6 - https://doi.org/10.1080/10400419.2022.2124777 SN - 1040-0419 SN - 1532-6934 VL - 34 IS - 4 SP - 419 EP - 430 PB - Routledge, Taylor & Francis Group CY - Abingdon ER - TY - JOUR A1 - Doerr, Benjamin A1 - Kötzing, Timo A1 - Lagodzinski, Julius Albert Gregor A1 - Lengler, Johannes T1 - The impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time JF - Theoretical computer science : the journal of the EATCS N2 - While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat. In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism. More specifically, we analyzed the performance of the (1 + 1) GP on the two benchmark functions ORDER and MAJORITY. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(T-init + nlogn) iterations both for ORDER and MAJORITY. For the case without bloat control, the bounds O(T-init logT(i)(nit) + n(logn)(3)) and Omega(T-init + nlogn) (and Omega(T-init log T-init) for n = 1) hold for MAJORITY(1). KW - genetic programming KW - bloat control KW - theory KW - runtime analysis Y1 - 2020 U6 - https://doi.org/10.1016/j.tcs.2020.01.011 SN - 0304-3975 SN - 1879-2294 VL - 816 SP - 144 EP - 168 PB - Elsevier CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Oosthoek, Kris A1 - Dörr, Christian T1 - Cyber security threats to bitcoin exchanges BT - adversary exploitation and laundering techniques JF - IEEE transactions on network and service management : a publication of the IEEE N2 - Bitcoin is gaining traction as an alternative store of value. Its market capitalization transcends all other cryptocurrencies in the market. But its high monetary value also makes it an attractive target to cyber criminal actors. Hacking campaigns usually target an ecosystem's weakest points. In Bitcoin, the exchange platforms are one of them. Each exchange breach is a threat not only to direct victims, but to the credibility of Bitcoin's entire ecosystem. Based on an extensive analysis of 36 breaches of Bitcoin exchanges, we show the attack patterns used to exploit Bitcoin exchange platforms using an industry standard for reporting intelligence on cyber security breaches. Based on this we are able to provide an overview of the most common attack vectors, showing that all except three hacks were possible due to relatively lax security. We show that while the security regimen of Bitcoin exchanges is subpar compared to other financial service providers, the use of stolen credentials, which does not require any hacking, is decreasing. We also show that the amount of BTC taken during a breach is decreasing, as well as the exchanges that terminate after being breached. Furthermore we show that overall security posture has improved, but still has major flaws. To discover adversarial methods post-breach, we have analyzed two cases of BTC laundering. Through this analysis we provide insight into how exchange platforms with lax cyber security even further increase the intermediary risk introduced by them into the Bitcoin ecosystem. KW - Bitcoin KW - Computer crime KW - Cryptography KW - Ecosystems KW - Currencies KW - Industries KW - Vocabulary KW - cryptocurrency exchanges KW - cyber KW - security KW - cyber threat intelligence KW - attacks KW - vulnerabilities KW - forensics Y1 - 2021 U6 - https://doi.org/10.1109/TNSM.2020.3046145 SN - 1932-4537 VL - 18 IS - 2 SP - 1616 EP - 1628 PB - IEEE CY - New York ER - TY - JOUR A1 - Vaid, Akhil A1 - Chan, Lili A1 - Chaudhary, Kumardeep A1 - Jaladanki, Suraj K. A1 - Paranjpe, Ishan A1 - Russak, Adam J. A1 - Kia, Arash A1 - Timsina, Prem A1 - Levin, Matthew A. A1 - He, John Cijiang A1 - Böttinger, Erwin A1 - Charney, Alexander W. A1 - Fayad, Zahi A. A1 - Coca, Steven G. A1 - Glicksberg, Benjamin S. A1 - Nadkarni, Girish N. T1 - Predictive approaches for acute dialysis requirement and death in COVID-19 JF - Clinical journal of the American Society of Nephrology : CJASN N2 - Background and objectives AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. Design, setting, participants, & measurements Using data from adult patients hospitalized with COVID-19 from five hospitals from theMount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to theMount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission. Results A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precisionrecall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction. Conclusions An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models. KW - COVID-19 KW - dialysis KW - machine learning KW - prediction KW - AKI Y1 - 2021 U6 - https://doi.org/10.2215/CJN.17311120 SN - 1555-9041 SN - 1555-905X VL - 16 IS - 8 SP - 1158 EP - 1168 PB - American Society of Nephrology CY - Washington ER - TY - JOUR A1 - Vaid, Akhil A1 - Somani, Sulaiman A1 - Russak, Adam J. A1 - De Freitas, Jessica K. A1 - Chaudhry, Fayzan F. A1 - Paranjpe, Ishan A1 - Johnson, Kipp W. A1 - Lee, Samuel J. A1 - Miotto, Riccardo A1 - Richter, Felix A1 - Zhao, Shan A1 - Beckmann, Noam D. A1 - Naik, Nidhi A1 - Kia, Arash A1 - Timsina, Prem A1 - Lala, Anuradha A1 - Paranjpe, Manish A1 - Golden, Eddye A1 - Danieletto, Matteo A1 - Singh, Manbir A1 - Meyer, Dara A1 - O'Reilly, Paul F. A1 - Huckins, Laura A1 - Kovatch, Patricia A1 - Finkelstein, Joseph A1 - Freeman, Robert M. A1 - Argulian, Edgar A1 - Kasarskis, Andrew A1 - Percha, Bethany A1 - Aberg, Judith A. A1 - Bagiella, Emilia A1 - Horowitz, Carol R. A1 - Murphy, Barbara A1 - Nestler, Eric J. A1 - Schadt, Eric E. A1 - Cho, Judy H. A1 - Cordon-Cardo, Carlos A1 - Fuster, Valentin A1 - Charney, Dennis S. A1 - Reich, David L. A1 - Böttinger, Erwin A1 - Levin, Matthew A. A1 - Narula, Jagat A1 - Fayad, Zahi A. A1 - Just, Allan C. A1 - Charney, Alexander W. A1 - Nadkarni, Girish N. A1 - Glicksberg, Benjamin S. T1 - Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation JF - Journal of medical internet research : international scientific journal for medical research, information and communication on the internet ; JMIR N2 - Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes. KW - machine learning KW - COVID-19 KW - electronic health record KW - TRIPOD KW - clinical KW - informatics KW - prediction KW - mortality KW - EHR KW - cohort KW - hospital KW - performance Y1 - 2020 U6 - https://doi.org/10.2196/24018 SN - 1439-4456 SN - 1438-8871 VL - 22 IS - 11 PB - Healthcare World CY - Richmond, Va. ER - TY - JOUR A1 - Cseh, Agnes A1 - Heeger, Klaus T1 - The stable marriage problem with ties and restricted edges JF - Discrete optimization N2 - In the stable marriage problem, a set of men and a set of women are given, each of whom has a strictly ordered preference list over the acceptable agents in the opposite class. A matching is called stable if it is not blocked by any pair of agents, who mutually prefer each other to their respective partner. Ties in the preferences allow for three different definitions for a stable matching: weak, strong and super-stability. Besides this, acceptable pairs in the instance can be restricted in their ability of blocking a matching or being part of it, which again generates three categories of restrictions on acceptable pairs. Forced pairs must be in a stable matching, forbidden pairs must not appear in it, and lastly, free pairs cannot block any matching. Our computational complexity study targets the existence of a stable solution for each of the three stability definitions, in the presence of each of the three types of restricted pairs. We solve all cases that were still open. As a byproduct, we also derive that the maximum size weakly stable matching problem is hard even in very dense graphs, which may be of independent interest. KW - stable matchings KW - restricted edges KW - complexity Y1 - 2020 U6 - https://doi.org/10.1016/j.disopt.2020.100571 SN - 1572-5286 SN - 1873-636X VL - 36 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Caruccio, Loredana A1 - Deufemia, Vincenzo A1 - Naumann, Felix A1 - Polese, Giuseppe T1 - Discovering relaxed functional dependencies based on multi-attribute dominance JF - IEEE transactions on knowledge and data engineering N2 - With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm. KW - Complexity theory KW - Approximation algorithms KW - Big Data KW - Distributed KW - databases KW - Semantics KW - Lakes KW - Functional dependencies KW - data profiling KW - data cleansing Y1 - 2020 U6 - https://doi.org/10.1109/TKDE.2020.2967722 SN - 1041-4347 SN - 1558-2191 VL - 33 IS - 9 SP - 3212 EP - 3228 PB - Institute of Electrical and Electronics Engineers CY - New York, NY ER - TY - JOUR A1 - Quinzan, Francesco A1 - Göbel, Andreas A1 - Wagner, Markus A1 - Friedrich, Tobias T1 - Evolutionary algorithms and submodular functions BT - benefits of heavy-tailed mutations JF - Natural computing : an innovative journal bridging biosciences and computer sciences ; an international journal N2 - A core operator of evolutionary algorithms (EAs) is the mutation. Recently, much attention has been devoted to the study of mutation operators with dynamic and non-uniform mutation rates. Following up on this area of work, we propose a new mutation operator and analyze its performance on the (1 + 1) Evolutionary Algorithm (EA). Our analyses show that this mutation operator competes with pre-existing ones, when used by the (1 + 1) EA on classes of problems for which results on the other mutation operators are available. We show that the (1 + 1) EA using our mutation operator finds a (1/3)-approximation ratio on any non-negative submodular function in polynomial time. We also consider the problem of maximizing a symmetric submodular function under a single matroid constraint and show that the (1 + 1) EA using our operator finds a (1/3)-approximation within polynomial time. This performance matches that of combinatorial local search algorithms specifically designed to solve these problems and outperforms them with constant probability. Finally, we evaluate the performance of the (1 + 1) EA using our operator experimentally by considering two applications: (a) the maximum directed cut problem on real-world graphs of different origins, with up to 6.6 million vertices and 56 million edges and (b) the symmetric mutual information problem using a four month period air pollution data set. In comparison with uniform mutation and a recently proposed dynamic scheme, our operator comes out on top on these instances. KW - Evolutionary algorithms KW - Mutation operators KW - Submodular functions KW - Matroids Y1 - 2021 U6 - https://doi.org/10.1007/s11047-021-09841-7 SN - 1572-9796 VL - 20 IS - 3 SP - 561 EP - 575 PB - Springer Science + Business Media B.V. CY - Dordrecht ER - TY - JOUR A1 - Henkenjohann, Richard T1 - Role of individual motivations and privacy concerns in the adoption of German electronic patient record apps BT - a mixed-methods study JF - International journal of environmental research and public health : IJERPH / Molecular Diversity Preservation International N2 - Germany's electronic patient record ("ePA") launched in 2021 with several attempts and years of delay. The development of such a large-scale project is a complex task, and so is its adoption. Individual attitudes towards an electronic health record are crucial, as individuals can reject opting-in to it and making any national efforts unachievable. Although the integration of an electronic health record serves potential benefits, it also constitutes risks for an individual's privacy. With a mixed-methods study design, this work provides evidence that different types of motivations and contextual privacy antecedents affect usage intentions towards the ePA. Most significantly, individual motivations stemming from feelings of volition or external mandates positively affect ePA adoption, although internal incentives are more powerful. KW - personal electronic health records KW - technology adoption KW - endogenous KW - motivations KW - health information privacy concern KW - mixed-methods KW - ePA Y1 - 2021 U6 - https://doi.org/10.3390/ijerph18189553 SN - 1660-4601 VL - 18 IS - 18 PB - MDPI CY - Basel ER - TY - JOUR A1 - Shekhar, Sumit A1 - Reimann, Max A1 - Mayer, Maximilian A1 - Semmo, Amir A1 - Pasewaldt, Sebastian A1 - Döllner, Jürgen A1 - Trapp, Matthias T1 - Interactive photo editing on smartphones via intrinsic decomposition JF - Computer graphics forum : journal of the European Association for Computer Graphics N2 - Intrinsic decomposition refers to the problem of estimating scene characteristics, such as albedo and shading, when one view or multiple views of a scene are provided. The inverse problem setting, where multiple unknowns are solved given a single known pixel-value, is highly under-constrained. When provided with correlating image and depth data, intrinsic scene decomposition can be facilitated using depth-based priors, which nowadays is easy to acquire with high-end smartphones by utilizing their depth sensors. In this work, we present a system for intrinsic decomposition of RGB-D images on smartphones and the algorithmic as well as design choices therein. Unlike state-of-the-art methods that assume only diffuse reflectance, we consider both diffuse and specular pixels. For this purpose, we present a novel specularity extraction algorithm based on a multi-scale intensity decomposition and chroma inpainting. At this, the diffuse component is further decomposed into albedo and shading components. We use an inertial proximal algorithm for non-convex optimization (iPiano) to ensure albedo sparsity. Our GPU-based visual processing is implemented on iOS via the Metal API and enables interactive performance on an iPhone 11 Pro. Further, a qualitative evaluation shows that we are able to obtain high-quality outputs. Furthermore, our proposed approach for specularity removal outperforms state-of-the-art approaches for real-world images, while our albedo and shading layer decomposition is faster than the prior work at a comparable output quality. Manifold applications such as recoloring, retexturing, relighting, appearance editing, and stylization are shown, each using the intrinsic layers obtained with our method and/or the corresponding depth data. KW - CCS Concepts KW - center dot Computing KW - methodologie KW - Image-based rendering KW - Image KW - processing KW - Computational photography Y1 - 2021 U6 - https://doi.org/10.1111/cgf.142650 SN - 0167-7055 SN - 1467-8659 VL - 40 SP - 497 EP - 510 PB - Blackwell CY - Oxford ER - TY - JOUR A1 - Schneider, Sven A1 - Lambers, Leen A1 - Orejas, Fernando T1 - A logic-based incremental approach to graph repair featuring delta preservation JF - International journal on software tools for technology transfer : STTT N2 - We introduce a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of least-changing graph repairs from which a user may select a graph repair based on non-formalized further requirements. This incremental approach features delta preservation as it allows to restrict the generation of graph repairs to delta-preserving graph repairs, which do not revert the additions and deletions of the most recent consistency-violating graph update. We specify consistency of graphs using the logic of nested graph conditions, which is equivalent to first-order logic on graphs. Technically, the incremental approach encodes if and how the graph under repair satisfies a graph condition using the novel data structure of satisfaction trees, which are adapted incrementally according to the graph updates applied. In addition to the incremental approach, we also present two state-based graph repair algorithms, which restore consistency of a graph independent of the most recent graph update and which generate additional graph repairs using a global perspective on the graph under repair. We evaluate the developed algorithms using our prototypical implementation in the tool AutoGraph and illustrate our incremental approach using a case study from the graph database domain. KW - Nested graph conditions KW - Graph repair KW - Model repair KW - Consistency KW - restoration KW - Delta preservation KW - Graph databases KW - Model-driven KW - engineering Y1 - 2021 U6 - https://doi.org/10.1007/s10009-020-00584-x SN - 1433-2779 SN - 1433-2787 VL - 23 IS - 3 SP - 369 EP - 410 PB - Springer CY - Berlin ; Heidelberg ER - TY - JOUR A1 - Olamoyegun, Michael Adeyemi A1 - Raimi, Taiwo Hassan A1 - Ala, Oluwabukola Ayodele A1 - Fadare, Joseph Olusesan T1 - Mobile phone ownership and willingness to receive mHealth services among patients with diabetes mellitus in South-West, Nigeria JF - Pan African medical journal : PAMJ N2 - Introduction: mobile phone technology is increasingly used to overcome traditional barriers to limiting access to diabetes care. This study evaluated mobile phone ownership and willingness to receive and pay for mobile phone-based diabetic services among people with diabetes in South-West, Nigeria. Methods: two hundred and fifty nine patients with diabetes were consecutively recruited from three tertiary health institutions in South-West, Nigeria. Questionnaire was used to evaluate mobile phone ownership, willingness to receive and pay for mobile phone-based diabetic health care services via voice call and text messaging. Results: 97.3% owned a mobile phone, with 38.9% and 61.1% owning smartphone and basic phone respectively. Males were significantly more willing to receive mobile-phone-based health services than females (81.1% vs 68.1%, p=0.025), likewise married compared to unmarried [77.4% vs 57.1%, p=0.0361. Voice calls (41.3%) and text messages (32.4%), were the most preferred modes of receiving diabetes-related health education with social media (3.1%) and email (1.5%) least. Almost three-quarter of participants (72.6%) who owned mobile phone, were willing to receive mobile phone-based diabetes health services. The educational status of patients (adjusted OR [AORJ: 1.7(95% CI: 1.6 to 2.11), glucometers possession (ACM: 2.0 [95% CI: 1.9 to 2.1) and type of mobile phone owned (AOR: 2.9 [95% CI: 2.8 to 5.0]) were significantly associated with the willingness to receive mobile phone-based diabetic services. Conclusion: the majority of study participants owned mobile phones and would be willing to receive and pay for diabetes-related healthcare delivery services provided the cost is minimal and affordable. KW - mobile phone KW - ownership KW - diabetes KW - healthcare KW - Nigeria Y1 - 2020 U6 - https://doi.org/10.11604/pamj.2020.37.29.25174 SN - 1937-8688 VL - 37 PB - African Field Epidemiology Network (AFENET) CY - Kampala, Uganda ER - TY - JOUR A1 - Sigel, Keith Magnus A1 - Swartz, Talia H. A1 - Golden, Eddye A1 - Paranjpe, Ishan A1 - Somani, Sulaiman A1 - Richter, Felix A1 - De Freitas, Jessica K. A1 - Miotto, Riccardo A1 - Zhao, Shan A1 - Polak, Paz A1 - Mutetwa, Tinaye A1 - Factor, Stephanie A1 - Mehandru, Saurabh A1 - Mullen, Michael A1 - Cossarini, Francesca A1 - Böttinger, Erwin A1 - Fayad, Zahi A1 - Merad, Miriam A1 - Gnjatic, Sacha A1 - Aberg, Judith A1 - Charney, Alexander A1 - Nadkarni, Girish A1 - Glicksberg, Benjamin S. T1 - Coronavirus 2019 and people living with human immunodeficiency virus BT - outcomes for hospitalized patients in New York City JF - Clinical infectious diseases : electronic edition N2 - Background: There are limited data regarding the clinical impact of coronavirus disease 2019 (COVID-19) on people living with human immunodeficiency virus (PLWH). In this study, we compared outcomes for PLWH with COVID-19 to a matched comparison group. Methods: We identified 88 PLWH hospitalized with laboratory-confirmed COVID-19 in our hospital system in New York City between 12 March and 23 April 2020. We collected data on baseline clinical characteristics, laboratory values, HIV status, treatment, and outcomes from this group and matched comparators (1 PLWH to up to 5 patients by age, sex, race/ethnicity, and calendar week of infection). We compared clinical characteristics and outcomes (death, mechanical ventilation, hospital discharge) for these groups, as well as cumulative incidence of death by HIV status. Results: Patients did not differ significantly by HIV status by age, sex, or race/ethnicity due to the matching algorithm. PLWH hospitalized with COVID-19 had high proportions of HIV virologic control on antiretroviral therapy. PLWH had greater proportions of smoking (P < .001) and comorbid illness than uninfected comparators. There was no difference in COVID-19 severity on admission by HIV status (P = .15). Poor outcomes for hospitalized PLWH were frequent but similar to proportions in comparators; 18% required mechanical ventilation and 21% died during follow-up (compared with 23% and 20%, respectively). There was similar cumulative incidence of death over time by HIV status (P = .94). Conclusions: We found no differences in adverse outcomes associated with HIV infection for hospitalized COVID-19 patients compared with a demographically similar patient group. KW - human immunodeficiency virus KW - coronavirus 2019 KW - severe acute respiratory KW - syndrome coronavirus 2 Y1 - 2020 U6 - https://doi.org/10.1093/cid/ciaa880 SN - 1058-4838 SN - 1537-6591 VL - 71 IS - 11 SP - 2933 EP - 2938 PB - Oxford Univ. Press CY - Cary, NC ER - TY - JOUR A1 - Doerr, Benjamin A1 - Krejca, Martin S. T1 - Significance-based estimation-of-distribution algorithms JF - IEEE transactions on evolutionary computation N2 - Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based cGA (sig-cGA) optimizes the commonly regarded benchmark functions OneMax (OM), LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed stable compact genetic algorithm-an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model-we prove that it optimizes OM only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OM in polynomial time. KW - heuristic algorithms KW - sociology KW - statistics KW - history KW - probabilistic KW - logic KW - benchmark testing KW - genetic algorithms KW - estimation-of-distribution KW - algorithm (EDA) KW - run time analysis KW - theory Y1 - 2020 U6 - https://doi.org/10.1109/TEVC.2019.2956633 SN - 1089-778X SN - 1941-0026 VL - 24 IS - 6 SP - 1025 EP - 1034 PB - Institute of Electrical and Electronics Engineers CY - New York, NY ER - TY - JOUR A1 - Thamsen, Lauritz A1 - Beilharz, Jossekin Jakob A1 - Vinh Thuy Tran, A1 - Nedelkoski, Sasho A1 - Kao, Odej T1 - Mary, Hugo, and Hugo* BT - learning to schedule distributed data-parallel processing jobs on shared clusters JF - Concurrency and computation : practice & experience N2 - Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly. KW - cluster resource management KW - distributed data-parallel processing KW - job KW - co-location KW - reinforcement learning KW - self-learning scheduler Y1 - 2020 U6 - https://doi.org/10.1002/cpe.5823 SN - 1532-0626 SN - 1532-0634 VL - 33 IS - 18 PB - Wiley CY - Hoboken ER - TY - JOUR A1 - Oosthoek, Kris A1 - Doerr, Christian T1 - Cyber threat intelligence: A product without a process? JF - International journal of intelligence and counterintelligence Y1 - 2020 U6 - https://doi.org/10.1080/08850607.2020.1780062 SN - 0885-0607 SN - 1521-0561 VL - 34 IS - 2 SP - 300 EP - 315 PB - Taylor & Francis CY - London ER - TY - JOUR A1 - Birnick, Johann A1 - Bläsius, Thomas A1 - Friedrich, Tobias A1 - Naumann, Felix A1 - Papenbrock, Thorsten A1 - Schirneck, Friedrich Martin T1 - Hitting set enumeration with partial information for unique column combination discovery JF - Proceedings of the VLDB Endowment N2 - Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations.
In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint. Y1 - 2020 U6 - https://doi.org/10.14778/3407790.3407824 SN - 2150-8097 VL - 13 IS - 11 SP - 2270 EP - 2283 PB - Association for Computing Machinery CY - [New York, NY] ER - TY - JOUR A1 - Lewkowicz, Daniel A1 - Wohlbrandt, Attila M. A1 - Böttinger, Erwin T1 - Digital therapeutic care apps with decision-support interventions for people with low back pain in Germany BT - Cost-effectiveness analysis JF - JMIR mhealth and uhealth N2 - Background: Digital therapeutic care apps provide a new effective and scalable approach for people with nonspecific low back pain (LBP). Digital therapeutic care apps are also driven by personalized decision-support interventions that support the user in self-managing LBP, and may induce prolonged behavior change to reduce the frequency and intensity of pain episodes. However, these therapeutic apps are associated with high attrition rates, and the initial prescription cost is higher than that of face-to-face physiotherapy. In Germany, digital therapeutic care apps are now being reimbursed by statutory health insurance; however, price targets and cost-driving factors for the formation of the reimbursement rate remain unexplored. Objective: The aim of this study was to evaluate the cost-effectiveness of a digital therapeutic care app compared to treatment as usual (TAU) in Germany. We further aimed to explore under which circumstances the reimbursement rate could be modified to consider value-based pricing. Methods: We developed a state-transition Markov model based on a best-practice analysis of prior LBP-related decision-analytic models, and evaluated the cost utility of a digital therapeutic care app compared to TAU in Germany. Based on a 3-year time horizon, we simulated the incremental cost and quality-adjusted life years (QALYs) for people with nonacute LBP from the societal perspective. In the deterministic sensitivity and scenario analyses, we focused on diverging attrition rates and app cost to assess our model's robustness and conditions for changing the reimbursement rate. All costs are reported in Euro (euro1=US $1.12). Results: Our base case results indicated that the digital therapeutic care strategy led to an incremental cost of euro121.59, but also generated 0.0221 additional QALYs compared to the TAU strategy, with an estimated incremental cost-effectiveness ratio (ICER) of euro5486 per QALY. The sensitivity analysis revealed that the reimbursement rate and the capability of digital therapeutic care to prevent reoccurring LBP episodes have a significant impact on the ICER. At the same time, the other parameters remained unaffected and thus supported the robustness of our model. In the scenario analysis, the different model time horizons and attrition rates strongly influenced the economic outcome. Reducing the cost of the app to euro99 per 3 months or decreasing the app's attrition rate resulted in digital therapeutic care being significantly less costly with more generated QALYs, and is thus considered to be the dominant strategy over TAU. Conclusions: The current reimbursement rate for a digital therapeutic care app in the statutory health insurance can be considered a cost-effective measure compared to TAU. The app's attrition rate and effect on the patient's prolonged behavior change essentially influence the settlement of an appropriate reimbursement rate. Future value-based pricing targets should focus on additional outcome parameters besides pain intensity and functional disability by including attrition rates and the app's long-term effect on quality of life. KW - cost-utility analysis KW - low back pain KW - back pain KW - cost-effectiveness KW - Markov model KW - digital therapy KW - digital health app KW - mHealth KW - orthopedic; KW - eHealth KW - mobile health KW - digital health KW - pain management KW - health apps Y1 - 2022 U6 - https://doi.org/10.2196/35042 SN - 2291-5222 VL - 10 IS - 2 PB - JMIR Publications CY - Toronto ER - TY - JOUR A1 - Kulahcioglu, Tugba A1 - Melo, Gerard de T1 - Affect-aware word clouds JF - ACM transactions on interactive intelligent systems N2 - Word clouds are widely used for non-analytic purposes, such as introducing a topic to students, or creating a gift with personally meaningful text. Surveys show that users prefer tools that yield word clouds with a stronger emotional impact. Fonts and color palettes are powerful typographical signals that may determine this impact. Typically, these signals are assigned randomly, or expected to be chosen by the users. We present an affect-aware font and color palette selection methodology that aims to facilitate more informed choices. We infer associations of fonts with a set of eight affects, and evaluate the resulting data in a series of user studies both on individual words as well as in word clouds. Relying on a recent study to procure affective color palettes, we carry out a similar user study to understand the impact of color choices on word clouds. Our findings suggest that both fonts and color palettes are powerful tools contributing to the affects evoked by a word cloud. The experiments further confirm that the novel datasets we propose are successful in enabling this. We also find that, for the majority of the affects, both signals need to be congruent to create a stronger impact. Based on this data, we implement a prototype that allows users to specify a desired affect and recommends congruent fonts and color palettes for the word. KW - affective interfaces KW - word clouds KW - typography KW - color palettes Y1 - 2020 U6 - https://doi.org/10.1145/3370928 SN - 2160-6455 SN - 2160-6463 VL - 10 IS - 4 PB - Association for Computing Machinery CY - New York, NY ER - TY - JOUR A1 - Krejca, Martin S. A1 - Witt, Carsten T1 - Lower bounds on the run time of the Univariate Marginal Distribution Algorithm on OneMax JF - Theoretical computer science : the journal of the EATCS N2 - The Univariate Marginal Distribution Algorithm (UMDA) - a popular estimation-of-distribution algorithm - is studied from a run time perspective. On the classical OneMax benchmark function on bit strings of length n, a lower bound of Omega(lambda + mu root n + n logn), where mu and lambda are algorithm-specific parameters, on its expected run time is proved. This is the first direct lower bound on the run time of UMDA. It is stronger than the bounds that follow from general black-box complexity theory and is matched by the run time of many evolutionary algorithms. The results are obtained through advanced analyses of the stochastic change of the frequencies of bit values maintained by the algorithm, including carefully designed potential functions. These techniques may prove useful in advancing the field of run time analysis for estimation-of-distribution algorithms in general. KW - estimation-of-distribution algorithm KW - run time analysis KW - lower bound Y1 - 2020 U6 - https://doi.org/10.1016/j.tcs.2018.06.004 SN - 0304-3975 SN - 1879-2294 VL - 832 SP - 143 EP - 165 PB - Elsevier CY - Amsterdam [u.a.] ER - TY - JOUR A1 - Bohn, Nicolai A1 - Kundisch, Dennis T1 - What are we talking about when we talk about technology pivots? BT - a Delphi study JF - Information & management N2 - Technology pivots were designed to help digital startups make adjustments to the technology underpinning their products and services. While academia and the media make liberal use of the term "technology pivot," they rarely align themselves to Ries' foundational conceptualization. Recent research suggests that a more granulated conceptualization of technology pivots is required. To scientifically derive a comprehensive conceptualization, we conduct a Delphi study with a panel of 38 experts drawn from academia and practice to explore their understanding of "technology pivots." Our study thus makes an important contribution to advance the seminal work by Ries on technology pivots. KW - digital startup KW - lean startup approach KW - technology pivot KW - conceptualization KW - Delphi study Y1 - 2020 U6 - https://doi.org/10.1016/j.im.2020.103319 SN - 0378-7206 SN - 1872-7530 VL - 57 IS - 6 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Lambers, Leen A1 - Weber, Jens T1 - Preface to the special issue on the 11th International Conference on Graph Transformation JF - Journal of Logical and Algebraic Methods in Programming N2 - This special issue contains extended versions of four selected papers from the 11th International Conference on Graph Transformation (ICGT 2018). The articles cover a tool for computing core graphs via SAT/SMT solvers (graph language definition), graph transformation through graph surfing in reaction systems (a new graph transformation formalism), the essence and initiality of conflicts in M-adhesive transformation systems, and a calculus of concurrent graph-rewriting processes (theory on conflicts and parallel independence). KW - graph transformation KW - graph languages KW - conflicts and dependencies in KW - concurrent graph rewriting Y1 - 2020 U6 - https://doi.org/10.1016/j.jlamp.2020.100525 SN - 2352-2208 VL - 112 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Cope, Justin L. A1 - Baukmann, Hannes A. A1 - Klinger, Jörn E. A1 - Ravarani, Charles N. J. A1 - Böttinger, Erwin A1 - Konigorski, Stefan A1 - Schmidt, Marco F. T1 - Interaction-based feature selection algorithm outperforms polygenic risk score in predicting Parkinson’s Disease status JF - Frontiers in genetics N2 - Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson's Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson's disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem. KW - epistasis KW - machine learning KW - feature selection KW - parkinson's disease KW - PPMI (parkinson's progression markers initiative) Y1 - 2021 U6 - https://doi.org/10.3389/fgene.2021.744557 SN - 1664-8021 VL - 12 PB - Frontiers Media CY - Lausanne ER - TY - JOUR A1 - Siddiqi, Muhammad Ali A1 - Dörr, Christian A1 - Strydis, Christos T1 - IMDfence BT - architecting a secure protocol for implantable medical devices JF - IEEE access N2 - Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, entity authentication, remote monitoring and system scalability. The protocol also allows emergency access that results in the graceful degradation of offered services without compromising security and patient safety. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere less than 7% increase in total IMD energy consumption, and less than 14 ms and 9 kB increase in system delay and memory footprint, respectively. KW - protocols KW - implants KW - authentication KW - ecosystems KW - remote monitoring KW - scalability KW - authentication protocol KW - battery-depletion attack KW - battery KW - DoS KW - denial-of-service attack KW - IMD KW - implantable medical device KW - non-repudiation KW - smart card KW - zero-power defense Y1 - 2020 U6 - https://doi.org/10.1109/ACCESS.2020.3015686 SN - 2169-3536 VL - 8 SP - 147948 EP - 147964 PB - Institute of Electrical and Electronics Engineers CY - Piscataway ER - TY - JOUR A1 - Dellepiane, Sergio A1 - Vaid, Akhil A1 - Jaladanki, Suraj K. A1 - Coca, Steven A1 - Fayad, Zahi A. A1 - Charney, Alexander W. A1 - Böttinger, Erwin A1 - He, John Cijiang A1 - Glicksberg, Benjamin S. A1 - Chan, Lili A1 - Nadkarni, Girish T1 - Acute kidney injury in patients hospitalized with COVID-19 in New York City BT - temporal trends From March 2020 to April 2021 JF - Kidney medicine Y1 - 2021 U6 - https://doi.org/10.1016/j.xkme.2021.06.008 SN - 2590-0595 VL - 3 IS - 5 SP - 877 EP - 879 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Kötzing, Timo A1 - Lagodzinski, Julius Albert Gregor A1 - Lengler, Johannes A1 - Melnichenko, Anna T1 - Destructiveness of lexicographic parsimony pressure and alleviation by a concatenation crossover in genetic programming JF - Theoretical computer science N2 - For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation: the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts); and also the role and the realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had surprisingly little share in this work.
We analyze a simple crossover operator in combination with randomized local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); we denote the resulting algorithm Concatenation Crossover GP. We consider three variants of the well-studied MAJORITY test function, adding large plateaus in different ways to the fitness landscape and thus giving a test bed for analyzing the interplay of variation operators and bloat control mechanisms in a setting with local optima. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control. (C) 2019 Elsevier B.V. All rights reserved. KW - genetic programming KW - mutation KW - theory KW - run time analysis Y1 - 2020 U6 - https://doi.org/10.1016/j.tcs.2019.11.036 SN - 0304-3975 VL - 816 SP - 96 EP - 113 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Hacker, Philipp A1 - Krestel, Ralf A1 - Grundmann, Stefan A1 - Naumann, Felix T1 - Explainable AI under contract and tort law BT - legal incentives and technical challenges JF - Artificial intelligence and law N2 - This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification. KW - explainability KW - explainable AI KW - interpretable machine learning KW - contract KW - law KW - tort law KW - explainability-accuracy trade-off KW - medical malpractice KW - corporate takeovers Y1 - 2020 U6 - https://doi.org/10.1007/s10506-020-09260-6 SN - 0924-8463 SN - 1572-8382 VL - 28 IS - 4 SP - 415 EP - 439 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Kaitoua, Abdulrahman A1 - Rabl, Tilmann A1 - Markl, Volker T1 - A distributed data exchange engine for polystores JF - Information technology : methods and applications of informatics and information technology JF - Information technology : Methoden und innovative Anwendungen der Informatik und Informationstechnik N2 - There is an increasing interest in fusing data from heterogeneous sources. Combining data sources increases the utility of existing datasets, generating new information and creating services of higher quality. A central issue in working with heterogeneous sources is data migration: In order to share and process data in different engines, resource intensive and complex movements and transformations between computing engines, services, and stores are necessary. Muses is a distributed, high-performance data migration engine that is able to interconnect distributed data stores by forwarding, transforming, repartitioning, or broadcasting data among distributed engines' instances in a resource-, cost-, and performance-adaptive manner. As such, it performs seamless information sharing across all participating resources in a standard, modular manner. We show an overall improvement of 30 % for pipelining jobs across multiple engines, even when we count the overhead of Muses in the execution time. This performance gain implies that Muses can be used to optimise large pipelines that leverage multiple engines. KW - distributed systems KW - data migration KW - data transformation KW - big data KW - engine KW - data integration Y1 - 2020 U6 - https://doi.org/10.1515/itit-2019-0037 SN - 1611-2776 SN - 2196-7032 VL - 62 IS - 3-4 SP - 145 EP - 156 PB - De Gruyter CY - Berlin ER - TY - JOUR A1 - Isailović, Dušan A1 - Stojanovic, Vladeta A1 - Trapp, Matthias A1 - Richter, Rico A1 - Hajdin, Rade A1 - Döllner, Jürgen Roland Friedrich T1 - Bridge damage BT - detection, IFC-based semantic enrichment and visualization JF - Automation in construction : an international research journal N2 - Building Information Modeling (BIM) representations of bridges enriched by inspection data will add tremendous value to future Bridge Management Systems (BMSs). This paper presents an approach for point cloud-based detection of spalling damage, as well as integrating damage components into a BIM via semantic enrichment of an as-built Industry Foundation Classes (IFC) model. An approach for generating the as-built BIM, geometric reconstruction of detected damage point clusters and semantic-enrichment of the corresponding IFC model is presented. Multiview-classification is used and evaluated for the detection of spalling damage features. The semantic enrichment of as-built IFC models is based on injecting classified and reconstructed damage clusters back into the as-built IFC, thus generating an accurate as-is IFC model compliant to the BMS inspection requirements. KW - damage detection KW - building information modeling KW - 3D point clouds KW - multiview classification KW - bridge management systems Y1 - 2020 U6 - https://doi.org/10.1016/j.autcon.2020.103088 SN - 0926-5805 SN - 1872-7891 VL - 112 PB - Elsevier CY - Amsterdam ER - TY - JOUR A1 - Pfitzner, Bjarne A1 - Steckhan, Nico A1 - Arnrich, Bert T1 - Federated learning in a medical context BT - a systematic literature review JF - ACM transactions on internet technology : TOIT / Association for Computing N2 - Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients' anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets. KW - Federated learning Y1 - 2021 U6 - https://doi.org/10.1145/3412357 SN - 1533-5399 SN - 1557-6051 VL - 21 IS - 2 SP - 1 EP - 31 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Rezaei, Mina A1 - Näppi, Janne J. A1 - Lippert, Christoph A1 - Meinel, Christoph A1 - Yoshida, Hiroyuki T1 - Generative multi-adversarial network for striking the right balance in abdominal image segmentation JF - International journal of computer assisted radiology and surgery N2 - Purpose: The identification of abnormalities that are relatively rare within otherwise normal anatomy is a major challenge for deep learning in the semantic segmentation of medical images. The small number of samples of the minority classes in the training data makes the learning of optimal classification challenging, while the more frequently occurring samples of the majority class hamper the generalization of the classification boundary between infrequently occurring target objects and classes. In this paper, we developed a novel generative multi-adversarial network, called Ensemble-GAN, for mitigating this class imbalance problem in the semantic segmentation of abdominal images. Method: The Ensemble-GAN framework is composed of a single-generator and a multi-discriminator variant for handling the class imbalance problem to provide a better generalization than existing approaches. The ensemble model aggregates the estimates of multiple models by training from different initializations and losses from various subsets of the training data. The single generator network analyzes the input image as a condition to predict a corresponding semantic segmentation image by use of feedback from the ensemble of discriminator networks. To evaluate the framework, we trained our framework on two public datasets, with different imbalance ratios and imaging modalities: the Chaos 2019 and the LiTS 2017. Result: In terms of the F1 score, the accuracies of the semantic segmentation of healthy spleen, liver, and left and right kidneys were 0.93, 0.96, 0.90 and 0.94, respectively. The overall F1 scores for simultaneous segmentation of the lesions and liver were 0.83 and 0.94, respectively. Conclusion: The proposed Ensemble-GAN framework demonstrated outstanding performance in the semantic segmentation of medical images in comparison with other approaches on popular abdominal imaging benchmarks. The Ensemble-GAN has the potential to segment abdominal images more accurately than human experts. KW - imbalanced learning KW - generative multi-discriminative networks KW - semantic KW - segmentation KW - abdominal imaging Y1 - 2020 U6 - https://doi.org/10.1007/s11548-020-02254-4 SN - 1861-6410 SN - 1861-6429 VL - 15 IS - 11 SP - 1847 EP - 1858 PB - Springer CY - Berlin ER - TY - JOUR A1 - Ghahremani, Sona A1 - Giese, Holger A1 - Vogel, Thomas T1 - Improving scalability and reward of utility-driven self-healing for large dynamic architectures JF - ACM transactions on autonomous and adaptive systems N2 - Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfies certain conditions. They result in scalable solutions but often with merely satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal decisions by using an often costly optimization, which typically does not scale for large problems. We propose a rule-based and utility-driven adaptation scheme that achieves the benefits of both directions such that the adaptation decisions are optimal, whereas the computation scales by avoiding an expensive optimization. We use this adaptation scheme for architecture-based self-healing of large software systems. For this purpose, we define the utility for large dynamic architectures of such systems based on patterns that define issues the self-healing must address. Moreover, we use pattern-based adaptation rules to resolve these issues. Using a pattern-based scheme to define the utility and adaptation rules allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. In addition to formally analyzing the computational effort and optimality of the proposed scheme, we thoroughly demonstrate its scalability and optimality in terms of reward in comparative experiments with a static rule-based approach as a baseline and a utility-driven approach using a constraint solver. These experiments are based on different failure profiles derived from real-world failure logs. We also investigate the impact of different failure profile characteristics on the scalability and reward to evaluate the robustness of the different approaches. KW - self-healing KW - adaptation rules KW - architecture-based adaptation KW - utility KW - reward KW - scalability KW - performance KW - failure profile model Y1 - 2020 U6 - https://doi.org/10.1145/3380965 SN - 1556-4665 SN - 1556-4703 VL - 14 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Rose, Robert A1 - Hölzle, Katharina A1 - Björk, Jennie T1 - More than a quarter century of creativity and innovation management BT - the journal's characteristics, evolution, and a look ahead JF - Creativity and innovation management N2 - When this journal was founded in 1992 by Tudor Rickards and Susan Moger, there was no academic outlet available that addressed issues at the intersection of creativity and innovation. From zero to 1,163 records, from the new kid on the block to one of the leading journals in creativity and innovation management has been quite a journey, and we would like to reflect on the past 28 years and the intellectual and conceptual structure of Creativity and Innovation Management (CIM). Specifically, we highlight milestones and influential articles, identify how key journal characteristics evolved, outline the (co-)authorship structure, and finally, map the thematic landscape of CIM by means of a text-mining analysis. This study represents the first systematic and comprehensive assessment of the journal's published body of knowledge and helps to understand the journal's influence on the creativity and innovation management community. We conclude by discussing future topics and paths of the journal as well as limitations of our approach. KW - anniversary KW - bibliometrics KW - creativity and innovation management KW - science mapping Y1 - 2020 U6 - https://doi.org/10.1111/caim.12361 SN - 0963-1690 SN - 1467-8691 VL - 29 IS - 1 SP - 5 EP - 20 PB - Wiley-Blackwell CY - Oxford ER - TY - JOUR A1 - Draisbach, Uwe A1 - Christen, Peter A1 - Naumann, Felix T1 - Transforming pairwise duplicates to entity clusters for high-quality duplicate detection JF - ACM Journal of Data and Information Quality N2 - Duplicate detection algorithms produce clusters of database records, each cluster representing a single real-world entity. As most of these algorithms use pairwise comparisons, the resulting (transitive) clusters can be inconsistent: Not all records within a cluster are sufficiently similar to be classified as duplicate. Thus, one of many subsequent clustering algorithms can further improve the result.
We explain in detail, compare, and evaluate many of these algorithms and introduce three new clustering algorithms in the specific context of duplicate detection. Two of our three new algorithms use the structure of the input graph to create consistent clusters. Our third algorithm, and many other clustering algorithms, focus on the edge weights, instead. For evaluation, in contrast to related work, we experiment on true real-world datasets, and in addition examine in great detail various pair-selection strategies used in practice. While no overall winner emerges, we are able to identify best approaches for different situations. In scenarios with larger clusters, our proposed algorithm, Extended Maximum Clique Clustering (EMCC), and Markov Clustering show the best results. EMCC especially outperforms Markov Clustering regarding the precision of the results and additionally has the advantage that it can also be used in scenarios where edge weights are not available. KW - Record linkage KW - data matching KW - entity resolution KW - deduplication KW - clustering Y1 - 2019 U6 - https://doi.org/10.1145/3352591 SN - 1936-1955 SN - 1936-1963 VL - 12 IS - 1 SP - 1 EP - 30 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Bilò, Davide A1 - Lenzner, Pascal T1 - On the tree conjecture for the network creation game JF - Theory of computing systems N2 - Selfish Network Creation focuses on modeling real world networks from a game-theoretic point of view. One of the classic models by Fabrikant et al. (2003) is the network creation game, where agents correspond to nodes in a network which buy incident edges for the price of alpha per edge to minimize their total distance to all other nodes. The model is well-studied but still has intriguing open problems. The most famous conjectures state that the price of anarchy is constant for all alpha and that for alpha >= n all equilibrium networks are trees. We introduce a novel technique for analyzing stable networks for high edge-price alpha and employ it to improve on the best known bound for the latter conjecture. In particular we show that for alpha > 4n - 13 all equilibrium networks must be trees, which implies a constant price of anarchy for this range of alpha. Moreover, we also improve the constant upper bound on the price of anarchy for equilibrium trees. KW - network creation games KW - price of anarchy KW - tree conjecture KW - algorithmic KW - game theory Y1 - 2019 U6 - https://doi.org/10.1007/s00224-019-09945-9 SN - 1432-4350 SN - 1433-0490 VL - 64 IS - 3 SP - 422 EP - 443 PB - Springer CY - New York ER - TY - JOUR A1 - Kastius, Alexander A1 - Schlosser, Rainer T1 - Dynamic pricing under competition using reinforcement learning JF - Journal of revenue and pricing management N2 - Dynamic pricing is considered a possibility to gain an advantage over competitors in modern online markets. The past advancements in Reinforcement Learning (RL) provided more capable algorithms that can be used to solve pricing problems. In this paper, we study the performance of Deep Q-Networks (DQN) and Soft Actor Critic (SAC) in different market models. We consider tractable duopoly settings, where optimal solutions derived by dynamic programming techniques can be used for verification, as well as oligopoly settings, which are usually intractable due to the curse of dimensionality. We find that both algorithms provide reasonable results, while SAC performs better than DQN. Moreover, we show that under certain conditions, RL algorithms can be forced into collusion by their competitors without direct communication. KW - Dynamic pricing KW - Competition KW - Reinforcement learning KW - E-commerce KW - Price collusion Y1 - 2021 U6 - https://doi.org/10.1057/s41272-021-00285-3 SN - 1476-6930 SN - 1477-657X VL - 21 IS - 1 SP - 50 EP - 63 PB - Springer Nature Switzerland AG CY - Cham ER - TY - JOUR A1 - Mattis, Toni A1 - Beckmann, Tom A1 - Rein, Patrick A1 - Hirschfeld, Robert T1 - First-class concepts BT - Reified architectural knowledge beyond dominant decompositions JF - Journal of object technology : JOT / ETH Zürich, Department of Computer Science N2 - Ideally, programs are partitioned into independently maintainable and understandable modules. As a system grows, its architecture gradually loses the capability to accommodate new concepts in a modular way. While refactoring is expensive and not always possible, and the programming language might lack dedicated primary language constructs to express certain cross-cutting concerns, programmers are still able to explain and delineate convoluted concepts through secondary means: code comments, use of whitespace and arrangement of code, documentation, or communicating tacit knowledge.
Secondary constructs are easy to change and provide high flexibility in communicating cross-cutting concerns and other concepts among programmers. However, such secondary constructs usually have no reified representation that can be explored and manipulated as first-class entities through the programming environment.
In this exploratory work, we discuss novel ways to express a wide range of concepts, including cross-cutting concerns, patterns, and lifecycle artifacts independently of the dominant decomposition imposed by an existing architecture. We propose the representation of concepts as first-class objects inside the programming environment that retain the capability to change as easily as code comments. We explore new tools that allow programmers to view, navigate, and change programs based on conceptual perspectives. In a small case study, we demonstrate how such views can be created and how the programming experience changes from draining programmers' attention by stretching it across multiple modules toward focusing it on cohesively presented concepts. Our designs are geared toward facilitating multiple secondary perspectives on a system to co-exist in symbiosis with the original architecture, hence making it easier to explore, understand, and explain complex contexts and narratives that are hard or impossible to express using primary modularity constructs. KW - software engineering KW - modularity KW - exploratory programming KW - program KW - comprehension KW - remodularization KW - architecture recovery Y1 - 2022 U6 - https://doi.org/10.5381/jot.2022.21.2.a6 SN - 1660-1769 VL - 21 IS - 2 SP - 1 EP - 15 PB - ETH Zürich, Department of Computer Science CY - Zürich ER - TY - JOUR A1 - Döllner, Jürgen Roland Friedrich T1 - Geospatial artificial intelligence BT - potentials of machine learning for 3D point clouds and geospatial digital twins JF - Journal of photogrammetry, remote sensing and geoinformation science : PFG : Photogrammetrie, Fernerkundung, Geoinformation N2 - Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models. N2 - Georäumliche Künstliche Intelligenz: Potentiale des Maschinellen Lernens für 3D-Punktwolken und georäumliche digitale Zwillinge. Künstliche Intelligenz (KI) verändert grundlegend die Art und Weise, wie IT-Lösungen in allen Anwendungsbereichen, einschließlich dem Geoinformationsbereich, implementiert und betrieben werden. In diesem Beitrag stellen wir KI-basierte Techniken für 3D-Punktwolken als einen Baustein der georäumlichen KI vor. Zunächst werden kurz der Begriff "KI” und die technologischen Entwicklungen skizziert, die für die Anwendung von KI auf IT-Lösungen aus der Sicht der Softwaretechnik erforderlich sind. Als nächstes charakterisieren wir 3D-Punktwolken als Schlüsselkategorie von Geodaten und ihre Rolle für den Aufbau von räumlichen digitalen Zwillingen; wir erläutern die Machbarkeit der Ansätze für Maschinelles Lernen (ML) und Deep Learning (DL) in Bezug auf 3D-Punktwolken. Insbesondere argumentieren wir, dass 3D-Punktwolken als Korpus mit ähnlichen Eigenschaften wie natürlichsprachliche Korpusse gesehen werden können und formulieren eine "Natürlichkeitshypothese” für 3D-Punktwolken. Im Hauptteil stellen wir einen Workflow zur Interpretation von 3D-Punktwolken auf der Grundlage von ML/DL-Ansätzen vor, die eine domänenspezifische und anwendungsspezifische Semantik für 3D-Punktwolken ableiten, ohne explizite räumliche 3D-Modelle oder explizite Regelsätze erstellen zu müssen. Abschließend wird an Beispielen gezeigt, wie ML/DL es ermöglichen, Basisdaten für räumliche digitale Zwillinge, wie z.B. für virtuelle 3D-Stadtmodelle, Innenraummodelle oder Gebäudeinformationsmodelle, effizient aufzubauen und zu pflegen. KW - geospatial artificial intelligence KW - machine learning KW - deep learning KW - 3D KW - point clouds KW - geospatial digital twins KW - 3D city models Y1 - 2020 U6 - https://doi.org/10.1007/s41064-020-00102-3 SN - 2512-2789 SN - 2512-2819 VL - 88 IS - 1 SP - 15 EP - 24 PB - Springer International Publishing CY - Cham ER - TY - JOUR A1 - Grüner, Andreas A1 - Mühle, Alexander A1 - Meinel, Christoph T1 - ATIB BT - Design and evaluation of an architecture for brokered self-sovereign identity integration and trust-enhancing attribute aggregation for service provider JF - IEEE access : practical research, open solutions / Institute of Electrical and Electronics Engineers N2 - Identity management is a principle component of securing online services. In the advancement of traditional identity management patterns, the identity provider remained a Trusted Third Party (TTP). The service provider and the user need to trust a particular identity provider for correct attributes amongst other demands. This paradigm changed with the invention of blockchain-based Self-Sovereign Identity (SSI) solutions that primarily focus on the users. SSI reduces the functional scope of the identity provider to an attribute provider while enabling attribute aggregation. Besides that, the development of new protocols, disregarding established protocols and a significantly fragmented landscape of SSI solutions pose considerable challenges for an adoption by service providers. We propose an Attribute Trust-enhancing Identity Broker (ATIB) to leverage the potential of SSI for trust-enhancing attribute aggregation. Furthermore, ATIB abstracts from a dedicated SSI solution and offers standard protocols. Therefore, it facilitates the adoption by service providers. Despite the brokered integration approach, we show that ATIB provides a high security posture. Additionally, ATIB does not compromise the ten foundational SSI principles for the users. KW - Blockchains KW - Protocols KW - Authentication KW - Licenses KW - Security KW - Privacy KW - Identity management systems KW - Attribute aggregation KW - attribute assurance KW - digital identity KW - identity broker KW - self-sovereign identity KW - trust model Y1 - 2021 U6 - https://doi.org/10.1109/ACCESS.2021.3116095 SN - 2169-3536 VL - 9 SP - 138553 EP - 138570 PB - Institute of Electrical and Electronics Engineers CY - New York, NY ER - TY - JOUR A1 - Chromik, Jonas A1 - Kirsten, Kristina A1 - Herdick, Arne A1 - Kappattanavar, Arpita Mallikarjuna A1 - Arnrich, Bert T1 - SensorHub BT - Multimodal sensing in real-life enables home-based studies JF - Sensors N2 - Observational studies are an important tool for determining whether the findings from controlled experiments can be transferred into scenarios that are closer to subjects' real-life circumstances. A rigorous approach to observational studies involves collecting data from different sensors to comprehensively capture the situation of the subject. However, this leads to technical difficulties especially if the sensors are from different manufacturers, as multiple data collection tools have to run simultaneously. We present SensorHub, a system that can collect data from various wearable devices from different manufacturers, such as inertial measurement units, portable electrocardiographs, portable electroencephalographs, portable photoplethysmographs, and sensors for electrodermal activity. Additionally, our tool offers the possibility to include ecological momentary assessments (EMAs) in studies. Hence, SensorHub enables multimodal sensor data collection under real-world conditions and allows direct user feedback to be collected through questionnaires, enabling studies at home. In a first study with 11 participants, we successfully used SensorHub to record multiple signals with different devices and collected additional information with the help of EMAs. In addition, we evaluated SensorHub's technical capabilities in several trials with up to 21 participants recording simultaneously using multiple sensors with sampling frequencies as high as 1000 Hz. We could show that although there is a theoretical limitation to the transmissible data rate, in practice this limitation is not an issue and data loss is rare. We conclude that with modern communication protocols and with the increasingly powerful smartphones and wearables, a system like our SensorHub establishes an interoperability framework to adequately combine consumer-grade sensing hardware which enables observational studies in real life. KW - multimodal sensing KW - home-based studies KW - activity recognition KW - sensor KW - systems KW - ecological momentary assessment KW - digital health Y1 - 2022 U6 - https://doi.org/10.3390/s22010408 SN - 1424-8220 VL - 22 IS - 1 PB - MDPI CY - Basel ER - TY - JOUR A1 - Schlosser, Rainer T1 - Risk-sensitive control of Markov decision processes BT - a moment-based approach with target distributions JF - Computers & operations research : and their applications to problems of world concern N2 - In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved. KW - risk aversion KW - Markov decision process KW - dynamic programming KW - dynamic KW - pricing KW - heuristics Y1 - 2020 U6 - https://doi.org/10.1016/j.cor.2020.104997 SN - 0305-0548 VL - 123 PB - Elsevier CY - Oxford ER - TY - JOUR A1 - Koumarelas, Ioannis A1 - Jiang, Lan A1 - Naumann, Felix T1 - Data preparation for duplicate detection JF - Journal of data and information quality : (JDIQ) N2 - Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection.
Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR. KW - data preparation KW - data wrangling KW - record linkage KW - duplicate detection KW - similarity measures Y1 - 2020 U6 - https://doi.org/10.1145/3377878 SN - 1936-1955 SN - 1936-1963 VL - 12 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Dreseler, Markus A1 - Boissier, Martin A1 - Rabl, Tilmann A1 - Uflacker, Matthias T1 - Quantifying TPC-H choke points and their optimizations JF - Proceedings of the VLDB Endowment N2 - TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems. This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly ifluenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine. Y1 - 2020 U6 - https://doi.org/10.14778/3389133.3389138 SN - 2150-8097 VL - 13 IS - 8 SP - 1206 EP - 1220 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Casel, Katrin A1 - Fernau, Henning A1 - Gaspers, Serge A1 - Gras, Benjamin A1 - Schmid, Markus L. T1 - On the complexity of the smallest grammar problem over fixed alphabets JF - Theory of computing systems N2 - In the smallest grammar problem, we are given a word w and we want to compute a preferably small context-free grammar G for the singleton language {w} (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is NP-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless P = NP. We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains NP-complete (and its optimisation version is APX-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i. e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields W[1]-hardness. Furthermore, we present an O(3(vertical bar w vertical bar)) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i. e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the "hierarchical depth" of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results. KW - grammar-based compression KW - smallest grammar problem KW - straight-line KW - programs KW - NP-completeness KW - exact exponential-time algorithms Y1 - 2020 U6 - https://doi.org/10.1007/s00224-020-10013-w SN - 1432-4350 SN - 1433-0490 VL - 65 IS - 2 SP - 344 EP - 409 PB - Springer CY - New York ER - TY - JOUR A1 - Kossmann, Jan A1 - Halfpap, Stefan A1 - Jankrift, Marcel A1 - Schlosser, Rainer T1 - Magic mirror in my hand, which is the best in the land? BT - an experimental evaluation of index selection algorithms JF - Proceedings of the VLDB Endowment N2 - Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach. Y1 - 2020 U6 - https://doi.org/10.14778/3407790.3407832 SN - 2150-8097 VL - 13 IS - 11 SP - 2382 EP - 2395 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Koumarelas, Ioannis A1 - Papenbrock, Thorsten A1 - Naumann, Felix T1 - MDedup BT - duplicate detection with matching dependencies JF - Proceedings of the VLDB Endowment N2 - Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize or require a lot of pre-labeled training data. Both parameterization and pre-labeling are at least domain-specific if not dataset-specific, which is a problem if a new dataset needs to be cleaned. For this reason, we propose a novel, rule-based and fully automatic duplicate detection approach that is based on matching dependencies (MDs). Our system uses automatically discovered MDs, various dataset features, and known gold standards to train a model that selects MDs as duplicate detection rules. Once trained, the model can select useful MDs for duplicate detection on any new dataset. To increase the generally low recall of MD-based data cleaning approaches, we propose an additional boosting step. Our experiments show that this approach reaches up to 94% F-measure and 100% precision on our evaluation datasets, which are good numbers considering that the system does not require domain or target data-specific configuration. Y1 - 2020 U6 - https://doi.org/10.14778/3377369.3377379 SN - 2150-8097 VL - 13 IS - 5 SP - 712 EP - 725 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Schirmer, Philipp A1 - Papenbrock, Thorsten A1 - Koumarelas, Ioannis A1 - Naumann, Felix T1 - Efficient discovery of matching dependencies JF - ACM transactions on database systems : TODS N2 - Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB. KW - matching dependencies KW - functional dependencies KW - dependency discovery KW - data profiling KW - data matching KW - entity resolution KW - similarity measures Y1 - 2020 U6 - https://doi.org/10.1145/3392778 SN - 0362-5915 SN - 1557-4644 VL - 45 IS - 3 PB - Association for Computing Machinery CY - New York ER -