Hasso-Plattner-Institut für Digital Engineering GmbH
Refine
Year of publication
Document Type
- Article (194)
- Doctoral Thesis (100)
- Other (84)
- Monograph/Edited Volume (42)
- Postprint (22)
- Conference Proceeding (4)
- Part of a Book (1)
- Habilitation Thesis (1)
- Report (1)
Keywords
- MOOC (42)
- digital education (37)
- e-learning (36)
- Digitale Bildung (34)
- online course creation (34)
- online course design (34)
- Kursdesign (33)
- Micro Degree (33)
- Online-Lehre (33)
- Onlinekurs (33)
This vision article outlines the main building blocks of what we term AI Compliance, an effort to bridge two complementary research areas: computer science and the law.
Such research has the goal to model, measure, and affect the quality of AI artifacts, such as data, models, and applications, to then facilitate adherence to legal standards.
DrDimont: explainable drug response prediction from differential analysis of multi-omics networks
(2022)
Motivation:
While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging.
Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning.
However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem.
Results:
We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks.
It allows for comparative conclusions between two conditions and translates them into differential drug response predictions.
DrDimont focuses on molecular interactions.
It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks.
DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response.
Residential segregation is a wide-spread phenomenon that can be observed in almost every major city.
In these urban areas residents with different racial or socioeconomic background tend to form homogeneous clusters.
Schelling's famous agent-based model for residential segregation explains how such clusters can form even if all agents are tolerant, i.e., if they agree to live in mixed neighborhoods.
For segregation to occur, all it needs is a slight bias towards agents preferring similar neighbors.
Very recently, Schelling's model has been investigated from a game-theoretic point of view with selfish agents that strategically select their residential location.
In these games, agents can improve on their current location by performing a location swap with another agent who is willing to swap.
We significantly deepen these investigations by studying the influence of the underlying topology modeling the residential area on the existence of equilibria, the Price of Anarchy and on the dynamic properties of the resulting strategic multi-agent system. Moreover, as a new conceptual contribution, we also consider the influence of locality, i.e., if the location swaps are restricted to swaps of neighboring agents.
We give improved almost tight bounds on the Price of Anarchy for arbitrary underlying graphs and we present (almost) tight bounds for regular graphs, paths and cycles. Moreover, we give almost tight bounds for grids, which are commonly used in empirical studies.
For grids we also show that locality has a severe impact on the game dynamics.
"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it.
Duplicates - multiple but different representations of the same real-world entities are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential.
Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results.
To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform.
It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand matching results.
Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching results, as shown in Figure 1.
HPI Future SOC Lab
(2024)
The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
This technical report presents results of research projects executed in 2020. Selected projects have presented their results on April 21st and November 10th 2020 at the Future SOC Lab Day events.
Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection
(2022)
Background:
Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied.
Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens.
No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone.
Results:
We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes.
We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning.
We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity.
Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats.
Conclusions:
The neural networks trained using our data collection enable accurate detection of novel fungal pathogens.
A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task.
Um in der Schule bereits frühzeitig ein Verständnis für informatische Prozesse zu vermitteln wurde das neue Informatikfach Digitale Welt für die Klassenstufe 5 konzipiert mit der bundesweit einmaligen Verbindung von Informatik mit anwendungsbezogenen und gesellschaftlich relevanten Bezügen zur Ökologie und Ökonomie. Der Technische Report gibt eine Handreichung zur Einführung des neuen Faches.
When I started my PhD, I wanted to do something related to systems but I wasn't sure exactly what. I didn't consider data management systems initially, because I was unaware of the richness of the systems work that data management systems were build on. I thought the field was mainly about SQL. Luckily, that view changed quickly.
The Hasso Plattner Institute (HPI), academically structured as the independent Faculty of Digital Engineering at the University of Potsdam, unites computer science research and teaching with the advantages of a privately financed institute and a tuition-free study program. Founder and namesake of the institute is the SAP co-founder Hasso Plattner, who also heads the Enterprise Platform and Integration Concepts (EPIC) research center which focuses on the technical aspects of business software with a vision to provide the fastest way to get insights out of enterprise data. Founded in 2006, the EPIC combines three research groups comprising autonomous data management, enterprise software engineering, and data-driven decision support.
Schelling's classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We introduce an agent-based saturated open-city variant, the Flip Schelling Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to change their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge {u,v} is monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and, moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge {u,v} makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c>0 such that the expected fraction of monochrome edges after the FSP is at least 1/2+c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2+o(1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength.