000 Informatik, Informationswissenschaft, allgemeine Werke
Refine
Year of publication
- 2020 (55) (remove)
Document Type
- Article (42)
- Doctoral Thesis (9)
- Postprint (3)
- Review (1)
Is part of the Bibliography
- yes (55)
Keywords
- machine learning (3)
- run time analysis (3)
- theory (3)
- Algorithms (2)
- Fault tolerance (2)
- HDAC1 (2)
- Peer-to-Peer ridesharing (2)
- RUNX2 (2)
- artificial intelligence (2)
- calcium influx (2)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (21)
- Hasso-Plattner-Institut für Digital Engineering GmbH (19)
- Institut für Informatik und Computational Science (11)
- Institut für Chemie (2)
- Department Linguistik (1)
- Digital Engineering Fakultät (1)
- Fachgruppe Betriebswirtschaftslehre (1)
- Lehreinheit für Wirtschafts-Arbeit-Technik (1)
- Wirtschaftswissenschaften (1)
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.
Stem cells are capable of sensing and processing environmental inputs, converting this information to output a specific cell lineage through signaling cascades. Despite the combinatorial nature of mechanical, thermal, and biochemical signals, these stimuli have typically been decoupled and applied independently, requiring continuous regulation by controlling units. We employ a programmable polymer actuator sheet to autonomously synchronize thermal and mechanical signals applied to mesenchymal stem cells (MSC5). Using a grid on its underside, the shape change of polymer sheet, as well as cell morphology, calcium (Ca2+) influx, and focal adhesion assembly, could be visualized and quantified. This paper gives compelling evidence that the temperature sensing and mechanosensing of MSC5 are interconnected via intracellular Ca2+. Up-regulated Ca2+ levels lead to a remarkable alteration of histone H3K9 acetylation and activation of osteogenic related genes. The interplay of physical, thermal, and biochemical signaling was utilized to accelerate the cell differentiation toward osteogenic lineage. The approach of programmable bioinstructivity provides a fundamental principle for functional biomaterials exhibiting multifaceted stimuli on differentiation programs. Technological impact is expected in the tissue engineering of periosteum for treating bone defects.
Stem cells are capable of sensing and processing environmental inputs, converting this information to output a specific cell lineage through signaling cascades. Despite the combinatorial nature of mechanical, thermal, and biochemical signals, these stimuli have typically been decoupled and applied independently, requiring continuous regulation by controlling units. We employ a programmable polymer actuator sheet to autonomously synchronize thermal and mechanical signals applied to mesenchymal stem cells (MSC5). Using a grid on its underside, the shape change of polymer sheet, as well as cell morphology, calcium (Ca2+) influx, and focal adhesion assembly, could be visualized and quantified. This paper gives compelling evidence that the temperature sensing and mechanosensing of MSC5 are interconnected via intracellular Ca2+. Up-regulated Ca2+ levels lead to a remarkable alteration of histone H3K9 acetylation and activation of osteogenic related genes. The interplay of physical, thermal, and biochemical signaling was utilized to accelerate the cell differentiation toward osteogenic lineage. The approach of programmable bioinstructivity provides a fundamental principle for functional biomaterials exhibiting multifaceted stimuli on differentiation programs. Technological impact is expected in the tissue engineering of periosteum for treating bone defects.
Recent findings suggest a role of oxytocin on the tendency to spontaneously mimic the emotional facial expressions of others. Oxytocin-related increases of facial mimicry, however, seem to be dependent on contextual factors. Given previous literature showing that people preferentially mimic emotional expressions of individuals associated with high (vs. low) rewards, we examined whether the reward value of the mimicked agent is one factor influencing the oxytocin effects on facial mimicry. To test this hypothesis, 60 male adults received 24 IU of either intranasal oxytocin or placebo in a double-blind, between-subject experiment. Next, the value of male neutral faces was manipulated using an associative learning task with monetary rewards. After the reward associations were learned, participants watched videos of the same faces displaying happy and angry expressions. Facial reactions to the emotional expressions were measured with electromyography. We found that participants judged as more pleasant the face identities associated with high reward values than with low reward values. However, happy expressions by low rewarding faces were more spontaneously mimicked than high rewarding faces. Contrary to our expectations, we did not find a significant direct effect of intranasal oxytocin on facial mimicry, nor on the reward-driven modulation of mimicry. Our results support the notion that mimicry is a complex process that depends on contextual factors, but failed to provide conclusive evidence of a role of oxytocin on the modulation of facial mimicry.
Background:
Childhood and adolescence are critical stages of life for mental health and well-being. Schools are a key setting for mental health promotion and illness prevention. One in five children and adolescents have a mental disorder, about half of mental disorders beginning before the age of 14. Beneficial and explainable artificial intelligence can replace current paper- based and online approaches to school mental health surveys. This can enhance data acquisition, interoperability, data driven analysis, trust and compliance. This paper presents a model for using chatbots for non-obtrusive data collection and supervised machine learning models for data analysis; and discusses ethical considerations pertaining to the use of these models.
Methods:
For data acquisition, the proposed model uses chatbots which interact with students. The conversation log acts as the source of raw data for the machine learning. Pre-processing of the data is automated by filtering for keywords and phrases.
Existing survey results, obtained through current paper-based data collection methods, are evaluated by domain experts (health professionals). These can be used to create a test dataset to validate the machine learning models. Supervised learning
can then be deployed to classify specific behaviour and mental health patterns.
Results:
We present a model that can be used to improve upon current paper-based data collection and manual data analysis methods. An open-source GitHub repository contains necessary tools and components of this model. Privacy is respected through
rigorous observance of confidentiality and data protection requirements. Critical reflection on these ethics and law aspects is included in the project.
Conclusions:
This model strengthens mental health surveillance in schools. The same tools and components could be applied to other public health data. Future extensions of this model could also incorporate unsupervised learning to find clusters and patterns
of unknown effects.
ganon
(2020)
Motivation:
The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices.
Results:
Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification.
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramidxpyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.
An independency (cliquy) tree of an n-vertex graph G is a spanning tree of G in which the set of leaves induces an independent set (clique). We study the problems of minimizing or maximizing the number of leaves of such trees, and fully characterize their parameterized complexity. We show that all four variants of deciding if an independency/cliquy tree with at least/most l leaves exists parameterized by l are either Para-NP- or W[1]-hard. We prove that minimizing the number of leaves of a cliquy tree parameterized by the number of internal vertices is Para-NP-hard too. However, we show that minimizing the number of leaves of an independency tree parameterized by the number k of internal vertices has an O*(4(k))-time algorithm and a 2k vertex kernel. Moreover, we prove that maximizing the number of leaves of an independency/cliquy tree parameterized by the number k of internal vertices both have an O*(18(k))-time algorithm and an O(k 2(k)) vertex kernel, but no polynomial kernel unless the polynomial hierarchy collapses to the third level. Finally, we present an O(3(n) . f(n))-time algorithm to find a spanning tree where the leaf set has a property that can be decided in f (n) time and has minimum or maximum size.
Objective We propose a data-driven method to detect temporal patterns of disease progression in high-dimensional claims data based on gradient boosting with stability selection. Materials and methods We identified patients with chronic obstructive pulmonary disease in a German health insurance claims database with 6.5 million individuals and divided them into a group of patients with the highest disease severity and a group of control patients with lower severity. We then used gradient boosting with stability selection to determine variables correlating with a chronic obstructive pulmonary disease diagnosis of highest severity and subsequently model the temporal progression of the disease using the selected variables. Results We identified a network of 20 diagnoses (e.g. respiratory failure), medications (e.g. anticholinergic drugs) and procedures associated with a subsequent chronic obstructive pulmonary disease diagnosis of highest severity. Furthermore, the network successfully captured temporal patterns, such as disease progressions from lower to higher severity grades. Discussion The temporal trajectories identified by our data-driven approach are compatible with existing knowledge about chronic obstructive pulmonary disease showing that the method can reliably select relevant variables in a high-dimensional context. Conclusion We provide a generalizable approach for the automatic detection of disease trajectories in claims data. This could help to diagnose diseases early, identify unknown risk factors and optimize treatment plans.
RHEEMix in the data jungle
(2020)
Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.