Refine
Year of publication
Document Type
- Doctoral Thesis (5)
- Article (4)
- Postprint (2)
- Master's Thesis (1)
Keywords
- database (12) (remove)
Efficiently managing large state is a key challenge for data management systems. Traditionally, state is split into fast but volatile state in memory for processing and persistent but slow state on secondary storage for durability. Persistent memory (PMem), as a new technology in the storage hierarchy, blurs the lines between these states by offering both byte-addressability and low latency like DRAM as well persistence like secondary storage. These characteristics have the potential to cause a major performance shift in database systems.
Driven by the potential impact that PMem has on data management systems, in this thesis we explore their use of PMem. We first evaluate the performance of real PMem hardware in the form of Intel Optane in a wide range of setups. To this end, we propose PerMA-Bench, a configurable benchmark framework that allows users to evaluate the performance of customizable database-related PMem access. Based on experimental results obtained with PerMA-Bench, we discuss findings and identify general and implementation-specific aspects that influence PMem performance and should be considered in future work to improve PMem-aware designs. We then propose Viper, a hybrid PMem-DRAM key-value store. Based on PMem-aware access patterns, we show how to leverage PMem and DRAM efficiently to design a key database component. Our evaluation shows that Viper outperforms existing key-value stores by 4–18x for inserts while offering full data persistence and achieving similar or better lookup performance. Next, we show which changes must be made to integrate PMem components into larger systems. By the example of stream processing engines, we highlight limitations of current designs and propose a prototype engine that overcomes these limitations. This allows our prototype to fully leverage PMem's performance for its internal state management. Finally, in light of Optane's discontinuation, we discuss how insights from PMem research can be transferred to future multi-tier memory setups by the example of Compute Express Link (CXL).
Overall, we show that PMem offers high performance for state management, bridging the gap between fast but volatile DRAM and persistent but slow secondary storage. Although Optane was discontinued, new memory technologies are continuously emerging in various forms and we outline how novel designs for them can build on insights from existing PMem research.
The digital transformation sets new requirements to all classes of enterprise systems in companies. ERP systems in particular, which represent the dominant class of enterprise systems, are struggling to meet the new requirements at all levels of the architecture. Therefore, there is an urgent need to reconsider the overall architecture of the systems and address the root of the related issues. Given that many restrictions ERP pose on their adaptability are related to the standardization of data, the database layer of ERP systems is addressed. Since database serve as the foundation for data storage and retrieval, they limit the flexibility of enterprise systems and the chance to adapt to new requirements accordingly. So far, relational databases are widely used. Using a systematic literature approach, recent requirements for ERP systems were identified. Prominent database approaches were assessed against the 23 requirements identified. The results reveal the strengths and weaknesses of recent database approaches. To this end, the results highlight the demand to combine multiple database approaches to fulfill recent business requirements. From a conceptual point of view, this paper supports the idea of federated databases which are interoperable to fulfill future requirements and support business operation. This research forms the basis for renewal of the current generation of ERP systems and proposes to ERP vendors to use different database concepts in the future.
The amount of data stored in databases and the complexity of database workloads are ever- increasing. Database management systems (DBMSs) offer many configuration options, such as index creation or unique constraints, which must be adapted to the specific instance to efficiently process large volumes of data. Currently, such database optimization is complicated, manual work performed by highly skilled database administrators (DBAs). In cloud scenarios, manual database optimization even becomes infeasible: it exceeds the abilities of the best DBAs due to the enormous number of deployed DBMS instances (some providers maintain millions of instances), missing domain knowledge resulting from data privacy requirements, and the complexity of the configuration tasks.
Therefore, we investigate how to automate the configuration of DBMSs efficiently with the help of unsupervised database optimization. While there are numerous configuration options, in this thesis, we focus on automatic index selection and the use of data dependencies, such as functional dependencies, for query optimization. Both aspects have an extensive performance impact and complement each other by approaching unsupervised database optimization from different perspectives.
Our contributions are as follows: (1) we survey automated state-of-the-art index selection algorithms regarding various criteria, e.g., their support for index interaction. We contribute an extensible platform for evaluating the performance of such algorithms with industry-standard datasets and workloads. The platform is well-received by the community and has led to follow-up research. With our platform, we derive the strengths and weaknesses of the investigated algorithms. We conclude that existing solutions often have scalability issues and cannot quickly determine (near-)optimal solutions for large problem instances. (2) To overcome these limitations, we present two new algorithms. Extend determines (near-)optimal solutions with an iterative heuristic. It identifies the best index configurations for the evaluated benchmarks. Its selection runtimes are up to 10 times lower compared with other near-optimal approaches. SWIRL is based on reinforcement learning and delivers solutions instantly. These solutions perform within 3 % of the optimal ones. Extend and SWIRL are available as open-source implementations.
(3) Our index selection efforts are complemented by a mechanism that analyzes workloads to determine data dependencies for query optimization in an unsupervised fashion. We describe and classify 58 query optimization techniques based on functional, order, and inclusion dependencies as well as on unique column combinations. The unsupervised mechanism and three optimization techniques are implemented in our open-source research DBMS Hyrise. Our approach reduces the Join Order Benchmark’s runtime by 26 % and accelerates some TPC-DS queries by up to 58 times.
Additionally, we have developed a cockpit for unsupervised database optimization that allows interactive experiments to build confidence in such automated techniques. In summary, our contributions improve the performance of DBMSs, support DBAs in their work, and enable them to contribute their time to other, less arduous tasks.
Genome-wide association analysis in humans links nucleotide metabolism to leukocyte telomere length
(2020)
Leukocyte telomere length (LTL) is a heritable biomarker of genomic aging. In this study, we perform a genome-wide meta-analysis of LTL by pooling densely genotyped and imputed association results across large-scale European-descent studies including up to 78,592 individuals. We identify 49 genomic regions at a false dicovery rate (FDR) < 0.05 threshold and prioritize genes at 31, with five highlighting nucleotide metabolism as an important regulator of LTL. We report six genome-wide significant loci in or near SENP7, MOB1B, CARMIL1 , PRRC2A, TERF2, and RFWD3, and our results support recently identified PARP1, POT1, ATM, and MPHOSPH6 loci. Phenome-wide analyses in >350,000 UK Biobank participants suggest that genetically shorter telomere length increases the risk of hypothyroidism and decreases the risk of thyroid cancer, lymphoma, and a range of proliferative conditions. Our results replicate previously reported associations with increased risk of coronary artery disease and lower risk for multiple cancer types. Our findings substantially expand current knowledge on genes that regulate LTL and their impact on human health and disease.
Genome-wide association analysis in humans links nucleotide metabolism to leukocyte telomere length
(2020)
Leukocyte telomere length (LTL) is a heritable biomarker of genomic aging. In this study, we perform a genome-wide meta-analysis of LTL by pooling densely genotyped and imputed association results across large-scale European-descent studies including up to 78,592 individuals. We identify 49 genomic regions at a false dicovery rate (FDR) < 0.05 threshold and prioritize genes at 31, with five highlighting nucleotide metabolism as an important regulator of LTL. We report six genome-wide significant loci in or near SENP7, MOB1B, CARMIL1 , PRRC2A, TERF2, and RFWD3, and our results support recently identified PARP1, POT1, ATM, and MPHOSPH6 loci. Phenome-wide analyses in >350,000 UK Biobank participants suggest that genetically shorter telomere length increases the risk of hypothyroidism and decreases the risk of thyroid cancer, lymphoma, and a range of proliferative conditions. Our results replicate previously reported associations with increased risk of coronary artery disease and lower risk for multiple cancer types. Our findings substantially expand current knowledge on genes that regulate LTL and their impact on human health and disease.
Botanic gardens have been exchanging seeds through seed catalogues for centuries. In many gardens, these catalogues remain an important source of plant material. Living collections have become more relevant for genetic analysis and derived research, since genomics of non-model organisms heavily rely on living material. The range of species that is made available annually on all seed lists combined, provides an unsurpassed source of instantly accessible plant material for research collections. Still, the Index Seminum has received criticism in the past few decades. The current exchange model dictates that associated data is manually entered into each database. The amount of time involved and the human errors occurring in this process are difficult to justify when the data was initially produced as a report from another database. The authors propose that an online marketplace for seed exchange should be established, with enhanced search possibilities and downloadable accession data in a standardised format. Such online service should preferably be supervised and coordinated by Botanic Gardens Conservation International (BGCI). This manuscript is the outcome of a workshop on July 9th, 2015, at the European botanic gardens congress "Eurogard VII" in Paris, where the first two authors invited members of the botanic garden community to discuss how the anachronistic Index Seminum can be transformed into an improved and modern tool for seed exchange.
Jeden Tag werden unzählige Mengen an medizinischen Patientendaten in Krankenhäusern und Arztpraxen digital gespeichert. Für Forschungszwecke werden diese Daten bisher größtenteils nicht verwendet. Ziel dieser Arbeit ist es täglich anfallende anonymisierte Patientendaten, die aus einer Praxis für ganzheitliche Innere Medizin stammen, zu analysieren. Aufgrund mangelnder Kooperation seitens des Anbieters der Praxissoftware konnten die Patientendaten nicht automatisch extrahiert werden. Daher wurde eine Auswahl an Diagnosen und anthropometrischen Parametern manuell in eine Datenbank übertragen. Informationen über die Behandlung wurden dabei nicht berücksichtigt. Data-Mining Verfahren ermöglichen die Forschung auf der Grundlage von alltäglichen Patientendaten. Durch die Anwendung maschinellen Lernens kann Präventionsmedizin und die Überwachung von Behandlungsverläufen unterstützt werden.
Das Potenzial der Analyse dieser sonst weitgehend ungenutzten Daten wird anhand von Untersuchungen zur Komorbidität verdeutlicht. Dabei zeigt sich, dass einerseits das Metabolische Syndrom und dessen Komponenten zusammen mit Krebserkrankungen ein Cluster bilden und andererseits psychosomatische Störungen vermehrt mit Autoimmunerkrankungen der Schilddrüse auftreten. Außerdem wird eine noch nicht schulmedizinisch anerkannte Stoffwechselerkrankung, die Hämopyrrollaktamurie (HPU) untersucht. Diese lässt sich durch eine vermehrte Ausscheidung von Pyrrolen im Urin nachweisen. Bezüglich der Patienten bei denen ein HPU-Test vorliegt, weisen 84 % einen erhöhten Titer auf. Diese Beobachtung steht im Widerspruch zur vorherigen Annahme, dass in etwa 10 % der Bevölkerung von HPU betroffen sind.
Präventives Handeln ermöglicht es Gesundheit zu erhalten. Zu diesem Zweck ist es notwen- dig Krankheiten möglichst früh zu erkennen. In dieser Studie können Entscheidungsbaum-Modelle die Hashimoto Thyreoiditis mit einer Genauigkeit von 87.5 % bei einem Patienten diagnostizieren. Defizite durch die fehlenden Informationen über die medikamentöse Behandlung werden anhand des Modells zur Vorhersage von Hypothyreoiditis (Genauigkeit von 60.9 %) aufgezeigt.
Mit Hilfe von STATIS, das auf einer Erweiterung der Hauptkomponentenanalyse basiert, die es ermöglicht mehrere Tabellen simultan zu vergleichen, wurde der Behandlungsverlauf von 20 Patienten über einen Zeitraum von fünf Jahren überwacht. Anhand von Hypertonie wird gezeigt, dass sich sich die Patenten bezüglich Ihrer Laborwerte voneinander unterscheiden und sich Muster für Krankheiten erkennen lassen.
Diese Arbeit demonstriert den Nutzen, der durch die vermehrte Analyse alltäglicher hochdimensionaler und heterogener Daten erbracht werden kann.
Manganese (Mn) is an essential micronutrient for development and function of the nervous system. Deficiencies in Mn transport have been implicated in the pathogenesis of Huntington's disease (HD), an autosomal dominant neurodegenerative disorder characterized by loss of medium spiny neurons of the striatum. Brain Mn levels are highest in striatum and other basal ganglia structures, the most sensitive brain regions to Mn neurotoxicity. Mouse models of HD exhibit decreased striatal Mn accumulation and HD striatal neuron models are resistant to Mn cytotoxicity. We hypothesized that the observed modulation of Mn cellular transport is associated with compensatory metabolic responses to HD pathology. Here we use an untargeted metabolomics approach by performing ultraperformance liquid chromatography-ion mobility-mass spectrometry (UPLC-IM-MS) on control and HD immortalized mouse striatal neurons to identify metabolic disruptions under three Mn exposure conditions, low (vehicle), moderate (non-cytotoxic) and high (cytotoxic). Our analysis revealed lower metabolite levels of pantothenic acid, and glutathione (GSH) in HD striatal cells relative to control cells. HD striatal cells also exhibited lower abundance and impaired induction of isobutyryl carnitine in response to increasing Mn exposure. In addition, we observed induction of metabolites in the pentose shunt pathway in HD striatal cells after high Mn exposure. These findings provide metabolic evidence of an interaction between the HD genotype and biologically relevant levels of Mn in a striatal cell model with known HD by Mn exposure interactions. The metabolic phenotypes detected support existing hypotheses that changes in energetic processes underlie the pathobiology of both HD and Mn neurotoxicity.
Manganese (Mn) is an essential micronutrient for development and function of the nervous system. Deficiencies in Mn transport have been implicated in the pathogenesis of Huntington's disease (HD), an autosomal dominant neurodegenerative disorder characterized by loss of medium spiny neurons of the striatum. Brain Mn levels are highest in striatum and other basal ganglia structures, the most sensitive brain regions to Mn neurotoxicity. Mouse models of HD exhibit decreased striatal Mn accumulation and HD striatal neuron models are resistant to Mn cytotoxicity. We hypothesized that the observed modulation of Mn cellular transport is associated with compensatory metabolic responses to HD pathology. Here we use an untargeted metabolomics approach by performing ultraperformance liquid chromatography-ion mobility-mass spectrometry (UPLC-IM-MS) on control and HD immortalized mouse striatal neurons to identify metabolic disruptions under three Mn exposure conditions, low (vehicle), moderate (non-cytotoxic) and high (cytotoxic). Our analysis revealed lower metabolite levels of pantothenic acid, and glutathione (GSH) in HD striatal cells relative to control cells. HD striatal cells also exhibited lower abundance and impaired induction of isobutyryl carnitine in response to increasing Mn exposure. In addition, we observed induction of metabolites in the pentose shunt pathway in HD striatal cells after high Mn exposure. These findings provide metabolic evidence of an interaction between the HD genotype and biologically relevant levels of Mn in a striatal cell model with known HD by Mn exposure interactions. The metabolic phenotypes detected support existing hypotheses that changes in energetic processes underlie the pathobiology of both HD and Mn neurotoxicity.
Planetary research is often user-based and requires considerable skill, time, and effort. Unfortunately, self-defined boundary conditions, definitions, and rules are often not documented or not easy to comprehend due to the complexity of research. This makes a comparison to other studies, or an extension of the already existing research, complicated. Comparisons are often distorted, because results rely on different, not well defined, or even unknown boundary conditions. The purpose of this research is to develop a standardized analysis method for planetary surfaces, which is adaptable to several research topics. The method provides a consistent quality of results. This also includes achieving reliable and comparable results and reducing the time and effort of conducting such studies. A standardized analysis method is provided by automated analysis tools that focus on statistical parameters. Specific key parameters and boundary conditions are defined for the tool application. The analysis relies on a database in which all key parameters are stored. These databases can be easily updated and adapted to various research questions. This increases the flexibility, reproducibility, and comparability of the research. However, the quality of the database and reliability of definitions directly influence the results. To ensure a high quality of results, the rules and definitions need to be well defined and based on previously conducted case studies. The tools then produce parameters, which are obtained by defined geostatistical techniques (measurements, calculations, classifications). The idea of an automated statistical analysis is tested to proof benefits but also potential problems of this method. In this study, I adapt automated tools for floor-fractured craters (FFCs) on Mars. These impact craters show a variety of surface features, occurring in different Martian environments, and having different fracturing origins. They provide a complex morphological and geological field of application. 433 FFCs are classified by the analysis tools due to their fracturing process. Spatial data, environmental context, and crater interior data are analyzed to distinguish between the processes involved in floor fracturing. Related geologic processes, such as glacial and fluvial activity, are too similar to be separately classified by the automated tools. Glacial and fluvial fracturing processes are merged together for the classification. The automated tools provide probability values for each origin model. To guarantee the quality and reliability of the results, classification tools need to achieve an origin probability above 50 %. This analysis method shows that 15 % of the FFCs are fractured by intrusive volcanism, 20 % by tectonic activity, and 43 % by water & ice related processes. In total, 75 % of the FFCs are classified to an origin type. This can be explained by a combination of origin models, superposition or erosion of key parameters, or an unknown fracturing model. Those features have to be manually analyzed in detail. Another possibility would be the improvement of key parameters and rules for the classification. This research shows that it is possible to conduct an automated statistical analysis of morphologic and geologic features based on analysis tools. Analysis tools provide additional information to the user and are therefore considered assistance systems.