TY - THES A1 - Perscheid, Cindy T1 - Integrative biomarker detection using prior knowledge on gene expression data sets T1 - Integrative Biomarker-Erkennung auf Genexpressions-Daten mithilfe von biologischem Vorwissen N2 - Gene expression data is analyzed to identify biomarkers, e.g. relevant genes, which serve for diagnostic, predictive, or prognostic use. Traditional approaches for biomarker detection select distinctive features from the data based exclusively on the signals therein, facing multiple shortcomings in regards to overfitting, biomarker robustness, and actual biological relevance. Prior knowledge approaches are expected to address these issues by incorporating prior biological knowledge, e.g. on gene-disease associations, into the actual analysis. However, prior knowledge approaches are currently not widely applied in practice because they are often use-case specific and seldom applicable in a different scope. This leads to a lack of comparability of prior knowledge approaches, which in turn makes it currently impossible to assess their effectiveness in a broader context. Our work addresses the aforementioned issues with three contributions. Our first contribution provides formal definitions for both prior knowledge and the flexible integration thereof into the feature selection process. Central to these concepts is the automatic retrieval of prior knowledge from online knowledge bases, which allows for streamlining the retrieval process and agreeing on a uniform definition for prior knowledge. We subsequently describe novel and generalized prior knowledge approaches that are flexible regarding the used prior knowledge and applicable to varying use case domains. Our second contribution is the benchmarking platform Comprior. Comprior applies the aforementioned concepts in practice and allows for flexibly setting up comprehensive benchmarking studies for examining the performance of existing and novel prior knowledge approaches. It streamlines the retrieval of prior knowledge and allows for combining it with prior knowledge approaches. Comprior demonstrates the practical applicability of our concepts and further fosters the overall development and comparability of prior knowledge approaches. Our third contribution is a comprehensive case study on the effectiveness of prior knowledge approaches. For that, we used Comprior and tested a broad range of both traditional and prior knowledge approaches in combination with multiple knowledge bases on data sets from multiple disease domains. Ultimately, our case study constitutes a thorough assessment of a) the suitability of selected knowledge bases for integration, b) the impact of prior knowledge being applied at different integration levels, and c) the improvements in terms of classification performance, biological relevance, and overall robustness. In summary, our contributions demonstrate that generalized concepts for prior knowledge and a streamlined retrieval process improve the applicability of prior knowledge approaches. Results from our case study show that the integration of prior knowledge positively affects biomarker results, particularly regarding their robustness. Our findings provide the first in-depth insights on the effectiveness of prior knowledge approaches and build a valuable foundation for future research. N2 - Biomarker sind charakteristische biologische Merkmale mit diagnostischer oder prognostischer Aussagekraft. Auf der molekularen Ebene sind dies Gene mit einem krankheitsspezifischen Expressionsmuster, welche mittels der Analyse von Genexpressionsdaten identifiziert werden. Traditionelle Ansätze für diese Art von Biomarker Detection wählen Gene als Biomarker ausschließlich anhand der vorhandenen Signale im Datensatz aus. Diese Vorgehensweise zeigt jedoch Schwächen insbesondere in Bezug auf die Robustheit und tatsächliche biologische Relevanz der identifizierten Biomarker. Verschiedene Forschungsarbeiten legen nahe, dass die Berücksichtigung des biologischen Kontexts während des Selektionsprozesses diese Schwächen ausgleichen kann. Sogenannte wissensbasierte Ansätze für Biomarker Detection beziehen vorhandenes biologisches Wissen, beispielsweise über Zusammenhänge zwischen bestimmten Genen und Krankheiten, direkt in die Analyse mit ein. Die Anwendung solcher Verfahren ist in der Praxis jedoch derzeit nicht weit verbreitet, da existierende Methoden oft spezifisch für einen bestimmten Anwendungsfall entwickelt wurden und sich nur mit großem Aufwand auf andere Anwendungsgebiete übertragen lassen. Dadurch sind Vergleiche untereinander kaum möglich, was es wiederum nicht erlaubt die Effektivität von wissensbasierten Methoden in einem breiteren Kontext zu untersuchen. Die vorliegende Arbeit befasst sich mit den vorgenannten Herausforderungen für wissensbasierte Ansätze. In einem ersten Schritt legen wir formale und einheitliche Definitionen für vorhandenes biologisches Wissen sowie ihre flexible Integration in den Biomarker-Auswahlprozess fest. Der Kerngedanke unseres Ansatzes ist die automatisierte Beschaffung von biologischem Wissen aus im Internet frei verfügbaren Wissens-Datenbanken. Dies erlaubt eine Vereinfachung der Kuratierung sowie die Festlegung einer einheitlichen Definition für biologisches Wissen. Darauf aufbauend beschreiben wir generalisierte wissensbasierte Verfahren, welche flexibel auf verschiedene Anwendungsfalle anwendbar sind. In einem zweiten Schritt haben wir die Benchmarking-Plattform Comprior entwickelt, welche unsere theoretischen Konzepte in einer praktischen Anwendung realisiert. Comprior ermöglicht die schnelle Umsetzung von umfangreichen Experimenten für den Vergleich von wissensbasierten Ansätzen. Comprior übernimmt die Beschaffung von biologischem Wissen und ermöglicht dessen beliebige Kombination mit wissensbasierten Ansätzen. Comprior demonstriert damit die praktische Umsetzbarkeit unserer theoretischen Konzepte und unterstützt zudem die technische Realisierung und Vergleichbarkeit wissensbasierter Ansätze. In einem dritten Schritt untersuchen wir die Effektivität wissensbasierter Ansätze im Rahmen einer umfangreichen Fallstudie. Mithilfe von Comprior vergleichen wir die Ergebnisse traditioneller und wissensbasierter Ansätze im Kontext verschiedener Krankheiten, wobei wir für wissensbasierte Ansätze auch verschiedene Wissens-Datenbanken verwenden. Unsere Fallstudie untersucht damit a) die Eignung von ausgewählten Wissens-Datenbanken für deren Einsatz bei wissensbasierten Ansätzen, b) den Einfluss verschiedener Integrationskonzepte für biologisches Wissen auf den Biomarker-Auswahlprozess, und c) den Grad der Verbesserung in Bezug auf die Klassifikationsleistung, biologische Relevanz und allgemeine Robustheit der selektierten Biomarker. Zusammenfassend demonstriert unsere Arbeit, dass generalisierte Konzepte für biologisches Wissen und dessen vereinfachte Kuration die praktische Anwendbarkeit von wissensbasierten Ansätzen erleichtern. Die Ergebnisse unserer Fallstudie zeigen, dass die Integration von vorhandenem biologischen Wissen einen positiven Einfluss auf die selektierten Biomarker hat, insbesondere in Bezug auf ihre biologische Relevanz. Diese erstmals umfassenderen Erkenntnisse zur Effektivität von wissensbasierten Ansätzen bilden eine wertvolle Grundlage für zukünftige Forschungsarbeiten. KW - gene expression KW - biomarker detection KW - prior knowledge KW - feature selection KW - Biomarker-Erkennung KW - Merkmalsauswahl KW - Gen-Expression KW - biologisches Vorwissen Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-582418 ER - TY - GEN A1 - Perscheid, Cindy A1 - Faber, Lukas A1 - Kraus, Milena A1 - Arndt, Paul A1 - Janke, Michael A1 - Rehfeldt, Sebastian A1 - Schubotz, Antje A1 - Slosarek, Tamara A1 - Uflacker, Matthias T1 - A tissue-aware gene selection approach for analyzing multi-tissue gene expression data T2 - 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) N2 - High-throughput RNA sequencing (RNAseq) produces large data sets containing expression levels of thousands of genes. The analysis of RNAseq data leads to a better understanding of gene functions and interactions, which eventually helps to study diseases like cancer and develop effective treatments. Large-scale RNAseq expression studies on cancer comprise samples from multiple cancer types and aim to identify their distinct molecular characteristics. Analyzing samples from different cancer types implies analyzing samples from different tissue origin. Such multi-tissue RNAseq data sets require a meaningful analysis that accounts for the inherent tissue-related bias: The identified characteristics must not originate from the differences in tissue types, but from the actual differences in cancer types. However, current analysis procedures do not incorporate that aspect. As a result, we propose to integrate a tissue-awareness into the analysis of multi-tissue RNAseq data. We introduce an extension for gene selection that provides a tissue-wise context for every gene and can be flexibly combined with any existing gene selection approach. We suggest to expand conventional evaluation by additional metrics that are sensitive to the tissue-related bias. Evaluations show that especially low complexity gene selection approaches profit from introducing tissue-awareness. KW - RNAseq KW - gene selection KW - tissue-awareness KW - TCGA KW - GTEx Y1 - 2018 SN - 978-1-5386-5488-0 U6 - https://doi.org/10.1109/BIBM.2018.8621189 SN - 2156-1125 SN - 2156-1133 SP - 2159 EP - 2166 PB - IEEE CY - New York ER - TY - JOUR A1 - Perscheid, Cindy T1 - Integrative biomarker detection on high-dimensional gene expression data sets BT - a survey on prior knowledge approaches JF - Briefings in bioinformatics N2 - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection. KW - gene selection KW - external knowledge bases KW - biomarker detection KW - gene KW - expression KW - prior knowledge Y1 - 2021 U6 - https://doi.org/10.1093/bib/bbaa151 SN - 1467-5463 SN - 1477-4054 VL - 22 IS - 3 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Perscheid, Cindy A1 - Grasnick, Bastien A1 - Uflacker, Matthias T1 - Integrative Gene Selection on Gene Expression Data BT - Providing Biological Context to Traditional Approaches JF - Journal of Integrative Bioinformatics N2 - The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results. KW - Gene Expression Data Analysis KW - Integrative Gene Selection KW - Pattern Recognition KW - Prior Knowledge KW - Knowledge Bases Y1 - 2019 U6 - https://doi.org/10.1515/jib-2018-0064 SN - 1613-4516 VL - 16 IS - 1 PB - De Gruyter CY - Berlin ER - TY - GEN A1 - Perscheid, Cindy A1 - Uflacker, Matthias T1 - Integrating Biological Context into the Analysis of Gene Expression Data T2 - Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference N2 - High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e. g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context. However, gene expression data follows particular patterns - the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers. KW - Gene expression KW - Machine learning KW - Feature selection KW - Association rule mining KW - Biclustering KW - Knowledge bases Y1 - 2019 SN - 978-3-319-99608-0 SN - 978-3-319-99607-3 U6 - https://doi.org/10.1007/978-3-319-99608-0_41 SN - 2194-5357 SN - 2194-5365 VL - 801 SP - 339 EP - 343 PB - Springer CY - Cham ER - TY - JOUR A1 - Perscheid, Cindy T1 - Comprior BT - Facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets JF - BMC Bioinformatics N2 - Background Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness. Results We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance. Conclusion Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness KW - Feature selection KW - Prior knowledge KW - Gene expression KW - Reproducible benchmarking Y1 - 2021 U6 - https://doi.org/10.1186/s12859-021-04308-z SN - 1471-2105 VL - 22 SP - 1 EP - 15 PB - Springer Nature CY - London ER - TY - GEN A1 - Perscheid, Cindy T1 - Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets T2 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät N2 - Background Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness. Results We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance. Conclusion Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness T3 - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 010 KW - Feature selection KW - Prior knowledge KW - Gene expression KW - Reproducible benchmarking Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-548943 SP - 1 EP - 15 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - CHAP A1 - Kurbel, Karl A1 - Nowak, Dawid A1 - Azodi, Amir A1 - Jaeger, David A1 - Meinel, Christoph A1 - Cheng, Feng A1 - Sapegin, Andrey A1 - Gawron, Marian A1 - Morelli, Frank A1 - Stahl, Lukas A1 - Kerl, Stefan A1 - Janz, Mariska A1 - Hadaya, Abdulmasih A1 - Ivanov, Ivaylo A1 - Wiese, Lena A1 - Neves, Mariana A1 - Schapranow, Matthieu-Patrick A1 - Fähnrich, Cindy A1 - Feinbube, Frank A1 - Eberhardt, Felix A1 - Hagen, Wieland A1 - Plauth, Max A1 - Herscheid, Lena A1 - Polze, Andreas A1 - Barkowsky, Matthias A1 - Dinger, Henriette A1 - Faber, Lukas A1 - Montenegro, Felix A1 - Czachórski, Tadeusz A1 - Nycz, Monika A1 - Nycz, Tomasz A1 - Baader, Galina A1 - Besner, Veronika A1 - Hecht, Sonja A1 - Schermann, Michael A1 - Krcmar, Helmut A1 - Wiradarma, Timur Pratama A1 - Hentschel, Christian A1 - Sack, Harald A1 - Abramowicz, Witold A1 - Sokolowska, Wioletta A1 - Hossa, Tymoteusz A1 - Opalka, Jakub A1 - Fabisz, Karol A1 - Kubaczyk, Mateusz A1 - Cmil, Milena A1 - Meng, Tianhui A1 - Dadashnia, Sharam A1 - Niesen, Tim A1 - Fettke, Peter A1 - Loos, Peter A1 - Perscheid, Cindy A1 - Schwarz, Christian A1 - Schmidt, Christopher A1 - Scholz, Matthias A1 - Bock, Nikolai A1 - Piller, Gunther A1 - Böhm, Klaus A1 - Norkus, Oliver A1 - Clark, Brian A1 - Friedrich, Björn A1 - Izadpanah, Babak A1 - Merkel, Florian A1 - Schweer, Ilias A1 - Zimak, Alexander A1 - Sauer, Jürgen A1 - Fabian, Benjamin A1 - Tilch, Georg A1 - Müller, David A1 - Plöger, Sabrina A1 - Friedrich, Christoph M. A1 - Engels, Christoph A1 - Amirkhanyan, Aragats A1 - van der Walt, Estée A1 - Eloff, J. H. P. A1 - Scheuermann, Bernd A1 - Weinknecht, Elisa ED - Meinel, Christoph ED - Polze, Andreas ED - Oswald, Gerhard ED - Strotmann, Rolf ED - Seibold, Ulrich ED - Schulzki, Bernhard T1 - HPI Future SOC Lab BT - Proceedings 2015 N2 - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2015 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 15. April 2015 und 4. November 2015 im Rahmen der Future SOC Lab Tag Veranstaltungen vor. KW - Future SOC Lab KW - Forschungsprojekte KW - Multicore Architekturen KW - In-Memory Technologie KW - Cloud Computing KW - maschinelles Lernen KW - künstliche Intelligenz Y1 - 2017 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-102516 ER -