TY  - THES
A1  - Perscheid, Cindy
T1  - Integrative biomarker detection using prior knowledge on gene expression data sets
T1  - Integrative Biomarker-Erkennung auf Genexpressions-Daten mithilfe von biologischem Vorwissen
N2  - Gene expression data is analyzed to identify biomarkers, e.g. relevant genes, which serve for diagnostic, predictive, or prognostic use. Traditional approaches for biomarker detection select distinctive features from the data based exclusively on the signals therein, facing multiple shortcomings in regards to overfitting, biomarker robustness, and actual biological relevance. Prior knowledge approaches are expected to address these issues by incorporating prior biological knowledge, e.g. on gene-disease associations, into the actual analysis. However, prior knowledge approaches are currently not widely applied in practice because they are often use-case specific and seldom applicable in a different scope. This leads to a lack of comparability of prior knowledge approaches, which in turn makes it currently impossible to assess their effectiveness in a broader context.

Our work addresses the aforementioned issues with three contributions. Our first contribution provides formal definitions for both prior knowledge and the flexible integration thereof into the feature selection process. Central to these concepts is the automatic retrieval of prior knowledge from online knowledge bases, which allows for streamlining the retrieval process and agreeing on a uniform definition for prior knowledge. We subsequently describe novel and generalized prior knowledge approaches that are flexible regarding the used prior knowledge and applicable to varying use case domains. Our second contribution is the benchmarking platform Comprior. Comprior applies the aforementioned concepts in practice and allows for flexibly setting up comprehensive benchmarking studies for examining the performance of existing and novel prior knowledge approaches. It streamlines the retrieval of prior knowledge and allows for combining it with prior knowledge approaches. Comprior demonstrates the practical applicability of our concepts and further fosters the overall development and comparability of prior knowledge approaches. Our third contribution is a comprehensive case study on the effectiveness of prior knowledge approaches. For that, we used Comprior and tested a broad range of both traditional and prior knowledge approaches in combination with multiple knowledge bases on data sets from multiple disease domains. Ultimately, our case study constitutes a thorough assessment of a) the suitability of selected knowledge bases for integration, b) the impact of prior knowledge being applied at different integration levels, and c) the improvements in terms of classification performance, biological relevance, and overall robustness.

In summary, our contributions demonstrate that generalized concepts for prior knowledge and a streamlined retrieval process improve the applicability of prior knowledge approaches. Results from our case study show that the integration of prior knowledge positively affects biomarker results, particularly regarding their robustness. Our findings provide the first in-depth insights on the effectiveness of prior knowledge approaches and build a valuable foundation for future research.
N2  - Biomarker sind charakteristische biologische Merkmale mit diagnostischer oder prognostischer Aussagekraft. Auf der molekularen Ebene sind dies Gene mit einem krankheitsspezifischen Expressionsmuster, welche mittels der Analyse von Genexpressionsdaten identifiziert werden. Traditionelle Ansätze für diese Art von Biomarker Detection wählen Gene als Biomarker ausschließlich anhand der vorhandenen Signale im Datensatz aus. Diese Vorgehensweise zeigt jedoch Schwächen insbesondere in Bezug auf die Robustheit und tatsächliche biologische Relevanz der identifizierten Biomarker. Verschiedene Forschungsarbeiten legen nahe, dass die Berücksichtigung des biologischen Kontexts während des Selektionsprozesses diese Schwächen ausgleichen kann. Sogenannte wissensbasierte Ansätze für Biomarker Detection beziehen vorhandenes biologisches Wissen, beispielsweise über Zusammenhänge zwischen bestimmten Genen und Krankheiten, direkt in die Analyse mit ein. Die Anwendung solcher Verfahren ist in der Praxis jedoch derzeit nicht weit verbreitet, da existierende Methoden oft spezifisch für einen bestimmten Anwendungsfall entwickelt wurden und sich nur mit großem Aufwand auf andere Anwendungsgebiete übertragen lassen. Dadurch sind Vergleiche untereinander kaum möglich, was es wiederum nicht erlaubt die Effektivität von wissensbasierten Methoden in einem breiteren Kontext zu untersuchen.

Die vorliegende Arbeit befasst sich mit den vorgenannten Herausforderungen für wissensbasierte Ansätze. In einem ersten Schritt legen wir formale und einheitliche Definitionen für vorhandenes biologisches Wissen sowie ihre flexible Integration in den Biomarker-Auswahlprozess fest. Der Kerngedanke unseres Ansatzes ist die automatisierte Beschaffung von biologischem Wissen aus im Internet frei verfügbaren Wissens-Datenbanken. Dies erlaubt eine Vereinfachung der Kuratierung sowie die Festlegung einer einheitlichen Definition für biologisches Wissen. Darauf aufbauend beschreiben wir generalisierte wissensbasierte Verfahren, welche flexibel auf verschiedene Anwendungsfalle anwendbar sind. In einem zweiten Schritt haben wir die Benchmarking-Plattform Comprior entwickelt, welche unsere theoretischen Konzepte in einer praktischen Anwendung realisiert. Comprior ermöglicht die schnelle Umsetzung von umfangreichen Experimenten für den Vergleich von wissensbasierten Ansätzen. Comprior übernimmt die Beschaffung von biologischem Wissen und ermöglicht dessen beliebige Kombination mit wissensbasierten Ansätzen. Comprior demonstriert damit die praktische Umsetzbarkeit unserer theoretischen Konzepte und unterstützt zudem die technische Realisierung und Vergleichbarkeit wissensbasierter Ansätze. In einem dritten Schritt untersuchen wir die Effektivität wissensbasierter Ansätze im Rahmen einer umfangreichen Fallstudie. Mithilfe von Comprior vergleichen wir die Ergebnisse traditioneller und wissensbasierter Ansätze im Kontext verschiedener Krankheiten, wobei wir für wissensbasierte Ansätze auch verschiedene Wissens-Datenbanken verwenden. Unsere Fallstudie untersucht damit a) die Eignung von ausgewählten Wissens-Datenbanken für deren Einsatz bei wissensbasierten Ansätzen, b) den Einfluss verschiedener Integrationskonzepte für biologisches Wissen auf den Biomarker-Auswahlprozess, und c) den Grad der Verbesserung in Bezug auf die Klassifikationsleistung, biologische Relevanz und allgemeine Robustheit der selektierten Biomarker.

Zusammenfassend demonstriert unsere Arbeit, dass generalisierte Konzepte für biologisches Wissen und dessen vereinfachte Kuration die praktische Anwendbarkeit von wissensbasierten Ansätzen erleichtern. Die Ergebnisse unserer Fallstudie zeigen, dass die Integration von vorhandenem biologischen Wissen einen positiven Einfluss auf die selektierten Biomarker hat, insbesondere in Bezug auf ihre biologische Relevanz. Diese erstmals umfassenderen Erkenntnisse zur Effektivität von wissensbasierten Ansätzen bilden eine wertvolle Grundlage für zukünftige Forschungsarbeiten.
KW  - gene expression
KW  - biomarker detection
KW  - prior knowledge
KW  - feature selection
KW  - Biomarker-Erkennung
KW  - Merkmalsauswahl
KW  - Gen-Expression
KW  - biologisches Vorwissen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-582418
ER  - 
TY  - GEN
A1  - Perscheid, Cindy
A1  - Faber, Lukas
A1  - Kraus, Milena
A1  - Arndt, Paul
A1  - Janke, Michael
A1  - Rehfeldt, Sebastian
A1  - Schubotz, Antje
A1  - Slosarek, Tamara
A1  - Uflacker, Matthias
T1  - A tissue-aware gene selection approach for analyzing multi-tissue gene expression data
T2  - 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
N2  - High-throughput RNA sequencing (RNAseq) produces large data sets containing expression levels of thousands of genes. The analysis of RNAseq data leads to a better understanding of gene functions and interactions, which eventually helps to study diseases like cancer and develop effective treatments. Large-scale RNAseq expression studies on cancer comprise samples from multiple cancer types and aim to identify their distinct molecular characteristics. Analyzing samples from different cancer types implies analyzing samples from different tissue origin. Such multi-tissue RNAseq data sets require a meaningful analysis that accounts for the inherent tissue-related bias: The identified characteristics must not originate from the differences in tissue types, but from the actual differences in cancer types. However, current analysis procedures do not incorporate that aspect. As a result, we propose to integrate a tissue-awareness into the analysis of multi-tissue RNAseq data. We introduce an extension for gene selection that provides a tissue-wise context for every gene and can be flexibly combined with any existing gene selection approach. We suggest to expand conventional evaluation by additional metrics that are sensitive to the tissue-related bias. Evaluations show that especially low complexity gene selection approaches profit from introducing tissue-awareness.
KW  - RNAseq
KW  - gene selection
KW  - tissue-awareness
KW  - TCGA
KW  - GTEx
Y1  - 2018
SN  - 978-1-5386-5488-0
U6  - https://doi.org/10.1109/BIBM.2018.8621189
SN  - 2156-1125
SN  - 2156-1133
SP  - 2159
EP  - 2166
PB  - IEEE
CY  - New York
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Integrative biomarker detection on high-dimensional gene expression data sets
BT  - a survey on prior knowledge approaches
JF  - Briefings in bioinformatics
N2  - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
KW  - gene selection
KW  - external knowledge bases
KW  - biomarker detection
KW  - gene
KW  - expression
KW  - prior knowledge
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbaa151
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
A1  - Grasnick, Bastien
A1  - Uflacker, Matthias
T1  - Integrative Gene Selection on Gene Expression Data
BT  - Providing Biological Context to Traditional Approaches
JF  - Journal of Integrative Bioinformatics
N2  - The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.
KW  - Gene Expression Data Analysis
KW  - Integrative Gene Selection
KW  - Pattern Recognition
KW  - Prior Knowledge
KW  - Knowledge Bases
Y1  - 2019
U6  - https://doi.org/10.1515/jib-2018-0064
SN  - 1613-4516
VL  - 16
IS  - 1
PB  - De Gruyter
CY  - Berlin
ER  - 
TY  - GEN
A1  - Perscheid, Cindy
A1  - Uflacker, Matthias
T1  - Integrating Biological Context into the Analysis of Gene Expression Data
T2  - Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference
N2  - High-throughput RNA sequencing produces large gene expression datasets whose analysis leads to a better understanding of diseases like cancer. The nature of RNA-Seq data poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes. Researchers apply traditional machine learning approaches, e. g. hierarchical clustering, to analyze this data. Until it comes to validation of the results, the analysis is based on the provided data only and completely misses the biological context. However, gene expression data follows particular patterns - the underlying biological processes. In our research, we aim to integrate the available biological knowledge earlier in the analysis process. We want to adapt state-of-the-art data mining algorithms to consider the biological context in their computations and deliver meaningful results for researchers.
KW  - Gene expression
KW  - Machine learning
KW  - Feature selection
KW  - Association rule mining
KW  - Biclustering
KW  - Knowledge bases
Y1  - 2019
SN  - 978-3-319-99608-0
SN  - 978-3-319-99607-3
U6  - https://doi.org/10.1007/978-3-319-99608-0_41
SN  - 2194-5357
SN  - 2194-5365
VL  - 801
SP  - 339
EP  - 343
PB  - Springer
CY  - Cham
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Comprior
BT  - Facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets
JF  - BMC Bioinformatics
N2  - Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.

Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.

Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
KW  - Feature selection
KW  - Prior knowledge
KW  - Gene expression
KW  - Reproducible benchmarking
Y1  - 2021
U6  - https://doi.org/10.1186/s12859-021-04308-z
SN  - 1471-2105
VL  - 22
SP  - 1
EP  - 15
PB  - Springer Nature
CY  - London
ER  - 
TY  - GEN
A1  - Perscheid, Cindy
T1  - Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.

Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.

Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 010 
KW  - Feature selection
KW  - Prior knowledge
KW  - Gene expression
KW  - Reproducible benchmarking
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-548943
SP  - 1
EP  - 15
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - CHAP
A1  - Kurbel, Karl
A1  - Nowak, Dawid
A1  - Azodi, Amir
A1  - Jaeger, David
A1  - Meinel, Christoph
A1  - Cheng, Feng
A1  - Sapegin, Andrey
A1  - Gawron, Marian
A1  - Morelli, Frank
A1  - Stahl, Lukas
A1  - Kerl, Stefan
A1  - Janz, Mariska
A1  - Hadaya, Abdulmasih
A1  - Ivanov, Ivaylo
A1  - Wiese, Lena
A1  - Neves, Mariana
A1  - Schapranow, Matthieu-Patrick
A1  - Fähnrich, Cindy
A1  - Feinbube, Frank
A1  - Eberhardt, Felix
A1  - Hagen, Wieland
A1  - Plauth, Max
A1  - Herscheid, Lena
A1  - Polze, Andreas
A1  - Barkowsky, Matthias
A1  - Dinger, Henriette
A1  - Faber, Lukas
A1  - Montenegro, Felix
A1  - Czachórski, Tadeusz
A1  - Nycz, Monika
A1  - Nycz, Tomasz
A1  - Baader, Galina
A1  - Besner, Veronika
A1  - Hecht, Sonja
A1  - Schermann, Michael
A1  - Krcmar, Helmut
A1  - Wiradarma, Timur Pratama
A1  - Hentschel, Christian
A1  - Sack, Harald
A1  - Abramowicz, Witold
A1  - Sokolowska, Wioletta
A1  - Hossa, Tymoteusz
A1  - Opalka, Jakub
A1  - Fabisz, Karol
A1  - Kubaczyk, Mateusz
A1  - Cmil, Milena
A1  - Meng, Tianhui
A1  - Dadashnia, Sharam
A1  - Niesen, Tim
A1  - Fettke, Peter
A1  - Loos, Peter
A1  - Perscheid, Cindy
A1  - Schwarz, Christian
A1  - Schmidt, Christopher
A1  - Scholz, Matthias
A1  - Bock, Nikolai
A1  - Piller, Gunther
A1  - Böhm, Klaus
A1  - Norkus, Oliver
A1  - Clark, Brian
A1  - Friedrich, Björn
A1  - Izadpanah, Babak
A1  - Merkel, Florian
A1  - Schweer, Ilias
A1  - Zimak, Alexander
A1  - Sauer, Jürgen
A1  - Fabian, Benjamin
A1  - Tilch, Georg
A1  - Müller, David
A1  - Plöger, Sabrina
A1  - Friedrich, Christoph M.
A1  - Engels, Christoph
A1  - Amirkhanyan, Aragats
A1  - van der Walt, Estée
A1  - Eloff, J. H. P.
A1  - Scheuermann, Bernd
A1  - Weinknecht, Elisa
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Oswald, Gerhard
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Schulzki, Bernhard
T1  - HPI Future SOC Lab
BT  - Proceedings 2015
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2015 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 15. April 2015 und 4. November 2015 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-102516
ER  -