TY  - JOUR
A1  - Perscheid, Cindy
T1  - Integrative biomarker detection on high-dimensional gene expression data sets
BT  - a survey on prior knowledge approaches
JF  - Briefings in bioinformatics
N2  - Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
KW  - gene selection
KW  - external knowledge bases
KW  - biomarker detection
KW  - gene
KW  - expression
KW  - prior knowledge
Y1  - 2021
U6  - https://doi.org/10.1093/bib/bbaa151
SN  - 1467-5463
SN  - 1477-4054
VL  - 22
IS  - 3
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Perscheid, Cindy
T1  - Comprior
BT  - Facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets
JF  - BMC Bioinformatics
N2  - Background
Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness.

Results
We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance.

Conclusion
Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness
KW  - Feature selection
KW  - Prior knowledge
KW  - Gene expression
KW  - Reproducible benchmarking
Y1  - 2021
U6  - https://doi.org/10.1186/s12859-021-04308-z
SN  - 1471-2105
VL  - 22
SP  - 1
EP  - 15
PB  - Springer Nature
CY  - London
ER  -