Integrative biomarker detection on high-dimensional gene expression data sets
- Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approachesGene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.…
Author details: | Cindy PerscheidORCiDGND |
---|---|
DOI: | https://doi.org/10.1093/bib/bbaa151 |
ISSN: | 1467-5463 |
ISSN: | 1477-4054 |
Pubmed ID: | https://pubmed.ncbi.nlm.nih.gov/32761115 |
Title of parent work (English): | Briefings in bioinformatics |
Subtitle (English): | a survey on prior knowledge approaches |
Publisher: | Oxford Univ. Press |
Place of publishing: | Oxford |
Publication type: | Article |
Language: | English |
Date of first publication: | 2021/08/06 |
Publication year: | 2021 |
Release date: | 2023/01/06 |
Tag: | biomarker detection; expression; external knowledge bases; gene; gene selection; prior knowledge |
Volume: | 22 |
Issue: | 3 |
Article number: | bbaa151 |
Number of pages: | 18 |
Funding institution: | Hasso Plattner Institute. |
Organizational units: | An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH |
DDC classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie | |
Peer review: | Referiert |