TY  - JOUR
A1  - Steinfath, Matthias
A1  - Gärtner, Tanja
A1  - Lisec, Jan
A1  - Meyer, Rhonda Christiane
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers
JF  - Theoretical and applied genetics : TAG ; international journal of plant breeding research
N2  - A recombinant inbred line (RIL) population, derived from two Arabidopsis thaliana accessions, and the corresponding testcrosses with these two original accessions were used for the development and validation of machine learning models to predict the biomass of hybrids. Genetic and metabolic information of the RILs served as predictors. Feature selection reduced the number of variables (genetic and metabolic markers) in the models by more than 80% without impairing the predictive power. Thus, potential biomarkers have been revealed. Metabolites were shown to bear information on inherited macroscopic phenotypes. This proof of concept could be interesting for breeders. The example population exhibits substantial mid-parent biomass heterosis. The results of feature selection could therefore be used to shed light on the origin of heterosis. In this respect, mainly dominance effects were detected.
KW  - Quantitative Trait Locus
KW  - feature selection
KW  - Partial Little Square
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
Y1  - 2009
U6  - https://doi.org/10.1007/s00122-009-1191-2
SN  - 0040-5752
SN  - 1432-2242
VL  - 120
SP  - 239
EP  - 247
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Cope, Justin L.
A1  - Baukmann, Hannes A.
A1  - Klinger, Jörn E.
A1  - Ravarani, Charles N. J.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
A1  - Schmidt, Marco F.
T1  - Interaction-based feature selection algorithm outperforms polygenic risk score in predicting Parkinson’s Disease status
JF  - Frontiers in genetics
N2  - Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson's Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson's disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem.
KW  - epistasis
KW  - machine learning
KW  - feature selection
KW  - parkinson's disease
KW  - PPMI (parkinson's progression markers initiative)
Y1  - 2021
U6  - https://doi.org/10.3389/fgene.2021.744557
SN  - 1664-8021
VL  - 12
PB  - Frontiers Media
CY  - Lausanne
ER  -