TY - JOUR A1 - Mbebi, Alain J. A1 - Breitler, Jean-Christophe A1 - Bordeaux, M'elanie A1 - Sulpice, Ronan A1 - McHale, Marcus A1 - Tong, Hao A1 - Toniutti, Lucile A1 - Castillo, Jonny Alonso A1 - Bertrand, Benoit A1 - Nikoloski, Zoran T1 - A comparative analysis of genomic and phenomic predictions of growth-related traits in 3-way coffee hybrids JF - G3: Genes, genomes, genetics N2 - Genomic prediction has revolutionized crop breeding despite remaining issues of transferability of models to unseen environmental conditions and environments. Usage of endophenotypes rather than genomic markers leads to the possibility of building phenomic prediction models that can account, in part, for this challenge. Here, we compare and contrast genomic prediction and phenomic prediction models for 3 growth-related traits, namely, leaf count, tree height, and trunk diameter, from 2 coffee 3-way hybrid populations exposed to a series of treatment-inducing environmental conditions. The models are based on 7 different statistical methods built with genomic markers and ChlF data used as predictors. This comparative analysis demonstrates that the best-performing phenomic prediction models show higher predictability than the best genomic prediction models for the considered traits and environments in the vast majority of comparisons within 3-way hybrid populations. In addition, we show that phenomic prediction models are transferrable between conditions but to a lower extent between populations and we conclude that chlorophyll a fluorescence data can serve as alternative predictors in statistical models of coffee hybrid performance. Future directions will explore their combination with other endophenotypes to further improve the prediction of growth-related traits for crops. KW - genomic prediction KW - phenomic prediction KW - 3-way coffee hybrids KW - chlorophyll a fluorescence KW - GenPred KW - Shared Data Resource Y1 - 2022 U6 - https://doi.org/10.1093/g3journal/jkac170 SN - 2160-1836 VL - 12 IS - 9 PB - Genetics Soc. of America CY - Pittsburgh, PA ER - TY - JOUR A1 - Tong, Hao A1 - Nankar, Amol N. A1 - Liu, Jintao A1 - Todorova, Velichka A1 - Ganeva, Daniela A1 - Grozeva, Stanislava A1 - Tringovska, Ivanka A1 - Pasev, Gancho A1 - Radeva-Ivanova, Vesela A1 - Gechev, Tsanko A1 - Kostova, Dimitrina A1 - Nikoloski, Zoran T1 - Genomic prediction of morphometric and colorimetric traits in Solanaceous fruits JF - Horticulture research N2 - Selection of high-performance lines with respect to traits of interest is a key step in plant breeding. Genomic prediction allows to determine the genomic estimated breeding values of unseen lines for trait of interest using genetic markers, e.g. single-nucleotide polymorphisms (SNPs), and machine learning approaches, which can therefore shorten breeding cycles, referring to genomic selection (GS). Here, we applied GS approaches in two populations of Solanaceous crops, i.e. tomato and pepper, to predict morphometric and colorimetric traits. The traits were measured by using scoring-based conventional descriptors (CDs) as well as by Tomato Analyzer (TA) tool using the longitudinally and latitudinally cut fruit images. The GS performance was assessed in cross-validations of classification-based and regression-based machine learning models for CD and TA traits, respectively. The results showed the usage of TA traits and tag SNPs provide a powerful combination to predict morphology and color-related traits of Solanaceous fruits. The highest predictability of 0.89 was achieved for fruit width in pepper, with an average predictability of 0.69 over all traits. The multi-trait GS models are of slightly better predictability than single-trait models for some colorimetric traits in pepper. While model validation performs poorly on wild tomato accessions, the usage as many as one accession per wild species in the training set can increase the transferability of models to unseen populations for some traits (e.g. fruit shape for which predictability in unseen scenario increased from zero to 0.6). Overall, GS approaches can assist the selection of high-performance Solanaceous fruits in crop breeding. Y1 - 2022 U6 - https://doi.org/10.1093/hr/uhac072 SN - 2052-7276 VL - 9 PB - Oxford Univ. Press CY - Cary ER - TY - JOUR A1 - Tong, Hao A1 - Küken, Anika A1 - Razaghi-Moghadam, Zahra A1 - Nikoloski, Zoran T1 - Characterization of effects of genetic variants via genome-scale metabolic modelling JF - Cellular and molecular life sciences : CMLS N2 - Genome-scale metabolic networks for model plants and crops in combination with approaches from the constraint-based modelling framework have been used to predict metabolic traits and design metabolic engineering strategies for their manipulation. With the advances in technologies to generate large-scale genotyping data from natural diversity panels and other populations, genome-wide association and genomic selection have emerged as statistical approaches to determine genetic variants associated with and predictive of traits. Here, we review recent advances in constraint-based approaches that integrate genetic variants in genome-scale metabolic models to characterize their effects on reaction fluxes. Since some of these approaches have been applied in organisms other than plants, we provide a critical assessment of their applicability particularly in crops. In addition, we further dissect the inferred effects of genetic variants with respect to reaction rate constants, abundances of enzymes, and concentrations of metabolites, as main determinants of reaction fluxes and relate them with their combined effects on complex traits, like growth. Through this systematic review, we also provide a roadmap for future research to increase the predictive power of statistical approaches by coupling them with mechanistic models of metabolism. KW - Single-nucleotide polymorphisms KW - Metabolic models KW - Genome-wide KW - association studies KW - Genomic selection Y1 - 2021 U6 - https://doi.org/10.1007/s00018-021-03844-4 SN - 1420-682X SN - 1420-9071 VL - 78 IS - 12 SP - 5123 EP - 5138 PB - Springer International Publishing AG CY - Cham ER - TY - JOUR A1 - Mbebi, Alain J. A1 - Tong, Hao A1 - Nikoloski, Zoran T1 - L-2,L-1-norm regularized multivariate regression model with applications to genomic prediction JF - Bioinformatics N2 - Motivation: Genomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP). Results: Here, we propose a L-2,L-1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L-2,L-1-joint, applicable in multi-trait GS. The usage of the L-2,L-1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with datasets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model. Y1 - 2021 U6 - https://doi.org/10.1093/bioinformatics/btab212 SN - 1367-4803 SN - 1460-2059 VL - 37 IS - 18 SP - 2896 EP - 2904 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Tong, Hao A1 - Küken, Anika A1 - Nikoloski, Zoran T1 - Integrating molecular markers into metabolic models improves genomic selection for Arabidopsis growth JF - Nature Communications N2 - The current trends of crop yield improvements are not expected to meet the projected rise in demand. Genomic selection uses molecular markers and machine learning to identify superior genotypes with improved traits, such as growth. Plant growth directly depends on rates of metabolic reactions which transform nutrients into the building blocks of biomass. Here, we predict growth of Arabidopsis thaliana accessions by employing genomic prediction of reaction rates estimated from accession-specific metabolic models. We demonstrate that, comparing to classical genomic selection on the available data sets for 67 accessions, our approach improves the prediction accuracy for growth within and across nitrogen environments by 32.6% and 51.4%, respectively, and from optimal nitrogen to low carbon environment by 50.4%. Therefore, integration of molecular markers into metabolic models offers an approach to predict traits directly related to metabolism, and its usefulness in breeding can be examined by gathering matching datasets in crops. An increase in genomic selection (GS) accuracy can accelerate genetic gain by shortening the breeding cycles. Here, the authors introduce a network-based GS method that uses metabolic models and improves the prediction accuracy of Arabidopsis growth within and across environments. Y1 - 2020 U6 - https://doi.org/10.1038/s41467-020-16279-5 SN - 2041-1723 VL - 11 IS - 1 PB - Nature Publishing Group UK CY - London ER - TY - JOUR A1 - Tong, Hao A1 - Nikoloski, Zoran T1 - Machine learning approaches for crop improvement BT - leveraging phenotypic and genotypic big data JF - Journal of plant physiology : biochemistry, physiology, molecular biology and biotechnology of plants N2 - Highly efficient and accurate selection of elite genotypes can lead to dramatic shortening of the breeding cycle in major crops relevant for sustaining present demands for food, feed, and fuel. In contrast to classical approaches that emphasize the need for resource-intensive phenotyping at all stages of artificial selection, genomic selection dramatically reduces the need for phenotyping. Genomic selection relies on advances in machine learning and the availability of genotyping data to predict agronomically relevant phenotypic traits. Here we provide a systematic review of machine learning approaches applied for genomic selection of single and multiple traits in major crops in the past decade. We emphasize the need to gather data on intermediate phenotypes, e.g. metabolite, protein, and gene expression levels, along with developments of modeling techniques that can lead to further improvements of genomic selection. In addition, we provide a critical view of factors that affect genomic selection, with attention to transferability of models between different environments. Finally, we highlight the future aspects of integrating high-throughput molecular phenotypic data from omics technologies with biological networks for crop improvement. KW - genomic selection KW - genomic prediction KW - machine learning KW - multiple KW - traits KW - multi-omics KW - GxE interaction Y1 - 2020 U6 - https://doi.org/10.1016/j.jplph.2020.153354 SN - 0176-1617 SN - 1618-1328 VL - 257 PB - Elsevier CY - München ER - TY - JOUR A1 - Rodriguez Cubillos, Andres Eduardo A1 - Tong, Hao A1 - Alseekh, Saleh A1 - de Abreu e Lima, Francisco Anastacio A1 - Yu, Jing A1 - Fernie, Alisdair R. A1 - Nikoloski, Zoran A1 - Laitinen, Roosa A. E. T1 - Inheritance patterns in metabolism and growth in diallel crosses of Arabidopsis thaliana from a single growth habitat JF - Heredity N2 - Metabolism is a key determinant of plant growth and modulates plant adaptive responses. Increased metabolic variation due to heterozygosity may be beneficial for highly homozygous plants if their progeny is to respond to sudden changes in the habitat. Here, we investigate the extent to which heterozygosity contributes to the variation in metabolism and size of hybrids of Arabidopsis thaliana whose parents are from a single growth habitat. We created full diallel crosses among seven parents, originating from Southern Germany, and analysed the inheritance patterns in primary and secondary metabolism as well as in rosette size in situ. In comparison to primary metabolites, compounds from secondary metabolism were more variable and showed more pronounced non-additive inheritance patterns which could be attributed to epistasis. In addition, we showed that glucosinolates, among other secondary metabolites, were positively correlated with a proxy for plant size. Therefore, our study demonstrates that heterozygosity in local A. thaliana population generates metabolic variation and may impact several tasks directly linked to metabolism. Y1 - 2018 U6 - https://doi.org/10.1038/s41437-017-0030-5 SN - 0018-067X SN - 1365-2540 VL - 120 IS - 5 SP - 463 EP - 473 PB - Nature Publ. Group CY - London ER - TY - THES A1 - Tong, Hao T1 - Dissection of genetic architecture of intermediate phenotypes and predictions in plants N2 - Determining the relationship between genotype and phenotype is the key to understand the plasticity and robustness of phenotypes in nature. While the directly observable plant phenotypes (e.g. agronomic, yield and stress resistance traits) have been well-investigated, there is still a lack in our knowledge about the genetic basis of intermediate phenotypes, such as metabolic phenotypes. Dissecting the links between genotype and phenotype depends on suitable statistical models. The state-of-the-art models are developed for directly observable phenotypes, regardless the characteristics of intermediate phenotypes. This thesis aims to fill the gaps in understanding genetic architecture of intermediate phenotypes, and how they tie to composite traits, namely plant growth. The metabolite levels and reaction fluxes, as two aspects of metabolic phenotypes, are shaped by the interrelated chemical reactions formed in genome-scale metabolic network. Here, I attempt to answer the question: Can the knowledge of underlying genome-scale metabolic network improve the model performance for prediction of metabolic phenotypes and associated plant growth? To this end, two projects are investigated in this thesis. Firstly, we propose an approach that couples genomic selection with genome-scale metabolic network and metabolic profiles in Arabidopsis thaliana to predict growth. This project is the first integration of genomic data with fluxes predicted based on constraint-based modeling framework and data on biomass composition. We demonstrate that our approach leads to a considerable increase of prediction accuracy in comparison to the state-of-the-art methods in both within and across environment predictions. Therefore, our work paves the way for combining knowledge on metabolic mechanisms in the statistical approach underlying genomic selection to increase the efficiency of future plant breeding approaches. Secondly, we investigate how reliable is genomic selection for metabolite levels, and which single nucleotide polymorphisms (SNPs), obtained from different neighborhoods of a given metabolic network, contribute most to the accuracy of prediction. The results show that the local structure of first and second neighborhoods are not sufficient for predicting the genetic basis of metabolite levels in Zea mays. Furthermore, we find that the enzymatic SNPs can capture most the genetic variance and the contribution of non-enzymatic SNPs is in fact small. To comprehensively understand the genetic architecture of metabolic phenotypes, I extend my study to a local Arabidopsis thaliana population and their hybrids. We analyze the genetic architecture in primary and secondary metabolism as well as in growth. In comparison to primary metabolites, compounds from secondary metabolism were more variable and show more non-additive inheritance patterns which could be attributed to epistasis. Therefore, our study demonstrates that heterozygosity in local Arabidopsis thaliana population generates metabolic variation and may impact several tasks directly linked to metabolism. The studies in this thesis improve the knowledge of genetic architecture of metabolic phenotypes in both inbreed and hybrid population. The approaches I proposed to integrate genome-scale metabolic network with genomic data provide the opportunity to obtain mechanistic insights about the determinants of agronomically important polygenic traits. Y1 - 2019 ER -