publish.UP Search

Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data (2020)

Razaghi-Moghadam, Zahra ; Nikoloski, Zoran

Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data fromEscherichia coliandSaccharomyces cerevisiaeas well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs fromE. coliandS. cerevisiaeto validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein-protein and protein-metabolite interactions.

Supervised learning of gene regulatory networks (2020)

Razaghi-Moghadam, Zahra ; Nikoloski, Zoran

Identifying the entirety of gene regulatory interactions in a biological system offers the possibility to determine the key molecular factors that affect important traits on the level of cells, tissues, and whole organisms. Despite the development of experimental approaches and technologies for identification of direct binding of transcription factors (TFs) to promoter regions of downstream target genes, computational approaches that utilize large compendia of transcriptomics data are still the predominant methods used to predict direct downstream targets of TFs, and thus reconstruct genome-wide gene-regulatory networks (GRNs). These approaches can broadly be categorized into unsupervised and supervised, based on whether data about known, experimentally verified gene-regulatory interactions are used in the process of reconstructing the underlying GRN. Here, we first describe the generic steps of supervised approaches for GRN reconstruction, since they have been recently shown to result in improved accuracy of the resulting networks? We also illustrate how they can be used with data from model organisms to obtain more accurate prediction of gene regulatory interactions.

Machine learning approaches for crop improvement (2020)

Tong, Hao ; Nikoloski, Zoran

Highly efficient and accurate selection of elite genotypes can lead to dramatic shortening of the breeding cycle in major crops relevant for sustaining present demands for food, feed, and fuel. In contrast to classical approaches that emphasize the need for resource-intensive phenotyping at all stages of artificial selection, genomic selection dramatically reduces the need for phenotyping. Genomic selection relies on advances in machine learning and the availability of genotyping data to predict agronomically relevant phenotypic traits. Here we provide a systematic review of machine learning approaches applied for genomic selection of single and multiple traits in major crops in the past decade. We emphasize the need to gather data on intermediate phenotypes, e.g. metabolite, protein, and gene expression levels, along with developments of modeling techniques that can lead to further improvements of genomic selection. In addition, we provide a critical view of factors that affect genomic selection, with attention to transferability of models between different environments. Finally, we highlight the future aspects of integrating high-throughput molecular phenotypic data from omics technologies with biological networks for crop improvement.

Integrating molecular markers into metabolic models improves genomic selection for Arabidopsis growth (2020)

Tong, Hao ; Küken, Anika ; Nikoloski, Zoran

The current trends of crop yield improvements are not expected to meet the projected rise in demand. Genomic selection uses molecular markers and machine learning to identify superior genotypes with improved traits, such as growth. Plant growth directly depends on rates of metabolic reactions which transform nutrients into the building blocks of biomass. Here, we predict growth of Arabidopsis thaliana accessions by employing genomic prediction of reaction rates estimated from accession-specific metabolic models. We demonstrate that, comparing to classical genomic selection on the available data sets for 67 accessions, our approach improves the prediction accuracy for growth within and across nitrogen environments by 32.6% and 51.4%, respectively, and from optimal nitrogen to low carbon environment by 50.4%. Therefore, integration of molecular markers into metabolic models offers an approach to predict traits directly related to metabolism, and its usefulness in breeding can be examined by gathering matching datasets in crops. An increase in genomic selection (GS) accuracy can accelerate genetic gain by shortening the breeding cycles. Here, the authors introduce a network-based GS method that uses metabolic models and improves the prediction accuracy of Arabidopsis growth within and across environments.

GeneReg (2020)

Razaghi-Moghadam, Zahra ; Nikoloski, Zoran

Motivation Large-scale metabolic models are widely used to design metabolic engineering strategies for diverse biotechnological applications. However, the existing computational approaches focus on alteration of reaction fluxes and often neglect the manipulations of gene expression to implement these strategies. Results Here, we find that the association of genes with multiple reactions leads to infeasibility of engineering strategies at the flux level, since they require contradicting manipulations of gene expression. Moreover, we identify that all of the existing approaches to design gene knockout strategies do not ensure that the resulting design may also require other gene alterations, such as up- or downregulations, to match the desired flux distribution. To address these issues, we propose a constraint-based approach, termed GeneReg, that facilitates the design of feasible metabolic engineering strategies at the gene level and that is readily applicable to large-scale metabolic networks. We show that GeneReg can identify feasible strategies to overproduce ethanol in Escherichia coli and lactate in Saccharomyces cerevisiae, but overproduction of the TCA cycle intermediates is not feasible in five organisms used as cell factories under default growth conditions. Therefore, GeneReg points at the need to couple gene regulation and metabolism to design rational metabolic engineering strategies.

CytoSeg 2.0 (2020)

Nowak, Jacqueline ; Gennermann, Kristin ; Persson, Staffan ; Nikoloski, Zoran

Motivation: Actin filaments (AFs) are dynamic structures that substantially change their organization over time. The dynamic behavior and the relatively low signal-to-noise ratio during live-cell imaging have rendered the quantification of the actin organization a difficult task. Results: We developed an automated image-based framework that extracts AFs from fluorescence microscopy images and represents them as networks, which are automatically analyzed to identify and compare biologically relevant features. Although the source code is freely available, we have now implemented the framework into a graphical user interface that can be installed as a Fiji plugin, thus enabling easy access by the research community.

Comparative analysis of ROS network genes in extremophile Eukaryotes (2020)

Lyall, Rafe ; Nikoloski, Zoran ; Gechev, Tsanko

The reactive oxygen species (ROS) gene network, consisting of both ROS-generating and detoxifying enzymes, adjusts ROS levels in response to various stimuli. We performed a cross-kingdom comparison of ROS gene networks to investigate how they have evolved across all Eukaryotes, including protists, fungi, plants and animals. We included the genomes of 16 extremotolerant Eukaryotes to gain insight into ROS gene evolution in organisms that experience extreme stress conditions. Our analysis focused on ROS genes found in all Eukaryotes (such as catalases, superoxide dismutases, glutathione reductases, peroxidases and glutathione peroxidase/peroxiredoxins) as well as those specific to certain groups, such as ascorbate peroxidases, dehydroascorbate/monodehydroascorbate reductases in plants and other photosynthetic organisms. ROS-producing NADPH oxidases (NOX) were found in most multicellular organisms, although several NOX-like genes were identified in unicellular or filamentous species. However, despite the extreme conditions experienced by extremophile species, we found no evidence for expansion of ROS-related gene families in these species compared to other Eukaryotes. Tardigrades and rotifers do show ROS gene expansions that could be related to their extreme lifestyles, although a high rate of lineage-specific horizontal gene transfer events, coupled with recent tetraploidy in rotifers, could explain this observation. This suggests that the basal Eukaryotic ROS scavenging systems are sufficient to maintain ROS homeostasis even under the most extreme conditions.

Characterization of maximal enzyme catalytic rates in central metabolism of Arabidopsis thaliana (2020)

Küken, Anika ; Gennermann, Kristin ; Nikoloski, Zoran

Availability of plant-specific enzyme kinetic data is scarce, limiting the predictive power of metabolic models and precluding identification of genetic factors of enzyme properties. Enzyme kinetic data are measuredin vitro, often under non-physiological conditions, and conclusions elicited from modeling warrant caution. Here we estimate maximalin vivocatalytic rates for 168 plant enzymes, including photosystems I and II, cytochrome-b6f complex, ATP-citrate synthase, sucrose-phosphate synthase as well as enzymes from amino acid synthesis with previously undocumented enzyme kinetic data in BRENDA. The estimations are obtained by integrating condition-specific quantitative proteomics data, maximal rates of selected enzymes, growth measurements fromArabidopsis thalianarosette with and fluxes through canonical pathways in a constraint-based model of leaf metabolism. In comparison to findings inEscherichia coli, we demonstrate weaker concordance between the plant-specificin vitroandin vivoenzyme catalytic rates due to a low degree of enzyme saturation. This is supported by the finding that concentrations of nicotinamide adenine dinucleotide (phosphate), adenosine triphosphate and uridine triphosphate, calculated based on our maximalin vivocatalytic rates, and available quantitative metabolomics data are below reportedKMvalues and, therefore, indicate undersaturation of respective enzymes. Our findings show that genome-wide profiling of enzyme kinetic properties is feasible in plants, paving the way for understanding resource allocation.

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

8 search hits