Refine
Year of publication
Document Type
- Article (91)
- Postprint (8)
- Review (6)
- Other (2)
- Part of Periodical (1)
Language
- English (108)
Is part of the Bibliography
- yes (108)
Keywords
- Arabidopsis thaliana (9)
- Network clustering (5)
- Metabolic networks (3)
- Protein complexes (3)
- Species comparison (3)
- respiration (3)
- Ascophyllum nodosum (2)
- Coherent partition (2)
- Graph partitions (2)
- GxE interaction (2)
The integration of experimental data into genome-scale metabolic models can greatly improve flux predictions. This is achieved by restricting predictions to a more realistic context-specific domain, like a particular cell or tissue type. Several computational approaches to integrate data have been proposed D generally obtaining context-specific (sub) models or flux distributions. However, these approaches may lead to a multitude of equally valid but potentially different models or flux distributions, due to possible alternative optima in the underlying optimization problems. Although this issue introduces ambiguity in context-specific predictions, it has not been generally recognized, especially in the case of model reconstructions. In this study, we analyze the impact of alternative optima in four state-of-the-art context-specific data integration approaches, providing both flux distributions and/or metabolic models. To this end, we present three computational methods and apply them to two particular case studies: leaf-specific predictions from the integration of gene expression data in a metabolic model of Arabidopsis thaliana, and liver-specific reconstructions derived from a human model with various experimental data sources. The application of these methods allows us to obtain the following results: (i) we sample the space of alternative flux distributions in the leaf-and the liver-specific case and quantify the ambiguity of the predictions. In addition, we show how the inclusion of l(1)-regularization during data integration reduces the ambiguity in both cases. (ii) We generate sets of alternative leaf-and liver-specific models that are optimal to each one of the evaluated model reconstruction approaches. We demonstrate that alternative models of the same context contain a marked fraction of disparate reactions. Further, we show that a careful balance between model sparsity and metabolic functionality helps in reducing the discrepancies between alternative models. Finally, our findings indicate that alternative optima must be taken into account for rendering the context-specific metabolic model predictions less ambiguous.
Whole-genome duplications (WGDs) or polyploidy events have been studied extensively in plants. In a now widely cited paper, Jiao et al. presented evidence for two ancient, ancestral plant WGDs predating the origin of flowering and seed plants, respectively. This finding was based primarily on a bimodal age distribution of gene duplication events obtained from molecular dating of almost 800 phylogenetic gene trees. We reanalyzed the phylogenomic data of Jiao et al. and found that the strong bimodality of the age distribution may be the result of technical and methodological issues and may hence not be a "true" signal of two WGD events. By using a state-of-the-art molecular dating algorithm, we demonstrate that the reported bimodal age distribution is not robust and should be interpreted with caution. Thus, there exists little evidence for two ancient WGDs in plants from phylogenomic dating.
The actin cytoskeleton is an essential intracellular filamentous structure that underpins cellular transport and cytoplasmic streaming in plant cells. However, the system-level properties of actin-based cellular trafficking remain tenuous, largely due to the inability to quantify key features of the actin cytoskeleton. Here, we developed an automated image-based, network-driven framework to accurately segment and quantify actin cytoskeletal structures and Golgi transport. We show that the actin cytoskeleton in both growing and elongated hypocotyl cells has structural properties facilitating efficient transport. Our findings suggest that the erratic movement of Golgi is a stable cellular phenomenon that might optimize distribution efficiency of cell material. Moreover, we demonstrate that Golgi transport in hypocotyl cells can be accurately predicted from the actin network topology alone. Thus, our framework provides quantitative evidence for system-wide coordination of cellular transport in plant cells and can be readily applied to investigate cytoskeletal organization and transport in other organisms.
Background: Biological systems adapt to changing environments by reorganizing their cellula r and physiological program with metabolites representing one important response level. Different stresses lead to both conserved and specific responses on the metabolite level which should be reflected in the underl ying metabolic network. Methodology/Principal Findings: Starting from experimental data obtained by a GC-MS based high-throughput metabolic profiling technology we here develop an approach that: (1) extracts network representations from metabolic conditiondependent data by using pairwise correlations, (2) determines the sets of stable and condition-dependent correlations based on a combination of statistical significance and homogeneity tests, and (3) can identify metabolites related to the stress response, which goes beyond simple ob servation s about the changes of metabolic concentrations. The approach was tested with Escherichia colias a model organism observed under four different environmental stress conditions (cold stress, heat stress, oxidative stress, lactose diau xie) and control unperturbed conditions. By constructing the stable network component, which displays a scale free topology and small-world characteristics, we demonstrated that: (1) metabolite hubs in this reconstructed correlation networks are significantly enriched for those contained in biochemical networks such as EcoCyc, (2) particular components of the stable network are enriched for functionally related biochemical path ways, and (3) ind ependently of the response scale, based on their importance in the reorganization of the cor relation network a set of metabolites can be identified which represent hypothetical candidates for adjusting to a stress-specific response. Conclusions/Significance: Network-based tools allowed the identification of stress-dependent and general metabolic correlation networks. This correlation-network-ba sed approach does not rely on major changes in concentration to identify metabolites important for st ress adaptation, but rather on the changes in network properties with respect to metabolites. This should represent a useful complementary technique in addition to more classical approaches.
The reactive oxygen species (ROS) gene network, consisting of both ROS-generating and detoxifying enzymes, adjusts ROS levels in response to various stimuli. We performed a cross-kingdom comparison of ROS gene networks to investigate how they have evolved across all Eukaryotes, including protists, fungi, plants and animals. We included the genomes of 16 extremotolerant Eukaryotes to gain insight into ROS gene evolution in organisms that experience extreme stress conditions. Our analysis focused on ROS genes found in all Eukaryotes (such as catalases, superoxide dismutases, glutathione reductases, peroxidases and glutathione peroxidase/peroxiredoxins) as well as those specific to certain groups, such as ascorbate peroxidases, dehydroascorbate/monodehydroascorbate reductases in plants and other photosynthetic organisms. ROS-producing NADPH oxidases (NOX) were found in most multicellular organisms, although several NOX-like genes were identified in unicellular or filamentous species. However, despite the extreme conditions experienced by extremophile species, we found no evidence for expansion of ROS-related gene families in these species compared to other Eukaryotes. Tardigrades and rotifers do show ROS gene expansions that could be related to their extreme lifestyles, although a high rate of lineage-specific horizontal gene transfer events, coupled with recent tetraploidy in rotifers, could explain this observation. This suggests that the basal Eukaryotic ROS scavenging systems are sufficient to maintain ROS homeostasis even under the most extreme conditions.
Ribosome biogenesis is tightly associated to plant metabolism due to the usage of ribosomes in the synthesis of proteins necessary to drive metabolic pathways. Given the central role of ribosome biogenesis in cell physiology, it is important to characterize the impact of different components involved in this process on plant metabolism. Double mutants of the Arabidopsis thaliana cytosolic 60S maturation factors REIL1 and REIL2 do not resume growth after shift to moderate 10 degrees C chilling conditions. To gain mechanistic insights into the metabolic effects of this ribosome biogenesis defect on metabolism, we developed TC-iReMet2, a constraint-based modelling approach that integrates relative metabolomics and transcriptomics time-course data to predict differential fluxes on a genome-scale level. We employed TC-iReMet2 with metabolomics and transcriptomics data from the Arabidopsis Columbia 0 wild type and the reil1-1 reil2-1 double mutant before and after cold shift. We identified reactions and pathways that are highly altered in a mutant relative to the wild type. These pathways include the Calvin-Benson cycle, photorespiration, gluconeogenesis, and glycolysis. Our findings also indicated differential NAD(P)/NAD(P)H ratios after cold shift. TC-iReMet2 allows for mechanistic hypothesis generation and interpretation of system biology experiments related to metabolic fluxes on a genome-scale level.
Reaction lumping in metabolic networks for application with thermodynamic metabolic flux analysis
(2021)
Thermodynamic metabolic flux analysis (TMFA) can narrow down the space of steady-state flux distributions, but requires knowledge of the standard Gibbs free energy for the modelled reactions. The latter are often not available due to unknown Gibbs free energy change of formation ,Delta fG0, of metabolites. To optimize the usage of data on thermodynamics in constraining a model, reaction lumping has been proposed to eliminate metabolites with unknown Delta fG0. However, the lumping procedure has not been formalized nor implemented for systematic identification of lumped reactions. Here, we propose, implement, and test a combined procedure for reaction lumping, applicable to genome-scale metabolic models. It is based on identification of groups of metabolites with unknown Delta fG0 whose elimination can be conducted independently of the others via: (1) group implementation, aiming to eliminate an entire such group, and, if this is infeasible, (2) a sequential implementation to ensure that a maximal number of metabolites with unknown Delta fG0 are eliminated. Our comparative analysis with genome-scale metabolic models of Escherichia coli, Bacillus subtilis, and Homo sapiens shows that the combined procedure provides an efficient means for systematic identification of lumped reactions. We also demonstrate that TMFA applied to models with reactions lumped according to the proposed procedure lead to more precise predictions in comparison to the original models. The provided implementation thus ensures the reproducibility of the findings and their application with standard TMFA.
L-2,L-1-norm regularized multivariate regression model with applications to genomic prediction
(2021)
Motivation:
Genomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP).
Results:
Here, we propose a L-2,L-1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L-2,L-1-joint, applicable in multi-trait GS. The usage of the L-2,L-1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with datasets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model.
Large-scale biochemical models are of increasing sizes due to the consideration of interacting organisms and tissues. Model reduction approaches that preserve the flux phenotypes can simplify the analysis and predictions of steady-state metabolic phenotypes. However, existing approaches either restrict functionality of reduced models or do not lead to significant decreases in the number of modelled metabolites. Here, we introduce an approach for model reduction based on the structural property of balancing of complexes that preserves the steady-state fluxes supported by the network and can be efficiently determined at genome scale. Using two large-scale mass-action kinetic models of Escherichia coli, we show that our approach results in a substantial reduction of 99% of metabolites. Applications to genome-scale metabolic models across kingdoms of life result in up to 55% and 85% reduction in the number of metabolites when arbitrary and mass-action kinetics is assumed, respectively. We also show that predictions of the specific growth rate from the reduced models match those based on the original models. Since steady-state flux phenotypes from the original model are preserved in the reduced, the approach paves the way for analysing other metabolic phenotypes in large-scale biochemical networks.
CytoSeg 2.0
(2020)
Motivation:
Actin filaments (AFs) are dynamic structures that substantially change their organization over time. The dynamic behavior and the relatively low signal-to-noise ratio during live-cell imaging have rendered the quantification of the actin organization a difficult task.
Results:
We developed an automated image-based framework that extracts AFs from fluorescence microscopy images and represents them as networks, which are automatically analyzed to identify and compare biologically relevant features. Although the source code is freely available, we have now implemented the framework into a graphical user interface that can be installed as a Fiji plugin, thus enabling easy access by the research community.
COMMIT
(2022)
Composition and functions of microbial communities affect important traits in diverse hosts, from crops to humans. Yet, mechanistic understanding of how metabolism of individual microbes is affected by the community composition and metabolite leakage is lacking. Here, we first show that the consensus of automatically generated metabolic reconstructions improves the quality of the draft reconstructions, measured by comparison to reference models. We then devise an approach for gap filling, termed COMMIT, that considers metabolites for secretion based on their permeability and the composition of the community. By applying COMMIT with two soil communities from the Arabidopsis thaliana culture collection, we could significantly reduce the gap-filling solution in comparison to filling gaps in individual reconstructions without affecting the genomic support. Inspection of the metabolic interactions in the soil communities allows us to identify microbes with community roles of helpers and beneficiaries. Therefore, COMMIT offers a versatile fully automated solution for large-scale modelling of microbial communities for diverse biotechnological applications. <br /> Author summaryMicrobial communities are important in ecology, human health, and crop productivity. However, detailed information on the interactions within natural microbial communities is hampered by the community size, lack of detailed information on the biochemistry of single organisms, and the complexity of interactions between community members. Metabolic models are comprised of biochemical reaction networks based on the genome annotation, and can provide mechanistic insights into community functions. Previous analyses of microbial community models have been performed with high-quality reference models or models generated using a single reconstruction pipeline. However, these models do not contain information on the composition of the community that determines the metabolites exchanged between the community members. In addition, the quality of metabolic models is affected by the reconstruction approach used, with direct consequences on the inferred interactions between community members. Here, we use fully automated consensus reconstructions from four approaches to arrive at functional models with improved genomic support while considering the community composition. We applied our pipeline to two soil communities from the Arabidopsis thaliana culture collection, providing only genome sequences. Finally, we show that the obtained models have 90% genomic support and demonstrate that the derived interactions are corroborated by independent computational predictions.
Characterization of maximal enzyme catalytic rates in central metabolism of Arabidopsis thaliana
(2020)
Availability of plant-specific enzyme kinetic data is scarce, limiting the predictive power of metabolic models and precluding identification of genetic factors of enzyme properties. Enzyme kinetic data are measuredin vitro, often under non-physiological conditions, and conclusions elicited from modeling warrant caution. Here we estimate maximalin vivocatalytic rates for 168 plant enzymes, including photosystems I and II, cytochrome-b6f complex, ATP-citrate synthase, sucrose-phosphate synthase as well as enzymes from amino acid synthesis with previously undocumented enzyme kinetic data in BRENDA. The estimations are obtained by integrating condition-specific quantitative proteomics data, maximal rates of selected enzymes, growth measurements fromArabidopsis thalianarosette with and fluxes through canonical pathways in a constraint-based model of leaf metabolism. In comparison to findings inEscherichia coli, we demonstrate weaker concordance between the plant-specificin vitroandin vivoenzyme catalytic rates due to a low degree of enzyme saturation. This is supported by the finding that concentrations of nicotinamide adenine dinucleotide (phosphate), adenosine triphosphate and uridine triphosphate, calculated based on our maximalin vivocatalytic rates, and available quantitative metabolomics data are below reportedKMvalues and, therefore, indicate undersaturation of respective enzymes. Our findings show that genome-wide profiling of enzyme kinetic properties is feasible in plants, paving the way for understanding resource allocation.
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data fromEscherichia coliandSaccharomyces cerevisiaeas well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs fromE. coliandS. cerevisiaeto validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein-protein and protein-metabolite interactions.
The current trends of crop yield improvements are not expected to meet the projected rise in demand. Genomic selection uses molecular markers and machine learning to identify superior genotypes with improved traits, such as growth. Plant growth directly depends on rates of metabolic reactions which transform nutrients into the building blocks of biomass. Here, we predict growth of Arabidopsis thaliana accessions by employing genomic prediction of reaction rates estimated from accession-specific metabolic models. We demonstrate that, comparing to classical genomic selection on the available data sets for 67 accessions, our approach improves the prediction accuracy for growth within and across nitrogen environments by 32.6% and 51.4%, respectively, and from optimal nitrogen to low carbon environment by 50.4%. Therefore, integration of molecular markers into metabolic models offers an approach to predict traits directly related to metabolism, and its usefulness in breeding can be examined by gathering matching datasets in crops. An increase in genomic selection (GS) accuracy can accelerate genetic gain by shortening the breeding cycles. Here, the authors introduce a network-based GS method that uses metabolic models and improves the prediction accuracy of Arabidopsis growth within and across environments.
GeneReg
(2020)
Motivation
Large-scale metabolic models are widely used to design metabolic engineering strategies for diverse biotechnological applications. However, the existing computational approaches focus on alteration of reaction fluxes and often neglect the manipulations of gene expression to implement these strategies.
Results
Here, we find that the association of genes with multiple reactions leads to infeasibility of engineering strategies at the flux level, since they require contradicting manipulations of gene expression. Moreover, we identify that all of the existing approaches to design gene knockout strategies do not ensure that the resulting design may also require other gene alterations, such as up- or downregulations, to match the desired flux distribution. To address these issues, we propose a constraint-based approach, termed GeneReg, that facilitates the design of feasible metabolic engineering strategies at the gene level and that is readily applicable to large-scale metabolic networks. We show that GeneReg can identify feasible strategies to overproduce ethanol in Escherichia coli and lactate in Saccharomyces cerevisiae, but overproduction of the TCA cycle intermediates is not feasible in five organisms used as cell factories under default growth conditions. Therefore, GeneReg points at the need to couple gene regulation and metabolism to design rational metabolic engineering strategies.
Highly efficient and accurate selection of elite genotypes can lead to dramatic shortening of the breeding cycle in major crops relevant for sustaining present demands for food, feed, and fuel. In contrast to classical approaches that emphasize the need for resource-intensive phenotyping at all stages of artificial selection, genomic selection dramatically reduces the need for phenotyping. Genomic selection relies on advances in machine learning and the availability of genotyping data to predict agronomically relevant phenotypic traits. Here we provide a systematic review of machine learning approaches applied for genomic selection of single and multiple traits in major crops in the past decade. We emphasize the need to gather data on intermediate phenotypes, e.g. metabolite, protein, and gene expression levels, along with developments of modeling techniques that can lead to further improvements of genomic selection. In addition, we provide a critical view of factors that affect genomic selection, with attention to transferability of models between different environments. Finally, we highlight the future aspects of integrating high-throughput molecular phenotypic data from omics technologies with biological networks for crop improvement.
Understanding the complexity of metabolic networks has implications for manipulation of their functions. The complexity of metabolic networks can be characterized by identifying multireaction dependencies that are challenging to determine due to the sheer number of combinations to consider. Here, we propose the concept of concordant complexes that captures multireaction dependencies and can be efficiently determined from the algebraic structure and operational constraints of metabolic networks. The concordant complexes imply the existence of concordance modules based on which the apparent complexity of 12 metabolic networks of organisms from all kingdoms of life can be reduced by at least 78%. A comparative analysis against an ensemble of randomized metabolic networks shows that the metabolic network of Escherichia coli contains fewer concordance modules and is, therefore, more tightly coordinated than expected by chance. Together, our findings demonstrate that metabolic networks are considerably simpler than what can be perceived from their structure alone.
High-throughput proteomics approaches have resulted in large-scale protein–protein interaction (PPI) networks that have been employed for the prediction of protein complexes. However, PPI networks contain false-positive as well as false-negative PPIs that affect the protein complex prediction algorithms. To address this issue, here we propose an algorithm called CUBCO+ that: (1) employs GO semantic similarity to retain only biologically relevant interactions with a high similarity score, (2) based on link prediction approaches, scores the false-negative edges, and (3) incorporates the resulting scores to predict protein complexes. Through comprehensive analyses with PPIs from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, we show that CUBCO+ performs as well as the approaches that predict protein complexes based on recently introduced graph partitions into biclique spanned subgraphs and outperforms the other state-of-the-art approaches. Moreover, we illustrate that in combination with GO semantic similarity, CUBCO+ enables us to predict more accurate protein complexes in 36% of the cases in comparison to CUBCO as its predecessor.
As autotrophic organisms, plants capture light energy to convert carbon dioxide into ATP, nicotinamide adenine dinucleotide phosphate (NADPH), and sugars, which are essential for the biosynthesis of building blocks, storage, and growth. At night, metabolism and growth can be sustained by mobilizing carbon (C) reserves. In response to changing environmental conditions, such as light-dark cycles, the small-molecule regulation of enzymatic activities is critical for reprogramming cellular metabolism. We have recently demonstrated that proteogenic dipeptides, protein degradation products, act as metabolic switches at the interface of proteostasis and central metabolism in both plants and yeast. Dipeptides accumulate in response to the environmental changes and act via direct binding and regulation of critical enzymatic activities, enabling C flux distribution. Here, we provide evidence pointing to the involvement of dipeptides in the metabolic rewiring characteristics for the day-night cycle in plants. Specifically, we measured the abundance of 13 amino acids and 179 dipeptides over short- (SD) and long-day (LD) diel cycles, each with different light intensities. Of the measured dipeptides, 38 and eight were characterized by day-night oscillation in SD and LD, respectively, reaching maximum accumulation at the end of the day and then gradually falling in the night. Not only the number of dipeptides, but also the amplitude of the oscillation was higher in SD compared with LD conditions. Notably, rhythmic dipeptides were enriched in the glucogenic amino acids that can be converted into glucose. Considering the known role of Target of Rapamycin (TOR) signaling in regulating both autophagy and metabolism, we subsequently investigated whether diurnal fluctuations of dipeptides levels are dependent on the TOR Complex (TORC). The Raptor1b mutant (raptor1b), known for the substantial reduction of TOR kinase activity, was characterized by the augmented accumulation of dipeptides, which is especially pronounced under LD conditions. We were particularly intrigued by the group of 16 dipeptides, which, based on their oscillation under SD conditions and accumulation in raptor1b, can be associated with limited C availability or photoperiod. By mining existing protein-metabolite interaction data, we delineated putative protein interactors for a representative dipeptide Pro-Gln. The obtained list included enzymes of C and amino acid metabolism, which are also linked to the TORC-mediated metabolic network. Based on the obtained results, we speculate that the diurnal accumulation of dipeptides contributes to its metabolic adaptation in response to changes in C availability. We hypothesize that dipeptides would act as alternative respiratory substrates and by directly modulating the activity of the focal enzymes.