Refine
Year of publication
Document Type
- Article (9)
- Postprint (8)
- Doctoral Thesis (1)
Language
- English (18)
Is part of the Bibliography
- yes (18)
Keywords
- biomarker (3)
- machine learning (3)
- untargeted metabolomics (3)
- Gene Ontology (2)
- IDP (2)
- LEA protein (2)
- LysoPC(20:0) (2)
- MS neurodegeneration (2)
- PPMS (2)
- Phosphorylation Site (2)
Institute
Similar Yet Different
(2020)
The importance of intrinsically disordered late embryogenesis abundant (LEA) proteins in the tolerance to abiotic stresses involving cellular dehydration is undisputed. While structural transitions of LEA proteins in response to changes in water availability are commonly observed and several molecular functions have been suggested, a systematic, comprehensive and comparative study of possible underlying sequence-structure-function relationships is still lacking. We performed molecular dynamics (MD) simulations as well as spectroscopic and light scattering experiments to characterize six members of two distinct, lowly homologous clades of LEA_4 family proteins from Arabidopsis thaliana. We compared structural and functional characteristics to elucidate to what degree structure and function are encoded in LEA protein sequences and complemented these findings with physicochemical properties identified in a systematic bioinformatics study of the entire Arabidopsis thaliana LEA_4 family. Our results demonstrate that although the six experimentally characterized LEA_4 proteins have similar structural and functional characteristics, differences concerning their folding propensity and membrane stabilization capacity during a freeze/thaw cycle are obvious. These differences cannot be easily attributed to sequence conservation, simple physicochemical characteristics or the abundance of sequence motifs. Moreover, the folding propensity does not appear to be correlated with membrane stabilization capacity. Therefore, the refinement of LEA_4 structural and functional properties is likely encoded in specific patterns of their physicochemical characteristics.
Potato (Solanum tuberosum L.) is one of the most important food crops worldwide. Current potato varieties are highly susceptible to drought stress. In view of global climate change, selection of cultivars with improved drought tolerance and high yield potential is of paramount importance. Drought tolerance breeding of potato is currently based on direct selection according to yield and phenotypic traits and requires multiple trials under drought conditions. Marker-assisted selection (MAS) is cheaper, faster and reduces classification errors caused by noncontrolled environmental effects. We analysed 31 potato cultivars grown under optimal and reduced water supply in six independent field trials. Drought tolerance was determined as tuber starch yield. Leaf samples from young plants were screened for preselected transcript and nontargeted metabolite abundance using qRT-PCR and GC-MS profiling, respectively. Transcript marker candidates were selected from a published RNA-Seq data set. A Random Forest machine learning approach extracted metabolite and transcript markers for drought tolerance prediction with low error rates of 6% and 9%, respectively. Moreover, by combining transcript and metabolite markers, the prediction error was reduced to 4.3%. Feature selection from Random Forest models allowed model minimization, yielding a minimal combination of only 20 metabolite and transcript markers that were successfully tested for their reproducibility in 16 independent agronomic field trials. We demonstrate that a minimum combination of transcript and metabolite markers sampled at early cultivation stages predicts potato yield stability under drought largely independent of seasonal and regional agronomic conditions.
Similar Yet Different
(2020)
The importance of intrinsically disordered late embryogenesis abundant (LEA) proteins in the tolerance to abiotic stresses involving cellular dehydration is undisputed. While structural transitions of LEA proteins in response to changes in water availability are commonly observed and several molecular functions have been suggested, a systematic, comprehensive and comparative study of possible underlying sequence-structure-function relationships is still lacking. We performed molecular dynamics (MD) simulations as well as spectroscopic and light scattering experiments to characterize six members of two distinct, lowly homologous clades of LEA_4 family proteins from Arabidopsis thaliana. We compared structural and functional characteristics to elucidate to what degree structure and function are encoded in LEA protein sequences and complemented these findings with physicochemical properties identified in a systematic bioinformatics study of the entire Arabidopsis thaliana LEA_4 family. Our results demonstrate that although the six experimentally characterized LEA_4 proteins have similar structural and functional characteristics, differences concerning their folding propensity and membrane stabilization capacity during a freeze/thaw cycle are obvious. These differences cannot be easily attributed to sequence conservation, simple physicochemical characteristics or the abundance of sequence motifs. Moreover, the folding propensity does not appear to be correlated with membrane stabilization capacity. Therefore, the refinement of LEA_4 structural and functional properties is likely encoded in specific patterns of their physicochemical characteristics.
Potato (Solanum tuberosum L.) is one of the most important food crops worldwide. Current potato varieties are highly susceptible to drought stress. In view of global climate change, selection of cultivars with improved drought tolerance and high yield potential is of paramount importance. Drought tolerance breeding of potato is currently based on direct selection according to yield and phenotypic traits and requires multiple trials under drought conditions. Marker‐assisted selection (MAS) is cheaper, faster and reduces classification errors caused by noncontrolled environmental effects. We analysed 31 potato cultivars grown under optimal and reduced water supply in six independent field trials. Drought tolerance was determined as tuber starch yield. Leaf samples from young plants were screened for preselected transcript and nontargeted metabolite abundance using qRT‐PCR and GC‐MS profiling, respectively. Transcript marker candidates were selected from a published RNA‐Seq data set. A Random Forest machine learning approach extracted metabolite and transcript markers for drought tolerance prediction with low error rates of 6% and 9%, respectively. Moreover, by combining transcript and metabolite markers, the prediction error was reduced to 4.3%. Feature selection from Random Forest models allowed model minimization, yielding a minimal combination of only 20 metabolite and transcript markers that were successfully tested for their reproducibility in 16 independent agronomic field trials. We demonstrate that a minimum combination of transcript and metabolite markers sampled at early cultivation stages predicts potato yield stability under drought largely independent of seasonal and regional agronomic conditions.
Climate models predict an increased likelihood of seasonal droughts for many areas of the world. Breeding for drought tolerance could be accelerated by marker-assisted selection. As a basis for marker identification, we studied the genetic variance, predictability of field performance and potential costs of tolerance in potato (Solanum tuberosum L.). Potato produces high calories per unit of water invested, but is drought-sensitive. In 14 independent pot or field trials, 34 potato cultivars were grown under optimal and reduced water supply to determine starch yield. In an artificial dataset, we tested several stress indices for their power to distinguish tolerant and sensitive genotypes independent of their yield potential. We identified the deviation of relative starch yield from the experimental median (DRYM) as the most efficient index. DRYM corresponded qualitatively to the partial least square model-based metric of drought stress tolerance in a stress effect model. The DRYM identified significant tolerance variation in the European potato cultivar population to allow tolerance breeding and marker identification. Tolerance results from pot trials correlated with those from field trials but predicted field performance worse than field growth parameters. Drought tolerance correlated negatively with yield under optimal conditions in the field. The distribution of yield data versus DRYM indicated that tolerance can be combined with average yield potentials, thus circumventing potential yield penalties in tolerance breeding.
Parkinson's disease (PD) shows high heterogeneity with regard to the underlying molecular pathogenesis involving multiple pathways and mechanisms. Diagnosis is still challenging and rests entirely on clinical features. Thus, there is an urgent need for robust diagnostic biofluid markers. Untargeted metabolomics allows establishing low-molecular compound biomarkers in a wide range of complex diseases by the measurement of various molecular classes in biofluids such as blood plasma, serum, and cerebrospinal fluid (CSF). Here, we applied untargeted high-resolution mass spectrometry to determine plasma and CSF metabolite profiles. We semiquantitatively determined small-molecule levels (<= 1.5 kDa) in the plasma and CSF from early PD patients (disease duration 0-4 years; n = 80 and 40, respectively), and sex-and age-matched controls (n = 76 and 38, respectively). We performed statistical analyses utilizing partial least square and random forest analysis with a 70/30 training and testing split approach, leading to the identification of 20 promising plasma and 14 CSF metabolites. The semetabolites differentiated the test set with an AUC of 0.8 (plasma) and 0.9 (CSF). Characteristics of the metabolites indicate perturbations in the glycerophospholipid, sphingolipid, and amino acid metabolism in PD, which underscores the high power of metabolomic approaches. Further studies will enable to develop a potential metabolite-based biomarker panel specific for PD
The study of non-coding RNA genes has received increased attention in recent years fuelled by accumulating evidence that larger portions of genomes than previously acknowledged are transcribed into RNA molecules of mostly unknown function, as well as the discovery of novel non-coding RNA types and functional RNA elements. Here, we demonstrate that specific properties of graphs that represent the predicted RNA secondary structure reflect functional information. We introduce a computational algorithm and an associated web-based tool (GraPPLE) for classifying non-coding RNA molecules as functional and, furthermore, into Rfam families based on their graph properties. Unlike sequence-similarity-based methods and covariance models, GraPPLE is demonstrated to be more robust with regard to increasing sequence divergence, and when combined with existing methods, leads to a significant improvement of prediction accuracy. Furthermore, graph properties identified as most informative are shown to provide an understanding as to what particular structural features render RNA molecules functional. Thus, GraPPLE may offer a valuable computational filtering tool to identify potentially interesting RNA molecules among large candidate datasets.
Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structurebased P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de.
ChlamyCyc : an integrative systems biology database and web-portal for Chlamydomonas reinhardtii
(2009)
Background: The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern highthroughput technologies there is an imperative need to integrate large-scale data sets from highthroughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. Results: In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. Conclusion: ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
Primary progressive multiple sclerosis (PPMS) shows a highly variable disease progression with poor prognosis and a characteristic accumulation of disabilities in patients. These hallmarks of PPMS make it difficult to diagnose and currently impossible to efficiently treat. This study aimed to identify plasma metabolite profiles that allow diagnosis of PPMS and its differentiation from the relapsing remitting subtype (RRMS), primary neurodegenerative disease (Parkinson’s disease, PD), and healthy controls (HCs) and that significantly change during the disease course and could serve as surrogate markers of multiple sclerosis (MS)-associated neurodegeneration over time. We applied untargeted high-resolution metabolomics to plasma samples to identify PPMS-specific signatures, validated our findings in independent sex- and age-matched PPMS and HC cohorts and built discriminatory models by partial least square discriminant analysis (PLS-DA). This signature was compared to sex- and age-matched RRMS patients, to patients with PD and HC. Finally, we investigated these metabolites in a longitudinal cohort of PPMS patients over a 24-month period. PLS-DA yielded predictive models for classification along with a set of 20 PPMS-specific informative metabolite markers. These metabolites suggest disease-specific alterations in glycerophospholipid and linoleic acid pathways. Notably, the glycerophospholipid LysoPC(20:0) significantly decreased during the observation period. These findings show potential for diagnosis and disease course monitoring, and might serve as biomarkers to assess treatment efficacy in future clinical trials for neuroprotective MS therapies.