TY  - JOUR
A1  - Steuer, Ralf
A1  - Humburg, Peter
A1  - Selbig, Joachim
T1  - Validation and functional annotation of expression-based clusters based on gene ontology
JF  - BMC bioinformatics
N2  - Background: The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. Results: In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12: 1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. Conclusion: We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.
Y1  - 2006
U6  - https://doi.org/10.1186/1471-2105-7-380
SN  - 1471-2105
VL  - 7
IS  - 380
PB  - BioMed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Rajasundaram, Dhivyaa
A1  - Runavot, Jean-Luc
A1  - Guo, Xiaoyuan
A1  - Willats, William G. T.
A1  - Meulewaeter, Frank
A1  - Selbig, Joachim
T1  - Understanding the relationship between cotton fiber properties and non-cellulosic cell wall polysaccharides
JF  - PLoS one
N2  - A detailed knowledge of cell wall heterogeneity and complexity is crucial for understanding plant growth and development. One key challenge is to establish links between polysaccharide-rich cell walls and their phenotypic characteristics. It is of particular interest for some plant material, like cotton fibers, which are of both biological and industrial importance. To this end, we attempted to study cotton fiber characteristics together with glycan arrays using regression based approaches. Taking advantage of the comprehensive microarray polymer profiling technique (CoMPP), 32 cotton lines from different cotton species were studied. The glycan array was generated by sequential extraction of cell wall polysaccharides from mature cotton fibers and screening samples against eleven extensively characterized cell wall probes. Also, phenotypic characteristics of cotton fibers such as length, strength, elongation and micronaire were measured. The relationship between the two datasets was established in an integrative manner using linear regression methods. In the conducted analysis, we demonstrated the usefulness of regression based approaches in establishing a relationship between glycan measurements and phenotypic traits. In addition, the analysis also identified specific polysaccharides which may play a major role during fiber development for the final fiber characteristics. Three different regression methods identified a negative correlation between micronaire and the xyloglucan and homogalacturonan probes. Moreover, homogalacturonan and callose were shown to be significant predictors for fiber length. The role of these polysaccharides was already pointed out in previous cell wall elongation studies. Additional relationships were predicted for fiber strength and elongation which will need further experimental validation.
Y1  - 2014
U6  - https://doi.org/10.1371/journal.pone.0112168
SN  - 1932-6203
VL  - 9
IS  - 11
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Flöter, André
A1  - Nicolas, Jacques
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Threshold extraction in metabolite concentration data
N2  - Motivation: Continued development of analytical techniques based on gas chromatography and mass spectrometry now facilitates the generation of larger sets of metabolite concentration data. An important step towards the understanding of metabolite dynamics is the recognition of stable states where metabolite concentrations exhibit a simple behaviour. Such states can be characterized through the identification of significant thresholds in the concentrations. But general techniques for finding discretization thresholds in continuous data prove to be practically insufficient for detecting states due to the weak conditional dependences in concentration data. Results: We introduce a method of recognizing states in the framework of decision tree induction. It is based upon a global analysis of decision forests where stability and quality are evaluated. It leads to the detection of thresholds that are both comprehensible and robust. Applied to metabolite concentration data, this method has led to the discovery of hidden states in the corresponding variables. Some of these reflect known properties of the biological experiments, and others point to putative new states
Y1  - 2004
ER  - 
TY  - JOUR
A1  - Flöter, André
A1  - Nicolas, Jacques
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Threshold extraction in metabolite concentration data
Y1  - 2003
UR  - http://www.cs.uni-potsdam.de/wv/pdfformat/floeterGCB2003.pdf
ER  - 
TY  - JOUR
A1  - Guo, Ke-Tai
A1  - Fu, Peng
A1  - Juerchott, Kathrin
A1  - Motaln, Helena
A1  - Selbig, Joachim
A1  - Lah, Tamara T.
A1  - Tonn, Jörg-Christian
A1  - Schichor, Christian
T1  - The expression of Wnt-inhibitor DKK1 (Dickkopf 1) is determined by intercellular crosstalk and hypoxia in human malignant gliomas
JF  - Journal of cancer research and clinical oncology : official organ of the Deutsche Krebsgesellschaft
N2  - Objective Wnt signalling pathways regulate proliferation, motility and survival in a variety of human cell types. Dickkopf 1 (DKK1) gene codes for a secreted Wnt inhibitory factor. It functions as tumour suppressor gene in breast cancer and as a pro-apoptotic factor in glioma cells. In this study, we aimed to demonstrate whether the different expression of DKK1 in human glioma-derived cells is dependent on microenvironmental factors like hypoxia and regulated by the intercellular crosstalk with bone-marrow-derived mesenchymal stem cells (bmMSCs).
 Methods Glioma cell line U87-MG, three cell lines from human glioblastoma grade IV (glioma-derived mesenchymal stem cells) and three bmMSCs were selected for the experiment. The expression of DKK1 in cell lines under normoxic/hypoxic environment or co-culture condition was measured using real-time PCR and enzyme-linked immunoadsorbent assay. The effect of DKK1 on cell migration and proliferation was evaluated by in vitro wound healing assays and sulphorhodamine assays, respectively.
 Results Glioma-derived cells U87-MG displayed lower DKK1 expression compared with bmMSCs. Hypoxia led to an overexpression of DKK1 in bmMSCs and U87-MG when compared to normoxic environment, whereas co-culture of U87-MG with bmMSCs induced the expression of DKK1 in both cell lines. Exogenous recombinant DKK1 inhibited cell migration on all cell lines, but did not have a significant effect on cell proliferation of bmMSCs and glioma cell lines.
 Conclusion In this study, we showed for the first time that the expression of DKK1 was hypoxia dependent in human malignant glioma cell lines. The induction of DKK1 by intracellular crosstalk or hypoxia stimuli sheds light on the intense adaption of glial tumour cells to environmental alterations.
KW  - Dickkopf 1
KW  - Intercellular crosstalk
KW  - Hypoxia
KW  - Gliomas
Y1  - 2014
U6  - https://doi.org/10.1007/s00432-014-1642-2
SN  - 0171-5216
SN  - 1432-1335
VL  - 140
IS  - 8
SP  - 1261
EP  - 1270
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Girbig, Dorothee
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
T1  - Systematic analysis of stability patterns in plant primary metabolism
JF  - PLoS one
N2  - Metabolic networks are characterized by complex interactions and regulatory mechanisms between many individual components. These interactions determine whether a steady state is stable to perturbations. Structural kinetic modeling (SKM) is a framework to analyze the stability of metabolic steady states that allows the study of the system Jacobian without requiring detailed knowledge about individual rate equations. Stability criteria can be derived by generating a large number of structural kinetic models (SK-models) with randomly sampled parameter sets and evaluating the resulting Jacobian matrices. Until now, SKM experiments applied univariate tests to detect the network components with the largest influence on stability. In this work, we present an extended SKM approach relying on supervised machine learning to detect patterns of enzyme-metabolite interactions that act together in an orchestrated manner to ensure stability. We demonstrate its application on a detailed SK-model of the Calvin-Benson cycle and connected pathways. The identified stability patterns are highly complex reflecting that changes in dynamic properties depend on concerted interactions between several network components. In total, we find more patterns that reliably ensure stability than patterns ensuring instability. This shows that the design of this system is strongly targeted towards maintaining stability. We also investigate the effect of allosteric regulators revealing that the tendency to stability is significantly increased by including experimentally determined regulatory mechanisms that have not yet been integrated into existing kinetic models.
Y1  - 2012
U6  - https://doi.org/10.1371/journal.pone.0034686
SN  - 1932-6203
VL  - 7
IS  - 4
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Steuer, Ralf
A1  - Gross, Thilo
A1  - Selbig, Joachim
A1  - Blasius, Bernd
T1  - Structural kinetic modeling of metabolic networks
JF  - Proceedings of the National Academy of Sciences of the United States of America
N2  - To develop and investigate detailed mathematical models of metabolic processes is one of the primary challenges in systems biology. However, despite considerable advance in the topological analysis of metabolic networks, kinetic modeling is still often severely hampered by inadequate knowledge of the enzyme-kinetic rate laws and their associated parameter values. Here we propose a method that aims to give a quantitative account of the dynamical capabilities of a metabolic system, without requiring any explicit information about the functional form of the rate equations. Our approach is based on constructing a local linear model at each point in parameter space, such that each element of the model is either directly experimentally accessible or amenable to a straightforward biochemical interpretation. This ensemble of local linear models, encompassing all possible explicit kinetic models, then allows for a statistical exploration of the comprehensive parameter space. The method is exemplified on two paradigmatic metabolic systems: the glycolytic pathway of yeast and a realistic-scale representation of the photosynthetic Calvin cycle.
KW  - systems biology
KW  - computational biochemistry
KW  - metabolomics
KW  - metabolic regulation
KW  - biological robustness
Y1  - 2006
U6  - https://doi.org/10.1073/pnas.0600013103
SN  - 0027-8424
SN  - 1091-6490
VL  - 103
IS  - 32
SP  - 11868
EP  - 11873
PB  - National Academy of Sciences
CY  - Washington
ER  - 
TY  - JOUR
A1  - Timmer, Marco
A1  - Theiss, Hans
A1  - Jürchott, Katrin
A1  - Ries, Christian
A1  - Paron, Igor
A1  - Franz, W.
A1  - Selbig, Joachim
A1  - Guo, Ketai
A1  - Tonn, Jörg
A1  - Schichor, Christian
T1  - Stromal-Derived Factor 1a (Sdf-1a), a Homing Factor for Mesenchymal Progenitor Cells, Is Elevated in Tumor Tissue and Plasma of Glioma Patients
N2  - Malignant gliomas are a fatal disease lacking sufficient possibilities for early diagnosis and chemical markers to detect remission or relapse. The recruitment of progenitor cells such as mesenchymal stem cells (MSC) is a main feature of gliomas. Stromal cell-derived factor-1 (SDF-1), a chemokine produced in glioma cell lines, enhances migration in MSC and has been associated with cell survival and apoptosis in gliomas. Therefore, this study was performed to evaluate (i) whether SDF-1 and its receptors are expressed in human malignant gliomas in situ and (ii) if SDF-1 might potentially play a role in recruiting MSCs into human glioma. In glioblastoma tissue, immunohistochemistry revealed that SDF-1 and its receptor CXCR4 are expressed in regions of angiogenesis and necrosis, and qPCR showed that SDF-1 is elevated. Public expression data indicated that CXCR4 was upregulated. The latter data also illustrate that SDF-1 could be up- or downregulated in glioma compared to normal brain in a transcript-specific manner. In plasma, SDF-1 is elevated in glioma patients. The level is reduced by both dexamethasone intake and surgery. Dexamethasone also decreased SDF-1 production in cells in vitro. The undirected migration of human MSC (hMSC) was not enhanced by the addition of SDF-1. However, SDF-1 stimulated directed invasion of hMSC in a dose-dependent manner. Taken together, we show that SDF-1 is a potent chemoattractant of progenitor cells such as hMSCs and that its expression is elevated in glioma tissue, which results in elevated SDF-1 levels in the patient's plasma samples with concomittant decrease after tumor resection. The fact that elevated SDF-1 plasma levels are significantly decreased after tumor surgery could be a first hint that SDF-1 might act as tumor marker for malignant gliomas in order to detect disease progression or remission, respectively.
Y1  - 2010
UR  - http://neuro-oncology.oxfordjournals.org/
SN  - 1522-8517
ER  - 
TY  - JOUR
A1  - Larhlimi, Abdelhalim
A1  - Basler, Georg
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Stoichiometric capacitance reveals the theoretical capabilities of metabolic networks
JF  - Bioinformatics
N2  - Motivation: Metabolic engineering aims at modulating the capabilities of metabolic networks by changing the activity of biochemical reactions. The existing constraint-based approaches for metabolic engineering have proven useful, but are limited only to reactions catalogued in various pathway databases.
 Results: We consider the alternative of designing synthetic strategies which can be used not only to characterize the maximum theoretically possible product yield but also to engineer networks with optimal conversion capability by using a suitable biochemically feasible reaction called 'stoichiometric capacitance'. In addition, we provide a theoretical solution for decomposing a given stoichiometric capacitance over a set of known enzymatic reactions. We determine the stoichiometric capacitance for genome-scale metabolic networks of 10 organisms from different kingdoms of life and examine its implications for the alterations in flux variability patterns. Our empirical findings suggest that the theoretical capacity of metabolic networks comes at a cost of dramatic system's changes.
Y1  - 2012
U6  - https://doi.org/10.1093/bioinformatics/bts381
SN  - 1367-4803
VL  - 28
IS  - 18
SP  - I502
EP  - I508
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Sulpice, Ronan
A1  - Pyl, Eva-Theresa
A1  - Ishihara, Hirofumi
A1  - Trenkamp, Sandra
A1  - Steinfath, Matthias
A1  - Witucka-Wall, Hanna
A1  - Gibon, Yves
A1  - Usadel, Björn
A1  - Poree, Fabien
A1  - Piques, Maria Conceicao
A1  - von Korff, Maria
A1  - Steinhauser, Marie Caroline
A1  - Keurentjes, Joost J. B.
A1  - Guenther, Manuela
A1  - Hoehne, Melanie
A1  - Selbig, Joachim
A1  - Fernie, Alisdair R.
A1  - Altmann, Thomas
A1  - Stitt, Mark
T1  - Starch as a major integrator in the regulation of plant growth
N2  - Rising demand for food and bioenergy makes it imperative to breed for increased crop yield. Vegetative plant growth could be driven by resource acquisition or developmental programs. Metabolite profiling in 94 Arabidopsis accessions revealed that biomass correlates negatively with many metabolites, especially starch. Starch accumulates in the light and is degraded at night to provide a sustained supply of carbon for growth. Multivariate analysis revealed that starch is an integrator of the overall metabolic response. We hypothesized that this reflects variation in a regulatory network that balances growth with the carbon supply. Transcript profiling in 21 accessions revealed coordinated changes of transcripts of more than 70 carbon-regulated genes and identified 2 genes (myo-inositol-1- phosphate synthase, a Kelch-domain protein) whose transcripts correlate with biomass. The impact of allelic variation at these 2 loci was shown by association mapping, identifying them as candidate lead genes with the potential to increase biomass production.
Y1  - 2009
UR  - http://www.pnas.org/
U6  - https://doi.org/10.1073/pnas.0903478106
SN  - 0027-8424
ER  - 
TY  - JOUR
A1  - Hummel, Jan
A1  - Keshvari, N.
A1  - Weckwerth, Wolfram
A1  - Selbig, Joachim
T1  - Species-specific analysis of protein sequence motifs using mutual information
N2  - Background: Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. Such profiles conveniently reflect functional necessities by pointing out proximity at conserved sequence positions as well as depicting distances at variable positions. Discovering significant conservation characteristics within the variable positions of profiles mirrors group-specific and, in particular, evolutionary features of the underlying sequences. Results: We describe the tool PROfile analysis based on Mutual Information (PROMI) that enables comparative analysis of user-classified protein sequences. PROMI is implemented as a web service using Perl and R as well as other publicly available packages and tools on the server-side. On the client-side platform-independence is achieved by generally applied internet delivery standards. As one possible application analysis of the zinc finger C2H2-type protein domain is introduced to illustrate the functionality of the tool. Conclusion: The web service PROMI should assist researchers to detect evolutionary correlations in protein profiles of defined biological sequences. It is available at http:// promi.mpimpgolm. mpg.de where additional documentation can be found
Y1  - 2005
SN  - 1471-2105
ER  - 
TY  - JOUR
A1  - Grimbs, Sergio
A1  - Arnold, Anne
A1  - Koseska, Aneta
A1  - Kurths, Jürgen
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Spatiotemporal dynamics of the Calvin cycle multistationarity and symmetry breaking instabilities
JF  - Biosystems : journal of biological and information processing sciences
N2  - The possibility of controlling the Calvin cycle has paramount implications for increasing the production of biomass. Multistationarity, as a dynamical feature of systems, is the first obvious candidate whose control could find biotechnological applications. Here we set out to resolve the debate on the multistationarity of the Calvin cycle. Unlike the existing simulation-based studies, our approach is based on a sound mathematical framework, chemical reaction network theory and algebraic geometry, which results in provable results for the investigated model of the Calvin cycle in which we embed a hierarchy of realistic kinetic laws. Our theoretical findings demonstrate that there is a possibility for multistationarity resulting from two sources, homogeneous and inhomogeneous instabilities, which partially settle the debate on multistability of the Calvin cycle. In addition, our tractable analytical treatment of the bifurcation parameters can be employed in the design of validation experiments.
KW  - Multistationarity
KW  - Calvin cycle
KW  - Algebraic geometry
KW  - Bifurcation parameters
KW  - Biomass
Y1  - 2011
U6  - https://doi.org/10.1016/j.biosystems.2010.10.015
SN  - 0303-2647
VL  - 103
IS  - 2
SP  - 212
EP  - 223
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Ryngajllo, Malgorzata
A1  - Childs, Liam H.
A1  - Lohse, Marc
A1  - Giorgi, Federico M.
A1  - Lude, Anja
A1  - Selbig, Joachim
A1  - Usadel, Björn
T1  - SLocX  predicting subcellular localization of Arabidopsis proteins leveraging gene expression data
JF  - Frontiers in plant science
N2  - Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mito-chondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
KW  - subcellular localization
KW  - support vector machine
KW  - prediction
KW  - gene expression
Y1  - 2011
U6  - https://doi.org/10.3389/fpls.2011.00043
SN  - 1664-462X
VL  - 2
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Childs, Dorothee
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
T1  - Refined elasticity sampling for Monte Carlo-based identification of stabilizing network patterns
JF  - Bioinformatics
N2  - Motivation: Structural kinetic modelling (SKM) is a framework to analyse whether a metabolic steady state remains stable under perturbation, without requiring detailed knowledge about individual rate equations. It provides a representation of the system's Jacobian matrix that depends solely on the network structure, steady state measurements, and the elasticities at the steady state. For a measured steady state, stability criteria can be derived by generating a large number of SKMs with randomly sampled elasticities and evaluating the resulting Jacobian matrices. The elasticity space can be analysed statistically in order to detect network positions that contribute significantly to the perturbation response. Here, we extend this approach by examining the kinetic feasibility of the elasticity combinations created during Monte Carlo sampling.
 Results: Using a set of small example systems, we show that the majority of sampled SKMs would yield negative kinetic parameters if they were translated back into kinetic models. To overcome this problem, a simple criterion is formulated that mitigates such infeasible models. After evaluating the small example pathways, the methodology was used to study two steady states of the neuronal TCA cycle and the intrinsic mechanisms responsible for their stability or instability. The findings of the statistical elasticity analysis confirm that several elasticities are jointly coordinated to control stability and that the main source for potential instabilities are mutations in the enzyme alpha-ketoglutarate dehydrogenase.
Y1  - 2015
U6  - https://doi.org/10.1093/bioinformatics/btv243
SN  - 1367-4803
SN  - 1460-2059
VL  - 31
IS  - 12
SP  - 214
EP  - 220
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Meyer, Rhonda Christiane
A1  - Kusterer, Barbara
A1  - Lisec, Jan
A1  - Steinfath, Matthias
A1  - Becher, Martina
A1  - Scharr, Hanno
A1  - Melchinger, Albrecht E.
A1  - Selbig, Joachim
A1  - Schurr, Ulrich
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - QTL analysis of early stage heterosis for biomass in Arabidopsis
JF  - Theoretical and applied genetics
N2  - The main objective of this study was to identify genomic regions involved in biomass heterosis using QTL, generation means, and mode-of-inheritance classification analyses. In a modified North Carolina Design III we backcrossed 429 recombinant inbred line and 140 introgression line populations to the two parental accessions, C24 and Col-0, whose F 1 hybrid exhibited 44% heterosis for biomass. Mid-parent heterosis in the RILs ranged from −31 to 99% for dry weight and from −58 to 143% for leaf area. We detected ten genomic positions involved in biomass heterosis at an early developmental stage, individually explaining between 2.4 and 15.7% of the phenotypic variation. While overdominant gene action was prevalent in heterotic QTL, our results suggest that a combination of dominance, overdominance and epistasis is involved in biomass heterosis in this Arabidopsis cross.
KW  - Quantitative Trait Locus
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
KW  - dominance effect
KW  - recombinant inbred line population
Y1  - 2009
U6  - https://doi.org/10.1007/s00122-009-1074-6
SN  - 1432-2242
SN  - 0040-5752
VL  - 129
IS  - 2
SP  - 227
EP  - 237
PB  - Springer Nature
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Riaño-Pachón, Diego Mauricio
A1  - Kleessen, Sabrina
A1  - Neigenfind, Jost
A1  - Durek, Pawel
A1  - Weber, Elke
A1  - Engelsberger, Wolfgang R.
A1  - Walther, Dirk
A1  - Selbig, Joachim
A1  - Schulze, Waltraud X.
A1  - Kersten, Birgit
T1  - Proteome-wide survey of phosphorylation patterns affected by nuclear DNA polymorphisms in Arabidopsis thaliana
JF  - BMC Genomics
N2  - Background: Protein phosphorylation is an important post-translational modification influencing many aspects of dynamic cellular behavior. Site-specific phosphorylation of amino acid residues serine, threonine, and tyrosine can have profound effects on protein structure, activity, stability, and interaction with other biomolecules. Phosphorylation sites can be affected in diverse ways in members of any species, one such way is through single nucleotide polymorphisms (SNPs). The availability of large numbers of experimentally identified phosphorylation sites, and of natural variation datasets in Arabidopsis thaliana prompted us to analyze the effect of non-synonymous SNPs (nsSNPs) onto phosphorylation sites.

Results: From the analyses of 7,178 experimentally identified phosphorylation sites we found that: (i) Proteins with multiple phosphorylation sites occur more often than expected by chance. (ii) Phosphorylation hotspots show a preference to be located outside conserved domains. (iii) nsSNPs affected experimental phosphorylation sites as much as the corresponding non-phosphorylated amino acid residues. (iv) Losses of experimental phosphorylation sites by nsSNPs were identified in 86 A. thaliana proteins, among them receptor proteins were overrepresented.

These results were confirmed by similar analyses of predicted phosphorylation sites in A. thaliana. In addition, predicted threonine phosphorylation sites showed a significant enrichment of nsSNPs towards asparagines and a significant depletion of the synonymous substitution. Proteins in which predicted phosphorylation sites were affected by nsSNPs (loss and gain), were determined to be mainly receptor proteins, stress response proteins and proteins involved in nucleotide and protein binding. Proteins involved in metabolism, catalytic activity and biosynthesis were less affected.

Conclusions: We analyzed more than 7,100 experimentally identified phosphorylation sites in almost 4,300 protein-coding loci in silico, thus constituting the largest phosphoproteomics dataset for A. thaliana available to date. Our findings suggest a relatively high variability in the presence or absence of phosphorylation sites between different natural accessions in receptor and other proteins involved in signal transduction. Elucidating the effect of phosphorylation sites affected by nsSNPs on adaptive responses represents an exciting research goal for the future.
KW  - Gene Ontology
KW  - Phosphorylation Site
KW  - phosphorylated amino acid
KW  - slim term
KW  - single nucleotide polymorphism mapping
Y1  - 2010
U6  - https://doi.org/10.1186/1471-2164-11-411
SN  - 1471-2164
VL  - 11
PB  - Biomed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Steinfath, Matthias
A1  - Gärtner, Tanja
A1  - Lisec, Jan
A1  - Meyer, Rhonda Christiane
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers
JF  - Theoretical and applied genetics : TAG ; international journal of plant breeding research
N2  - A recombinant inbred line (RIL) population, derived from two Arabidopsis thaliana accessions, and the corresponding testcrosses with these two original accessions were used for the development and validation of machine learning models to predict the biomass of hybrids. Genetic and metabolic information of the RILs served as predictors. Feature selection reduced the number of variables (genetic and metabolic markers) in the models by more than 80% without impairing the predictive power. Thus, potential biomarkers have been revealed. Metabolites were shown to bear information on inherited macroscopic phenotypes. This proof of concept could be interesting for breeders. The example population exhibits substantial mid-parent biomass heterosis. The results of feature selection could therefore be used to shed light on the origin of heterosis. In this respect, mainly dominance effects were detected.
KW  - Quantitative Trait Locus
KW  - feature selection
KW  - Partial Little Square
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
Y1  - 2009
U6  - https://doi.org/10.1007/s00122-009-1191-2
SN  - 0040-5752
SN  - 1432-2242
VL  - 120
SP  - 239
EP  - 247
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Moehlig, M.
A1  - Floeter, A.
A1  - Spranger, Joachim
A1  - Weickert, Martin O.
A1  - Schill, T.
A1  - Schloesser, H. W.
A1  - Brabant, G.
A1  - Pfeiffer, Andreas F. H.
A1  - Selbig, Joachim
A1  - Schoefl, C.
T1  - Predicting impaired glucose metabolism in women with polycystic ovary syndrome by decision tree modelling
JF  - Diabetologia : journal of the European Association for the Study of Diabetes (EASD)
N2  - Aims/hypothesis Polycystic ovary syndrome (PCOS) is a risk factor of type 2 diabetes. Screening for impaired glucose metabolism (IGM) with an OGTT has been recommended, but this is relatively time-consuming and inconvenient. Thus, a strategy that could minimise the need for an OGTT would be beneficial. Materials and methods Consecutive PCOS patients (n=118) with fasting glucose < 6.1 mmol/l were included in the study. Parameters derived from medical history, clinical examination and fasting blood samples were assessed by decision tree modelling for their ability to discriminate women with IGM (2-h OGTT value >= 7.8 mmol/l) from those with NGT. Results According to the OGTT results, 93 PCOS women had NGT and 25 had IGM. The best decision tree consisted of HOMA-IR, the proinsulin:insulin ratio, proinsulin, 17-OH progesterone and the ratio of luteinising hormone:follicle-stimulating hormone. This tree identified 69 women with NGT. The remaining 49 women included all women with IGM (100% sensitivity, 74% specificity to detect IGM). Pruning this tree to three levels still identified 53 women with NGT (100% sensitivity, 57% specificity to detect IGM). Restricting the data matrix used for tree modelling to medical history and clinical parameters produced a tree using BMI, waist circumference and WHR. Pruning this tree to two levels separated 27 women with NGT (100% sensitivity, 29% specificity to detect IGM). The validity of both trees was tested by a leave-10%-out cross-validation. Conclusions/interpretation Decision trees are useful tools for separating PCOS women with NGT from those with IGM. They can be used for stratifying the metabolic screening of PCOS women, whereby the number of OGTTs can be markedly reduced.
KW  - decision tree
KW  - HOMA
KW  - impaired glucose tolerance
KW  - insulin
KW  - insulin resistance
KW  - polycystic ovary syndrome
KW  - proinsulin
KW  - type 2 diabetes mellitus
Y1  - 2006
U6  - https://doi.org/10.1007/s00125-006-0395-0
SN  - 0012-186X
VL  - 49
SP  - 2572
EP  - 2579
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Edlich-Muth, Christian
A1  - Muraya, Moses M.
A1  - Altmann, Thomas
A1  - Selbig, Joachim
T1  - Phenomic prediction of maize hybrids
JF  - Biosystems : journal of biological and information processing sciences
N2  - Phenomic experiments are carried out in large-scale plant phenotyping facilities that acquire a large number of pictures of hundreds of plants simultaneously. With the aid of automated image processing, the data are converted into genotype-feature matrices that cover many consecutive days of development. Here, we explore the possibility of predicting the biomass of the fully grown plant from early developmental stage image-derived features. We performed phenomic experiments on 195 inbred and 382 hybrid maizes varieties and followed their progress from 16 days after sowing (DAS) to 48 DAS with 129 image-derived features. By applying sparse regression methods, we show that 73% of the variance in hybrid fresh weight of fully-grown plants is explained by about 20 features at the three-leaf-stage or earlier. Dry weight prediction explained over 90% of the variance. When phenomic features of parental inbred lines were used as predictors of hybrid biomass, the proportion of variance explained was 42 and 45%, for fresh weight and dry weight models consisting of 35 and 36 features, respectively. These models were very robust, showing only a small amount of variation in performance over the time scale of the experiment. We also examined mid-parent heterosis in phenomic features. Feature heterosis displayed a large degree of variance which resulted in prediction performance that was less robust than models of either parental or hybrid predictors. Our results show that phenomic prediction is a viable alternative to genomic and metabolic prediction of hybrid performance. In particular, the utility of early-stage parental lines is very encouraging. (C) 2016 Elsevier Ireland Ltd. All rights reserved.
KW  - Hybrid prediction
KW  - LASSO
KW  - Regression
KW  - Maize
KW  - Phenomics
Y1  - 2016
U6  - https://doi.org/10.1016/j.biosystems.2016.05.008
SN  - 0303-2647
SN  - 1872-8324
VL  - 146
SP  - 102
EP  - 109
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Scholz, Matthias
A1  - Kaplan, F.
A1  - Guy, C. L.
A1  - Kopka, Joachim
A1  - Selbig, Joachim
T1  - Non-linear PCA : a missing data approach
N2  - Motivation: Visualizing and analysing the potential non-linear structure of a dataset is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs non-linear principal component analysis (NLPCA) from incomplete datasets. Missing values are ignored while optimizing the model, but can be estimated afterwards. Results are shown for both artificial and experimental datasets. In contrast to linear methods, non-linear methods were able to give better missing value estimations for non-linear structured data. Application: We applied this technique to a time course of metabolite data from a cold stress experiment on the model plant Arabidopsis thaliana, and could approximate the mapping function from any time point to the metabolite responses. Thus, the inverse NLPCA provides greatly improved information for better understanding the complex response to cold stress
Y1  - 2005
SN  - 1367-4803
ER  -