TY  - JOUR
A1  - Scholz, Matthias
A1  - Kaplan, F.
A1  - Guy, C. L.
A1  - Kopka, Joachim
A1  - Selbig, Joachim
T1  - Non-linear PCA : a missing data approach
N2  - Motivation: Visualizing and analysing the potential non-linear structure of a dataset is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs non-linear principal component analysis (NLPCA) from incomplete datasets. Missing values are ignored while optimizing the model, but can be estimated afterwards. Results are shown for both artificial and experimental datasets. In contrast to linear methods, non-linear methods were able to give better missing value estimations for non-linear structured data. Application: We applied this technique to a time course of metabolite data from a cold stress experiment on the model plant Arabidopsis thaliana, and could approximate the mapping function from any time point to the metabolite responses. Thus, the inverse NLPCA provides greatly improved information for better understanding the complex response to cold stress
Y1  - 2005
SN  - 1367-4803
ER  - 
TY  - GEN
A1  - Hische, Manuela
A1  - Larhlimi, Abdelhalim
A1  - Schwarz, Franziska
A1  - Fischer-Rosinský, Antje
A1  - Bobbert, Thomas
A1  - Assmann, Anke
A1  - Catchpole, Gareth S.
A1  - Pfeiffer, Andreas F. H.
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
A1  - Spranger, Joachim
T1  - A distinct metabolic signature predictsdevelopment of fasting plasma glucose
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background

High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.

Methods

We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.

Results

We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.

Conclusions

We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 850 
KW  - prediction
KW  - fasting glucose
KW  - type 2 diabetes
KW  - metabolomics
KW  - plasma
KW  - random forest
KW  - metabolite
KW  - regression
KW  - biomarker
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427400
SN  - 1866-8372
IS  - 850
ER  - 
TY  - JOUR
A1  - Hische, Manuela
A1  - Luis-Dominguez, Olga
A1  - Pfeiffer, Andreas F. H.
A1  - Schwarz, Peter E.
A1  - Selbig, Joachim
A1  - Spranger, Joachim
T1  - Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus
N2  - Objective: The prevalence of unknown impaired fasting glucose (IFG), impaired glucose tolerance (IGT), or type 2 diabetes mellitus (T2DM) is high. Numerous studies demonstrated that IFG, IGT, or T2DM are associated with increased cardiovascular risk, therefore an improved identification strategy would be desirable. The objective of this study was to create a simple and reliable tool to identify individuals with impaired glucose metabolism (IGM). Design and methods: A cohort of 1737 individuals (1055 controls, 682 with previously unknown IGM) was screened by 75 g oral glucose tolerance test (OGTT). Supervised machine learning was used to automatically generate decision trees to identify individuals with IGM. To evaluate the accuracy of identification, a tenfold cross-validation was performed. Resulting trees were subsequently re-evaluated in a second, independent cohort of 1998 individuals (1253 controls, 745 unknown IGM). Results: A clinical decision tree included age and systolic blood pressure (sensitivity 89.3%, specificity 37.4%, and positive predictive value (PPV) 48.0%), while a tree based on clinical and laboratory data included fasting glucose and systolic blood pressure (sensitivity 89.7%, specificity 54.6%, and PPV 56.2%). The inclusion of additional parameters did not improve test quality. The external validation approach confirmed the presented decision trees. Conclusion: We proposed a simple tool to identify individuals with existing IGM. From a practical perspective, fasting blood glucose and blood pressure measurements should be regularly measured in all individuals presenting in outpatient clinics. An OGTT appears to be useful only if the subjects are older than 48 years or show abnormalities in fasting glucose or blood pressure.
Y1  - 2010
UR  - http://www.eje-online.org/
U6  - https://doi.org/10.1530/Eje-10-0649
SN  - 0804-4643
ER  - 
TY  - JOUR
A1  - Steinfath, Matthias
A1  - Strehmel, Nadine
A1  - Peters, Rolf
A1  - Schauer, Nicolas
A1  - Groth, Detlef
A1  - Hummel, Jan
A1  - Steup, Martin
A1  - Selbig, Joachim
A1  - Kopka, Joachim
A1  - Geigenberger, Peter
A1  - Dongen, Joost T. van
T1  - Discovering plant metabolic biomarkers for phenotype prediction using an untargeted approach
N2  - Biomarkers are used to predict phenotypical properties before these features become apparent and, therefore, are valuable tools for both fundamental and applied research. Diagnostic biomarkers have been discovered in medicine many decades ago and are now commonly applied. While this is routine in the field of medicine, it is of surprise that in agriculture this approach has never been investigated. Up to now, the prediction of phenotypes in plants was based on growing plants and assaying the organs of interest in a time intensive process. For the first time, we demonstrate in this study the application of metabolomics to predict agronomic important phenotypes of a crop plant that was grown in different environments. Our procedure consists of established techniques to screen untargeted for a large amount of metabolites in parallel, in combination with machine learning methods. By using this combination of metabolomics and biomathematical tools metabolites were identified that can be used as biomarkers to improve the prediction of traits. The predictive metabolites can be selected and used subsequently to develop fast, targeted and low-cost diagnostic biomarker assays that can be implemented in breeding programs or quality assessment analysis. The identified metabolic biomarkers allow for the prediction of crop product quality. Furthermore, marker-assisted selection can benefit from the discovery of metabolic biomarkers when other molecular markers come to its limitation. The described marker selection method was developed for potato tubers, but is generally applicable to any crop and trait as it functions independently of genomic information.
Y1  - 2010
UR  - http://www3.interscience.wiley.com/cgi-bin/issn?DESCRIPTOR=PRINTISSN&VALUE=1467-7644
U6  - https://doi.org/10.1111/j.1467-7652.2010.00516.x
SN  - 1467-7644
ER  - 
TY  - JOUR
A1  - Moehlig, M.
A1  - Floeter, A.
A1  - Spranger, Joachim
A1  - Weickert, Martin O.
A1  - Schill, T.
A1  - Schloesser, H. W.
A1  - Brabant, G.
A1  - Pfeiffer, Andreas F. H.
A1  - Selbig, Joachim
A1  - Schoefl, C.
T1  - Predicting impaired glucose metabolism in women with polycystic ovary syndrome by decision tree modelling
JF  - Diabetologia : journal of the European Association for the Study of Diabetes (EASD)
N2  - Aims/hypothesis Polycystic ovary syndrome (PCOS) is a risk factor of type 2 diabetes. Screening for impaired glucose metabolism (IGM) with an OGTT has been recommended, but this is relatively time-consuming and inconvenient. Thus, a strategy that could minimise the need for an OGTT would be beneficial. Materials and methods Consecutive PCOS patients (n=118) with fasting glucose < 6.1 mmol/l were included in the study. Parameters derived from medical history, clinical examination and fasting blood samples were assessed by decision tree modelling for their ability to discriminate women with IGM (2-h OGTT value >= 7.8 mmol/l) from those with NGT. Results According to the OGTT results, 93 PCOS women had NGT and 25 had IGM. The best decision tree consisted of HOMA-IR, the proinsulin:insulin ratio, proinsulin, 17-OH progesterone and the ratio of luteinising hormone:follicle-stimulating hormone. This tree identified 69 women with NGT. The remaining 49 women included all women with IGM (100% sensitivity, 74% specificity to detect IGM). Pruning this tree to three levels still identified 53 women with NGT (100% sensitivity, 57% specificity to detect IGM). Restricting the data matrix used for tree modelling to medical history and clinical parameters produced a tree using BMI, waist circumference and WHR. Pruning this tree to two levels separated 27 women with NGT (100% sensitivity, 29% specificity to detect IGM). The validity of both trees was tested by a leave-10%-out cross-validation. Conclusions/interpretation Decision trees are useful tools for separating PCOS women with NGT from those with IGM. They can be used for stratifying the metabolic screening of PCOS women, whereby the number of OGTTs can be markedly reduced.
KW  - decision tree
KW  - HOMA
KW  - impaired glucose tolerance
KW  - insulin
KW  - insulin resistance
KW  - polycystic ovary syndrome
KW  - proinsulin
KW  - type 2 diabetes mellitus
Y1  - 2006
U6  - https://doi.org/10.1007/s00125-006-0395-0
SN  - 0012-186X
VL  - 49
SP  - 2572
EP  - 2579
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Steuer, Ralf
A1  - Humburg, Peter
A1  - Selbig, Joachim
T1  - Validation and functional annotation of expression-based clusters based on gene ontology
JF  - BMC bioinformatics
N2  - Background: The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. Results: In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12: 1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. Conclusion: We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.
Y1  - 2006
U6  - https://doi.org/10.1186/1471-2105-7-380
SN  - 1471-2105
VL  - 7
IS  - 380
PB  - BioMed Central
CY  - London
ER  - 
TY  - GEN
A1  - Neigenfind, Jost
A1  - Gyetvai, Gabor
A1  - Basekow, Rico
A1  - Diehl, Svenja
A1  - Achenbach, Ute
A1  - Gebhardt, Christiane
A1  - Selbig, Joachim
A1  - Kersten, Birgit
T1  - Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background: Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous.

Results: Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.

Conclusion: Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at http://www.gabipd.org/projects/satlotyper/. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 883 
KW  - linkage disequilibrium
KW  - pure parsimony
KW  - potato
KW  - resistance
KW  - efficient
KW  - solanum
KW  - Conjunctive Normal Form
KW  - Full Adder
KW  - Disjunctive Normal Form
KW  - Haplotype Inference
KW  - Genotype Inference
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-435011
SN  - 1866-8372
IS  - 883
ER  - 
TY  - JOUR
A1  - Grell, Susanne
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Modelling biological networks by action languages via set programming
Y1  - 2006
UR  - http://www.cs.uni-potsdam.de/wv/pdfformat/gebsch06c.pdf
U6  - https://doi.org/10.1007/11799573
SN  - 0302-9743
ER  - 
TY  - JOUR
A1  - Beerenwinkel, Niko
A1  - Sing, Tobias
A1  - Lengauer, Thomas
A1  - Rahnenfuhrer, Joerg
A1  - Roomp, Kirsten
A1  - Savenkov, Igor
A1  - Fischer, Roman
A1  - Hoffmann, Daniel
A1  - Selbig, Joachim
A1  - Korn, Klaus
A1  - Walter, Hauke
A1  - Berg, Thomas
A1  - Braun, Patrick
A1  - Faetkenheuer, Gerd
A1  - Oette, Mark
A1  - Rockstroh, Juergen
A1  - Kupfer, Bernd
A1  - Kaiser, Rolf
A1  - Daeumer, Martin
T1  - Computational methods for the design of effective therapies against drug resistant HIV strains
N2  - The development of drug resistance is a major obstacle to successful treatment of HIV infection. The extraordinary replication dynamics of HIV facilitates its escape from selective pressure exerted by the human immune system and by combination drug therapy. We have developed several computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genomic data
Y1  - 2005
ER  - 
TY  - JOUR
A1  - Hummel, Jan
A1  - Keshvari, N.
A1  - Weckwerth, Wolfram
A1  - Selbig, Joachim
T1  - Species-specific analysis of protein sequence motifs using mutual information
N2  - Background: Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. Such profiles conveniently reflect functional necessities by pointing out proximity at conserved sequence positions as well as depicting distances at variable positions. Discovering significant conservation characteristics within the variable positions of profiles mirrors group-specific and, in particular, evolutionary features of the underlying sequences. Results: We describe the tool PROfile analysis based on Mutual Information (PROMI) that enables comparative analysis of user-classified protein sequences. PROMI is implemented as a web service using Perl and R as well as other publicly available packages and tools on the server-side. On the client-side platform-independence is achieved by generally applied internet delivery standards. As one possible application analysis of the zinc finger C2H2-type protein domain is introduced to illustrate the functionality of the tool. Conclusion: The web service PROMI should assist researchers to detect evolutionary correlations in protein profiles of defined biological sequences. It is available at http:// promi.mpimpgolm. mpg.de where additional documentation can be found
Y1  - 2005
SN  - 1471-2105
ER  - 
TY  - JOUR
A1  - Cordes, Frank
A1  - Kaiser, Rolf
A1  - Selbig, Joachim
T1  - Bioinformatics approach to predicting HIV drug resistance
N2  - The emergence of drug resistance remains one of the most challenging issues in the treatment of HIV-1 infection. The extreme replication dynamics of HIV facilitates its escape from the selective pressure exerted by the human immune system and by the applied combination drug therapy. This article reviews computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genotypic and phenotypic data. Genotypic assays are based on the analysis of mutations associated with reduced drug susceptibility, but are difficult to interpret due to the numerous mutations and mutational patterns that confer drug resistance. Phenotypic resistance or susceptibility can be experimentally evaluated by measuring the inhibition of the viral replication in cell culture assays. However, this procedure is expensive and time consuming
Y1  - 2006
UR  - http://www.expert-reviews.com/loi/erm
U6  - https://doi.org/10.1586/14737159.6.2.207
SN  - 1473-7159
ER  - 
TY  - JOUR
A1  - Flöter, André
A1  - Selbig, Joachim
A1  - Schaub, Torsten H.
T1  - Finding metabolic pathways in decision forests
Y1  - 2004
SN  - 3-540-23221-4
ER  - 
TY  - JOUR
A1  - Flöter, André
A1  - Nicolas, Jacques
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Threshold extraction in metabolite concentration data
N2  - Motivation: Continued development of analytical techniques based on gas chromatography and mass spectrometry now facilitates the generation of larger sets of metabolite concentration data. An important step towards the understanding of metabolite dynamics is the recognition of stable states where metabolite concentrations exhibit a simple behaviour. Such states can be characterized through the identification of significant thresholds in the concentrations. But general techniques for finding discretization thresholds in continuous data prove to be practically insufficient for detecting states due to the weak conditional dependences in concentration data. Results: We introduce a method of recognizing states in the framework of decision tree induction. It is based upon a global analysis of decision forests where stability and quality are evaluated. It leads to the detection of thresholds that are both comprehensible and robust. Applied to metabolite concentration data, this method has led to the discovery of hidden states in the corresponding variables. Some of these reflect known properties of the biological experiments, and others point to putative new states
Y1  - 2004
ER  - 
TY  - JOUR
A1  - Daub, Carsten O.
A1  - Steuer, Ralf
A1  - Selbig, Joachim
A1  - Kloska, Sebastian
T1  - Estimating mutual information using B-spline functions : an improved similarity measure for analysing gene expression data
N2  - Background: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non- commercial use from kloska@scienion.de upon request. Conclusion: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended
Y1  - 2004
SN  - 1471-2105
ER  - 
TY  - JOUR
A1  - Flöter, André
A1  - Nicolas, Jacques
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Threshold extraction in metabolite concentration data
Y1  - 2003
UR  - http://www.cs.uni-potsdam.de/wv/pdfformat/floeterGCB2003.pdf
ER  - 
TY  - GEN
A1  - Szymanski, Jedrzej
A1  - Jozefczuk, Szymon
A1  - Nikoloski, Zoran
A1  - Selbig, Joachim
A1  - Nikiforova, Victoria
A1  - Catchpole, Gareth
A1  - Willmitzer, Lothar
T1  - Stability of metabolic correlations under changing environmental conditions in Escherichia coli : a systems approach
N2  - Background: Biological systems adapt to changing environments by reorganizing their cellula r and physiological program with metabolites representing one important response level. Different stresses lead to both conserved and specific responses on the metabolite level which should be reflected in the underl ying metabolic network. Methodology/Principal Findings: Starting from experimental data obtained by a GC-MS based high-throughput metabolic profiling technology we here develop an approach that: (1) extracts network representations from metabolic conditiondependent data by using pairwise correlations, (2) determines the sets of stable and condition-dependent correlations based on a combination of statistical significance and homogeneity tests, and (3) can identify metabolites related to the stress response, which goes beyond simple ob servation s about the changes of metabolic concentrations. The approach was tested with Escherichia colias a model organism observed under four different environmental stress conditions (cold stress, heat stress, oxidative stress, lactose diau xie) and control unperturbed conditions. By constructing the stable network component, which displays a scale free topology and small-world characteristics, we demonstrated that: (1) metabolite hubs in this reconstructed correlation networks are significantly enriched for those contained in biochemical networks such as EcoCyc, (2) particular components of the stable network are enriched for functionally related biochemical path ways, and (3) ind ependently of the response scale, based on their importance in the reorganization of the cor relation network a set of metabolites can be identified which represent hypothetical candidates for adjusting to a stress-specific response. Conclusions/Significance: Network-based tools allowed the identification of stress-dependent and general metabolic correlation networks. This correlation-network-ba sed approach does not rely on major changes in concentration to identify metabolites important for st ress adaptation, but rather on the changes in network properties with respect to metabolites. This should represent a useful complementary technique in addition to more classical approaches.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - paper 147 
KW  - Small-world networks
KW  - saccharomyces-cerevisiae
KW  - trehalose synthesis
KW  - gene-expression
KW  - stress-response
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45253
ER  - 
TY  - GEN
A1  - Durek, Pawel
A1  - Schudoma, Christian
A1  - Weckwerth, Wolfram
A1  - Selbig, Joachim
A1  - Walther, Dirk
T1  - Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins
N2  - Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structurebased P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - paper 141 
KW  - Support vector machines
KW  - Microarray data
KW  - Docking interactions
KW  - Signal-transduction
KW  - Sequence alignment
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45129
ER  - 
TY  - GEN
A1  - Rajasundaram, Dhivyaa
A1  - Selbig, Joachim
T1  - More effort — more results
BT  - recent advances in integrative ‘omics’ data analysis
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - The development of 'omics' technologies has progressed to address complex biological questions that underlie various plant functions thereby producing copious amounts of data. The need to assimilate large amounts of data into biologically meaningful interpretations has necessitated the development of statistical methods to integrate multidimensional information. Throughout this review, we provide examples of recent outcomes of 'omics' data integration together with an overview of available statistical methods and tools.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 923 
KW  - principal component
KW  - plant biology
KW  - package
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-442639
SN  - 1866-8372
IS  - 923
SP  - 57
EP  - 61
ER  - 
TY  - GEN
A1  - Köhl, Karin I.
A1  - Basler, Georg
A1  - Lüdemann, Alexander
A1  - Selbig, Joachim
A1  - Walther, Dirk
T1  - A plant resource and experiment management system based on the Golm Plant Database as a basic tool for omics research
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background: For omics experiments, detailed characterisation of experimental material with respect to its genetic features, its cultivation history and its treatment history is a requirement for analyses by bioinformatics tools and for publication needs. Furthermore, meta-analysis of several experiments in systems biology based approaches make it necessary to store this information in a standardised manner, preferentially in relational databases. In the Golm Plant Database System, we devised a data management system based on a classical Laboratory Information Management System combined with web-based user interfaces for data entry and retrieval to collect this information in an academic environment.

Results: The database system contains modules representing the genetic features of the germplasm, the experimental conditions and the sampling details. In the germplasm module, genetically identical lines of biological material are generated by defined workflows, starting with the import workflow, followed by further workflows like genetic modification (transformation), vegetative or sexual reproduction. The latter workflows link lines and thus create pedigrees. For experiments, plant objects are generated from plant lines and united in so-called cultures, to which the cultivation conditions are linked. Materials and methods for each cultivation step are stored in a separate ACCESS database of the plant cultivation unit. For all cultures and thus every plant object, each cultivation site and the culture's arrival time at a site are logged by a barcode-scanner based system. Thus, for each plant object, all site-related parameters, e. g. automatically logged climate data, are available. These life history data and genetic information for the plant objects are linked to analytical results by the sampling module, which links sample components to plant object identifiers. This workflow uses controlled vocabulary for organs and treatments. Unique names generated by the system and barcode labels facilitate identification and management of the material. Web pages are provided as user interfaces to facilitate maintaining the system in an environment with many desktop computers and a rapidly changing user community. Web based search tools are the basis for joint use of the material by all researchers of the institute.

Conclusion: The Golm Plant Database system, which is based on a relational database, collects the genetic and environmental information on plant material during its production or experimental use at the Max-Planck-Institute of Molecular Plant Physiology. It thus provides information according to the MIAME standard for the component 'Sample' in a highly standardised format. The Plant Database system thus facilitates collaborative work and allows efficient queries in data analysis for systems biology research.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 830 
KW  - microarray data
KW  - arabidopsis
KW  - information
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427595
IS  - 830
ER  - 
TY  - GEN
A1  - Dworschak, Steve
A1  - Grell, Susanne
A1  - Nikiforova, Victoria J.
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Modeling biological networks by action languages via answer set programming
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - We describe an approach to modeling biological networks by action languages via answer set programming. To this end, we propose an action language for modeling biological networks, building on previous work by Baral et al. We introduce its syntax and semantics along with a translation into answer set programming, an efficient Boolean Constraint Programming Paradigm. Finally, we describe one of its applications, namely, the sulfur starvation response-pathway of the model plant Arabidopsis thaliana and sketch the functionality of our system and its usage.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 843 
KW  - biological network model
KW  - action language
KW  - answer set programming
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429846
SN  - 1866-8372
IS  - 843
ER  - 
TY  - GEN
A1  - Repsilber, Dirk
A1  - Kern, Sabine
A1  - Telaar, Anna
A1  - Walzl, Gerhard
A1  - Black, Gillian F.
A1  - Selbig, Joachim
A1  - Parida, Shreemanta K.
A1  - Kaufmann, Stefan H. E.
A1  - Jacobsen, Marc
T1  - Biomarker discovery in heterogeneous tissue samples
BT  - taking the in-silico deconfounding approach
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.

Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.

Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 854 
KW  - differential gene expression
KW  - quantile normalization
KW  - heterogeneous tissue
KW  - gene expression matrix
KW  - homogeneous cell population
KW  - selection
KW  - microdissection
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429343
SN  - 1866-8372
IS  - 854
ER  - 
TY  - JOUR
A1  - Sulpice, Ronan
A1  - Nikoloski, Zoran
A1  - Tschoep, Hendrik
A1  - Antonio, Carla
A1  - Kleessen, Sabrina
A1  - Larhlimi, Abdelhalim
A1  - Selbig, Joachim
A1  - Ishihara, Hirofumi
A1  - Gibon, Yves
A1  - Fernie, Alisdair R.
A1  - Stitt, Mark
T1  - Impact of the Carbon and Nitrogen Supply on Relationships and Connectivity between Metabolism and Biomass in a Broad Panel of Arabidopsis Accessions(1[W][OA])
JF  - Plant physiology : an international journal devoted to physiology, biochemistry, cellular and molecular biology, biophysics and environmental biology of plants
N2  - Natural genetic diversity provides a powerful tool to study the complex interrelationship between metabolism and growth. Profiling of metabolic traits combined with network-based and statistical analyses allow the comparison of conditions and identification of sets of traits that predict biomass. However, it often remains unclear why a particular set of metabolites is linked with biomass and to what extent the predictive model is applicable beyond a particular growth condition. A panel of 97 genetically diverse Arabidopsis (Arabidopsis thaliana) accessions was grown in near-optimal carbon and nitrogen supply, restricted carbon supply, and restricted nitrogen supply and analyzed for biomass and 54 metabolic traits. Correlation-based metabolic networks were generated from the genotype-dependent variation in each condition to reveal sets of metabolites that show coordinated changes across accessions. The networks were largely specific for a single growth condition. Partial least squares regression from metabolic traits allowed prediction of biomass within and, slightly more weakly, across conditions (cross-validated Pearson correlations in the range of 0.27-0.58 and 0.21-0.51 and P values in the range of <0.001-<0.13 and <0.001-<0.023, respectively). Metabolic traits that correlate with growth or have a high weighting in the partial least squares regression were mainly condition specific and often related to the resource that restricts growth under that condition. Linear mixed-model analysis using the combined metabolic traits from all growth conditions as an input indicated that inclusion of random effects for the conditions improves predictions of biomass. Thus, robust prediction of biomass across a range of conditions requires condition-specific measurement of metabolic traits to take account of environment-dependent changes of the underlying networks.
Y1  - 2013
U6  - https://doi.org/10.1104/pp.112.210104
SN  - 0032-0889
SN  - 1532-2548
VL  - 162
IS  - 1
SP  - 347
EP  - 363
PB  - American Society of Plant Physiologists
CY  - Rockville
ER  - 
TY  - JOUR
A1  - Timmer, Marco
A1  - Theiss, Hans
A1  - Jürchott, Katrin
A1  - Ries, Christian
A1  - Paron, Igor
A1  - Franz, W.
A1  - Selbig, Joachim
A1  - Guo, Ketai
A1  - Tonn, Jörg
A1  - Schichor, Christian
T1  - Stromal-Derived Factor 1a (Sdf-1a), a Homing Factor for Mesenchymal Progenitor Cells, Is Elevated in Tumor Tissue and Plasma of Glioma Patients
N2  - Malignant gliomas are a fatal disease lacking sufficient possibilities for early diagnosis and chemical markers to detect remission or relapse. The recruitment of progenitor cells such as mesenchymal stem cells (MSC) is a main feature of gliomas. Stromal cell-derived factor-1 (SDF-1), a chemokine produced in glioma cell lines, enhances migration in MSC and has been associated with cell survival and apoptosis in gliomas. Therefore, this study was performed to evaluate (i) whether SDF-1 and its receptors are expressed in human malignant gliomas in situ and (ii) if SDF-1 might potentially play a role in recruiting MSCs into human glioma. In glioblastoma tissue, immunohistochemistry revealed that SDF-1 and its receptor CXCR4 are expressed in regions of angiogenesis and necrosis, and qPCR showed that SDF-1 is elevated. Public expression data indicated that CXCR4 was upregulated. The latter data also illustrate that SDF-1 could be up- or downregulated in glioma compared to normal brain in a transcript-specific manner. In plasma, SDF-1 is elevated in glioma patients. The level is reduced by both dexamethasone intake and surgery. Dexamethasone also decreased SDF-1 production in cells in vitro. The undirected migration of human MSC (hMSC) was not enhanced by the addition of SDF-1. However, SDF-1 stimulated directed invasion of hMSC in a dose-dependent manner. Taken together, we show that SDF-1 is a potent chemoattractant of progenitor cells such as hMSCs and that its expression is elevated in glioma tissue, which results in elevated SDF-1 levels in the patient's plasma samples with concomittant decrease after tumor resection. The fact that elevated SDF-1 plasma levels are significantly decreased after tumor surgery could be a first hint that SDF-1 might act as tumor marker for malignant gliomas in order to detect disease progression or remission, respectively.
Y1  - 2010
UR  - http://neuro-oncology.oxfordjournals.org/
SN  - 1522-8517
ER  - 
TY  - JOUR
A1  - Nikoloski, Zoran
A1  - May, Patrick
A1  - Selbig, Joachim
T1  - A new network model explains the evolution of plant-specific metabolic networks
Y1  - 2009
UR  - http://www.sciencedirect.com/science/journal/10956433
U6  - https://doi.org/10.1016/j.cbpa.2009.04.567
SN  - 1095-6433
ER  - 
TY  - JOUR
A1  - Bulik, Sascha
A1  - Grimbs, Sergio
A1  - Huthmacher, Carola
A1  - Selbig, Joachim
A1  - Holzhutter, Hermann G.
T1  - Kinetic hybrid models composed of mechanistic and simplified enzymatic rate laws : a promising method for speeding up the kinetic modelling of complex metabolic networks
N2  - Kinetic modelling of complex metabolic networks - a central goal of computational systems biology - is currently hampered by the lack of reliable rate equations for the majority of the underlying biochemical reactions and membrane transporters. On the basis of biochemically substantiated evidence that metabolic control is exerted by a narrow set of key regulatory enzymes, we propose here a hybrid modelling approach in which only the central regulatory enzymes are described by detailed mechanistic rate equations, and the majority of enzymes are approximated by simplified (nonmechanistic) rate equations (e.g. mass action, LinLog, Michaelis-Menten and power law) capturing only a few basic kinetic features and hence containing only a small number of parameters to be experimentally determined. To check the reliability of this approach, we have applied it to two different metabolic networks, the energy and redox metabolism of red blood cells, and the purine metabolism of hepatocytes, using in both cases available comprehensive mechanistic models as reference standards. Identification of the central regulatory enzymes was performed by employing only information on network topology and the metabolic data for a single reference state of the network [Grimbs S, Selbig J, Bulik S, Holzhutter HG & Steuer R (2007) Mol Syst Biol3, 146, doi:10.1038/msb4100186]. Calculations of stationary and temporary states under various physiological challenges demonstrate the good performance of the hybrid models. We propose the hybrid modelling approach as a means to speed up the development of reliable kinetic models for complex metabolic networks.
Y1  - 2009
UR  - http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291742-4658
U6  - https://doi.org/10.1111/j.1742-4658.2008.06784.x
SN  - 1742-464X
ER  - 
TY  - JOUR
A1  - Schichor, Christian
A1  - Albrecht, Valerie
A1  - Korte, Benjamin
A1  - Buchner, Alexander
A1  - Riesenberg, Rainer
A1  - Mysliwietz, Josef
A1  - Paron, Igor
A1  - Motaln, Helena
A1  - Turnsek, Tamara Lah
A1  - Juerchott, Kathrin
A1  - Selbig, Joachim
A1  - Tonn, Jörg-Christian
T1  - Mesenchymal stem cells and glioma cells form a structural as well as a functional syncytium in vitro
JF  - Experimental neurology
N2  - The interaction of human mesenchymal stem cells (hMSCs) and tumor cells has been investigated in various contexts. HMSCs are considered as cellular treatment vectors based on their capacity to migrate towards a malignant lesion. However, concerns about unpredictable behavior of transplanted hMSCs are accumulating. In malignant gliomas, the recruitment mechanism is driven by glioma-secreted factors which lead to accumulation of both, tissue specific stem cells as well as bone marrow derived hMSCs within the tumor. The aim of the present work was to study specific cellular interactions between hMSCs and glioma cells in vitro. We show, that glioma cells as well as hMSCs differentially express connexins. and that they interact via gap-junctional coupling. Besides this so-called functional syncytium formation, we also provide evidence of cell fusion events (structural syncytium). These complex cellular interactions led to an enhanced migration and altered proliferation of both, tumor and mesenchymal stem cell types in vitro. The presented work shows that glioma cells display signs of functional as well as structural syncytium formation with hMSCs in vitro. The described cellular phenomena provide new insight into the complexity of interaction patterns between tumor cells and host cells. Based on these findings, further studies are warranted to define the impact of a functional or structural syncytium formation on malignant tumors and cell based therapies in vivo.
KW  - Mesenchymal stem cell
KW  - Glioma
KW  - Syncytium
KW  - Gap junction
KW  - Fusion
Y1  - 2012
U6  - https://doi.org/10.1016/j.expneurol.2011.12.033
SN  - 0014-4886
VL  - 234
IS  - 1
SP  - 208
EP  - 219
PB  - Elsevier
CY  - San Diego
ER  - 
TY  - JOUR
A1  - Basler, Georg
A1  - Ebenhoeh, Oliver
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Mass-balanced randomization of metabolic networks
JF  - Bioinformatics
N2  - Motivation: Network-centered studies in systems biology attempt to integrate the topological properties of biological networks with experimental data in order to make predictions and posit hypotheses. For any topology-based prediction, it is necessary to first assess the significance of the analyzed property in a biologically meaningful context. Therefore, devising network null models, carefully tailored to the topological and biochemical constraints imposed on the network, remains an important computational problem.
 Results: We first review the shortcomings of the existing generic sampling scheme-switch randomization-and explain its unsuitability for application to metabolic networks. We then devise a novel polynomial-time algorithm for randomizing metabolic networks under the (bio)chemical constraint of mass balance. The tractability of our method follows from the concept of mass equivalence classes, defined on the representation of compounds in the vector space over chemical elements. We finally demonstrate the uniformity of the proposed method on seven genome-scale metabolic networks, and empirically validate the theoretical findings. The proposed method allows a biologically meaningful estimation of significance for metabolic network properties.
Y1  - 2011
U6  - https://doi.org/10.1093/bioinformatics/btr145
SN  - 1367-4803
VL  - 27
IS  - 10
SP  - 1397
EP  - 1403
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Höhenwarter, Wolfgang
A1  - Larhlimi, Abdelhalim
A1  - Hummel, Jan
A1  - Egelhofer, Volker
A1  - Selbig, Joachim
A1  - van Dongen, Joost T.
A1  - Wienkoop, Stefanie
A1  - Weckwerth, Wolfram
T1  - MAPA Distinguishes genotype-specific variability of highly similar regulatory protein isoforms in potato tuber
JF  - Journal of proteome research
N2  - Mass Accuracy Precursor Alignment is a fast and flexible method for comparative proteome analysis that allows the comparison of unprecedented numbers of shotgun proteomics analyses on a personal computer in a matter of hours. We compared 183 LC-MS analyses and more than 2 million MS/MS spectra and could define and separate the proteomic phenotypes of field grown tubers of 12 tetraploid cultivars of the crop plant Solanum tuberosum. Protein isoforms of patatin as well as other major gene families such as lipoxygenase and cysteine protease inhibitor that regulate tuber development were found to be the primary source of variability between the cultivars. This suggests that differentially expressed protein isoforms modulate genotype specific tuber development and the plant phenotype. We properly assigned the measured abundance of tryptic peptides to different protein isoforms that share extensive stretches of primary structure and thus inferred their abundance. Peptides unique to different protein isoforms were used to classify the remaining peptides assigned to the entire subset of isoforms based on a common abundance profile using multivariate statistical procedures. We identified nearly 4000,proteins which we used for quantitative functional annotation making this the most extensive study of the tuber proteome to date.
KW  - comparative proteomics
KW  - mass accuracy
KW  - protein isoforms
KW  - potato tuber
KW  - lipoxygenase
KW  - protease inhibitor
KW  - phenotype
KW  - genetic variability
Y1  - 2011
U6  - https://doi.org/10.1021/pr101109a
SN  - 1535-3893
VL  - 10
IS  - 7
SP  - 2979
EP  - 2991
PB  - American Chemical Society
CY  - Washington
ER  - 
TY  - JOUR
A1  - Juerchott, Kathrin
A1  - Guo, Ke-Tai
A1  - Catchpole, Gareth
A1  - Feher, Kristen
A1  - Willmitzer, Lothar
A1  - Schichor, Christian
A1  - Selbig, Joachim
T1  - Comparison of metabolite profiles in U87 glioma cells and mesenchymal stem cells
JF  - Biosystems : journal of biological and information processing sciences
N2  - Gas chromatography-mass spectrometry (GC-MS) profiles were generated from U87 glioma cells and human mesenchymal stem cells (hMSC). 37 metabolites representing glycolysis intermediates, TCA cycle metabolites, amino acids and lipids were selected for a detailed analysis. The concentrations of these. metabolites were compared and Pearson correlation coefficients were used to calculate the relationship between pairs of metabolites. Metabolite profiles and correlation patterns differ significantly between the two cell lines. These profiles can be considered as a signature of the underlying biochemical system and provide snap-shots of the metabolism in mesenchymal stem cells and tumor cells.
KW  - Metabolite profiles
KW  - Correlation networks
KW  - U87 glioma cells
KW  - Human mesenchymal stem cells
Y1  - 2011
U6  - https://doi.org/10.1016/j.biosystems.2011.05.005
SN  - 0303-2647
VL  - 105
IS  - 2
SP  - 130
EP  - 139
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Childs, Dorothee
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
T1  - Refined elasticity sampling for Monte Carlo-based identification of stabilizing network patterns
JF  - Bioinformatics
N2  - Motivation: Structural kinetic modelling (SKM) is a framework to analyse whether a metabolic steady state remains stable under perturbation, without requiring detailed knowledge about individual rate equations. It provides a representation of the system's Jacobian matrix that depends solely on the network structure, steady state measurements, and the elasticities at the steady state. For a measured steady state, stability criteria can be derived by generating a large number of SKMs with randomly sampled elasticities and evaluating the resulting Jacobian matrices. The elasticity space can be analysed statistically in order to detect network positions that contribute significantly to the perturbation response. Here, we extend this approach by examining the kinetic feasibility of the elasticity combinations created during Monte Carlo sampling.
 Results: Using a set of small example systems, we show that the majority of sampled SKMs would yield negative kinetic parameters if they were translated back into kinetic models. To overcome this problem, a simple criterion is formulated that mitigates such infeasible models. After evaluating the small example pathways, the methodology was used to study two steady states of the neuronal TCA cycle and the intrinsic mechanisms responsible for their stability or instability. The findings of the statistical elasticity analysis confirm that several elasticities are jointly coordinated to control stability and that the main source for potential instabilities are mutations in the enzyme alpha-ketoglutarate dehydrogenase.
Y1  - 2015
U6  - https://doi.org/10.1093/bioinformatics/btv243
SN  - 1367-4803
SN  - 1460-2059
VL  - 31
IS  - 12
SP  - 214
EP  - 220
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Girbig, Dorothee
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
T1  - Systematic analysis of stability patterns in plant primary metabolism
JF  - PLoS one
N2  - Metabolic networks are characterized by complex interactions and regulatory mechanisms between many individual components. These interactions determine whether a steady state is stable to perturbations. Structural kinetic modeling (SKM) is a framework to analyze the stability of metabolic steady states that allows the study of the system Jacobian without requiring detailed knowledge about individual rate equations. Stability criteria can be derived by generating a large number of structural kinetic models (SK-models) with randomly sampled parameter sets and evaluating the resulting Jacobian matrices. Until now, SKM experiments applied univariate tests to detect the network components with the largest influence on stability. In this work, we present an extended SKM approach relying on supervised machine learning to detect patterns of enzyme-metabolite interactions that act together in an orchestrated manner to ensure stability. We demonstrate its application on a detailed SK-model of the Calvin-Benson cycle and connected pathways. The identified stability patterns are highly complex reflecting that changes in dynamic properties depend on concerted interactions between several network components. In total, we find more patterns that reliably ensure stability than patterns ensuring instability. This shows that the design of this system is strongly targeted towards maintaining stability. We also investigate the effect of allosteric regulators revealing that the tendency to stability is significantly increased by including experimentally determined regulatory mechanisms that have not yet been integrated into existing kinetic models.
Y1  - 2012
U6  - https://doi.org/10.1371/journal.pone.0034686
SN  - 1932-6203
VL  - 7
IS  - 4
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Larhlimi, Abdelhalim
A1  - David, Laszlo
A1  - Selbig, Joachim
A1  - Bockmayr, Alexander
T1  - F2C2: a fast tool for the computation of flux coupling in genome-scale metabolic networks
JF  - BMC bioinformatics
N2  - Background: Flux coupling analysis (FCA) has become a useful tool in the constraint-based analysis of genome-scale metabolic networks. FCA allows detecting dependencies between reaction fluxes of metabolic networks at steady-state. On the one hand, this can help in the curation of reconstructed metabolic networks by verifying whether the coupling between reactions is in agreement with the experimental findings. On the other hand, FCA can aid in defining intervention strategies to knock out target reactions.
 Results: We present a new method F2C2 for FCA, which is orders of magnitude faster than previous approaches. As a consequence, FCA of genome-scale metabolic networks can now be performed in a routine manner.
 Conclusions: We propose F2C2 as a fast tool for the computation of flux coupling in genome-scale metabolic networks. F2C2 is freely available for non-commercial use at https://sourceforge.net/projects/f2c2/files/.
Y1  - 2012
U6  - https://doi.org/10.1186/10.1186/1471-2105-13-57
SN  - 1471-2105
VL  - 13
PB  - BioMed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Rajasundaram, Dhivyaa
A1  - Runavot, Jean-Luc
A1  - Guo, Xiaoyuan
A1  - Willats, William G. T.
A1  - Meulewaeter, Frank
A1  - Selbig, Joachim
T1  - Understanding the relationship between cotton fiber properties and non-cellulosic cell wall polysaccharides
JF  - PLoS one
N2  - A detailed knowledge of cell wall heterogeneity and complexity is crucial for understanding plant growth and development. One key challenge is to establish links between polysaccharide-rich cell walls and their phenotypic characteristics. It is of particular interest for some plant material, like cotton fibers, which are of both biological and industrial importance. To this end, we attempted to study cotton fiber characteristics together with glycan arrays using regression based approaches. Taking advantage of the comprehensive microarray polymer profiling technique (CoMPP), 32 cotton lines from different cotton species were studied. The glycan array was generated by sequential extraction of cell wall polysaccharides from mature cotton fibers and screening samples against eleven extensively characterized cell wall probes. Also, phenotypic characteristics of cotton fibers such as length, strength, elongation and micronaire were measured. The relationship between the two datasets was established in an integrative manner using linear regression methods. In the conducted analysis, we demonstrated the usefulness of regression based approaches in establishing a relationship between glycan measurements and phenotypic traits. In addition, the analysis also identified specific polysaccharides which may play a major role during fiber development for the final fiber characteristics. Three different regression methods identified a negative correlation between micronaire and the xyloglucan and homogalacturonan probes. Moreover, homogalacturonan and callose were shown to be significant predictors for fiber length. The role of these polysaccharides was already pointed out in previous cell wall elongation studies. Additional relationships were predicted for fiber strength and elongation which will need further experimental validation.
Y1  - 2014
U6  - https://doi.org/10.1371/journal.pone.0112168
SN  - 1932-6203
VL  - 9
IS  - 11
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Kanzleiter, Timo
A1  - Jaehnert, Markus
A1  - Schulze, Gunnar
A1  - Selbig, Joachim
A1  - Hallahan, Nicole
A1  - Schwenk, Robert Wolfgang
A1  - Schürmann, Annette
T1  - Exercise training alters DNA methylation patterns in genes related to muscle growth and differentiation in mice
JF  - American journal of physiology : Endocrinology and metabolism
N2  - The adaptive response of skeletal muscle to exercise training is tightly controlled and therefore requires transcriptional regulation. DNA methylation is an epigenetic mechanism known to modulate gene expression, but its contribution to exercise-induced adaptations in skeletal muscle is not well studied. Here, we describe a genome-wide analysis of DNA methylation in muscle of trained mice (n = 3). Compared with sedentary controls, 2,762 genes exhibited differentially methylated CpGs (P < 0.05, meth diff >5%, coverage > 10) in their putative promoter regions. Alignment with gene expression data (n = 6) revealed 200 genes with a negative correlation between methylation and expression changes in response to exercise training. The majority of these genes were related to muscle growth and differentiation, and a minor fraction involved in metabolic regulation. Among the candidates were genes that regulate the expression of myogenic regulatory factors (Plexin A2) as well as genes that participate in muscle hypertrophy (Igfbp4) and motor neuron innervation (Dok7). Interestingly, a transcription factor binding site enrichment study discovered significantly enriched occurrence of CpG methylation in the binding sites of the myogenic regulatory factors MyoD and myogenin. These findings suggest that DNA methylation is involved in the regulation of muscle adaptation to regular exercise training.
KW  - DNA methylation
KW  - regular exercise training
KW  - muscle development
Y1  - 2015
U6  - https://doi.org/10.1152/ajpendo.00289.2014
SN  - 0193-1849
SN  - 1522-1555
VL  - 308
IS  - 10
SP  - E912
EP  - E920
PB  - American Chemical Society
CY  - Bethesda
ER  - 
TY  - JOUR
A1  - Andorf, Sandra
A1  - Meyer, Rhonda C.
A1  - Selbig, Joachim
A1  - Altmann, Thomas
A1  - Repsilber, Dirk
T1  - Integration of a systems biological network analysis and QTL results for biomass heterosis in arabidopsis thaliana
JF  - PLoS one
N2  - To contribute to a further insight into heterosis we applied an integrative analysis to a systems biological network approach and a quantitative genetics analysis towards biomass heterosis in early Arabidopsis thaliana development. The study was performed on the parental accessions C24 and Col-0 and the reciprocal crosses. In an over-representation analysis it was tested if the overlap between the resulting gene lists of the two approaches is significantly larger than expected by chance. Top ranked genes in the results list of the systems biological analysis were significantly over-represented in the heterotic QTL candidate regions for either hybrid as well as regarding mid-parent and best-parent heterosis. This suggests that not only a few but rather several genes that influence biomass heterosis are located within each heterotic QTL region. Furthermore, the overlapping resulting genes of the two integrated approaches were particularly enriched in biomass related pathways. A chromosome-wise over-representation analysis gave rise to the hypothesis that chromosomes number 2 and 4 probably carry a majority of the genes involved in biomass heterosis in the early development of Arabidopsis thaliana.
Y1  - 2012
U6  - https://doi.org/10.1371/journal.pone.0049951
SN  - 1932-6203
VL  - 7
IS  - 11
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Nikoloski, Zoran
A1  - Grimbs, Sergio
A1  - Klie, Sebastian
A1  - Selbig, Joachim
T1  - Complexity of automated gene annotation
JF  - Biosystems : journal of biological and information processing sciences
N2  - Integration of high-throughput data with functional annotation by graph-theoretic methods has been postulated as promising way to unravel the function of unannotated genes. Here, we first review the existing graph-theoretic approaches for automated gene function annotation and classify them into two categories with respect to their relation to two instances of transductive learning on networks - with dynamic costs and with constant costs - depending on whether or not ontological relationship between functional terms is employed. The determined categories allow to characterize the computational complexity of the existing approaches and establish the relation to classical graph-theoretic problems, such as bisection and multiway cut. In addition, our results point out that the ontological form of the structured functional knowledge does not lower the complexity of the transductive learning with dynamic costs - one of the key problems in modern systems biology. The NP-hardness of automated gene annotation renders the development of heuristic or approximation algorithms a priority for additional research.
KW  - Complexity
KW  - Gene function prediction
KW  - External structural measures
KW  - Transductive learning
Y1  - 2011
U6  - https://doi.org/10.1016/j.biosystems.2010.12.003
SN  - 0303-2647
VL  - 104
IS  - 1
SP  - 1
EP  - 8
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Guo, Ke-Tai
A1  - Jürchott, Kathrin
A1  - Fu, Peng
A1  - Selbig, Joachim
A1  - Eigenbrod, Sabina
A1  - Tonn, Jörg-Christian
A1  - Schichor, Christian
T1  - Isolation and characterization of bone marrow-derived progenitor cells from malignant gliomas
JF  - Anticancer research : international journal of cancer research and treatment
N2  - Background: Malignant gliomas are highly-vascularised tumours. Neoangiogenesis is a crucial factor in the malignant behaviour of tumour and prognosis of patients. Several mechanisms are suspected to lead to neoangiogenesis, one of them is the recruitment of multipotent progenitor cells towards the tumour. Factors such as Vascular endothelial growth factor-A (VEGF-A) were described to recruit bone marrow-derived endothelial progenitor cells (EPCs) to the glioma stroma and vasculature. Little is known about isolating EPCs from normal or malignant tissues. Materials and Methods: In this study, we addressed the topic of characterization of tumour-isolated EPCs and re-defined the clonal relationship between EPCs and hematopoietic stem cells (HSCs) in gliomas. We first checked public gene expression data of glioma for putative marker expression, pointing towards a prevalence of EPCs and HSCs in glioma. Immunohistochemical staining of glioma tissue confirmed the higher expression of these progenitor markers in glioma tissue. EPCs and HSCs were consequently isolated and characterized at the phenotypic and functional levels. We applied a new isolation method, for the first time, to specimen from patients with high grade glioma including seven grade IV glioblastoma, five-grade III astrocytoma, and three grade III oligoastrocytoma. Results: In all samples, we were able to isolate the tumour-derived EPCs, which were positive for characteristic markers: CD31, CD34 and VEGFR2. The EPCs formed capillary networks in vitro and had the ability to take up acetylated low-density lipoprotein. Glioma-derived HSCs were positive for CD34 and CD45, but they were unable to form a capillary network in vitro. These findings on tumour-derived EPCs/HSCs were in concordance with the results, derived from peripheral blood of healthy volunteers. Conclusion: In our study, we established a new method for EPC/HSC isolation from human gliomas, defined the contribution of EPCs and HSCs to the tumour tissue, and highlighted the intense in vivo tumour host interaction.
KW  - Glioma
KW  - endothelial progenitor cell
KW  - hematopoietic stem cell
Y1  - 2012
SN  - 0250-7005
VL  - 32
IS  - 11
SP  - 4971
EP  - 4982
PB  - International Institute of Anticancer Research
CY  - Athens
ER  - 
TY  - JOUR
A1  - Grimbs, Sergio
A1  - Arnold, Anne
A1  - Koseska, Aneta
A1  - Kurths, Jürgen
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Spatiotemporal dynamics of the Calvin cycle multistationarity and symmetry breaking instabilities
JF  - Biosystems : journal of biological and information processing sciences
N2  - The possibility of controlling the Calvin cycle has paramount implications for increasing the production of biomass. Multistationarity, as a dynamical feature of systems, is the first obvious candidate whose control could find biotechnological applications. Here we set out to resolve the debate on the multistationarity of the Calvin cycle. Unlike the existing simulation-based studies, our approach is based on a sound mathematical framework, chemical reaction network theory and algebraic geometry, which results in provable results for the investigated model of the Calvin cycle in which we embed a hierarchy of realistic kinetic laws. Our theoretical findings demonstrate that there is a possibility for multistationarity resulting from two sources, homogeneous and inhomogeneous instabilities, which partially settle the debate on multistability of the Calvin cycle. In addition, our tractable analytical treatment of the bifurcation parameters can be employed in the design of validation experiments.
KW  - Multistationarity
KW  - Calvin cycle
KW  - Algebraic geometry
KW  - Bifurcation parameters
KW  - Biomass
Y1  - 2011
U6  - https://doi.org/10.1016/j.biosystems.2010.10.015
SN  - 0303-2647
VL  - 103
IS  - 2
SP  - 212
EP  - 223
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Gruden, Kristina
A1  - Hren, Matjaz
A1  - Herman, Ana
A1  - Blejec, Andrej
A1  - Albrecht, Tanja
A1  - Selbig, Joachim
A1  - Bauer, Christian G.
A1  - Schuchardt, Johannes
A1  - Or-Guil, Michal
A1  - Zupancic, Klemen
A1  - Svajger, Urban
A1  - Stabuc, Borut
A1  - Ihan, Alojz
A1  - Kopitar, Andreja Natasa
A1  - Ravnikar, Maja
A1  - Knezevic, Miomir
A1  - Rozman, Primoz
A1  - Jeras, Matjaz
T1  - A "Crossomics" study analysing variability of different components in peripheral blood of healthy caucasoid individuals
JF  - PLoS one
N2  - Background: Different immunotherapy approaches for the treatment of cancer and autoimmune diseases are being developed and tested in clinical studies worldwide. Their resulting complex experimental data should be properly evaluated, therefore reliable normal healthy control baseline values are indispensable.
 Methodology/Principal Findings: To assess intra- and inter-individual variability of various biomarkers, peripheral blood of 16 age and gender equilibrated healthy volunteers was sampled on 3 different days within a period of one month. Complex "crossomics'' analyses of plasma metabolite profiles, antibody concentrations and lymphocyte subset counts as well as whole genome expression profiling in CD4(+)T and NK cells were performed. Some of the observed age, gender and BMI dependences are in agreement with the existing knowledge, like negative correlation between sex hormone levels and age or BMI related increase in lipids and soluble sugars. Thus we can assume that the distribution of all 39.743 analysed markers is well representing the normal Caucasoid population. All lymphocyte subsets, 20% of metabolites and less than 10% of genes, were identified as highly variable in our dataset.
 Conclusions/Significance: Our study shows that the intra- individual variability was at least two-fold lower compared to the inter-individual one at all investigated levels, showing the importance of personalised medicine approach from yet another perspective.
Y1  - 2012
U6  - https://doi.org/10.1371/journal.pone.0028761
SN  - 1932-6203
VL  - 7
IS  - 1
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Ryngajllo, Malgorzata
A1  - Childs, Liam H.
A1  - Lohse, Marc
A1  - Giorgi, Federico M.
A1  - Lude, Anja
A1  - Selbig, Joachim
A1  - Usadel, Björn
T1  - SLocX  predicting subcellular localization of Arabidopsis proteins leveraging gene expression data
JF  - Frontiers in plant science
N2  - Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mito-chondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
KW  - subcellular localization
KW  - support vector machine
KW  - prediction
KW  - gene expression
Y1  - 2011
U6  - https://doi.org/10.3389/fpls.2011.00043
SN  - 1664-462X
VL  - 2
PB  - Frontiers Research Foundation
CY  - Lausanne
ER  - 
TY  - JOUR
A1  - Catchpole, Gareth
A1  - Platzer, Alexander
A1  - Weikert, Cornelia
A1  - Kempkensteffen, Carsten
A1  - Johannsen, Manfred
A1  - Krause, Hans
A1  - Jung, Klaus
A1  - Miller, Kurt
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
A1  - Weikert, Steffen
T1  - Metabolic profiling reveals key metabolic features of renal cell carcinoma
JF  - Journal of cellular and molecular medicine : a journal of translational medicine
N2  - Recent evidence suggests that metabolic changes play a pivotal role in the biology of cancer and in particular renal cell carcinoma (RCC). Here, a global metabolite profiling approach was applied to characterize the metabolite pool of RCC and normal renal tissue. Advanced decision tree models were applied to characterize the metabolic signature of RCC and to explore features of metastasized tumours. The findings were validated in a second independent dataset. Vitamin E derivates and metabolites of glucose, fatty acid, and inositol phosphate metabolism determined the metabolic profile of RCC. alpha-tocopherol, hippuric acid, myoinositol, fructose-1-phosphate and glucose-1-phosphate contributed most to the tumour/normal discrimination and all showed pronounced concentration changes in RCC. The identified metabolic profile was characterized by a low recognition error of only 5% for tumour versus normal samples. Data on metastasized tumours suggested a key role for metabolic pathways involving arachidonic acid, free fatty acids, proline, uracil and the tricarboxylic acid cycle. These results illustrate the potential of mass spectroscopy based metabolomics in conjunction with sophisticated data analysis methods to uncover the metabolic phenotype of cancer. Differentially regulated metabolites, such as vitamin E compounds, hippuric acid and myoinositol, provide leads for the characterization of novel pathways in RCC.
KW  - kidney cancer
KW  - metabolism
KW  - metabolomics
KW  - metastasis
Y1  - 2011
U6  - https://doi.org/10.1111/j.1582-4934.2009.00939.x
SN  - 1582-1838
VL  - 15
IS  - 1
SP  - 109
EP  - 118
PB  - Wiley-Blackwell
CY  - Malden
ER  - 
TY  - JOUR
A1  - Klie, Sebastian
A1  - Nikoloski, Zoran
A1  - Selbig, Joachim
T1  - Biological cluster evaluation for gene function prediction
JF  - Journal of computational biology
N2  - Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set.
KW  - algorithms
KW  - biochemical networks
KW  - combinatorics
KW  - computational molecular biology
KW  - databases
KW  - functional genomics
KW  - gene expression
KW  - NP-completeness
Y1  - 2014
U6  - https://doi.org/10.1089/cmb.2009.0129
SN  - 1066-5277
SN  - 1557-8666
VL  - 21
IS  - 6
SP  - 428
EP  - 445
PB  - Liebert
CY  - New Rochelle
ER  - 
TY  - JOUR
A1  - Larhlimi, Abdelhalim
A1  - Basler, Georg
A1  - Grimbs, Sergio
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Stoichiometric capacitance reveals the theoretical capabilities of metabolic networks
JF  - Bioinformatics
N2  - Motivation: Metabolic engineering aims at modulating the capabilities of metabolic networks by changing the activity of biochemical reactions. The existing constraint-based approaches for metabolic engineering have proven useful, but are limited only to reactions catalogued in various pathway databases.
 Results: We consider the alternative of designing synthetic strategies which can be used not only to characterize the maximum theoretically possible product yield but also to engineer networks with optimal conversion capability by using a suitable biochemically feasible reaction called 'stoichiometric capacitance'. In addition, we provide a theoretical solution for decomposing a given stoichiometric capacitance over a set of known enzymatic reactions. We determine the stoichiometric capacitance for genome-scale metabolic networks of 10 organisms from different kingdoms of life and examine its implications for the alterations in flux variability patterns. Our empirical findings suggest that the theoretical capacity of metabolic networks comes at a cost of dramatic system's changes.
Y1  - 2012
U6  - https://doi.org/10.1093/bioinformatics/bts381
SN  - 1367-4803
VL  - 28
IS  - 18
SP  - I502
EP  - I508
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Larhlimi, Abdelhalim
A1  - Blachon, Sylvain
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Robustness of metabolic networks a review of existing definitions
JF  - Biosystems : journal of biological and information processing sciences
N2  - Describing the determinants of robustness of biological systems has become one of the central questions in systems biology. Despite the increasing research efforts, it has proven difficult to arrive at a unifying definition for this important concept. We argue that this is due to the multifaceted nature of the concept of robustness and the possibility to formally capture it at different levels of systemic formalisms (e.g, topology and dynamic behavior). Here we provide a comprehensive review of the existing definitions of robustness pertaining to metabolic networks. As kinetic approaches have been excellently reviewed elsewhere, we focus on definitions of robustness proposed within graph-theoretic and constraint-based formalisms.
KW  - Robustness
KW  - Metabolic networks
KW  - Graph theory
KW  - Constraint-based approaches
Y1  - 2011
U6  - https://doi.org/10.1016/j.biosystems.2011.06.002
SN  - 0303-2647
VL  - 106
IS  - 1
SP  - 1
EP  - 8
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Bordag, Natalie
A1  - Klie, Sebastian
A1  - Jürchott, Kathrin
A1  - Vierheller, Janine
A1  - Schiewe, Hajo
A1  - Albrecht, Valerie
A1  - Tonn, Jörg-Christian
A1  - Schwartz, Christoph
A1  - Schichor, Christian
A1  - Selbig, Joachim
T1  - Glucocorticoid (dexamethasone)-induced metabolome changes in healthy males suggest prediction of response and side effects
JF  - Scientific reports
N2  - Glucocorticoids are indispensable anti-inflammatory and decongestant drugs with high prevalence of use at (similar to)0.9% of the adult population. Better holistic insights into glucocorticoid-induced changes are crucial for effective use as concurrent medication and management of adverse effects. The profiles of 214 metabolites from plasma of 20 male healthy volunteers were recorded prior to and after ingestion of a single dose of 4 mg dexamethasone (+20 mg pantoprazole). Samples were drawn at three predefined time points per day: seven untreated (day 1 midday - day 3 midday) and four treated (day 3 evening - day 4 evening) per volunteer. Statistical analysis revealed tremendous impact of dexamethasone on the metabolome with 150 of 214 metabolites being significantly deregulated on at least one time point after treatment (ANOVA, Benjamini-Hochberg corrected, q < 0.05). Inter-person variability was high and remained uninfluenced by treatment. The clearly visible circadian rhythm prior to treatment was almost completely suppressed and deregulated by dexamethasone. The results draw a holistic picture of the severe metabolic deregulation induced by single-dose, short-term glucocorticoid application. The observed metabolic changes suggest a potential for early detection of severe side effects, raising hope for personalized early countermeasures increasing quality of life and reducing health care costs.
Y1  - 2015
U6  - https://doi.org/10.1038/srep15954
SN  - 2045-2322
VL  - 5
PB  - Nature Publ. Group
CY  - London
ER  - 
TY  - JOUR
A1  - Girbig, Dorothee
A1  - Selbig, Joachim
A1  - Grimbs, Sergio
T1  - A MATLAB toolbox for structural kinetic modeling
JF  - Bioinformatics
N2  - Structural kinetic modeling (SKM) enables the analysis of dynamical properties of metabolic networks solely based on topological information and experimental data. Current SKM-based experiments are hampered by the time-intensive process of assigning model parameters and choosing appropriate sampling intervals for MonteCarlo experiments. We introduce a toolbox for the automatic and efficient construction and evaluation of structural kinetic models (SK models). Quantitative and qualitative analyses of network stability properties are performed in an automated manner. We illustrate the model building and analysis process in detailed example scripts that provide toolbox implementations of previously published literature models.
Y1  - 2012
U6  - https://doi.org/10.1093/bioinformatics/bts473
SN  - 1367-4803
VL  - 28
IS  - 19
SP  - 2546
EP  - 2547
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Basler, Georg
A1  - Grimbs, Sergio
A1  - Ebenhöh, Oliver
A1  - Selbig, Joachim
A1  - Nikoloski, Zoran
T1  - Evolutionary significance of metabolic network properties
JF  - Interface : journal of the Royal Society
N2  - Complex networks have been successfully employed to represent different levels of biological systems, ranging from gene regulation to protein-protein interactions and metabolism. Network-based research has mainly focused on identifying unifying structural properties, such as small average path length, large clustering coefficient, heavy-tail degree distribution and hierarchical organization, viewed as requirements for efficient and robust system architectures. However, for biological networks, it is unclear to what extent these properties reflect the evolutionary history of the represented systems. Here, we show that the salient structural properties of six metabolic networks from all kingdoms of life may be inherently related to the evolution and functional organization of metabolism by employing network randomization under mass balance constraints. Contrary to the results from the common Markov-chain switching algorithm, our findings suggest the evolutionary importance of the small-world hypothesis as a fundamental design principle of complex networks. The approach may help us to determine the biologically meaningful properties that result from evolutionary pressure imposed on metabolism, such as the global impact of local reaction knockouts. Moreover, the approach can be applied to test to what extent novel structural properties can be used to draw biologically meaningful hypothesis or predictions from structure alone.
KW  - metabolic networks
KW  - significance
KW  - randomization
KW  - null model
KW  - centrality
Y1  - 2012
U6  - https://doi.org/10.1098/rsif.2011.0652
SN  - 1742-5689
VL  - 9
IS  - 71
SP  - 1168
EP  - 1176
PB  - Royal Society
CY  - London
ER  - 
TY  - JOUR
A1  - Guo, Ke-Tai
A1  - Fu, Peng
A1  - Juerchott, Kathrin
A1  - Motaln, Helena
A1  - Selbig, Joachim
A1  - Lah, Tamara T.
A1  - Tonn, Jörg-Christian
A1  - Schichor, Christian
T1  - The expression of Wnt-inhibitor DKK1 (Dickkopf 1) is determined by intercellular crosstalk and hypoxia in human malignant gliomas
JF  - Journal of cancer research and clinical oncology : official organ of the Deutsche Krebsgesellschaft
N2  - Objective Wnt signalling pathways regulate proliferation, motility and survival in a variety of human cell types. Dickkopf 1 (DKK1) gene codes for a secreted Wnt inhibitory factor. It functions as tumour suppressor gene in breast cancer and as a pro-apoptotic factor in glioma cells. In this study, we aimed to demonstrate whether the different expression of DKK1 in human glioma-derived cells is dependent on microenvironmental factors like hypoxia and regulated by the intercellular crosstalk with bone-marrow-derived mesenchymal stem cells (bmMSCs).
 Methods Glioma cell line U87-MG, three cell lines from human glioblastoma grade IV (glioma-derived mesenchymal stem cells) and three bmMSCs were selected for the experiment. The expression of DKK1 in cell lines under normoxic/hypoxic environment or co-culture condition was measured using real-time PCR and enzyme-linked immunoadsorbent assay. The effect of DKK1 on cell migration and proliferation was evaluated by in vitro wound healing assays and sulphorhodamine assays, respectively.
 Results Glioma-derived cells U87-MG displayed lower DKK1 expression compared with bmMSCs. Hypoxia led to an overexpression of DKK1 in bmMSCs and U87-MG when compared to normoxic environment, whereas co-culture of U87-MG with bmMSCs induced the expression of DKK1 in both cell lines. Exogenous recombinant DKK1 inhibited cell migration on all cell lines, but did not have a significant effect on cell proliferation of bmMSCs and glioma cell lines.
 Conclusion In this study, we showed for the first time that the expression of DKK1 was hypoxia dependent in human malignant glioma cell lines. The induction of DKK1 by intracellular crosstalk or hypoxia stimuli sheds light on the intense adaption of glial tumour cells to environmental alterations.
KW  - Dickkopf 1
KW  - Intercellular crosstalk
KW  - Hypoxia
KW  - Gliomas
Y1  - 2014
U6  - https://doi.org/10.1007/s00432-014-1642-2
SN  - 0171-5216
SN  - 1432-1335
VL  - 140
IS  - 8
SP  - 1261
EP  - 1270
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Jargosch, M.
A1  - Kroeger, S.
A1  - Gralinska, E.
A1  - Klotz, Ulrike
A1  - Fang, Z.
A1  - Chen, W.
A1  - Leser, U.
A1  - Selbig, Joachim
A1  - Groth, Detlef
A1  - Baumgrass, Ria
T1  - Data integration for identification of important transcription factors of STAT6-mediated cell fate decisions
JF  - Genetics and molecular research
N2  - Data integration has become a useful strategy for uncovering new insights into complex biological networks. We studied whether this approach can help to delineate the signal transducer and activator of transcription 6 (STAT6)-mediated transcriptional network driving T helper (Th) 2 cell fate decisions. To this end, we performed an integrative analysis of publicly available RNA-seq data of Stat6-knockout mouse studies together with STAT6 ChIP-seq data and our own gene expression time series data during Th2 cell differentiation. We focused on transcription factors (TFs), cytokines, and cytokine receptors and delineated 59 positively and 41 negatively STAT6-regulated genes, which were used to construct a transcriptional network around STAT6. The network illustrates that important and well-known TFs for Th2 cell differentiation are positively regulated by STAT6 and act either as activators for Th2 cells (e.g., Gata3, Atf3, Satb1, Nfil3, Maf, and Pparg) or as suppressors for other Th cell subpopulations such as Th1 (e.g., Ar), Th17 (e.g., Etv6), or iTreg (e.g., Stat3 and Hifla) cells. Moreover, our approach reveals 11 TFs (e.g., Atf5, Creb3l2, and Asb2) with unknown functions in Th cell differentiation. This fact together with the observed enrichment of asthma risk genes among those regulated by STAT6 underlines the potential value of the data integration strategy used here. Thus, our results clearly support the opinion that data integration is a useful tool to delineate complex physiological processes.
KW  - Data integration
KW  - Th2 cells
KW  - Gene regulatory network
KW  - STAT6
KW  - Transcription factors
Y1  - 2016
U6  - https://doi.org/10.4238/gmr.15028493
SN  - 1676-5680
VL  - 15
PB  - FUNPEC
CY  - Ribeirao Preto
ER  - 
TY  - JOUR
A1  - Edlich-Muth, Christian
A1  - Muraya, Moses M.
A1  - Altmann, Thomas
A1  - Selbig, Joachim
T1  - Phenomic prediction of maize hybrids
JF  - Biosystems : journal of biological and information processing sciences
N2  - Phenomic experiments are carried out in large-scale plant phenotyping facilities that acquire a large number of pictures of hundreds of plants simultaneously. With the aid of automated image processing, the data are converted into genotype-feature matrices that cover many consecutive days of development. Here, we explore the possibility of predicting the biomass of the fully grown plant from early developmental stage image-derived features. We performed phenomic experiments on 195 inbred and 382 hybrid maizes varieties and followed their progress from 16 days after sowing (DAS) to 48 DAS with 129 image-derived features. By applying sparse regression methods, we show that 73% of the variance in hybrid fresh weight of fully-grown plants is explained by about 20 features at the three-leaf-stage or earlier. Dry weight prediction explained over 90% of the variance. When phenomic features of parental inbred lines were used as predictors of hybrid biomass, the proportion of variance explained was 42 and 45%, for fresh weight and dry weight models consisting of 35 and 36 features, respectively. These models were very robust, showing only a small amount of variation in performance over the time scale of the experiment. We also examined mid-parent heterosis in phenomic features. Feature heterosis displayed a large degree of variance which resulted in prediction performance that was less robust than models of either parental or hybrid predictors. Our results show that phenomic prediction is a viable alternative to genomic and metabolic prediction of hybrid performance. In particular, the utility of early-stage parental lines is very encouraging. (C) 2016 Elsevier Ireland Ltd. All rights reserved.
KW  - Hybrid prediction
KW  - LASSO
KW  - Regression
KW  - Maize
KW  - Phenomics
Y1  - 2016
U6  - https://doi.org/10.1016/j.biosystems.2016.05.008
SN  - 0303-2647
SN  - 1872-8324
VL  - 146
SP  - 102
EP  - 109
PB  - Elsevier
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Rajasundaram, Dhivyaa
A1  - Selbig, Joachim
T1  - analysis
JF  - Current opinion in plant biology
N2  - The development of ‘omics’ technologies has progressed to address complex biological questions that underlie various plant functions thereby producing copious amounts of data. The need to assimilate large amounts of data into biologically meaningful interpretations has necessitated the development of statistical methods to integrate multidimensional information. Throughout this review, we provide examples of recent outcomes of ‘omics’ data integration together with an overview of available statistical methods and tools.
Y1  - 2016
U6  - https://doi.org/10.1016/j.pbi.2015.12.010
SN  - 1369-5266
SN  - 1879-0356
VL  - 30
SP  - 57
EP  - 61
PB  - Elsevier
CY  - London
ER  - 
TY  - JOUR
A1  - Steuer, Ralf
A1  - Gross, Thilo
A1  - Selbig, Joachim
A1  - Blasius, Bernd
T1  - Structural kinetic modeling of metabolic networks
JF  - Proceedings of the National Academy of Sciences of the United States of America
N2  - To develop and investigate detailed mathematical models of metabolic processes is one of the primary challenges in systems biology. However, despite considerable advance in the topological analysis of metabolic networks, kinetic modeling is still often severely hampered by inadequate knowledge of the enzyme-kinetic rate laws and their associated parameter values. Here we propose a method that aims to give a quantitative account of the dynamical capabilities of a metabolic system, without requiring any explicit information about the functional form of the rate equations. Our approach is based on constructing a local linear model at each point in parameter space, such that each element of the model is either directly experimentally accessible or amenable to a straightforward biochemical interpretation. This ensemble of local linear models, encompassing all possible explicit kinetic models, then allows for a statistical exploration of the comprehensive parameter space. The method is exemplified on two paradigmatic metabolic systems: the glycolytic pathway of yeast and a realistic-scale representation of the photosynthetic Calvin cycle.
KW  - systems biology
KW  - computational biochemistry
KW  - metabolomics
KW  - metabolic regulation
KW  - biological robustness
Y1  - 2006
U6  - https://doi.org/10.1073/pnas.0600013103
SN  - 0027-8424
SN  - 1091-6490
VL  - 103
IS  - 32
SP  - 11868
EP  - 11873
PB  - National Academy of Sciences
CY  - Washington
ER  - 
TY  - GEN
A1  - Gärtner, Tanja
A1  - Steinfath, Matthias
A1  - Andorf, Sandra
A1  - Lisec, Jan
A1  - Meyer, Rhonda C.
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Improved heterosis prediction by combining information on DNA- and metabolic markers
N2  - Background: Hybrids represent a cornerstone in the success story of breeding programs. The fundamental principle underlying this success is the phenomenon of hybrid vigour, or heterosis. It describes an advantage of the offspring as compared to the two parental lines with respect to parameters such as growth and resistance against abiotic or biotic stress. Dominance, overdominance or epistasis based models are commonly used explanations. Conclusion/Significance: The heterosis level is clearly a function of the combination of the parents used for offspring production. This results in a major challenge for plant breeders, as usually several thousand combinations of parents have to be tested for identifying the best combinations. Thus, any approach to reliably predict heterosis levels based on properties of the parental lines would be highly beneficial for plant breeding. Methodology/Principal Findings: Recently, genetic data have been used to predict heterosis. Here we show that a combination of parental genetic and metabolic markers, identified via feature selection and minimum-description-length based regression methods, significantly improves the prediction of biomass heterosis in resulting offspring. These findings will help furthering our understanding of the molecular basis of heterosis, revealing, for instance, the presence of nonlinear genotype-phenotype relationships. In addition, we describe a possible approach for accelerated selection in plant breeding.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - paper 142 
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45132
ER  - 
TY  - JOUR
A1  - Lisec, Jan
A1  - Römisch-Margl, Lilla
A1  - Nikoloski, Zoran
A1  - Piepho, Hans-Peter
A1  - Giavalisco, Patrick
A1  - Selbig, Joachim
A1  - Gierl, Alfons
A1  - Willmitzer, Lothar
T1  - Corn hybrids display lower metabolite variability and complex metabolite inheritance patterns
JF  - The plant journal
N2  - We conducted a comparative analysis of the root metabolome of six parental maize inbred lines and their 14 corresponding hybrids showing fresh weight heterosis. We demonstrated that the metabolic profiles not only exhibit distinct features for each hybrid line compared with its parental lines, but also separate reciprocal hybrids. Reconstructed metabolic networks, based on robust correlations between metabolic profiles, display a higher network density in most hybrids as compared with the corresponding inbred lines. With respect to metabolite level inheritance, additive, dominant and overdominant patterns are observed with no specific overrepresentation. Despite the observed complexity of the inheritance pattern, for the majority of metabolites the variance observed in all 14 hybrids is lower compared with inbred lines. Deviations of metabolite levels from the average levels of the hybrids correlate negatively with biomass, which could be applied for developing predictors of hybrid performance based on characteristics of metabolite patterns.
KW  - heterosis
KW  - Zea mays
KW  - metabolomics
Y1  - 2011
U6  - https://doi.org/10.1111/j.1365-313X.2011.04689.x
SN  - 0960-7412
VL  - 68
IS  - 2
SP  - 326
EP  - 336
PB  - Wiley-Blackwell
CY  - Malden
ER  - 
TY  - JOUR
A1  - Meyer, Rhonda C.
A1  - Witucka-Wall, Hanna
A1  - Becher, Martina
A1  - Blacha, Anna Maria
A1  - Boudichevskaia, Anastassia
A1  - Dörmann, Peter
A1  - Fiehn, Oliver
A1  - Friedel, Svetlana
A1  - von Korff, Maria
A1  - Lisec, Jan
A1  - Melzer, Michael
A1  - Repsilber, Dirk
A1  - Schmidt, Renate
A1  - Scholz, Matthias
A1  - Selbig, Joachim
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - Heterosis manifestation during early Arabidopsis seedling development is characterized by intermediate gene expression and enhanced metabolic activity in the hybrids
JF  - The plant journal
N2  - Heterosis-associated cellular and molecular processes were analyzed in seeds and seedlings of Arabidopsis thaliana accessions Col-0 and C24 and their heterotic hybrids. Microscopic examination revealed no advantages in terms of hybrid mature embryo organ sizes or cell numbers. Increased cotyledon sizes were detectable 4 days after sowing. Growth heterosis results from elevated cell sizes and numbers, and is well established at 10 days after sowing. The relative growth rates of hybrid seedlings were most enhanced between 3 and 4 days after sowing. Global metabolite profiling and targeted fatty acid analysis revealed maternal inheritance patterns for a large proportion of metabolites in the very early stages. During developmental progression, the distribution shifts to dominant, intermediate and heterotic patterns, with most changes occurring between 4 and 6 days after sowing. The highest incidence of heterotic patterns coincides with establishment of size differences at 4 days after sowing. In contrast, overall transcript patterns at 4, 6 and 10 days after sowing are characterized by intermediate to dominant patterns, with parental transcript levels showing the largest differences. Overall, the results suggest that, during early developmental stages, intermediate gene expression and higher metabolic activity in the hybrids compared to the parents lead to better resource efficiency, and therefore enhanced performance in the hybrids.
KW  - heterosis
KW  - seedlings
KW  - metabolite profiling
KW  - transcript profiling
KW  - morphological analysis
KW  - Arabidopsis thaliana
KW  - biomass
Y1  - 2012
U6  - https://doi.org/10.1111/j.1365-313X.2012.05021.x
SN  - 0960-7412
VL  - 71
IS  - 4
SP  - 669
EP  - 683
PB  - Wiley-Blackwell
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Feher, Kristen
A1  - Lisec, Jan
A1  - Roemisch-Margl, Lilla
A1  - Selbig, Joachim
A1  - Gierl, Alfons
A1  - Piepho, Hans-Peter
A1  - Nikoloski, Zoran
A1  - Willmitzer, Lothar
T1  - Deducing hybrid performance from parental metabolic profiles of young primary roots of maize by using a multivariate diallel approach
JF  - PLoS one
Y1  - 2014
U6  - https://doi.org/10.1371/journal.pone.0085435
SN  - 1932-6203
VL  - 9
IS  - 1
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - GEN
A1  - Larhlimi, Abdelhalim
A1  - David, Laszlo
A1  - Selbig, Joachim
A1  - Bockmayr, Alexander
T1  - F2C2
BT  - a fast tool for the computation of flux coupling in genome-scale metabolic networks
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: Flux coupling analysis (FCA) has become a useful tool in the constraint-based analysis of genome-scale metabolic networks. FCA allows detecting dependencies between reaction fluxes of metabolic networks at steady-state. On the one hand, this can help in the curation of reconstructed metabolic networks by verifying whether the coupling between reactions is in agreement with the experimental findings. On the other hand, FCA can aid in defining intervention strategies to knock out target reactions.

Results: We present a new method F2C2 for FCA, which is orders of magnitude faster than previous approaches. As a consequence, FCA of genome-scale metabolic networks can now be performed in a routine manner.

Conclusions: We propose F2C2 as a fast tool for the computation of flux coupling in genome-scale metabolic networks. F2C2 is freely available for non-commercial use at https://sourceforge.net/projects/f2c2/files/.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 921 
KW  - balance analysis
KW  - reconstruction
KW  - pathways
KW  - models
KW  - metabolic network
KW  - couple reaction
KW  - reversible reaction
KW  - linear programming problem
KW  - coupling relationship
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-432431
SN  - 1866-8372
IS  - 921
ER  - 
TY  - GEN
A1  - Steinfath, Matthias
A1  - Gärtner, Tanja
A1  - Lisec, Jan
A1  - Meyer, Rhonda C.
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - A recombinant inbred line (RIL) population, derived from two Arabidopsis thaliana accessions, and the corresponding testcrosses with these two original accessions were used for the development and validation of machine learning models to predict the biomass of hybrids. Genetic and metabolic information of the RILs served as predictors. Feature selection reduced the number of variables (genetic and metabolic markers) in the models by more than 80% without impairing the predictive power. Thus, potential biomarkers have been revealed. Metabolites were shown to bear information on inherited macroscopic phenotypes. This proof of concept could be interesting for breeders. The example population exhibits substantial mid-parent biomass heterosis. The results of feature selection could therefore be used to shed light on the origin of heterosis. In this respect, mainly dominance effects were detected.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1324 
KW  - Quantitative Trait Locus
KW  - feature selection
KW  - Partial Little Square
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431115
SN  - 1866-8372
IS  - 1324
ER  - 
TY  - JOUR
A1  - Riaño-Pachón, Diego Mauricio
A1  - Kleessen, Sabrina
A1  - Neigenfind, Jost
A1  - Durek, Pawel
A1  - Weber, Elke
A1  - Engelsberger, Wolfgang R.
A1  - Walther, Dirk
A1  - Selbig, Joachim
A1  - Schulze, Waltraud X.
A1  - Kersten, Birgit
T1  - Proteome-wide survey of phosphorylation patterns affected by nuclear DNA polymorphisms in Arabidopsis thaliana
JF  - BMC Genomics
N2  - Background: Protein phosphorylation is an important post-translational modification influencing many aspects of dynamic cellular behavior. Site-specific phosphorylation of amino acid residues serine, threonine, and tyrosine can have profound effects on protein structure, activity, stability, and interaction with other biomolecules. Phosphorylation sites can be affected in diverse ways in members of any species, one such way is through single nucleotide polymorphisms (SNPs). The availability of large numbers of experimentally identified phosphorylation sites, and of natural variation datasets in Arabidopsis thaliana prompted us to analyze the effect of non-synonymous SNPs (nsSNPs) onto phosphorylation sites.

Results: From the analyses of 7,178 experimentally identified phosphorylation sites we found that: (i) Proteins with multiple phosphorylation sites occur more often than expected by chance. (ii) Phosphorylation hotspots show a preference to be located outside conserved domains. (iii) nsSNPs affected experimental phosphorylation sites as much as the corresponding non-phosphorylated amino acid residues. (iv) Losses of experimental phosphorylation sites by nsSNPs were identified in 86 A. thaliana proteins, among them receptor proteins were overrepresented.

These results were confirmed by similar analyses of predicted phosphorylation sites in A. thaliana. In addition, predicted threonine phosphorylation sites showed a significant enrichment of nsSNPs towards asparagines and a significant depletion of the synonymous substitution. Proteins in which predicted phosphorylation sites were affected by nsSNPs (loss and gain), were determined to be mainly receptor proteins, stress response proteins and proteins involved in nucleotide and protein binding. Proteins involved in metabolism, catalytic activity and biosynthesis were less affected.

Conclusions: We analyzed more than 7,100 experimentally identified phosphorylation sites in almost 4,300 protein-coding loci in silico, thus constituting the largest phosphoproteomics dataset for A. thaliana available to date. Our findings suggest a relatively high variability in the presence or absence of phosphorylation sites between different natural accessions in receptor and other proteins involved in signal transduction. Elucidating the effect of phosphorylation sites affected by nsSNPs on adaptive responses represents an exciting research goal for the future.
KW  - Gene Ontology
KW  - Phosphorylation Site
KW  - phosphorylated amino acid
KW  - slim term
KW  - single nucleotide polymorphism mapping
Y1  - 2010
U6  - https://doi.org/10.1186/1471-2164-11-411
SN  - 1471-2164
VL  - 11
PB  - Biomed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Meyer, Rhonda Christiane
A1  - Kusterer, Barbara
A1  - Lisec, Jan
A1  - Steinfath, Matthias
A1  - Becher, Martina
A1  - Scharr, Hanno
A1  - Melchinger, Albrecht E.
A1  - Selbig, Joachim
A1  - Schurr, Ulrich
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - QTL analysis of early stage heterosis for biomass in Arabidopsis
JF  - Theoretical and applied genetics
N2  - The main objective of this study was to identify genomic regions involved in biomass heterosis using QTL, generation means, and mode-of-inheritance classification analyses. In a modified North Carolina Design III we backcrossed 429 recombinant inbred line and 140 introgression line populations to the two parental accessions, C24 and Col-0, whose F 1 hybrid exhibited 44% heterosis for biomass. Mid-parent heterosis in the RILs ranged from −31 to 99% for dry weight and from −58 to 143% for leaf area. We detected ten genomic positions involved in biomass heterosis at an early developmental stage, individually explaining between 2.4 and 15.7% of the phenotypic variation. While overdominant gene action was prevalent in heterotic QTL, our results suggest that a combination of dominance, overdominance and epistasis is involved in biomass heterosis in this Arabidopsis cross.
KW  - Quantitative Trait Locus
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
KW  - dominance effect
KW  - recombinant inbred line population
Y1  - 2009
U6  - https://doi.org/10.1007/s00122-009-1074-6
SN  - 1432-2242
SN  - 0040-5752
VL  - 129
IS  - 2
SP  - 227
EP  - 237
PB  - Springer Nature
CY  - Berlin
ER  - 
TY  - GEN
A1  - Riaño-Pachón, Diego Mauricio
A1  - Kleessen, Sabrina
A1  - Neigenfind, Jost
A1  - Durek, Pawel
A1  - Weber, Elke
A1  - Engelsberger, Wolfgang R.
A1  - Walther, Dirk
A1  - Selbig, Joachim
A1  - Schulze, Waltraud X.
A1  - Kersten, Birgit
T1  - Proteome-wide survey of phosphorylation patterns affected by nuclear DNA polymorphisms in Arabidopsis thaliana
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: Protein phosphorylation is an important post-translational modification influencing many aspects of dynamic cellular behavior. Site-specific phosphorylation of amino acid residues serine, threonine, and tyrosine can have profound effects on protein structure, activity, stability, and interaction with other biomolecules. Phosphorylation sites can be affected in diverse ways in members of any species, one such way is through single nucleotide polymorphisms (SNPs). The availability of large numbers of experimentally identified phosphorylation sites, and of natural variation datasets in Arabidopsis thaliana prompted us to analyze the effect of non-synonymous SNPs (nsSNPs) onto phosphorylation sites.

Results: From the analyses of 7,178 experimentally identified phosphorylation sites we found that: (i) Proteins with multiple phosphorylation sites occur more often than expected by chance. (ii) Phosphorylation hotspots show a preference to be located outside conserved domains. (iii) nsSNPs affected experimental phosphorylation sites as much as the corresponding non-phosphorylated amino acid residues. (iv) Losses of experimental phosphorylation sites by nsSNPs were identified in 86 A. thaliana proteins, among them receptor proteins were overrepresented.

These results were confirmed by similar analyses of predicted phosphorylation sites in A. thaliana. In addition, predicted threonine phosphorylation sites showed a significant enrichment of nsSNPs towards asparagines and a significant depletion of the synonymous substitution. Proteins in which predicted phosphorylation sites were affected by nsSNPs (loss and gain), were determined to be mainly receptor proteins, stress response proteins and proteins involved in nucleotide and protein binding. Proteins involved in metabolism, catalytic activity and biosynthesis were less affected.

Conclusions: We analyzed more than 7,100 experimentally identified phosphorylation sites in almost 4,300 protein-coding loci in silico, thus constituting the largest phosphoproteomics dataset for A. thaliana available to date. Our findings suggest a relatively high variability in the presence or absence of phosphorylation sites between different natural accessions in receptor and other proteins involved in signal transduction. Elucidating the effect of phosphorylation sites affected by nsSNPs on adaptive responses represents an exciting research goal for the future.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1328 
KW  - Gene Ontology
KW  - Phosphorylation Site
KW  - phosphorylated amino acid
KW  - slim term
KW  - single nucleotide polymorphism mapping
Y1  - 2010
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431181
SN  - 1866-8372
IS  - 1328
ER  - 
TY  - GEN
A1  - Meyer, Rhonda Christiane
A1  - Kusterer, Barbara
A1  - Lisec, Jan
A1  - Steinfath, Matthias
A1  - Becher, Martina
A1  - Scharr, Hanno
A1  - Melchinger, Albrecht E.
A1  - Selbig, Joachim
A1  - Schurr, Ulrich
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - QTL analysis of early stage heterosis for biomass in Arabidopsis
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - The main objective of this study was to identify genomic regions involved in biomass heterosis using QTL, generation means, and mode-of-inheritance classification analyses. In a modified North Carolina Design III we backcrossed 429 recombinant inbred line and 140 introgression line populations to the two parental accessions, C24 and Col-0, whose F 1 hybrid exhibited 44% heterosis for biomass. Mid-parent heterosis in the RILs ranged from −31 to 99% for dry weight and from −58 to 143% for leaf area. We detected ten genomic positions involved in biomass heterosis at an early developmental stage, individually explaining between 2.4 and 15.7% of the phenotypic variation. While overdominant gene action was prevalent in heterotic QTL, our results suggest that a combination of dominance, overdominance and epistasis is involved in biomass heterosis in this Arabidopsis cross.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1330 
KW  - Quantitative Trait Locus
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
KW  - dominance effect
KW  - recombinant inbred line population
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431272
SN  - 1866-8372
IS  - 1330
ER  - 
TY  - JOUR
A1  - Steinfath, Matthias
A1  - Gärtner, Tanja
A1  - Lisec, Jan
A1  - Meyer, Rhonda Christiane
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers
JF  - Theoretical and applied genetics : TAG ; international journal of plant breeding research
N2  - A recombinant inbred line (RIL) population, derived from two Arabidopsis thaliana accessions, and the corresponding testcrosses with these two original accessions were used for the development and validation of machine learning models to predict the biomass of hybrids. Genetic and metabolic information of the RILs served as predictors. Feature selection reduced the number of variables (genetic and metabolic markers) in the models by more than 80% without impairing the predictive power. Thus, potential biomarkers have been revealed. Metabolites were shown to bear information on inherited macroscopic phenotypes. This proof of concept could be interesting for breeders. The example population exhibits substantial mid-parent biomass heterosis. The results of feature selection could therefore be used to shed light on the origin of heterosis. In this respect, mainly dominance effects were detected.
KW  - Quantitative Trait Locus
KW  - feature selection
KW  - Partial Little Square
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
Y1  - 2009
U6  - https://doi.org/10.1007/s00122-009-1191-2
SN  - 0040-5752
SN  - 1432-2242
VL  - 120
SP  - 239
EP  - 247
PB  - Springer
CY  - Berlin
ER  - 
TY  - JOUR
A1  - Sulpice, Ronan
A1  - Pyl, Eva-Theresa
A1  - Ishihara, Hirofumi
A1  - Trenkamp, Sandra
A1  - Steinfath, Matthias
A1  - Witucka-Wall, Hanna
A1  - Gibon, Yves
A1  - Usadel, Björn
A1  - Poree, Fabien
A1  - Piques, Maria Conceicao
A1  - von Korff, Maria
A1  - Steinhauser, Marie Caroline
A1  - Keurentjes, Joost J. B.
A1  - Guenther, Manuela
A1  - Hoehne, Melanie
A1  - Selbig, Joachim
A1  - Fernie, Alisdair R.
A1  - Altmann, Thomas
A1  - Stitt, Mark
T1  - Starch as a major integrator in the regulation of plant growth
N2  - Rising demand for food and bioenergy makes it imperative to breed for increased crop yield. Vegetative plant growth could be driven by resource acquisition or developmental programs. Metabolite profiling in 94 Arabidopsis accessions revealed that biomass correlates negatively with many metabolites, especially starch. Starch accumulates in the light and is degraded at night to provide a sustained supply of carbon for growth. Multivariate analysis revealed that starch is an integrator of the overall metabolic response. We hypothesized that this reflects variation in a regulatory network that balances growth with the carbon supply. Transcript profiling in 21 accessions revealed coordinated changes of transcripts of more than 70 carbon-regulated genes and identified 2 genes (myo-inositol-1- phosphate synthase, a Kelch-domain protein) whose transcripts correlate with biomass. The impact of allelic variation at these 2 loci was shown by association mapping, identifying them as candidate lead genes with the potential to increase biomass production.
Y1  - 2009
UR  - http://www.pnas.org/
U6  - https://doi.org/10.1073/pnas.0903478106
SN  - 0027-8424
ER  - 
TY  - JOUR
A1  - Lisec, Jan
A1  - Steinfath, Matthias
A1  - Meyer, Rhonda C.
A1  - Selbig, Joachim
A1  - Melchinger, Albrecht E.
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - Identification of heterotic metabolite QTL in Arabidopsis thaliana RIL and IL populations
N2  - Two mapping populations of a cross between the Arabidopsis thaliana accessions Col-0 and C24 were cultivated and analyzed with respect to the levels of 181 metabolites to elucidate the biological phenomenon of heterosis at the metabolic level. The relative mid-parent heterosis in the F-1 hybrids was <20% for most metabolic traits. The first mapping population consisting of 369 recombinant inbred lines (RILs) and their test cross progeny with both parents allowed us to determine the position and effect of 147 quantitative trait loci (QTL) for metabolite absolute mid-parent heterosis (aMPH). Furthermore, we identified 153 and 83 QTL for augmented additive (Z(1)) and dominance effects (Z(2)), respectively. We identified putative candidate genes for these QTL using the ARACYC database (http://www.arabidopsis.org/ biocyc), and calculated the average degree of dominance, which was within the dominance and over-dominance range for most metabolites. Analyzing a second population of 41 introgression lines (ILs) and their test crosses with the recurrent parent, we identified 634 significant differences in metabolite levels. Nine per cent of these effects were classified as over-dominant, according to the mode of inheritance. A comparison of both approaches suggested epistasis as a major contributor to metabolite heterosis in Arabidopsis. A linear combination of metabolite levels was shown to significantly correlate with biomass heterosis (r = 0.62).
Y1  - 2009
UR  - http://www3.interscience.wiley.com/cgi-bin/issn?DESCRIPTOR=PRINTISSN&VALUE=0960-7412
U6  - https://doi.org/10.1111/j.1365-313X.2009.03910.x
SN  - 0960-7412
ER  - 
TY  - JOUR
A1  - Hartmann, Stefanie
A1  - Helm, Conrad
A1  - Nickel, Birgit
A1  - Meyer, Matthias
A1  - Struck, Torsten H.
A1  - Tiedemann, Ralph
A1  - Selbig, Joachim
A1  - Bleidorn, Christoph
T1  - Exploiting gene families for phylogenomic analysis of myzostomid transcriptome data
JF  - PLoS one
N2  - Background: In trying to understand the evolutionary relationships of organisms, the current flood of sequence data offers great opportunities, but also reveals new challenges with regard to data quality, the selection of data for subsequent analysis, and the automation of steps that were once done manually for single-gene analyses. Even though genome or transcriptome data is available for representatives of most bilaterian phyla, some enigmatic taxa still have an uncertain position in the animal tree of life. This is especially true for myzostomids, a group of symbiotic ( or parasitic) protostomes that are either placed with annelids or flatworms.
 Methodology: Based on similarity criteria, Illumina-based transcriptome sequences of one myzostomid were compared to protein sequences of one additional myzostomid and 29 reference metazoa and clustered into gene families. These families were then used to investigate the phylogenetic position of Myzostomida using different approaches: Alignments of 989 sequence families were concatenated, and the resulting superalignment was analyzed under a Maximum Likelihood criterion. We also used all 1,878 gene trees with at least one myzostomid sequence for a supertree approach: the individual gene trees were computed and then reconciled into a species tree using gene tree parsimony.
 Conclusions: Superalignments require strictly orthologous genes, and both the gene selection and the widely varying amount of data available for different taxa in our dataset may cause anomalous placements and low bootstrap support. In contrast, gene tree parsimony is designed to accommodate multilocus gene families and therefore allows a much more comprehensive data set to be analyzed. Results of this supertree approach showed a well-resolved phylogeny, in which myzostomids were part of the annelid radiation, and major bilaterian taxa were found to be monophyletic.
Y1  - 2012
U6  - https://doi.org/10.1371/journal.pone.0029843
SN  - 1932-6203
VL  - 7
IS  - 1
PB  - PLoS
CY  - San Fransisco
ER  - 
TY  - JOUR
A1  - Hill, Natascha
A1  - Leow, Alexander
A1  - Bleidorn, Christoph
A1  - Groth, Detlef
A1  - Tiedemann, Ralph
A1  - Selbig, Joachim
A1  - Hartmann, Stefanie
T1  - Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information
JF  - Theory in biosciences
N2  - Many deep evolutionary divergences still remain unresolved, such as those among major taxa of the Lophotrochozoa. As alternative phylogenetic markers, the intron-exon structure of eukaryotic genomes and the patterns of absence and presence of spliceosomal introns appear to be promising. However, given the potential homoplasy of intron presence, the phylogenetic analysis of this data using standard evolutionary approaches has remained a challenge. Here, we used Mutual Information (MI) to estimate the phylogeny of Protostomia using gene structure data, and we compared these results with those obtained with Dollo Parsimony. Using full genome sequences from nine Metazoa, we identified 447 groups of orthologous sequences with 21,732 introns in 4,870 unique intron positions. We determined the shared absence and presence of introns in the corresponding sequence alignments and have made this data available in "IntronBase", a web-accessible and downloadable SQLite database. Our results obtained using Dollo Parsimony are obviously misled through systematic errors that arise from multiple intron loss events, but extensive filtering of data improved the quality of the estimated phylogenies. Mutual Information, in contrast, performs better with larger datasets, but at the same time it requires a complete data set, which is difficult to obtain for orthologs from a large number of taxa. Nevertheless, Mutual Information-based distances proved to be useful in analyzing this kind of data, also because the estimation of MI-based distances is independent of evolutionary models and therefore no pre-definitions of ancestral and derived character states are necessary.
KW  - Mutual Information
KW  - Evolution
KW  - Gene structure
Y1  - 2013
U6  - https://doi.org/10.1007/s12064-012-0173-0
SN  - 1431-7613
VL  - 132
IS  - 2
SP  - 93
EP  - 104
PB  - Springer
CY  - New York
ER  - 
TY  - BOOK
A1  - Hartmann, Stefanie
A1  - Selbig, Joachim
T1  - Introductory Bioinformatics
Y1  - 2009
SN  - 978-3-8370-5189-6
PB  - Books on Demand
CY  - Norderstedt
ER  -