publish.UP Search

Non-linear PCA : a missing data approach (2005)

Scholz, Matthias ; Kaplan, F. ; Guy, C. L. ; Kopka, Joachim ; Selbig, Joachim

Motivation: Visualizing and analysing the potential non-linear structure of a dataset is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs non-linear principal component analysis (NLPCA) from incomplete datasets. Missing values are ignored while optimizing the model, but can be estimated afterwards. Results are shown for both artificial and experimental datasets. In contrast to linear methods, non-linear methods were able to give better missing value estimations for non-linear structured data. Application: We applied this technique to a time course of metabolite data from a cold stress experiment on the model plant Arabidopsis thaliana, and could approximate the mapping function from any time point to the metabolite responses. Thus, the inverse NLPCA provides greatly improved information for better understanding the complex response to cold stress

Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus (2010)

Hische, Manuela ; Luis-Dominguez, Olga ; Pfeiffer, Andreas F. H. ; Schwarz, Peter E. ; Selbig, Joachim ; Spranger, Joachim

Objective: The prevalence of unknown impaired fasting glucose (IFG), impaired glucose tolerance (IGT), or type 2 diabetes mellitus (T2DM) is high. Numerous studies demonstrated that IFG, IGT, or T2DM are associated with increased cardiovascular risk, therefore an improved identification strategy would be desirable. The objective of this study was to create a simple and reliable tool to identify individuals with impaired glucose metabolism (IGM). Design and methods: A cohort of 1737 individuals (1055 controls, 682 with previously unknown IGM) was screened by 75 g oral glucose tolerance test (OGTT). Supervised machine learning was used to automatically generate decision trees to identify individuals with IGM. To evaluate the accuracy of identification, a tenfold cross-validation was performed. Resulting trees were subsequently re-evaluated in a second, independent cohort of 1998 individuals (1253 controls, 745 unknown IGM). Results: A clinical decision tree included age and systolic blood pressure (sensitivity 89.3%, specificity 37.4%, and positive predictive value (PPV) 48.0%), while a tree based on clinical and laboratory data included fasting glucose and systolic blood pressure (sensitivity 89.7%, specificity 54.6%, and PPV 56.2%). The inclusion of additional parameters did not improve test quality. The external validation approach confirmed the presented decision trees. Conclusion: We proposed a simple tool to identify individuals with existing IGM. From a practical perspective, fasting blood glucose and blood pressure measurements should be regularly measured in all individuals presenting in outpatient clinics. An OGTT appears to be useful only if the subjects are older than 48 years or show abnormalities in fasting glucose or blood pressure.

Discovering plant metabolic biomarkers for phenotype prediction using an untargeted approach (2010)

Steinfath, Matthias ; Strehmel, Nadine ; Peters, Rolf ; Schauer, Nicolas ; Groth, Detlef ; Hummel, Jan ; Steup, Martin ; Selbig, Joachim ; Kopka, Joachim ; Geigenberger, Peter ; Dongen, Joost T. van

Biomarkers are used to predict phenotypical properties before these features become apparent and, therefore, are valuable tools for both fundamental and applied research. Diagnostic biomarkers have been discovered in medicine many decades ago and are now commonly applied. While this is routine in the field of medicine, it is of surprise that in agriculture this approach has never been investigated. Up to now, the prediction of phenotypes in plants was based on growing plants and assaying the organs of interest in a time intensive process. For the first time, we demonstrate in this study the application of metabolomics to predict agronomic important phenotypes of a crop plant that was grown in different environments. Our procedure consists of established techniques to screen untargeted for a large amount of metabolites in parallel, in combination with machine learning methods. By using this combination of metabolomics and biomathematical tools metabolites were identified that can be used as biomarkers to improve the prediction of traits. The predictive metabolites can be selected and used subsequently to develop fast, targeted and low-cost diagnostic biomarker assays that can be implemented in breeding programs or quality assessment analysis. The identified metabolic biomarkers allow for the prediction of crop product quality. Furthermore, marker-assisted selection can benefit from the discovery of metabolic biomarkers when other molecular markers come to its limitation. The described marker selection method was developed for potato tubers, but is generally applicable to any crop and trait as it functions independently of genomic information.

Predicting impaired glucose metabolism in women with polycystic ovary syndrome by decision tree modelling (2006)

Moehlig, M. ; Floeter, A. ; Spranger, Joachim ; Weickert, Martin O. ; Schill, T. ; Schloesser, H. W. ; Brabant, G. ; Pfeiffer, Andreas F. H. ; Selbig, Joachim ; Schoefl, C.

Aims/hypothesis Polycystic ovary syndrome (PCOS) is a risk factor of type 2 diabetes. Screening for impaired glucose metabolism (IGM) with an OGTT has been recommended, but this is relatively time-consuming and inconvenient. Thus, a strategy that could minimise the need for an OGTT would be beneficial. Materials and methods Consecutive PCOS patients (n=118) with fasting glucose < 6.1 mmol/l were included in the study. Parameters derived from medical history, clinical examination and fasting blood samples were assessed by decision tree modelling for their ability to discriminate women with IGM (2-h OGTT value >= 7.8 mmol/l) from those with NGT. Results According to the OGTT results, 93 PCOS women had NGT and 25 had IGM. The best decision tree consisted of HOMA-IR, the proinsulin:insulin ratio, proinsulin, 17-OH progesterone and the ratio of luteinising hormone:follicle-stimulating hormone. This tree identified 69 women with NGT. The remaining 49 women included all women with IGM (100% sensitivity, 74% specificity to detect IGM). Pruning this tree to three levels still identified 53 women with NGT (100% sensitivity, 57% specificity to detect IGM). Restricting the data matrix used for tree modelling to medical history and clinical parameters produced a tree using BMI, waist circumference and WHR. Pruning this tree to two levels separated 27 women with NGT (100% sensitivity, 29% specificity to detect IGM). The validity of both trees was tested by a leave-10%-out cross-validation. Conclusions/interpretation Decision trees are useful tools for separating PCOS women with NGT from those with IGM. They can be used for stratifying the metabolic screening of PCOS women, whereby the number of OGTTs can be markedly reduced.

Validation and functional annotation of expression-based clusters based on gene ontology (2006)

Steuer, Ralf ; Humburg, Peter ; Selbig, Joachim

Background: The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. Results: In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12: 1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. Conclusion: We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.

Modelling biological networks by action languages via set programming (2006)

Grell, Susanne ; Schaub, Torsten H. ; Selbig, Joachim

Computational methods for the design of effective therapies against drug resistant HIV strains (2005)

Beerenwinkel, Niko ; Sing, Tobias ; Lengauer, Thomas ; Rahnenfuhrer, Joerg ; Roomp, Kirsten ; Savenkov, Igor ; Fischer, Roman ; Hoffmann, Daniel ; Selbig, Joachim ; Korn, Klaus ; Walter, Hauke ; Berg, Thomas ; Braun, Patrick ; Faetkenheuer, Gerd ; Oette, Mark ; Rockstroh, Juergen ; Kupfer, Bernd ; Kaiser, Rolf ; Daeumer, Martin

The development of drug resistance is a major obstacle to successful treatment of HIV infection. The extraordinary replication dynamics of HIV facilitates its escape from selective pressure exerted by the human immune system and by combination drug therapy. We have developed several computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genomic data

Species-specific analysis of protein sequence motifs using mutual information (2005)

Hummel, Jan ; Keshvari, N. ; Weckwerth, Wolfram ; Selbig, Joachim

Background: Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. Such profiles conveniently reflect functional necessities by pointing out proximity at conserved sequence positions as well as depicting distances at variable positions. Discovering significant conservation characteristics within the variable positions of profiles mirrors group-specific and, in particular, evolutionary features of the underlying sequences. Results: We describe the tool PROfile analysis based on Mutual Information (PROMI) that enables comparative analysis of user-classified protein sequences. PROMI is implemented as a web service using Perl and R as well as other publicly available packages and tools on the server-side. On the client-side platform-independence is achieved by generally applied internet delivery standards. As one possible application analysis of the zinc finger C2H2-type protein domain is introduced to illustrate the functionality of the tool. Conclusion: The web service PROMI should assist researchers to detect evolutionary correlations in protein profiles of defined biological sequences. It is available at http:// promi.mpimpgolm. mpg.de where additional documentation can be found

Bioinformatics approach to predicting HIV drug resistance (2006)

Cordes, Frank ; Kaiser, Rolf ; Selbig, Joachim

The emergence of drug resistance remains one of the most challenging issues in the treatment of HIV-1 infection. The extreme replication dynamics of HIV facilitates its escape from the selective pressure exerted by the human immune system and by the applied combination drug therapy. This article reviews computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genotypic and phenotypic data. Genotypic assays are based on the analysis of mutations associated with reduced drug susceptibility, but are difficult to interpret due to the numerous mutations and mutational patterns that confer drug resistance. Phenotypic resistance or susceptibility can be experimentally evaluated by measuring the inhibition of the viral replication in cell culture assays. However, this procedure is expensive and time consuming

Finding metabolic pathways in decision forests (2004)

Flöter, André ; Selbig, Joachim ; Schaub, Torsten H.

Threshold extraction in metabolite concentration data (2004)

Flöter, André ; Nicolas, Jacques ; Schaub, Torsten H. ; Selbig, Joachim

Motivation: Continued development of analytical techniques based on gas chromatography and mass spectrometry now facilitates the generation of larger sets of metabolite concentration data. An important step towards the understanding of metabolite dynamics is the recognition of stable states where metabolite concentrations exhibit a simple behaviour. Such states can be characterized through the identification of significant thresholds in the concentrations. But general techniques for finding discretization thresholds in continuous data prove to be practically insufficient for detecting states due to the weak conditional dependences in concentration data. Results: We introduce a method of recognizing states in the framework of decision tree induction. It is based upon a global analysis of decision forests where stability and quality are evaluated. It leads to the detection of thresholds that are both comprehensible and robust. Applied to metabolite concentration data, this method has led to the discovery of hidden states in the corresponding variables. Some of these reflect known properties of the biological experiments, and others point to putative new states

Estimating mutual information using B-spline functions : an improved similarity measure for analysing gene expression data (2004)

Daub, Carsten O. ; Steuer, Ralf ; Selbig, Joachim ; Kloska, Sebastian

Background: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non- commercial use from kloska@scienion.de upon request. Conclusion: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended

Threshold extraction in metabolite concentration data (2003)

Flöter, André ; Nicolas, Jacques ; Schaub, Torsten H. ; Selbig, Joachim

Stromal-Derived Factor 1a (Sdf-1a), a Homing Factor for Mesenchymal Progenitor Cells, Is Elevated in Tumor Tissue and Plasma of Glioma Patients (2010)

Timmer, Marco ; Theiss, Hans ; Jürchott, Katrin ; Ries, Christian ; Paron, Igor ; Franz, W. ; Selbig, Joachim ; Guo, Ketai ; Tonn, Jörg ; Schichor, Christian

Malignant gliomas are a fatal disease lacking sufficient possibilities for early diagnosis and chemical markers to detect remission or relapse. The recruitment of progenitor cells such as mesenchymal stem cells (MSC) is a main feature of gliomas. Stromal cell-derived factor-1 (SDF-1), a chemokine produced in glioma cell lines, enhances migration in MSC and has been associated with cell survival and apoptosis in gliomas. Therefore, this study was performed to evaluate (i) whether SDF-1 and its receptors are expressed in human malignant gliomas in situ and (ii) if SDF-1 might potentially play a role in recruiting MSCs into human glioma. In glioblastoma tissue, immunohistochemistry revealed that SDF-1 and its receptor CXCR4 are expressed in regions of angiogenesis and necrosis, and qPCR showed that SDF-1 is elevated. Public expression data indicated that CXCR4 was upregulated. The latter data also illustrate that SDF-1 could be up- or downregulated in glioma compared to normal brain in a transcript-specific manner. In plasma, SDF-1 is elevated in glioma patients. The level is reduced by both dexamethasone intake and surgery. Dexamethasone also decreased SDF-1 production in cells in vitro. The undirected migration of human MSC (hMSC) was not enhanced by the addition of SDF-1. However, SDF-1 stimulated directed invasion of hMSC in a dose-dependent manner. Taken together, we show that SDF-1 is a potent chemoattractant of progenitor cells such as hMSCs and that its expression is elevated in glioma tissue, which results in elevated SDF-1 levels in the patient's plasma samples with concomittant decrease after tumor resection. The fact that elevated SDF-1 plasma levels are significantly decreased after tumor surgery could be a first hint that SDF-1 might act as tumor marker for malignant gliomas in order to detect disease progression or remission, respectively.

Kinetic hybrid models composed of mechanistic and simplified enzymatic rate laws : a promising method for speeding up the kinetic modelling of complex metabolic networks (2009)

Bulik, Sascha ; Grimbs, Sergio ; Huthmacher, Carola ; Selbig, Joachim ; Holzhutter, Hermann G.

Kinetic modelling of complex metabolic networks - a central goal of computational systems biology - is currently hampered by the lack of reliable rate equations for the majority of the underlying biochemical reactions and membrane transporters. On the basis of biochemically substantiated evidence that metabolic control is exerted by a narrow set of key regulatory enzymes, we propose here a hybrid modelling approach in which only the central regulatory enzymes are described by detailed mechanistic rate equations, and the majority of enzymes are approximated by simplified (nonmechanistic) rate equations (e.g. mass action, LinLog, Michaelis-Menten and power law) capturing only a few basic kinetic features and hence containing only a small number of parameters to be experimentally determined. To check the reliability of this approach, we have applied it to two different metabolic networks, the energy and redox metabolism of red blood cells, and the purine metabolism of hepatocytes, using in both cases available comprehensive mechanistic models as reference standards. Identification of the central regulatory enzymes was performed by employing only information on network topology and the metabolic data for a single reference state of the network [Grimbs S, Selbig J, Bulik S, Holzhutter HG & Steuer R (2007) Mol Syst Biol3, 146, doi:10.1038/msb4100186]. Calculations of stationary and temporary states under various physiological challenges demonstrate the good performance of the hybrid models. We propose the hybrid modelling approach as a means to speed up the development of reliable kinetic models for complex metabolic networks.

Mesenchymal stem cells and glioma cells form a structural as well as a functional syncytium in vitro (2012)

Schichor, Christian ; Albrecht, Valerie ; Korte, Benjamin ; Buchner, Alexander ; Riesenberg, Rainer ; Mysliwietz, Josef ; Paron, Igor ; Motaln, Helena ; Turnsek, Tamara Lah ; Juerchott, Kathrin ; Selbig, Joachim ; Tonn, Jörg-Christian

The interaction of human mesenchymal stem cells (hMSCs) and tumor cells has been investigated in various contexts. HMSCs are considered as cellular treatment vectors based on their capacity to migrate towards a malignant lesion. However, concerns about unpredictable behavior of transplanted hMSCs are accumulating. In malignant gliomas, the recruitment mechanism is driven by glioma-secreted factors which lead to accumulation of both, tissue specific stem cells as well as bone marrow derived hMSCs within the tumor. The aim of the present work was to study specific cellular interactions between hMSCs and glioma cells in vitro. We show, that glioma cells as well as hMSCs differentially express connexins. and that they interact via gap-junctional coupling. Besides this so-called functional syncytium formation, we also provide evidence of cell fusion events (structural syncytium). These complex cellular interactions led to an enhanced migration and altered proliferation of both, tumor and mesenchymal stem cell types in vitro. The presented work shows that glioma cells display signs of functional as well as structural syncytium formation with hMSCs in vitro. The described cellular phenomena provide new insight into the complexity of interaction patterns between tumor cells and host cells. Based on these findings, further studies are warranted to define the impact of a functional or structural syncytium formation on malignant tumors and cell based therapies in vivo.

MAPA Distinguishes genotype-specific variability of highly similar regulatory protein isoforms in potato tuber (2011)

Höhenwarter, Wolfgang ; Larhlimi, Abdelhalim ; Hummel, Jan ; Egelhofer, Volker ; Selbig, Joachim ; van Dongen, Joost T. ; Wienkoop, Stefanie ; Weckwerth, Wolfram

Mass Accuracy Precursor Alignment is a fast and flexible method for comparative proteome analysis that allows the comparison of unprecedented numbers of shotgun proteomics analyses on a personal computer in a matter of hours. We compared 183 LC-MS analyses and more than 2 million MS/MS spectra and could define and separate the proteomic phenotypes of field grown tubers of 12 tetraploid cultivars of the crop plant Solanum tuberosum. Protein isoforms of patatin as well as other major gene families such as lipoxygenase and cysteine protease inhibitor that regulate tuber development were found to be the primary source of variability between the cultivars. This suggests that differentially expressed protein isoforms modulate genotype specific tuber development and the plant phenotype. We properly assigned the measured abundance of tryptic peptides to different protein isoforms that share extensive stretches of primary structure and thus inferred their abundance. Peptides unique to different protein isoforms were used to classify the remaining peptides assigned to the entire subset of isoforms based on a common abundance profile using multivariate statistical procedures. We identified nearly 4000,proteins which we used for quantitative functional annotation making this the most extensive study of the tuber proteome to date.

Comparison of metabolite profiles in U87 glioma cells and mesenchymal stem cells (2011)

Juerchott, Kathrin ; Guo, Ke-Tai ; Catchpole, Gareth ; Feher, Kristen ; Willmitzer, Lothar ; Schichor, Christian ; Selbig, Joachim

Gas chromatography-mass spectrometry (GC-MS) profiles were generated from U87 glioma cells and human mesenchymal stem cells (hMSC). 37 metabolites representing glycolysis intermediates, TCA cycle metabolites, amino acids and lipids were selected for a detailed analysis. The concentrations of these. metabolites were compared and Pearson correlation coefficients were used to calculate the relationship between pairs of metabolites. Metabolite profiles and correlation patterns differ significantly between the two cell lines. These profiles can be considered as a signature of the underlying biochemical system and provide snap-shots of the metabolism in mesenchymal stem cells and tumor cells.

Refined elasticity sampling for Monte Carlo-based identification of stabilizing network patterns (2015)

Childs, Dorothee ; Grimbs, Sergio ; Selbig, Joachim

Motivation: Structural kinetic modelling (SKM) is a framework to analyse whether a metabolic steady state remains stable under perturbation, without requiring detailed knowledge about individual rate equations. It provides a representation of the system's Jacobian matrix that depends solely on the network structure, steady state measurements, and the elasticities at the steady state. For a measured steady state, stability criteria can be derived by generating a large number of SKMs with randomly sampled elasticities and evaluating the resulting Jacobian matrices. The elasticity space can be analysed statistically in order to detect network positions that contribute significantly to the perturbation response. Here, we extend this approach by examining the kinetic feasibility of the elasticity combinations created during Monte Carlo sampling. Results: Using a set of small example systems, we show that the majority of sampled SKMs would yield negative kinetic parameters if they were translated back into kinetic models. To overcome this problem, a simple criterion is formulated that mitigates such infeasible models. After evaluating the small example pathways, the methodology was used to study two steady states of the neuronal TCA cycle and the intrinsic mechanisms responsible for their stability or instability. The findings of the statistical elasticity analysis confirm that several elasticities are jointly coordinated to control stability and that the main source for potential instabilities are mutations in the enzyme alpha-ketoglutarate dehydrogenase.

Systematic analysis of stability patterns in plant primary metabolism (2012)

Girbig, Dorothee ; Grimbs, Sergio ; Selbig, Joachim

Metabolic networks are characterized by complex interactions and regulatory mechanisms between many individual components. These interactions determine whether a steady state is stable to perturbations. Structural kinetic modeling (SKM) is a framework to analyze the stability of metabolic steady states that allows the study of the system Jacobian without requiring detailed knowledge about individual rate equations. Stability criteria can be derived by generating a large number of structural kinetic models (SK-models) with randomly sampled parameter sets and evaluating the resulting Jacobian matrices. Until now, SKM experiments applied univariate tests to detect the network components with the largest influence on stability. In this work, we present an extended SKM approach relying on supervised machine learning to detect patterns of enzyme-metabolite interactions that act together in an orchestrated manner to ensure stability. We demonstrate its application on a detailed SK-model of the Calvin-Benson cycle and connected pathways. The identified stability patterns are highly complex reflecting that changes in dynamic properties depend on concerted interactions between several network components. In total, we find more patterns that reliably ensure stability than patterns ensuring instability. This shows that the design of this system is strongly targeted towards maintaining stability. We also investigate the effect of allosteric regulators revealing that the tendency to stability is significantly increased by including experimentally determined regulatory mechanisms that have not yet been integrated into existing kinetic models.

Author(s)
Title
Additional Person(s)
Referee(s)
Abstract
Fulltext

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

52 search hits