TY  - GEN
A1  - Durek, Pawel
A1  - Schudoma, Christian
A1  - Weckwerth, Wolfram
A1  - Selbig, Joachim
A1  - Walther, Dirk
T1  - Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins
N2  - Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structurebased P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - paper 141 
KW  - Support vector machines
KW  - Microarray data
KW  - Docking interactions
KW  - Signal-transduction
KW  - Sequence alignment
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45129
ER  - 
TY  - GEN
A1  - Dworschak, Steve
A1  - Grell, Susanne
A1  - Nikiforova, Victoria J.
A1  - Schaub, Torsten H.
A1  - Selbig, Joachim
T1  - Modeling biological networks by action languages via answer set programming
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - We describe an approach to modeling biological networks by action languages via answer set programming. To this end, we propose an action language for modeling biological networks, building on previous work by Baral et al. We introduce its syntax and semantics along with a translation into answer set programming, an efficient Boolean Constraint Programming Paradigm. Finally, we describe one of its applications, namely, the sulfur starvation response-pathway of the model plant Arabidopsis thaliana and sketch the functionality of our system and its usage.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 843 
KW  - biological network model
KW  - action language
KW  - answer set programming
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429846
SN  - 1866-8372
IS  - 843
ER  - 
TY  - GEN
A1  - Gärtner, Tanja
A1  - Steinfath, Matthias
A1  - Andorf, Sandra
A1  - Lisec, Jan
A1  - Meyer, Rhonda C.
A1  - Altmann, Thomas
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
T1  - Improved heterosis prediction by combining information on DNA- and metabolic markers
N2  - Background: Hybrids represent a cornerstone in the success story of breeding programs. The fundamental principle underlying this success is the phenomenon of hybrid vigour, or heterosis. It describes an advantage of the offspring as compared to the two parental lines with respect to parameters such as growth and resistance against abiotic or biotic stress. Dominance, overdominance or epistasis based models are commonly used explanations. Conclusion/Significance: The heterosis level is clearly a function of the combination of the parents used for offspring production. This results in a major challenge for plant breeders, as usually several thousand combinations of parents have to be tested for identifying the best combinations. Thus, any approach to reliably predict heterosis levels based on properties of the parental lines would be highly beneficial for plant breeding. Methodology/Principal Findings: Recently, genetic data have been used to predict heterosis. Here we show that a combination of parental genetic and metabolic markers, identified via feature selection and minimum-description-length based regression methods, significantly improves the prediction of biomass heterosis in resulting offspring. These findings will help furthering our understanding of the molecular basis of heterosis, revealing, for instance, the presence of nonlinear genotype-phenotype relationships. In addition, we describe a possible approach for accelerated selection in plant breeding.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - paper 142 
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45132
ER  - 
TY  - GEN
A1  - Hische, Manuela
A1  - Larhlimi, Abdelhalim
A1  - Schwarz, Franziska
A1  - Fischer-Rosinský, Antje
A1  - Bobbert, Thomas
A1  - Assmann, Anke
A1  - Catchpole, Gareth S.
A1  - Pfeiffer, Andreas F. H.
A1  - Willmitzer, Lothar
A1  - Selbig, Joachim
A1  - Spranger, Joachim
T1  - A distinct metabolic signature predictsdevelopment of fasting plasma glucose
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background

High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.

Methods

We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.

Results

We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.

Conclusions

We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 850 
KW  - prediction
KW  - fasting glucose
KW  - type 2 diabetes
KW  - metabolomics
KW  - plasma
KW  - random forest
KW  - metabolite
KW  - regression
KW  - biomarker
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427400
SN  - 1866-8372
IS  - 850
ER  - 
TY  - GEN
A1  - Köhl, Karin I.
A1  - Basler, Georg
A1  - Lüdemann, Alexander
A1  - Selbig, Joachim
A1  - Walther, Dirk
T1  - A plant resource and experiment management system based on the Golm Plant Database as a basic tool for omics research
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background: For omics experiments, detailed characterisation of experimental material with respect to its genetic features, its cultivation history and its treatment history is a requirement for analyses by bioinformatics tools and for publication needs. Furthermore, meta-analysis of several experiments in systems biology based approaches make it necessary to store this information in a standardised manner, preferentially in relational databases. In the Golm Plant Database System, we devised a data management system based on a classical Laboratory Information Management System combined with web-based user interfaces for data entry and retrieval to collect this information in an academic environment.

Results: The database system contains modules representing the genetic features of the germplasm, the experimental conditions and the sampling details. In the germplasm module, genetically identical lines of biological material are generated by defined workflows, starting with the import workflow, followed by further workflows like genetic modification (transformation), vegetative or sexual reproduction. The latter workflows link lines and thus create pedigrees. For experiments, plant objects are generated from plant lines and united in so-called cultures, to which the cultivation conditions are linked. Materials and methods for each cultivation step are stored in a separate ACCESS database of the plant cultivation unit. For all cultures and thus every plant object, each cultivation site and the culture's arrival time at a site are logged by a barcode-scanner based system. Thus, for each plant object, all site-related parameters, e. g. automatically logged climate data, are available. These life history data and genetic information for the plant objects are linked to analytical results by the sampling module, which links sample components to plant object identifiers. This workflow uses controlled vocabulary for organs and treatments. Unique names generated by the system and barcode labels facilitate identification and management of the material. Web pages are provided as user interfaces to facilitate maintaining the system in an environment with many desktop computers and a rapidly changing user community. Web based search tools are the basis for joint use of the material by all researchers of the institute.

Conclusion: The Golm Plant Database system, which is based on a relational database, collects the genetic and environmental information on plant material during its production or experimental use at the Max-Planck-Institute of Molecular Plant Physiology. It thus provides information according to the MIAME standard for the component 'Sample' in a highly standardised format. The Plant Database system thus facilitates collaborative work and allows efficient queries in data analysis for systems biology research.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 830 
KW  - microarray data
KW  - arabidopsis
KW  - information
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427595
IS  - 830
ER  - 
TY  - GEN
A1  - Larhlimi, Abdelhalim
A1  - David, Laszlo
A1  - Selbig, Joachim
A1  - Bockmayr, Alexander
T1  - F2C2
BT  - a fast tool for the computation of flux coupling in genome-scale metabolic networks
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: Flux coupling analysis (FCA) has become a useful tool in the constraint-based analysis of genome-scale metabolic networks. FCA allows detecting dependencies between reaction fluxes of metabolic networks at steady-state. On the one hand, this can help in the curation of reconstructed metabolic networks by verifying whether the coupling between reactions is in agreement with the experimental findings. On the other hand, FCA can aid in defining intervention strategies to knock out target reactions.

Results: We present a new method F2C2 for FCA, which is orders of magnitude faster than previous approaches. As a consequence, FCA of genome-scale metabolic networks can now be performed in a routine manner.

Conclusions: We propose F2C2 as a fast tool for the computation of flux coupling in genome-scale metabolic networks. F2C2 is freely available for non-commercial use at https://sourceforge.net/projects/f2c2/files/.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 921 
KW  - balance analysis
KW  - reconstruction
KW  - pathways
KW  - models
KW  - metabolic network
KW  - couple reaction
KW  - reversible reaction
KW  - linear programming problem
KW  - coupling relationship
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-432431
SN  - 1866-8372
IS  - 921
ER  - 
TY  - GEN
A1  - Meyer, Rhonda Christiane
A1  - Kusterer, Barbara
A1  - Lisec, Jan
A1  - Steinfath, Matthias
A1  - Becher, Martina
A1  - Scharr, Hanno
A1  - Melchinger, Albrecht E.
A1  - Selbig, Joachim
A1  - Schurr, Ulrich
A1  - Willmitzer, Lothar
A1  - Altmann, Thomas
T1  - QTL analysis of early stage heterosis for biomass in Arabidopsis
T2  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - The main objective of this study was to identify genomic regions involved in biomass heterosis using QTL, generation means, and mode-of-inheritance classification analyses. In a modified North Carolina Design III we backcrossed 429 recombinant inbred line and 140 introgression line populations to the two parental accessions, C24 and Col-0, whose F 1 hybrid exhibited 44% heterosis for biomass. Mid-parent heterosis in the RILs ranged from −31 to 99% for dry weight and from −58 to 143% for leaf area. We detected ten genomic positions involved in biomass heterosis at an early developmental stage, individually explaining between 2.4 and 15.7% of the phenotypic variation. While overdominant gene action was prevalent in heterotic QTL, our results suggest that a combination of dominance, overdominance and epistasis is involved in biomass heterosis in this Arabidopsis cross.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1330 
KW  - Quantitative Trait Locus
KW  - recombinant inbred line
KW  - Quantitative Trait Locus analysis
KW  - dominance effect
KW  - recombinant inbred line population
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-431272
SN  - 1866-8372
IS  - 1330
ER  - 
TY  - GEN
A1  - Neigenfind, Jost
A1  - Gyetvai, Gabor
A1  - Basekow, Rico
A1  - Diehl, Svenja
A1  - Achenbach, Ute
A1  - Gebhardt, Christiane
A1  - Selbig, Joachim
A1  - Kersten, Birgit
T1  - Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT
T2  - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
N2  - Background: Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous.

Results: Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.

Conclusion: Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at http://www.gabipd.org/projects/satlotyper/. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 883 
KW  - linkage disequilibrium
KW  - pure parsimony
KW  - potato
KW  - resistance
KW  - efficient
KW  - solanum
KW  - Conjunctive Normal Form
KW  - Full Adder
KW  - Disjunctive Normal Form
KW  - Haplotype Inference
KW  - Genotype Inference
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-435011
SN  - 1866-8372
IS  - 883
ER  - 
TY  - GEN
A1  - Rajasundaram, Dhivyaa
A1  - Selbig, Joachim
T1  - More effort — more results
BT  - recent advances in integrative ‘omics’ data analysis
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - The development of 'omics' technologies has progressed to address complex biological questions that underlie various plant functions thereby producing copious amounts of data. The need to assimilate large amounts of data into biologically meaningful interpretations has necessitated the development of statistical methods to integrate multidimensional information. Throughout this review, we provide examples of recent outcomes of 'omics' data integration together with an overview of available statistical methods and tools.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 923 
KW  - principal component
KW  - plant biology
KW  - package
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-442639
SN  - 1866-8372
IS  - 923
SP  - 57
EP  - 61
ER  - 
TY  - GEN
A1  - Repsilber, Dirk
A1  - Kern, Sabine
A1  - Telaar, Anna
A1  - Walzl, Gerhard
A1  - Black, Gillian F.
A1  - Selbig, Joachim
A1  - Parida, Shreemanta K.
A1  - Kaufmann, Stefan H. E.
A1  - Jacobsen, Marc
T1  - Biomarker discovery in heterogeneous tissue samples
BT  - taking the in-silico deconfounding approach
T2  - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe
N2  - Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.

Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.

Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.
T3  - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 854 
KW  - differential gene expression
KW  - quantile normalization
KW  - heterogeneous tissue
KW  - gene expression matrix
KW  - homogeneous cell population
KW  - selection
KW  - microdissection
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-429343
SN  - 1866-8372
IS  - 854
ER  -