Refine
Has Fulltext
- no (3)
Document Type
- Article (3)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- biological robustness (1)
- computational biochemistry (1)
- metabolic regulation (1)
- metabolomics (1)
- systems biology (1)
Background: The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. Results: In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12: 1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. Conclusion: We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.
Background: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non- commercial use from kloska@scienion.de upon request. Conclusion: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended
To develop and investigate detailed mathematical models of metabolic processes is one of the primary challenges in systems biology. However, despite considerable advance in the topological analysis of metabolic networks, kinetic modeling is still often severely hampered by inadequate knowledge of the enzyme-kinetic rate laws and their associated parameter values. Here we propose a method that aims to give a quantitative account of the dynamical capabilities of a metabolic system, without requiring any explicit information about the functional form of the rate equations. Our approach is based on constructing a local linear model at each point in parameter space, such that each element of the model is either directly experimentally accessible or amenable to a straightforward biochemical interpretation. This ensemble of local linear models, encompassing all possible explicit kinetic models, then allows for a statistical exploration of the comprehensive parameter space. The method is exemplified on two paradigmatic metabolic systems: the glycolytic pathway of yeast and a realistic-scale representation of the photosynthetic Calvin cycle.