Refine
Document Type
- Article (4)
- Doctoral Thesis (1)
Language
- English (5)
Is part of the Bibliography
- yes (5)
Keywords
- correlation (5) (remove)
Institute
- Institut für Biochemie und Biologie (5) (remove)
The St. Nicolas House Algorithm (SNHA) finds association chains of direct dependent variables in a data set. The dependency is based on the correlation coefficient, which is visualized as an undirected graph. The network prediction is improved by a bootstrap routine. It enables the computation of the empirical p-value, which is used to evaluate the significance of the predicted edges. Synthetic data generated with the Monte Carlo method were used to firstly compare the Python package with the original R package, and secondly to evaluate the predicted network using the sensitivity, specificity, balanced classification rate and the Matthew's correlation coefficient (MCC). The Python implementation yields the same results as the R package. Hence, the algorithm was correctly ported into Python. The SNHA scores high specificity values for all tested graphs. For graphs with high edge densities, the other evaluation metrics decrease due to lower sensitivity, which could be partially improved by using bootstrap,while for graphs with low edge densities the algorithm achieves high evaluation scores. The empirical p-values indicated that the predicted edges indeed are significant.
Background: We investigated height of Norwegian conscripts in view of the hypothesis of a "community effect on height" using autocorrelation analysis of district heights within a time-span of 20 years at the end of the 19th century and correlations between neighboring districts at this time. Material and methods: After digitalizing available body height data of Norwegian draftees in 1877-1878, 1880 (averaged as 1878), and 1895-1897 (averaged as 1896) we calculated the magnitude of autocorrelation of body height within the same municipality at different time points. Furthermore, we generated three different neighborhood networks, (1) based on Euclidean distances, (2) a minimum spanning tree build on those distances, (3) a network founded on real world road connections. The networks were used to determine the correlation between body height of neighboring districts depending on the number of edges required to connect two municipalities. Results: The autocorrelation value for body heights was around r = 0.5 (for all p < 0.001) in the years 1878 and 1896. The correlation between neighboring districts varied in the Euclidean distance based network between 0.47 and 0.27 approximately for both years in a sorted order, descending from nearest (0-50 km) to farthest (150-200 km, for all p < 0.001). First order neighbors in the minimum spanning tree network correlation was 0.36 in 1878 and 0.42 in 1896 (for all p < 0.001). The values of neighbor correlation in the road connection based network ranged in 1878 from 0.42 (first order neighbors) to 0.17 (forth order neighbors, for all p < 0.01) and in 1896 from 0.46 (first order neighbors) to 0.12 (forth order neighbors, for all p < 0.05). Conclusion: This initial study of Norwegian conscript height data from the 19th century showed significant medium sized effects for the within district autocorrelation between 1878 and 1896 as well as medium neighborhood correlation, slightly lower in comparison to a recent study regarding Swiss conscripts. Digitalizing more data from other years in this and later time spans as well as using older road and ship connections instead of the actual road data might stabilize and improve those findings.
Background: We investigated average body height in the central provinces of the Russian empire in the middle of XIX century in view of the concept of "community effects on height". We analyzed body height correlations between neighboring districts at this time. We added information about secular changes in body height during the 19th century of this territory. Material and methods: The study used height data of conscripts, which were born in the years 1853-1863, and age 21 at the time of measurement. The territory of seven provinces was considered as a network with 105 nodes, each node representing one district with information on average male body height. In order to define neighboring districts three different approaches were used: based on the "common borders" method, based on Euclidean distances (from 60 to 120 km), based on real road connections. Results: Small but significant correlation coefficients were observed between 1st order districts in the network based on Euclidean distance of 100 km (r = 0.256, p-value = 0.006) and based on "the common borders" approach (r = 0.25, p-value = 0.02). Wherein no significant correlations were observed in the network based on road connections and between second order neighbors regardless of the method. Conclusion: Height correlation coefficients between 1st order neighboring districts observed in the Russian districts were very similar to values observed in the Polish study (r = 0.24). The considered Russian territory and the territory of Poland have a lot in common. They consist of both plains without mountains. In contradistinction to Poland the transport infrastructure in Russia was weakly developed in the middle of XIX century. In addition, the mobility of people was limited by serfdom. In this context the absent of significant correlation of second order neighbors can be explained by low population density and lack of migration and communication between the districts.
Land-use intensification is a key driver of biodiversity change. However, little is known about how it alters relationships between the diversities of different taxonomic groups, which are often correlated due to shared environmental drivers and trophic interactions. Using data from 150 grassland sites, we examined how land-use intensification (increased fertilization, higher livestock densities, and increased mowing frequency) altered correlations between the species richness of 15 plant, invertebrate, and vertebrate taxa. We found that 54% of pairwise correlations between taxonomic groups were significant and positive among all grasslands, while only one was negative. Higher land-use intensity substantially weakened these correlations(35% decrease in rand 43% fewer significant pairwise correlations at high intensity), a pattern which may emerge as a result of biodiversity declines and the breakdown of specialized relationships in these conditions. Nevertheless, some groups (Coleoptera, Heteroptera, Hymenoptera and Orthoptera) were consistently correlated with multidiversity, an aggregate measure of total biodiversity comprised of the standardized diversities of multiple taxa, at both high and lowland-use intensity. The form of intensification was also important; increased fertilization and mowing frequency typically weakened plant-plant and plant-primary consumer correlations, whereas grazing intensification did not. This may reflect decreased habitat heterogeneity under mowing and fertilization and increased habitat heterogeneity under grazing. While these results urge caution in using certain taxonomic groups to monitor impacts of agricultural management on biodiversity, they also suggest that the diversities of some groups are reasonably robust indicators of total biodiversity across a range of conditions.
The past decades are characterized by various efforts to provide complete sequence information of genomes regarding various organisms. The availability of full genome data triggered the development of multiplex high-throughput assays allowing simultaneous measurement of transcripts, proteins and metabolites. With genome information and profiling technologies now in hand a highly parallel experimental biology is offering opportunities to explore and discover novel principles governing biological systems. Understanding biological complexity through modelling cellular systems represents the driving force which today allows shifting from a component-centric focus to integrative and systems level investigations. The emerging field of systems biology integrates discovery and hypothesis-driven science to provide comprehensive knowledge via computational models of biological systems. Within the context of evolving systems biology, investigations were made in large-scale computational analyses on transcript co-response data through selected prokaryotic and plant model organisms. CSB.DB - a comprehensive systems-biology database - (http://csbdb.mpimp-golm.mpg.de/) was initiated to provide public and open access to the results of biostatistical analyses in conjunction with additional biological knowledge. The database tool CSB.DB enables potential users to infer hypothesis about functional interrelation of genes of interest and may serve as future basis for more sophisticated means of elucidating gene function. The co-response concept and the CSB.DB database tool were successfully applied to predict operons in Escherichia coli by using the chromosomal distance and transcriptional co-responses. Moreover, examples were shown which indicate that transcriptional co-response analysis allows identification of differential promoter activities under different experimental conditions. The co-response concept was successfully transferred to complex organisms with the focus on the eukaryotic plant model organism Arabidopsis thaliana. The investigations made enabled the discovery of novel genes regarding particular physiological processes and beyond, allowed annotation of gene functions which cannot be accessed by sequence homology. GMD - the Golm Metabolome Database - was initiated and implemented in CSB.DB to integrated metabolite information and metabolite profiles. This novel module will allow addressing complex biological questions towards transcriptional interrelation and extent the recent systems level quest towards phenotyping.