Refine
Year of publication
- 2010 (2) (remove)
Document Type
- Article (1)
- Doctoral Thesis (1)
Language
- English (2)
Is part of the Bibliography
- yes (2) (remove)
Keywords
- Genotypisierung (1)
- RNA (1)
- Support-Vektor-Maschine (1)
- Vereinigungs-Mapping (1)
- association mapping (1)
- de novo ncRNA Vorhersage (1)
- de novo ncRNA prediction (1)
- genotyping (1)
- support vector machine (1)
Institute
The genome can be considered the blueprint for an organism. Composed of DNA, it harbours all organism-specific instructions for the synthesis of all structural components and their associated functions. The role of carriers of actual molecular structure and functions was believed to be exclusively assumed by proteins encoded in particular segments of the genome, the genes. In the process of converting the information stored genes into functional proteins, RNA – a third major molecule class – was discovered early on to act a messenger by copying the genomic information and relaying it to the protein-synthesizing machinery. Furthermore, RNA molecules were identified to assist in the assembly of amino acids into native proteins. For a long time, these - rather passive - roles were thought to be the sole purpose of RNA. However, in recent years, new discoveries have led to a radical revision of this view. First, RNA molecules with catalytic functions - thought to be the exclusive domain of proteins - were discovered. Then, scientists realized that much more of the genomic sequence is transcribed into RNA molecules than there are proteins in cells begging the question what the function of all these molecules are. Furthermore, very short and altogether new types of RNA molecules seemingly playing a critical role in orchestrating cellular processes were discovered. Thus, RNA has become a central research topic in molecular biology, even to the extent that some researcher dub cells as “RNA machines”. This thesis aims to contribute towards our understanding of RNA-related phenomena by applying Bioinformatics means. First, we performed a genome-wide screen to identify sites at which the chemical composition of DNA (the genotype) critically influences phenotypic traits (the phenotype) of the model plant Arabidopsis thaliana. Whole genome hybridisation arrays were used and an informatics strategy developed, to identify polymorphic sites from hybridisation to genomic DNA. Following this approach, not only were genotype-phenotype associations discovered across the entire Arabidopsis genome, but also regions not currently known to encode proteins, thus representing candidate sites for novel RNA functional molecules. By statistically associating them with phenotypic traits, clues as to their particular functions were obtained. Furthermore, these candidate regions were subjected to a novel RNA-function classification prediction method developed as part of this thesis. While determining the chemical structure (the sequence) of candidate RNA molecules is relatively straightforward, the elucidation of its structure-function relationship is much more challenging. Towards this end, we devised and implemented a novel algorithmic approach to predict the structural and, thereby, functional class of RNA molecules. In this algorithm, the concept of treating RNA molecule structures as graphs was introduced. We demonstrate that this abstraction of the actual structure leads to meaningful results that may greatly assist in the characterization of novel RNA molecules. Furthermore, by using graph-theoretic properties as descriptors of structure, we indentified particular structural features of RNA molecules that may determine their function, thus providing new insights into the structure-function relationships of RNA. The method (termed Grapple) has been made available to the scientific community as a web-based service. RNA has taken centre stage in molecular biology research and novel discoveries can be expected to further solidify the central role of RNA in the origin and support of life on earth. As illustrated by this thesis, Bioinformatics methods will continue to play an essential role in these discoveries.
Background: Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Single Feature Polymorphism (SFP) detection via genomic DNA hybridization to Arabidopsis Tiling 1.0 Arrays for the detection of selective sweeps, and identification of associations between sweep regions and growth-related metabolic traits. Results: A total of 1,072,557 high-quality SFPs were detected and indications for 3,943 deletions and 1,007 duplications were obtained. A significantly lower than expected SFP frequency was observed in protein-, rRNA-, and tRNA-coding regions and in non- repetitive intergenic regions, while pseudogenes, transposons, and non-coding RNA genes are enriched with SFPs. Gene families involved in plant defence or in signalling were identified as highly polymorphic, while several other families including transcription factors are depleted of SFPs. 198 significant associations between metabolic genes and 9 metabolic and growth-related phenotypic traits were detected with annotation hinting at the nature of the relationship. Five significant selective sweep regions were also detected of which one associated significantly with a metabolic trait. Conclusions: We generated a high density polymorphism map for 54 A. thaliana accessions that highlights the variability of resistance genes across geographic ranges and used it to identify selective sweeps and associations between metabolic genes and metabolic phenotypes. Several associations show a clear biological relationship, while many remain requiring further investigation.