Refine
Has Fulltext
- no (3)
Document Type
- Article (3)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Institute
- Institut für Biochemie und Biologie (3) (remove)
Over the last decade the modeling and the storage of biological data has been a topic of wide interest for scientists dealing with biological and biomedical research. Currently most data is still stored in text files which leads to data redundancies and file chaos. In this paper we show how to use relational modeling techniques and relational database technology for modeling and storing biological sequence data, i.e. for data maintained in collections like EMBL or SWISS-PROT to better serve the needs for these application domains. For this reason we propose a two step approach. First, we model the structure (and therefore the meaning of the) data using an Entity-Relationship approach. The ER model leads to a clean design of a relational database schema for storing and retrieving the DNA and protein data extracted from various sources. Our approach provides the clean basis for building complex biological applications that are more amenable to changes and software ports than their file-base counterparts.
Coccolithophores have influenced the global climate for over 200 million years(1). These marine phytoplankton can account for 20 per cent of total carbon fixation in some systems(2). They form blooms that can occupy hundreds of thousands of square kilometres and are distinguished by their elegantly sculpted calcium carbonate exoskeletons (coccoliths), rendering them visible from space(3). Although coccolithophores export carbon in the form of organic matter and calcite to the sea floor, they also release CO2 in the calcification process. Hence, they have a complex influence on the carbon cycle, driving either CO2 production or uptake, sequestration and export to the deep ocean(4). Here we report the first haptophyte reference genome, from the coccolithophore Emiliania huxleyi strain CCMP1516, and sequences from 13 additional isolates. Our analyses reveal a pan genome (core genes plus genes distributed variably between strains) probably supported by an atypical complement of repetitive sequence in the genome. Comparisons across strains demonstrate that E. huxleyi, which has long been considered a single species, harbours extensive genome variability reflected in different metabolic repertoires. Genome variability within this species complex seems to underpin its capacity both to thrive in habitats ranging from the equator to the subarctic and to form large-scale episodic blooms under a wide variety of environmental conditions.