TY - GEN A1 - Barlow, Axel A1 - Hartmann, Stefanie A1 - Gonzalez, Javier A1 - Hofreiter, Michael A1 - Paijmans, Johanna L. A. T1 - Consensify BT - a method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population clustering analyses, and to mislead tests of admixture using D statistics. We introduce Consensify, a method for generating pseudohaploid sequences, which controls for biases resulting from differential sequencing coverage while greatly reducing error rates. The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. For phylogenetic and population clustering analysis, we find that Consensify is less affected by artefacts than methods based on single read sampling. For D statistics, Consensify is more resistant to false positives and appears to be less affected by biases resulting from different laboratory protocols than other frequently used methods. Although Consensify is developed with palaeogenomic data in mind, it is applicable for any low to medium coverage short read datasets. We predict that Consensify will be a useful tool for future studies of palaeogenomes. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1033 KW - palaeogenomics KW - ancient DNA KW - sequencing error KW - error reduction KW - D statistics KW - bioinformatics Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-472521 SN - 1866-8372 IS - 1033 ER - TY - JOUR A1 - Westbury, Michael V. A1 - Baleka, Sina Isabelle A1 - Barlow, Axel A1 - Hartmann, Stefanie A1 - Paijmans, Johanna L. A. A1 - Kramarz, Alejandro A1 - Forasiepi, Analia M. A1 - Bond, Mariano A1 - Gelfo, Javier N. A1 - Reguero, Marcelo A. A1 - Lopez-Mendoza, Patricio A1 - Taglioretti, Matias A1 - Scaglia, Fernando A1 - Rinderknecht, Andres A1 - Jones, Washington A1 - Mena, Francisco A1 - Billet, Guillaume A1 - de Muizon, Christian A1 - Luis Aguilar, Jose A1 - MacPhee, Ross D. E. A1 - Hofreiter, Michael T1 - A mitogenomic timetree for Darwin’s enigmatic South American mammal Macrauchenia patachonica JF - Nature Communications N2 - The unusual mix of morphological traits displayed by extinct South American native ungulates (SANUs) confounded both Charles Darwin, who first discovered them, and Richard Owen, who tried to resolve their relationships. Here we report an almost complete mitochondrial genome for the litoptern Macrauchenia. Our dated phylogenetic tree places Macrauchenia as sister to Perissodactyla, but close to the radiation of major lineages within Laurasiatheria. This position is consistent with a divergence estimate of B66Ma (95% credibility interval, 56.64-77.83 Ma) obtained for the split between Macrauchenia and other Panperissodactyla. Combined with their morphological distinctiveness, this evidence supports the positioning of Litopterna (possibly in company with other SANU groups) as a separate order within Laurasiatheria. We also show that, when using strict criteria, extinct taxa marked by deep divergence times and a lack of close living relatives may still be amenable to palaeogenomic analysis through iterative mapping against more distant relatives. Y1 - 2017 U6 - https://doi.org/10.1038/ncomms15951 SN - 2041-1723 VL - 8 PB - Nature Publ. Group CY - London ER - TY - GEN A1 - Dennis, Alice B. A1 - Ballesteros, Gabriel I. A1 - Robin, Stéphanie A1 - Schrader, Lukas A1 - Bast, Jens A1 - Berghöfer, Jan A1 - Beukeboom, Leo W. A1 - Belghazi, Maya A1 - Bretaudeau, Anthony A1 - Buellesbach, Jan A1 - Cash, Elizabeth A1 - Colinet, Dominique A1 - Dumas, Zoé A1 - Errbii, Mohammed A1 - Falabella, Patrizia A1 - Gatti, Jean-Luc A1 - Geuverink, Elzemiek A1 - Gibson, Joshua D. A1 - Hertaeg, Corinne A1 - Hartmann, Stefanie A1 - Jacquin-Joly, Emmanuelle A1 - Lammers, Mark A1 - Lavandero, Blas I. A1 - Lindenbaum, Ina A1 - Massardier-Galata, Lauriane A1 - Meslin, Camille A1 - Montagné, Nicolas A1 - Pak, Nina A1 - Poirié, Marylène A1 - Salvia, Rosanna A1 - Smith, Chris R. A1 - Tagu, Denis A1 - Tares, Sophie A1 - Vogel, Heiko A1 - Schwander, Tanja A1 - Simon, Jean-Christophe A1 - Figueroa, Christian C. A1 - Vorburger, Christoph A1 - Legeai, Fabrice A1 - Gadau, Jürgen T1 - Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Background Parasitoid wasps have fascinating life cycles and play an important role in trophic networks, yet little is known about their genome content and function. Parasitoids that infect aphids are an important group with the potential for biological control. Their success depends on adapting to develop inside aphids and overcoming both host aphid defenses and their protective endosymbionts. Results We present the de novo genome assemblies, detailed annotation, and comparative analysis of two closely related parasitoid wasps that target pest aphids: Aphidius ervi and Lysiphlebus fabarum (Hymenoptera: Braconidae: Aphidiinae). The genomes are small (139 and 141 Mbp) and the most AT-rich reported thus far for any arthropod (GC content: 25.8 and 23.8%). This nucleotide bias is accompanied by skewed codon usage and is stronger in genes with adult-biased expression. AT-richness may be the consequence of reduced genome size, a near absence of DNA methylation, and energy efficiency. We identify missing desaturase genes, whose absence may underlie mimicry in the cuticular hydrocarbon profile of L. fabarum. We highlight key gene groups including those underlying venom composition, chemosensory perception, and sex determination, as well as potential losses in immune pathway genes. Conclusions These findings are of fundamental interest for insect evolution and biological control applications. They provide a strong foundation for further functional studies into coevolution between parasitoids and their hosts. Both genomes are available at https://bipaa.genouest.org. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 989 KW - Parasitoid wasp KW - Aphid host KW - Aphidius ervi KW - GC content KW - de novo genome assembly KW - DNA methylation loss KW - Chemosensory genes KW - Toll and Imd pathways KW - Venom proteins KW - Lysiphlebus fabarum Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-476129 SN - 1866-8372 IS - 989 ER - TY - GEN A1 - Paraskevopoulou, Sofia A1 - Dennis, Alice B. A1 - Weithoff, Guntram A1 - Hartmann, Stefanie A1 - Tiedemann, Ralph T1 - Within species expressed genetic variability and gene expression response to different temperatures in the rotifer Brachionus calyciflorus sensu stricto T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Genetic divergence is impacted by many factors, including phylogenetic history, gene flow, genetic drift, and divergent selection. Rotifers are an important component of aquatic ecosystems, and genetic variation is essential to their ongoing adaptive diversification and local adaptation. In addition to coding sequence divergence, variation in gene expression may relate to variable heat tolerance, and can impose ecological barriers within species. Temperature plays a significant role in aquatic ecosystems by affecting species abundance, spatio-temporal distribution, and habitat colonization. Recently described (formerly cryptic) species of the Brachionus calyciflorus complex exhibit different temperature tolerance both in natural and in laboratory studies, and show that B. calyciflorus sensu stricto (s.s.) is a thermotolerant species. Even within B. calyciflorus s.s., there is a tendency for further temperature specializations. Comparison of expressed genes allows us to assess the impact of stressors on both expression and sequence divergence among disparate populations within a single species. Here, we have used RNA-seq to explore expressed genetic diversity in B. calyciflorus s.s. in two mitochondrial DNA lineages with different phylogenetic histories and differences in thermotolerance. We identify a suite of candidate genes that may underlie local adaptation, with a particular focus on the response to sustained high or low temperatures. We do not find adaptive divergence in established candidate genes for thermal adaptation. Rather, we detect divergent selection among our two lineages in genes related to metabolism (lipid metabolism, metabolism of xenobiotics). T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 796 Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-441050 SN - 1866-8372 IS - 796 ER - TY - JOUR A1 - Autenrieth, Marijke A1 - Hartmann, Stefanie A1 - Lah, Ljerka A1 - Roos, Anna A1 - Dennis, Alice B. A1 - Tiedemann, Ralph T1 - High-quality whole-genome sequence of an abundant Holarctic odontocete, the harbour porpoise (Phocoena phocoena) JF - Molecular ecology resources N2 - The harbour porpoise (Phocoena phocoena) is a highly mobile cetacean found across the Northern hemisphere. It occurs in coastal waters and inhabits basins that vary broadly in salinity, temperature and food availability. These diverse habitats could drive subtle differentiation among populations, but examination of this would be best conducted with a robust reference genome. Here, we report the first harbour porpoise genome, assembled de novo from an individual originating in the Kattegat Sea (Sweden). The genome is one of the most complete cetacean genomes currently available, with a total size of 2.39 Gb and 50% of the total length found in just 34 scaffolds. Using 122 of the longest scaffolds, we were able to show high levels of synteny with the genome of the domestic cattle (Bos taurus). Our draft annotation comprises 22,154 predicted genes, which we further annotated through matches to the NCBI nucleotide database, GO categorization and motif prediction. Within the predicted genes, we have confirmed the presence of >20 genes or gene families that have been associated with adaptive evolution in other cetaceans. Overall, this genome assembly and draft annotation represent a crucial addition to the genomic resources currently available for the study of porpoises and Phocoenidae evolution, phylogeny and conservation. KW - cetaceans KW - genomics/proteomics KW - mammals KW - molecular evolution Y1 - 2018 U6 - https://doi.org/10.1111/1755-0998.12932 SN - 1755-098X SN - 1755-0998 VL - 18 IS - 6 SP - 1469 EP - 1481 PB - Wiley CY - Hoboken ER - TY - JOUR A1 - Paraskevopoulou, Sofia A1 - Dennis, Alice B. A1 - Weithoff, Guntram A1 - Hartmann, Stefanie A1 - Tiedemann, Ralph T1 - Within species expressed genetic variability and gene expression response to different temperatures in the rotifer Brachionus calyciflorus sensu stricto JF - PLoS ONE N2 - Genetic divergence is impacted by many factors, including phylogenetic history, gene flow, genetic drift, and divergent selection. Rotifers are an important component of aquatic ecosystems, and genetic variation is essential to their ongoing adaptive diversification and local adaptation. In addition to coding sequence divergence, variation in gene expression may relate to variable heat tolerance, and can impose ecological barriers within species. Temperature plays a significant role in aquatic ecosystems by affecting species abundance, spatio-temporal distribution, and habitat colonization. Recently described (formerly cryptic) species of the Brachionus calyciflorus complex exhibit different temperature tolerance both in natural and in laboratory studies, and show that B. calyciflorus sensu stricto (s.s.) is a thermotolerant species. Even within B. calyciflorus s.s., there is a tendency for further temperature specializations. Comparison of expressed genes allows us to assess the impact of stressors on both expression and sequence divergence among disparate populations within a single species. Here, we have used RNA-seq to explore expressed genetic diversity in B. calyciflorus s.s. in two mitochondrial DNA lineages with different phylogenetic histories and differences in thermotolerance. We identify a suite of candidate genes that may underlie local adaptation, with a particular focus on the response to sustained high or low temperatures. We do not find adaptive divergence in established candidate genes for thermal adaptation. Rather, we detect divergent selection among our two lineages in genes related to metabolism (lipid metabolism, metabolism of xenobiotics). Y1 - 2019 U6 - https://doi.org/10.1371/journal.pone.0223134 SN - 1932-6203 VL - 9 IS - 14 PB - PLoS ONE CY - San Francisco, California ER - TY - JOUR A1 - Hartmann, Stefanie A1 - Preick, Michaela A1 - Abelt, Silke A1 - Scheffel, André A1 - Hofreiter, Michael T1 - Annotated genome sequences of the carnivorous plant Roridula gorgonias and a non-carnivorous relative, Clethra arborea JF - BMC Research Notes N2 - Objective Plant carnivory is distributed across the tree of life and has evolved at least six times independently, but sequenced and annotated nuclear genomes of carnivorous plants are currently lacking. We have sequenced and structurally annotated the nuclear genome of the carnivorous Roridula gorgonias and that of a non-carnivorous relative, Madeira’s lily-of-the-valley-tree, Clethra arborea, both within the Ericales. This data adds an important resource to study the evolutionary genetics of plant carnivory across angiosperm lineages and also for functional and systematic aspects of plants within the Ericales. Results Our assemblies have total lengths of 284 Mbp (R. gorgonias) and 511 Mbp (C. arborea) and show high BUSCO scores of 84.2% and 89.5%, respectively. We used their predicted genes together with publicly available data from other Ericales’ genomes and transcriptomes to assemble a phylogenomic data set for the inference of a species tree. However, groups of orthologs showed a marked absence of species represented by a transcriptome. We discuss possible reasons and caution against combining predicted genes from genome- and transriptome-based assemblies. KW - Carnivorous plant KW - Roridula gorgonias KW - Clethra arborea KW - Genome assembly KW - Transcriptome assembly KW - Phylogenomics KW - Orthologous Matrix (OMA) Project Y1 - 2020 U6 - https://doi.org/10.1186/s13104-020-05254-4 SN - 1756-0500 VL - 13 PB - Biomed Central CY - London ER - TY - JOUR A1 - Barlow, Axel A1 - Hartmann, Stefanie A1 - Gonzalez, Javier A1 - Hofreiter, Michael A1 - Paijmans, Johanna L. A. T1 - Consensify BT - a method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates JF - Genes / Molecular Diversity Preservation International N2 - A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population clustering analyses, and to mislead tests of admixture using D statistics. We introduce Consensify, a method for generating pseudohaploid sequences, which controls for biases resulting from differential sequencing coverage while greatly reducing error rates. The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. For phylogenetic and population clustering analysis, we find that Consensify is less affected by artefacts than methods based on single read sampling. For D statistics, Consensify is more resistant to false positives and appears to be less affected by biases resulting from different laboratory protocols than other frequently used methods. Although Consensify is developed with palaeogenomic data in mind, it is applicable for any low to medium coverage short read datasets. We predict that Consensify will be a useful tool for future studies of palaeogenomes. KW - palaeogenomics KW - ancient DNA KW - sequencing error KW - error reduction KW - D statistics KW - bioinformatics Y1 - 2020 U6 - https://doi.org/10.3390/genes11010050 SN - 2073-4425 VL - 11 IS - 1 PB - MDPI CY - Basel ER - TY - GEN A1 - Zulawski, Monika A1 - Schulze, Gunnar A1 - Braginets, Rostyslav A1 - Hartmann, Stefanie A1 - Schulze, Waltraud X T1 - The Arabidopsis Kinome BT - phylogeny and evolutionary insights into functional diversification T2 - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe N2 - Background Protein kinases constitute a particularly large protein family in Arabidopsis with important functions in cellular signal transduction networks. At the same time Arabidopsis is a model plant with high frequencies of gene duplications. Here, we have conducted a systematic analysis of the Arabidopsis kinase complement, the kinome, with particular focus on gene duplication events. We matched Arabidopsis proteins to a Hidden-Markov Model of eukaryotic kinases and computed a phylogeny of 942 Arabidopsis protein kinase domains and mapped their origin by gene duplication. Results The phylogeny showed two major clades of receptor kinases and soluble kinases, each of which was divided into functional subclades. Based on this phylogeny, association of yet uncharacterized kinases to families was possible which extended functional annotation of unknowns. Classification of gene duplications within these protein kinases revealed that representatives of cytosolic subfamilies showed a tendency to maintain segmentally duplicated genes, while some subfamilies of the receptor kinases were enriched for tandem duplicates. Although functional diversification is observed throughout most subfamilies, some instances of functional conservation among genes transposed from the same ancestor were observed. In general, a significant enrichment of essential genes was found among genes encoding for protein kinases. Conclusions The inferred phylogeny allowed classification and annotation of yet uncharacterized kinases. The prediction and analysis of syntenic blocks and duplication events within gene families of interest can be used to link functional biology to insights from an evolutionary viewpoint. The approach undertaken here can be applied to any gene family in any organism with an annotated genome. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 861 KW - Hide Markov Model KW - Duplication Event KW - Kinase Family KW - Tandem Duplication KW - Segmental Duplication Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-432907 SN - 1866-8372 IS - 861 ER - TY - GEN A1 - Hartmann, Stefanie A1 - Vision, Todd J. T1 - Using ESTs for phylogenomics BT - can one accurately infer a phylogenetic tree from a gappy alignment? T2 - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe N2 - Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a major problem for phylogenetic analysis. The concern will be greatest for high-throughput phylogenomic analyses, in which Neighbor Joining is often the preferred method due to its computational efficiency. Both approaches can be used to increase the accuracy of phylogenetic inference from a gappy alignment. The choice between the two approaches will depend upon how robust the application is to the loss of sequences from the input set, with alignment masking generally giving a much greater improvement in accuracy but at the cost of discarding a larger number of the input sequences. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 889 KW - Maximum Parsimony KW - pairwise distance KW - phylogenetic inference KW - alignment error KW - Maximum Parsimony tree Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-436670 SN - 1866-8372 IS - 889 ER -