TY - JOUR A1 - Cheng, Fuxia A1 - Hartmann, Stefanie A1 - Gupta, Mayetri A1 - Ibrahim, Joseph G. A1 - Vision, Todd J. T1 - A hierarchical model for incomplete alignments in phylogenetic inference N2 - Motivation: Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies. Results: We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a pro. le likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family. Y1 - 2009 UR - http://bioinformatics.oxfordjournals.org/ U6 - https://doi.org/10.1093/bioinformatics/btp015 SN - 1367-4803 ER - TY - JOUR A1 - Schröder, Christiane A1 - Bleidorn, Christoph A1 - Hartmann, Stefanie A1 - Tiedemann, Ralph T1 - Occurrence of Can-SINEs and intron sequence evolution supports robust phylogeny of pinniped carnivores and their terrestrial relatives N2 - Investigating the dog genome we found 178965 introns with a moderate length of 200-1000 bp. A screening of these sequences against 23 different repeat libraries to find insertions of short interspersed elements (SINEs) detected 45276 SINEs. Virtually all of these SINEs (98%) belong to the tRNA-derived Can-SINE family. Can-SINEs arose about 55 million years ago before Carnivora split into two basal groups, the Caniformia (doglike carnivores) and the Feliformia (cat-like carnivores). Genome comparisons of dog and cat recovered 506 putatively informative SINE loci for caniformian phylogeny. In this study we show how to use such genome information of model organisms to research the phylogeny of related non-model species of interest. Investigating a dataset including representatives of all major caniformian lineages, we analysed 24 randomly chosen loci for 22 taxa. All loci were amplifiable and revealed 17 parsimony- informative SINE insertions. The screening for informative SINE insertions yields a large amount of sequence information, in particular of introns, which contain reliable phylogenetic information as well. A phylogenetic analysis of intron- and SINE sequence data provided a statistically robust phylogeny which is congruent with the absence/presence pattern of our SINE markers. This phylogeny strongly supports a sistergroup relationship of Musteloidea and Pinnipedia. Within Pinnipedia, we see strong support from bootstrapping and the presence of a SINE insertion for a sistergroup relationship of the walrus with the Otariidae. Y1 - 2009 UR - http://www.sciencedirect.com/science/journal/03781119 U6 - https://doi.org/10.1016/j.gene.2009.06.012 SN - 0378-1119 ER - TY - JOUR A1 - Struck, Torsten H. A1 - Paul, Christiane A1 - Hill, Natascha A1 - Hartmann, Stefanie A1 - Hoesel, Christoph A1 - Kube, Michael A1 - Lieb, Bernhard A1 - Meyer, Achim A1 - Tiedemann, Ralph A1 - Purschke, Guenter A1 - Bleidorn, Christoph T1 - Phylogenomic analyses unravel annelid evolution JF - Nature : the international weekly journal of science N2 - Annelida, the ringed worms, is a highly diverse animal phylum that includes more than 15,000 described species and constitutes the dominant benthic macrofauna from the intertidal zone down to the deep sea. A robust annelid phylogeny would shape our understanding of animal body-plan evolution and shed light on the bilaterian ground pattern. Traditionally, Annelida has been split into two major groups: Clitellata (earthworms and leeches) and polychaetes (bristle worms), but recent evidence suggests that other taxa that were once considered to be separate phyla (Sipuncula, Echiura and Siboglinidae (also known as Pogonophora)) should be included in Annelida(1-4). However, the deep-level evolutionary relationships of Annelida are still poorly understood, and a robust reconstruction of annelid evolutionary history is needed. Here we show that phylogenomic analyses of 34 annelid taxa, using 47,953 amino acid positions, recovered a well-supported phylogeny with strong support for major splits. Our results recover chaetopterids, myzostomids and sipunculids in the basal part of the tree, although the position of Myzostomida remains uncertain owing to its long branch. The remaining taxa are split into two clades: Errantia (which includes the model annelid Platynereis), and Sedentaria (which includes Clitellata). Ancestral character trait reconstructions indicate that these clades show adaptation to either an errant or a sedentary lifestyle, with alteration of accompanying morphological traits such as peristaltic movement, parapodia and sensory perception. Finally, life history characters in Annelida seem to be phylogenetically informative. Y1 - 2011 U6 - https://doi.org/10.1038/nature09864 SN - 0028-0836 VL - 471 IS - 7336 SP - 95 EP - U113 PB - Nature Publ. Group CY - London ER - TY - JOUR A1 - Burleigh, J. Gordon A1 - Bansal, Mukul S. A1 - Eulenstein, Oliver A1 - Hartmann, Stefanie A1 - Wehe, Andre A1 - Vision, Todd J. T1 - Genome-Scale Phylogenetics inferring the plant tree of life from 18,896 gene trees JF - Systematic biology N2 - Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life. KW - Gene tree-species tree reconciliation KW - gene tree parsimony KW - plant phylogeny KW - phylogenomics Y1 - 2011 U6 - https://doi.org/10.1093/sysbio/syq072 SN - 1063-5157 VL - 60 IS - 2 SP - 117 EP - 125 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Bonizzoni, Mariangela A1 - Bourjea, Jerome A1 - Chen, Bin A1 - Crain, B. J. A1 - Cui, Liwang A1 - Fiorentino, V. A1 - Hartmann, Stefanie A1 - Hendricks, S. A1 - Ketmaier, Valerio A1 - Ma, Xiaoguang A1 - Muths, Delphine A1 - Pavesi, Laura A1 - Pfautsch, Simone A1 - Rieger, M. A. A1 - Santonastaso, T. A1 - Sattabongkot, Jetsumon A1 - Taron, C. H. A1 - Taron, D. J. A1 - Tiedemann, Ralph A1 - Yan, Guiyun A1 - Zheng, Bin A1 - Zhong, Daibin T1 - Permanent genetic resources added to molecular ecology resources database 1 April 2011-31 May 2011 JF - Molecular ecology resources N2 - This article documents the addition of 92 microsatellite marker loci to the Molecular Ecology Resources Database. Loci were developed for the following species: Anopheles minimus, An. sinensis, An. dirus, Calephelis mutica, Lutjanus kasmira, Murella muralis and Orchestia montagui. These loci were cross-tested on the following species: Calephelis arizonensi, Calephelis borealis, Calephelis nemesis, Calephelis virginiensis and Lutjanus bengalensis. Y1 - 2011 U6 - https://doi.org/10.1111/j.1755-0998.2011.03046.x SN - 1755-098X VL - 11 IS - 5 SP - 935 EP - 936 PB - Wiley-Blackwell CY - Malden ER - TY - JOUR A1 - Hartmann, Stefanie A1 - Helm, Conrad A1 - Nickel, Birgit A1 - Meyer, Matthias A1 - Struck, Torsten H. A1 - Tiedemann, Ralph A1 - Selbig, Joachim A1 - Bleidorn, Christoph T1 - Exploiting gene families for phylogenomic analysis of myzostomid transcriptome data JF - PLoS one N2 - Background: In trying to understand the evolutionary relationships of organisms, the current flood of sequence data offers great opportunities, but also reveals new challenges with regard to data quality, the selection of data for subsequent analysis, and the automation of steps that were once done manually for single-gene analyses. Even though genome or transcriptome data is available for representatives of most bilaterian phyla, some enigmatic taxa still have an uncertain position in the animal tree of life. This is especially true for myzostomids, a group of symbiotic ( or parasitic) protostomes that are either placed with annelids or flatworms. Methodology: Based on similarity criteria, Illumina-based transcriptome sequences of one myzostomid were compared to protein sequences of one additional myzostomid and 29 reference metazoa and clustered into gene families. These families were then used to investigate the phylogenetic position of Myzostomida using different approaches: Alignments of 989 sequence families were concatenated, and the resulting superalignment was analyzed under a Maximum Likelihood criterion. We also used all 1,878 gene trees with at least one myzostomid sequence for a supertree approach: the individual gene trees were computed and then reconciled into a species tree using gene tree parsimony. Conclusions: Superalignments require strictly orthologous genes, and both the gene selection and the widely varying amount of data available for different taxa in our dataset may cause anomalous placements and low bootstrap support. In contrast, gene tree parsimony is designed to accommodate multilocus gene families and therefore allows a much more comprehensive data set to be analyzed. Results of this supertree approach showed a well-resolved phylogeny, in which myzostomids were part of the annelid radiation, and major bilaterian taxa were found to be monophyletic. Y1 - 2012 U6 - https://doi.org/10.1371/journal.pone.0029843 SN - 1932-6203 VL - 7 IS - 1 PB - PLoS CY - San Fransisco ER - TY - JOUR A1 - Bartel, Manuela A1 - Hartmann, Stefanie A1 - Lehmann, Karola A1 - Postel, Kai A1 - Quesada, Humberto A1 - Philipp, Eva E. R. A1 - Heilmann, Katja A1 - Micheel, Burkhard A1 - Stuckas, Heiko T1 - Identification of sperm proteins as candidate biomarkers for the analysis of reproductive isolation in Mytilus: a case study for the enkurin locus JF - Marine biology : international journal on life in oceans and coastal waters N2 - Sperm proteins of the marine sessile mussels of the Mytilus edulis species complex are models to investigate reproductive isolation and speciation. This study aimed at identifying sperm proteins and their corresponding genes. This was aided by the use of monoclonal antibodies that preferentially bind to yet unknown sperm molecules. By identifying their target molecules, this approach identified proteins with relevance to Mytilus sperm function. This procedure identified 16 proteins, for example, enkurin, laminin, porin and heat shock proteins. The potential use of these proteins as genetic markers to study reproductive isolation is exemplified by analysing the enkurin locus. Enkurin evolution is driven by purifying selection, the locus displays high levels of intraspecific variation and species-specific alleles group in distinct phylogenetic clusters. These findings characterize enkurin as informative candidate biomarker for analyses of clinal variation and differential introgression in hybrid zones, for example, to understand determinants of reproductive isolation in Baltic Mytilus populations. Y1 - 2012 U6 - https://doi.org/10.1007/s00227-012-2005-7 SN - 0025-3162 VL - 159 IS - 10 SP - 2195 EP - 2207 PB - Springer CY - New York ER - TY - JOUR A1 - Hill, Natascha A1 - Leow, Alexander A1 - Bleidorn, Christoph A1 - Groth, Detlef A1 - Tiedemann, Ralph A1 - Selbig, Joachim A1 - Hartmann, Stefanie T1 - Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information JF - Theory in biosciences N2 - Many deep evolutionary divergences still remain unresolved, such as those among major taxa of the Lophotrochozoa. As alternative phylogenetic markers, the intron-exon structure of eukaryotic genomes and the patterns of absence and presence of spliceosomal introns appear to be promising. However, given the potential homoplasy of intron presence, the phylogenetic analysis of this data using standard evolutionary approaches has remained a challenge. Here, we used Mutual Information (MI) to estimate the phylogeny of Protostomia using gene structure data, and we compared these results with those obtained with Dollo Parsimony. Using full genome sequences from nine Metazoa, we identified 447 groups of orthologous sequences with 21,732 introns in 4,870 unique intron positions. We determined the shared absence and presence of introns in the corresponding sequence alignments and have made this data available in "IntronBase", a web-accessible and downloadable SQLite database. Our results obtained using Dollo Parsimony are obviously misled through systematic errors that arise from multiple intron loss events, but extensive filtering of data improved the quality of the estimated phylogenies. Mutual Information, in contrast, performs better with larger datasets, but at the same time it requires a complete data set, which is difficult to obtain for orthologs from a large number of taxa. Nevertheless, Mutual Information-based distances proved to be useful in analyzing this kind of data, also because the estimation of MI-based distances is independent of evolutionary models and therefore no pre-definitions of ancestral and derived character states are necessary. KW - Mutual Information KW - Evolution KW - Gene structure Y1 - 2013 U6 - https://doi.org/10.1007/s12064-012-0173-0 SN - 1431-7613 VL - 132 IS - 2 SP - 93 EP - 104 PB - Springer CY - New York ER - TY - JOUR A1 - Schedina, Ina-Maria A1 - Pfautsch, Simone A1 - Hartmann, Stefanie A1 - Dolgener, N. A1 - Polgar, Anika A1 - Bianco, Pier Giorgio A1 - Tiedemann, Ralph A1 - Ketmaier, Valerio T1 - Isolation and characterization of eight microsatellite loci in the brook lamprey Lampetra planeri (Petromyzontiformes) using 454 sequence data JF - Journal of fish biology N2 - Eight polymorphic microsatellite loci were developed for the brook lamprey Lampetra planeri through 454 sequencing and their usefulness was tested in 45 individuals of both L. planeri and the river lamprey Lampetra fluviatilis. The number of alleles per loci ranged between two and five; the Italian and Irish populations had a mean expected heterozygosity of 0.388 and 0.424 and a mean observed heterozygosity of 0.418 and 0.411, respectively. (C) 2014 The Fisheries Society of the British Isles KW - conservation KW - population structure KW - species pair Y1 - 2014 U6 - https://doi.org/10.1111/jfb.12470 SN - 0022-1112 SN - 1095-8649 VL - 85 IS - 3 SP - 960 EP - 964 PB - Wiley-Blackwell CY - Hoboken ER - TY - JOUR A1 - Zulawski, Monika A1 - Schulze, Gunnar A1 - Braginets, Rostyslav A1 - Hartmann, Stefanie A1 - Schulze, Waltraud X. T1 - The Arabidopsis Kinome: phylogeny and evolutionary insights into functional diversification JF - BMC genomics N2 - Background: Protein kinases constitute a particularly large protein family in Arabidopsis with important functions in cellular signal transduction networks. At the same time Arabidopsis is a model plant with high frequencies of gene duplications. Here, we have conducted a systematic analysis of the Arabidopsis kinase complement, the kinome, with particular focus on gene duplication events. We matched Arabidopsis proteins to a Hidden-Markov Model of eukaryotic kinases and computed a phylogeny of 942 Arabidopsis protein kinase domains and mapped their origin by gene duplication. Results: The phylogeny showed two major clades of receptor kinases and soluble kinases, each of which was divided into functional subclades. Based on this phylogeny, association of yet uncharacterized kinases to families was possible which extended functional annotation of unknowns. Classification of gene duplications within these protein kinases revealed that representatives of cytosolic subfamilies showed a tendency to maintain segmentally duplicated genes, while some subfamilies of the receptor kinases were enriched for tandem duplicates. Although functional diversification is observed throughout most subfamilies, some instances of functional conservation among genes transposed from the same ancestor were observed. In general, a significant enrichment of essential genes was found among genes encoding for protein kinases. Conclusions: The inferred phylogeny allowed classification and annotation of yet uncharacterized kinases. The prediction and analysis of syntenic blocks and duplication events within gene families of interest can be used to link functional biology to insights from an evolutionary viewpoint. The approach undertaken here can be applied to any gene family in any organism with an annotated genome. Y1 - 2014 U6 - https://doi.org/10.1186/1471-2164-15-548 SN - 1471-2164 VL - 15 PB - BioMed Central CY - London ER -