• Treffer 12 von 34
Zurück zur Trefferliste

Using ESTs for phylogenomics

  • Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. TheBackground While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a major problem for phylogenetic analysis. The concern will be greatest for high-throughput phylogenomic analyses, in which Neighbor Joining is often the preferred method due to its computational efficiency. Both approaches can be used to increase the accuracy of phylogenetic inference from a gappy alignment. The choice between the two approaches will depend upon how robust the application is to the loss of sequences from the input set, with alignment masking generally giving a much greater improvement in accuracy but at the cost of discarding a larger number of the input sequences.zeige mehrzeige weniger

Volltext Dateien herunterladen

  • pmnr889.pdfeng
    (1345KB)

    SHA-1: 8cb619ace560efde20670aac4292cdfb1f1ca5c7

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Stefanie HartmannORCiDGND, Todd J. VisionORCiD
URN:urn:nbn:de:kobv:517-opus4-436670
DOI:https://doi.org/10.25932/publishup-43667
ISSN:1866-8372
Titel des übergeordneten Werks (Deutsch):Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe
Untertitel (Englisch):can one accurately infer a phylogenetic tree from a gappy alignment?
Schriftenreihe (Bandnummer):Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe (889)
Publikationstyp:Postprint
Sprache:Englisch
Datum der Erstveröffentlichung:21.04.2020
Erscheinungsjahr:2008
Veröffentlichende Institution:Universität Potsdam
Datum der Freischaltung:21.04.2020
Freies Schlagwort / Tag:Maximum Parsimony; Maximum Parsimony tree; alignment error; pairwise distance; phylogenetic inference
Ausgabe:889
Seitenanzahl:15
Quelle:BMC Evolutionary Biology 8 (2008) 95 DOI: 10.1186/1471-2148-8-95
Organisationseinheiten:Mathematisch-Naturwissenschaftliche Fakultät
DDC-Klassifikation:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Peer Review:Referiert
Publikationsweg:Open Access
Lizenz (Englisch):License LogoCreative Commons - Namensnennung 2.0 Generic
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.