SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Computational phylogenetics

180

Figures of phylogenetic trees are widely used to illustrate the result of evolutionary analyses. However, one cannot easily extract a machine-readable representation from such images. Therefore, new software emerges that helps to preserve phylogenies digitally for future research.

Concepts: Horizontal gene transfer, Biology, Cladistics, Species, Computational phylogenetics, Evolution, Phylogenetics, Phylogenetic tree

177

The Ginglymodi is one of the most common, though poorly understood groups of neopterygians, which includes gars, macrosemiiforms, and “semionotiforms.” In particular, the phylogenetic relationships between the widely distributed “semionotiforms,” and between them and other ginglymodians have been enigmatic. Here, the phylogenetic relationships between eight of the 11 “semionotiform” genera, five genera of living and fossil gars and three macrosemiid genera, are analysed through cladistic analysis, based on 90 morphological characters and 37 taxa, including 7 out-group taxa. The results of the analysis show that the Ginglymodi includes two main lineages: Lepisosteiformes and †Semionotiformes. The genera †Pliodetes, †Araripelepidotes, †Lepidotes, †Scheenstia, and †Isanichthys are lepisosteiforms, and not semionotiforms, as previously thought, and these taxa extend the stratigraphic range of the lineage leading to gars back up to the Early Jurassic. A monophyletic †Lepidotes is restricted to the Early Jurassic species, whereas the strongly tritoral species previously referred to †Lepidotes are referred to †Scheenstia. Other species previously referred to †Lepidotes represent other genera or new taxa. The macrosemiids are well nested within semionotiforms, together with †Semionotidae, here restricted to †Semionotus, and a new family including †Callipurbeckia n. gen. minor (previously referred to †Lepidotes), †Macrosemimimus, †Tlayuamichin, †Paralepidotus, and †Semiolepis. Due to the numerous taxonomic changes needed according to the phylogenetic analysis, this article also includes formal taxonomic definitions and diagnoses for all generic and higher taxa, which are new or modified. The study of Mesozoic ginglymodians led to confirm Patterson’s observation that these fishes show morphological affinities with both halecomorphs and teleosts. Therefore, the compilation of large data sets including the Mesozoic ginglymodians and the re-evaluation of several hypotheses of homology are essential to test the hypotheses of the Halecostomi vs. the Holostei, which is one of the major topics in the evolution of Mesozoic vertebrates and the origin of modern fish faunas.

Concepts: Evolution, Species, Clade, Computational phylogenetics, Phylogenetic tree, Actinopterygii, Cladistics, Phylogenetics

174

BACKGROUND: Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great “Tree of Life” (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user’s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS: With the aim of building such a “phylotastic” system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (www.phylotastic.org), and a server image. CONCLUSIONS: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.

Concepts: Computational phylogenetics, Semantic Web, Phylogenetic comparative methods, Phylogenetics, Phylogenetic tree, Phylogenetic nomenclature, Cladistics

171

SUMMARY: Two methods to add unaligned sequences into an existing multiple sequence alignment have been implemented as the “–add” and “–addfragments” options in the MAFFT package. The former option is a basic one and applicable only to full-length sequences, while the latter option is applicable even when the unaligned sequences are short and fragmentary. These methods internally infer the phylogenetic relationship among the sequences in the existing alignment, as well as the phylogenetic positions of unaligned sequences. Benchmarks based on two independent simulations consistently suggest that the “–addfragments” option outperforms recent methods, PaPaRa and PAGAN, in accuracy for difficult problems and that these three methods appropriately handle easy problems. AVAILABILITY: http://mafft.cbrc.jp/alignment/software/ CONTACT: katoh@ifrec.osaka-u.ac.jp SUPPLEMENTARY INFORMATION: Available at Bioinformatics online.

Concepts: Sequence, Multiple sequence alignment, DNA, Sequence alignment, Clustal, Phylogenetic tree, Bioinformatics, Computational phylogenetics

167

With the advent of high-throughput sequencing technologies, the rapid generation and accumulation of large amounts of sequencing data pose an insurmountable demand for efficient algorithms for constructing whole-genome phylogenies. The existing phylogenomic methods all use assembled sequences, which are often not available owing to the difficulty of assembling short-reads; this obstructs phylogenetic investigations on species without a reference genome. In this report, we present co-phylog, an assembly-free phylogenomic approach that creates a ‘micro-alignment’ at each ‘object’ in the sequence using the ‘context’ of the object and calculates pairwise distances before reconstructing the phylogenetic tree based on those distances. We explored the parameters' usages and the optimal working range of co-phylog, assessed co-phylog using the simulated next-generation sequencing (NGS) data and the real NGS raw data. We also compared co-phylog method with traditional alignment and alignment-free methods and illustrated the advantages and limitations of co-phylog method. In conclusion, we demonstrated that co-phylog is efficient algorithm and that it delivers high resolution and accurate phylogenies using whole-genome unassembled sequencing data, especially in the case of closely related organisms, thereby significantly alleviating the computational burden in the genomic era.

Concepts: Algorithm, Organism, Evolution, Phylogenetic tree, Biology, Computational phylogenetics, Species, Phylogenetics

140

Vulvovaginal candidiasis (VVC) is an important problem due to Candida spp. The aim of this study was molecular identification, phylogenetic analysis, and evaluation of antifungal susceptibility of non-albicans Candida isolates from VVC.

Concepts: Biology, Candida, Computational phylogenetics, Candidiasis

113

A new small-bodied ornithopod dinosaur, Diluvicursor pickeringi, gen. et sp. nov., is named from the lower Albian of the Eumeralla Formation in southeastern Australia and helps shed new light on the anatomy and diversity of Gondwanan ornithopods. Comprising an almost complete tail and partial lower right hindlimb, the holotype (NMV P221080) was deposited as a carcass or body-part in a log-filled scour near the base of a deep, high-energy river that incised a faunally rich, substantially forested riverine floodplain within the Australian-Antarctic rift graben. The deposit is termed the ‘Eric the Red West Sandstone.’ The holotype, interpreted as an older juvenile ∼1.2 m in total length, appears to have endured antemortem trauma to the pes. A referred, isolated posterior caudal vertebra (NMV P229456) from the holotype locality, suggests D. pickeringi grew to at least 2.3 m in length. D. pickeringi is characterised by 10 potential autapomorphies, among which dorsoventrally low neural arches and transversely broad caudal ribs on the anterior-most caudal vertebrae are a visually defining combination of features. These features suggest D. pickeringi had robust anterior caudal musculature and strong locomotor abilities. Another isolated anterior caudal vertebra (NMV P228342) from the same deposit, suggests that the fossil assemblage hosts at least two ornithopod taxa. D. pickeringi and two stratigraphically younger, indeterminate Eumeralla Formation ornithopods from Dinosaur Cove, NMV P185992/P185993 and NMV P186047, are closely related. However, the tail of D. pickeringi is far shorter than that of NMV P185992/P185993 and its pes more robust than that of NMV P186047. Preliminary cladistic analysis, utilising three existing datasets, failed to resolve D. pickeringi beyond a large polytomy of Ornithopoda. However, qualitative assessment of shared anatomical features suggest that the Eumeralla Formation ornithopods, South American Anabisetia saldiviai and Gasparinisaura cincosaltensis, Afro-Laurasian dryosaurids and possibly Antarctic Morrosaurus antarcticus share a close phylogenetic progenitor. Future phylogenetic analysis with improved data on Australian ornithopods will help to test these suggested affinities.

Concepts: Anabisetia, Computational phylogenetics, Cretaceous, Cladistics, Phylogenetics, Dinosaur, Vertebra, Ornithopod

97

Genomic data is increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom - a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by colouring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch colouring approach can incorporate a variable number of unique colours to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo.

Concepts: Horizontal gene transfer, Computational phylogenetics, Cladistics, Phylogenetics, Phylogenetic tree, Evolution, Infectious disease, Epidemiology

77

We compared 31 complete and nearly complete globally derived HSV-1 genomic sequences using HSV-2 HG52 as an outgroup to investigate their phylogenetic relationships and look for evidence of recombination. The sequences were retrieved from NCBI and were then aligned using Clustal W. The generation of a maximum likelihood tree resulted in a six clade structure that corresponded with the timing and routes of past human migration. The East African derived viruses contained the greatest amount of genetic diversity and formed four of the six clades. The East Asian and European/North American derived viruses formed separate clades. HSV-1 strains E07, E22 and E03 were highly divergent and may each represent an individual clade. Possible recombination was analyzed by partitioning the alignment into 5 kb segments, performing individual phylogenetic analysis on each partition and generating a.phylogenetic network from the results. However most evidence for recombination spread at the base of the tree suggesting that recombination did not significantly disrupt the clade structure. Examination of previous estimates of HSV-1 mutation rates in conjunction with the phylogenetic data presented here, suggests that the substitution rate for HSV-1 is approximately 1.38×10(-7) subs/site/year. In conclusion, this study expands the previously described HSV-1 three clade phylogenetic structures to a minimum of six and shows that the clade structure also mirrors global human migrations. Given that HSV-1 has co-evolved with its host, sequencing HSV-1 isolated from various populations could serve as a surrogate biomarker to study human population structure and migration patterns.

Concepts: Virus, Cladistics, Globalization, Phylogenetic nomenclature, Computational phylogenetics, Population, Phylogenetics, Human migration

42

The pterosaurs were a diverse group of Mesozoic flying reptiles that underwent a body plan reorganization, adaptive radiation, and replacement of earlier forms midway through their long history, resulting in the origin of the Pterodactyloidea, a highly specialized clade containing the largest flying organisms. The sudden appearance and large suite of morphological features of this group were suggested to be the result of it originating in terrestrial environments, where the pterosaur fossil record has traditionally been poor [1, 2], and its many features suggested to be adaptations to those environments [1, 2]. However, little evidence has been available to test this hypothesis, and it has not been supported by previous phylogenies or early pterodactyloid discoveries. We report here the earliest pterosaur with the diagnostic elongate metacarpus of the Pterodactyloidea, Kryptodrakon progenitor, gen. et sp. nov., from the terrestrial Middle-Upper Jurassic boundary of Northwest China. Phylogenetic analysis confirms this species as the basalmost pterodactyloid and reconstructs a terrestrial origin and a predominantly terrestrial history for the Pterodactyloidea. Phylogenetic comparative methods support this reconstruction by means of a significant correlation between wing shape and environment also found in modern flying vertebrates, indicating that pterosaurs lived in or were at least adapted to the environments in which they were preserved.

Concepts: Computational phylogenetics, Cladistics, Reptile, Pterodactyloidea, Phylogenetic comparative methods, Evolution, Pterosaur, Phylogenetics