Journal: Molecular biology and evolution
Wild mammalian species, including bats, constitute the natural reservoir of Betacoronavirus (including SARS, MERS, and the deadly SARS-CoV-2). Different hosts or host tissues provide different cellular environments, especially different antiviral and RNA modification activities that can alter RNA modification signatures observed in the viral RNA genome. The zinc finger antiviral protein (ZAP) binds specifically to CpG dinucleotides and recruits other proteins to degrade a variety of viral RNA genomes. Many mammalian RNA viruses have evolved CpG deficiency. Increasing CpG dinucleotides in these low-CpG viral genomes in the presence of ZAP consistently leads to decreased viral replication and virulence. Because ZAP exhibits tissue-specific expression, viruses infecting different tissues are expected to have different CpG signatures, suggesting a means to identify viral tissue-switching events. I show that SARS-CoV-2 has the most extreme CpG deficiency in all known Betacoronavirus genomes. This suggests that SARS-CoV-2 may have evolved in a new host (or new host tissue) with high ZAP expression. A survey of CpG deficiency in viral genomes identified a virulent canine coronavirus (Alphacoronavirus) as possessing the most extreme CpG deficiency, comparable to that observed in SARS-CoV-2. This suggests that the canine tissue infected by the canine coronavirus may provide a cellular environment strongly selecting against CpG. Thus, viral surveys focused on decreasing CpG in viral RNA genomes may provide important clues about the selective environments and viral defenses in the original hosts.
Long chain polyunsaturated fatty acids (LCPUFA) are bioactive components of membrane phospholipids and serve as substrates for signaling molecules. LCPUFA can be obtained directly from animal foods or synthesized endogenously from 18 carbon precursors via the FADS2 coded enzyme. Vegans rely almost exclusively on endogenous synthesis to generate LCPUFA and we hypothesized that an adaptive genetic polymorphism would confer advantage. The rs66698963 polymorphism, a 22-bp insertion-deletion within FADS2, is associated with basal FADS1 expression, and coordinated induction of FADS1 and FADS2 in vitro. Here, we determined rs66698963 genotype frequencies from 234 individuals of a primarily vegetarian Indian population and 311 individuals from the US. A much higher I/I genotype frequency was found in Indians (68%) than in the US (18%). Analysis using 1000 Genomes Project data confirmed our observation, revealing a global I/I genotype of 70% in South Asians, 53% in Africans, 29% in East Asians, and 17% in Europeans. Tests based on population divergence, site frequency spectrum, and long-range haplotype consistently point to positive selection encompassing rs66698963 in South Asian, African, and some East Asian populations. Basal plasma phospholipid arachidonic acid (ARA) status was 8% greater in I/I compared with D/D individuals. The biochemical pathway product-precursor difference, ARA minus linoleic acid, was 31% and 13% greater for I/I and I/D compared with D/D, respectively. This study is consistent with previous in vitro data suggesting that the insertion allele enhances n-6 LCPUFA synthesis and may confer an adaptive advantage in South Asians because of the traditional plant-based diet practice.
The sea slug Elysia chlorotica offers a unique opportunity to study the evolution of a novel function (photosynthesis) in a complex multicellular host. E. chlorotica harvests plastids (absent of nuclei) from its heterokont algal prey, Vaucheria litorea. The ‘stolen’ plastids are maintained for several months in cells of the digestive tract and are essential for animal development. The basis of long-term maintenance of photosynthesis in this sea slug was thought to be explained by extensive horizontal gene transfer (HGT) from the nucleus of the alga to the animal nucleus, followed by expression of algal genes in the gut to provide essential plastid-destined proteins. Early studies of target genes and proteins supported the HGT hypothesis, but more recent genome-wide data provide conflicting results. Here we generated significant genome data from the E. chlorotica germ line (egg DNA) and from V. litorea to test the HGT hypothesis. Our comprehensive analyses fail to provide evidence for alga-derived HGT into the germ line of the sea slug. PCR analyses of genomic DNA and cDNA from different individual E. chlorotica suggest however that algal nuclear genes (or gene fragments) are present in the adult slug. We suggest these nucleic acids may derive from and/or reside in extra-chromosomal DNAs that are made available to the animal through contact with the alga. These data resolve a long-standing issue and suggest that HGT is not the primary reason underlying long-term maintenance of photosynthesis in E. chlorotica. Therefore, sea slug photosynthesis is sustained in as yet unexplained ways that do not appear to endanger the animal germ line through the introduction of dozens of foreign genes.
Black widow spiders (members of the genus Latrodectus) are widely feared because of their potent neurotoxic venom. α-Latrotoxin is the vertebrate-specific toxin responsible for the dramatic effects of black widow envenomation. The evolution of this toxin is enigmatic because only two α-latrotoxin sequences are known. In this study, ~4 kb α-latrotoxin sequences and their homologs were characterized from a diversity of Latrodectus species, and representatives of Steatoda and Parasteatoda, establishing the wide distribution of latrotoxins across the mega-diverse spider family Theridiidae. Across black widow species, α-latrotoxin shows ≥ 94% nucleotide identity and variability consistent with purifying selection. Multiple codon and branch-specific estimates of the nonsynonymous/ synonymous substitution rate ratio also suggest a long history of purifying selection has acted on α-latrotoxin across Latrodectus and Steatoda. However, α-latrotoxin is highly divergent in amino acid sequence between these genera, with 68.7% of protein differences involving non-conservative substitutions, evidence for positive selection on its physiochemical properties and particular codons, and an elevated rate of nonsynonymous substitutions along α-latrotoxin’s Latrodectus branch. Such variation likely explains the efficacy of red-back spider, L. hasselti, antivenom in treating bites from other Latrodectus species, and the weaker neurotoxic symptoms associated with Steatoda and Parasteatoda bites. Long-term purifying selection on α-latrotoxin indicates its functional importance in black widow venom, even though vertebrates are a small fraction of their diet. The greater differences between Latrodectus and Steatoda α-latrotoxin, and their relationships to invertebrate-specific latrotoxins, suggest a shift in α-latrotoxin towards increased vertebrate toxicity coincident with the evolution of widow spiders.
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.
Principal component (PC) maps, which plot the values of a given PC estimated on the basis of allele frequency variation at the geographic sampling locations of a set of populations, are often used to investigate the properties of past range expansions. Some studies have argued that in a range expansion, the axis of greatest variation (i.e., the first PC) is parallel to the axis of expansion. In contrast, others have identified a pattern in which the axis of greatest variation is perpendicular to the axis of expansion. Here, we seek to understand this difference in outcomes by investigating the effect of the geographic sampling scheme on the direction of the axis of greatest variation under a two-dimensional range expansion model. From datasets simulated using each of two different schemes for the geographic sampling of populations under the model, we create PC maps for the first PC. We find that depending on the geographic sampling scheme, the axis of greatest variation can be either parallel or perpendicular to the axis of expansion. We provide an explanation for this result in terms of intra- and inter-population coalescence times.
Recombination suppression leads to the structural and functional differentiation of sex chromosomes, and is thus a crucial step in the process of sex chromosome evolution. Despite extensive theoretical work, the exact processes and mechanisms of recombination suppression and differentiation are not well understood. In threespine sticklebacks (Gasterosteus aculeatus), a different sex chromosome system has recently evolved by a fusion between the Y chromosome and an autosome in the Japan Sea lineage, which diverged from the ancestor of other lineages about two million years ago. We investigated the evolutionary dynamics and differentiation processes of sex chromosomes based on comparative analyses of these divergent lineages using 63 microsatellite loci. Both chromosome-wide differentiation patterns and phylogenetic inferences with X and Y alleles indicated that the ancestral sex chromosomes were extensively differentiated before the divergence of these lineages. In contrast, genetic differentiation appeared to have proceeded only in a small region of the neo-sex chromosomes. The recombination maps constructed for the Japan Sea lineage indicated that recombination has been suppressed or reduced over a large region spanning the ancestral and neo-sex chromosomes. Chromosomal regions exhibiting genetic differentiation and suppressed or reduced recombination were detected continuously and sequentially in the neo-sex chromosomes, suggesting that differentiation has gradually spread from the fusion point following the extension of recombination suppression. Our study illustrates an ongoing process of sex chromosome differentiation, providing empirical support for the theoretical model postulating that recombination suppression and differentiation proceed in a gradual manner in the very early stage of sex chromosome evolution.
To identify the evolutionary genetic novelties that contributed to shape human-specific traits such as the use of a complex language, long term planning and exceptional learning abilities is one of the ultimate frontiers of modern biology. Evolutionary signatures of functional shifts could be detected by comparing non-coding regions that are highly conserved across mammals or primates and rapidly accumulated nucleotide substitutions only in the lineage leading to humans. Since gene loci densely populated with human accelerated elements (HAEs) are more likely to have contributed to human-specific novelties we sought to identify the transcriptional units and genomic 1 Mb intervals of the entire human genome carrying the highest number of HAEs. To this end we took advantage of four available datasets of human genomic accelerated regions obtained through different comparisons and algorithms and performed a meta-analysis of the combined data. We found that the brain developmental transcription factor NPAS3 contains the largest cluster of non-coding accelerated regions in the human genome with up to 14 elements that are highly conserved in mammals, including primates, but carry human-specific nucleotide substitutions. We then tested the ability of the 14 HAEs identified at the NPAS3 locus to act as transcriptional regulatory sequences in a reporter expression assay performed in transgenic zebrafish. We found that 11 out of the 14 HAEs present in NPAS3 act as transcriptional enhancers during development, particularly within the nervous system. Since NPAS3 is known to play a crucial role during mammalian brain development, our results indicate that the high density of HAEs present in the human NPAS3 locus could have modified the spatio-temporal expression pattern of NPAS3 in the developing human brain and, therefore, contributed to human brain evolution.
Although Siberia was inhabited by modern humans at an early stage, there is still debate over whether it remained habitable during the extreme cold of the Last Glacial Maximum or whether it was subsequently repopulated by peoples with recent shared ancestry. Previous studies of the genetic history of Siberian populations were hampered by the extensive admixture that appears to have taken place among these populations, since commonly used methods assume a tree-like population history and at most single admixture events. Here we analyze geogenetic maps and use other approaches to distinguish the effects of shared ancestry from prehistoric migrations and contact, and develop a new method based on the covariance of ancestry components, to investigate the potentially complex admixture history. We furthermore adapt a previously devised method of admixture dating for use with multiple events of gene flow, and apply these methods to whole-genome genotype data from over 500 individuals belonging to 20 different Siberian ethnolinguistic groups. The results of these analyses indicate that there have been multiple layers of admixture detectable in most of the Siberian populations, with considerable differences in the admixture histories of individual populations. Furthermore, most of the populations of Siberia included here, even those settled far to the north, appear to have a southern origin, with the northward expansions of different populations possibly being driven partly by the advent of pastoralism, especially reindeer domestication. These newly developed methods to analyse multiple admixture events should aid in the investigation of similarly complex population histories elsewhere.
The hormone progesterone is important for preparing the uterine lining for egg implantation and in maintaining the early stages of pregnancy. The gene encoding the progesterone receptor (PGR) carries introgressed Neandertal haplotypes with two non-synonymous substitutions and a mobile Alu element. They have reached nearly 20% frequency in non-Africans and have been associated with preterm birth. Here we show that whereas one of the missense substitutions appears fixed among Neandertals, the other substitution as well as the Alu insertion were polymorphic among Neandertals. We show that two Neandertal haplotypes carrying the PGR gene entered the modern human population and that present-day carriers of the Neandertal haplotypes express higher levels of the receptor. In a cohort of present-day Britons, these carriers have more siblings, fewer miscarriages and less bleeding during early pregnancy suggesting that it promotes fertility. This may explain the high frequency of the Neandertal progesterone receptor alleles in modern human populations.