Journal: Genome biology and evolution
A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 - 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The “Rhineland Hypothesis” depicts Eastern European Jews as a “population isolate” that emerged from a small group of German Jews who migrated eastward and expanded rapidly. Alternatively, the “Khazarian Hypothesis” suggests that Eastern European Jew descended from the Khazars, an amalgam of Turkic clans that settled the Caucasus in the early centuries CE and converted to Judaism in the 8(th) century. Mesopotamian and Greco-Roman Jews continuously reinforced the Judaized Empire until the 13(th) century. Following the collapse of their empire, the Judeo-Khazars fled to Eastern Europe. The rise of European Jewry is therefore explained by the contribution of the Judeo-Khazars. Thus far, however, the Khazar’s contribution has been estimated only empirically, as the absence of genome-wide data from Caucasus populations precluded testing the Khazarian Hypothesis. Recent sequencing of modern Caucasus populations prompted us to revisit the Khazarian Hypothesis and compare it with the Rhineland Hypothesis. We applied a wide range of population genetic analyses to compare these two hypotheses. Our findings support the Khazarian Hypothesis and portray the European Jewish genome as a mosaic of Caucasus, European, and Semitic ancestries, thereby consolidating previous contradictory reports of Jewish ancestry. We further describe major difference among Caucasus populations explained by early presence of Judeans in the Southern and Central Caucasus. Our results have important implications on the demographic forces that shaped the genetic diversity in the Caucasus and medical studies.
The Yiddish language is over one thousand years old and incorporates German, Slavic, and Hebrew elements. The prevalent view claims Yiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians, and Turks exhibit the highest genetic similarity with AJs. The Geographic Population Structure (GPS) analysis localized most AJs along major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from “Ashkenaz.” Iranian and mountain Jews were localized along trade routes on the Turkey’s eastern border. Loss of maternal haplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call “Ashkenazic” (i.e., “Scythian”), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.
Whole genome duplication has played an important role in plant evolution and diversification. Sugarcane is an important crop with a complex hybrid polyploid genome, for which the process of adaptation to polyploidy is still poorly understood. In order to improve our knowledge about sugarcane genome evolution and the homo/homeologous gene expression balance, we sequenced and analyzed 27 BACs (Bacterial Artificial Chromosome) of sugarcane R570 cultivar, containing the putative single-copy genes LFY (seven haplotypes), PHYC (four haplotypes) and TOR (seven haplotypes). Comparative genomic approaches showed that these sugarcane loci presented a high degree of conservation of gene content and collinearity (synteny) with sorghum and rice orthologous regions, but were invaded by transposable elements (TE). All the homo/homeologous haplotypes of LFY, PHYC and TOR are likely to be functional, since they are all under purifying selection (dN/dS ≪ 1). However, they were found to participate in a non-equivalently manner to the overall expression of the corresponding gene. SNPs, indels and amino acid substitutions allowed inferring the S. officinarum or S. spontaneum origin of the TOR haplotypes, which further led to the estimation that these two sugarcane ancestral species diverged between 2.5 to 3.5 million years ago. In addition, analysis of shared TE insertions in TOR haplotypes suggested that two autopolyploidization may have occurred in the lineage that gave rise to S. officinarum, after its divergence from S. spontaneum.
The germline definition in metazoans was first based on few bilaterian models. As a result, gene function interpretations were often based on phenotypes observed in those models and led to the definition of a set of genes, considered as specific of the germline, named the “germline core”. However, some of these genes were shown to also be involved in somatic stem cells, thus leading to the notion of germline multipotency program (GMP). Because Porifera and Ctenophora are currently the best candidates to be the sister-group to all other animals, the comparative analysis of gene contents and functions between these phyla, Cnidaria and Bilateria is expected to provide clues on early animal evolution and on the links between somatic and germ lineages. Our present bioinformatic analyses at the metazoan scale show that a set of 18 GMP genes was already present in the last common ancestor of metazoans and indicate more precisely the evolution of some of them in the animal lineage. The expression patterns and levels of 11 of these genes in the homoscleromorph sponge Oscarella lobularis show that they are expressed throughout their life cycle, in pluri/multipotent progenitors, during gametogenesis, embryogenesis and during wound healing. This new study in a non bilaterian species reinforces the hypothesis of an ancestral multipotency program.
In a previous analysis of the phylogenetic relationships of coelacanths, lungfishes and tetrapods, using cartilaginous fish as the outgroup, the sister relationship of lungfishes and tetrapods was constructed with high statistical support. However, using as the outgroup ray-finned fish, which are more taxonomically closely related to the three lineages than cartilaginous fish, the sister relationship of coelacanths and tetrapods was most often constructed depending on the methods and the data sets, but the statistical support was generally low except in the cases in which the data set including a small number of species was analyzed. In this study, instead of the fast evolving ray-finned fish, teleost fish, in the previous data sets, by using two slowly evolving ray-finned fish, gar and bowfin, as the outgroup, we showed that the sister relationship of lungfishes and tetrapods was reconstructed with high statistical support. In our analysis the evolutionary rates of gar and bowfin were similar to each other and one third to one half of teleost fish. The difference of the amino acid frequencies of the two species with other lineages were larger than those of teleost fish. This study provides a strong support for lungfishes as the closest relative of tetrapods and indicates the importance of using an appropriate outgroup with small divergence in phylogenetic construction.
Polar bears (Ursus maritimus) face extremely cold temperatures and periods of fasting, which might result in more severe energetic challenges than those experienced by their sister species, the brown bear (U. arctos). We have examined the mitochondrial and nuclear genomes of polar and brown bears to investigate if polar bears demonstrate lineage-specific signals of molecular adaptation in genes associated with cellular respiration/energy production. We observed increased evolutionary rates in the mitochondrial cytochrome c oxidase I gene in polar but not brown bears. An amino acid substitution occurred near the interaction site with a nuclear-encoded subunit of the cytochrome c oxidase complex, and was predicted to lead to a functional change, although the significance of this remains unclear. The nuclear genomes of brown and polar bears demonstrate different adaptations related to cellular respiration. Analyses of the genomes of brown bears exhibited substitutions that may alter the function of proteins that regulate glucose uptake, which could be beneficial when feeding on carbohydrate-dominated diets during hyperphagia, followed by fasting during hibernation. In polar bears, genes demonstrating signatures of functional divergence and those potentially under positive selection were enriched in functions related to production of nitric oxide, which can regulate energy production in several different ways. This suggests that polar bears may be able to fine-tune intracellular levels of nitric oxide as an adaptive response to control trade-offs between energy production in the form of ATP versus generation of heat (thermogenesis).
Cyanobacteria forged two major evolutionary transitions with the invention of oxygenic photosynthesis and the bestowal of photosynthetic lifestyle upon eukaryotes through endosymbiosis. Information germane to understanding those transitions is imprinted in cyanobacterial genomes, but deciphering it, is complicated by lateral gene transfer (LGT). Here we report genome sequences for the morphologically most complex true-branching cyanobacteria, and for Scytonema hofmanni PCC 7110, which with 12,356 proteins is the most gene-rich prokaryote currently known. We investigated components of cyanobacterial evolution that have been vertically inherited, horizontally transferred, and donated to eukaryotes at plastid origin. The vertical component indicates a freshwater origin for water-splitting photosynthesis. Networks of the horizontal component reveal that 60 % of cyanobacterial gene families have been affected by LGT. Plant nuclear genes acquired from cyanobacteria define a lower bound frequency of 611 multigene families that, in turn, specify diazotrophic cyanobacterial lineages as having a gene collection most similar to that possessed by the plastid ancestor.
Placental mammals comprise three principal clades: Afrotheria (e.g. elephants and tenrecs), Xenarthra (e.g. armadillos and sloths) and Boreoeutheria (all other placental mammals), the relationships among which are the subject of controversy and a touchstone for debate on the limits of phylogenetic inference. Previous analyses have found support for all three hypotheses, leading some to conclude that this phylogenetic problem might be impossible to resolve, due to the compounded effects of Incomplete Lineage Sorting (ILS) and a rapid radiation. Here we show, using a genome scale nucleotide dataset, microRNAs, and the reanalysis of the three largest previously published amino-acid datasets, that the root of Placentalia lies between Atlantogenata and Boreoeutheria. Although we found evidence for ILS in early placental evolution, we are able to reject previous conclusions that the placental root is a hard polytomy that cannot be resolved. Reanalyses of previous datasets recover Atlantogenata + Boreoeutheria and show that contradictory results are a consequence of poorly fitting evolutionary models; instead, when the evolutionary process is better-modelled, all datasets converge on Atlantogenata. Our Bayesian molecular clock analysis estimates that marsupials diverged from placentals 157-170 Ma, crown Placentalia diverged 86-100 Ma, and crown Atlantogenata diverged 84-97 Ma. Our results are compatible with placental diversification being driven by dispersal rather than vicariance mechanisms, postdating early phases in the protracted opening of the Atlantic Ocean.
Despite 400-450 million years of independent evolution, a strong phenotypic convergence has occurred between two groups of fish: tunas and lamnid sharks. This convergence is characterised by centralisation of red muscle, a distinctive swimming style (stiffened body powered through tail movements) and elevated body temperature (endothermy). Furthermore, both groups demonstrate elevated white muscle metabolic capacities. All these traits are unusual in fish and more likely evolved to support their fast-swimming, pelagic, predatory behaviour. Here we tested the hypothesis that their convergent evolution was driven by selection on a set of metabolic genes. We sequenced white muscle transcriptomes of six tuna, one mackerel and three shark species, and supplemented this data set with previously published RNA-seq data. Using 26 species in total, (including 7,032 tuna genes plus 1,719 shark genes), we constructed phylogenetic trees and carried out maximum-likelihood analyses of gene selection. We inferred several genes relating to metabolism to be under selection. We also found that the same one gene, glycogenin-1, evolved under positive selection independently in tunas and lamnid sharks, providing evidence of convergent selective pressures at gene level possibly underlying shared physiology.