Journal: American journal of human genetics
Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for more than 200,000 years, were probably well adapted to this environment and its local pathogens. It is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three Toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to the Neandertal genome, and the third haplotype is most similar to the Denisovan genome. The Toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi, and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.
Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidrón, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes-including A00, the highly divergent basal haplogroup. We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447-806 kya). This is ∼2.1 (95% CI: 1.7-2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.
Men have a shorter life expectancy compared with women but the underlying factor(s) are not clear. Late-onset, sporadic Alzheimer disease (AD) is a common and lethal neurodegenerative disorder and many germline inherited variants have been found to influence the risk of developing AD. Our previous results show that a fundamentally different genetic variant, i.e., lifetime-acquired loss of chromosome Y (LOY) in blood cells, is associated with all-cause mortality and an increased risk of non-hematological tumors and that LOY could be induced by tobacco smoking. We tested here a hypothesis that men with LOY are more susceptible to AD and show that LOY is associated with AD in three independent studies of different types. In a case-control study, males with AD diagnosis had higher degree of LOY mosaicism (adjusted odds ratio = 2.80, p = 0.0184, AD events = 606). Furthermore, in two prospective studies, men with LOY at blood sampling had greater risk for incident AD diagnosis during follow-up time (hazard ratio [HR] = 6.80, 95% confidence interval [95% CI] = 2.16-21.43, AD events = 140, p = 0.0011). Thus, LOY in blood is associated with risks of both AD and cancer, suggesting a role of LOY in blood cells on disease processes in other tissues, possibly via defective immunosurveillance. As a male-specific risk factor, LOY might explain why males on average live shorter lives than females.
The predominantly African origin of all modern human populations is well established, but the route taken out of Africa is still unclear. Two alternative routes, via Egypt and Sinai or across the Bab el Mandeb strait into Arabia, have traditionally been proposed as feasible gateways in light of geographic, paleoclimatic, archaeological, and genetic evidence. Distinguishing among these alternatives has been difficult. We generated 225 whole-genome sequences (225 at 8× depth, of which 8 were increased to 30×; Illumina HiSeq 2000) from six modern Northeast African populations (100 Egyptians and five Ethiopian populations each represented by 25 individuals). West Eurasian components were masked out, and the remaining African haplotypes were compared with a panel of sub-Saharan African and non-African genomes. We showed that masked Northeast African haplotypes overall were more similar to non-African haplotypes and more frequently present outside Africa than were any sets of haplotypes derived from a West African population. Furthermore, the masked Egyptian haplotypes showed these properties more markedly than the masked Ethiopian haplotypes, pointing to Egypt as the more likely gateway in the exodus to the rest of the world. Using five Ethiopian and three Egyptian high-coverage masked genomes and the multiple sequentially Markovian coalescent (MSMC) approach, we estimated the genetic split times of Egyptians and Ethiopians from non-African populations at 55,000 and 65,000 years ago, respectively, whereas that of West Africans was estimated to be 75,000 years ago. Both the haplotype and MSMC analyses thus suggest a predominant northern route out of Africa via Egypt.
We report the discovery of an African American Y chromosome that carries the ancestral state of all SNPs that defined the basal portion of the Y chromosome phylogenetic tree. We sequenced ∼240 kb of this chromosome to identify private, derived mutations on this lineage, which we named A00. We then estimated the time to the most recent common ancestor (TMRCA) for the Y tree as 338 thousand years ago (kya) (95% confidence interval = 237-581 kya). Remarkably, this exceeds current estimates of the mtDNA TMRCA, as well as those of the age of the oldest anatomically modern human fossils. The extremely ancient age combined with the rarity of the A00 lineage, which we also find at very low frequency in central Africa, point to the importance of considering more complex models for the origin of Y chromosome diversity. These models include ancient population structure and the possibility of archaic introgression of Y chromosomes into anatomically modern humans. The A00 lineage was discovered in a large database of consumer samples of African Americans and has not been identified in traditional hunter-gatherer populations from sub-Saharan Africa. This underscores how the stochastic nature of the genealogical process can affect inference from a single locus and warrants caution during the interpretation of the geographic location of divergent branches of the Y chromosome phylogenetic tree for the elucidation of human origins.
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries-such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”-with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
Spontaneous dizygotic (DZ) twinning occurs in 1%-4% of women, with familial clustering and unknown physiological pathways and genetic origin. DZ twinning might index increased fertility and has distinct health implications for mother and child. We performed a GWAS in 1,980 mothers of spontaneous DZ twins and 12,953 control subjects. Findings were replicated in a large Icelandic cohort and tested for association across a broad range of fertility traits in women. Two SNPs were identified (rs11031006 near FSHB, p = 1.54 × 10(-9), and rs17293443 in SMAD3, p = 1.57 × 10(-8)) and replicated (p = 3 × 10(-3) and p = 1.44 × 10(-4), respectively). Based on ∼90,000 births in Iceland, the risk of a mother delivering twins increased by 18% for each copy of allele rs11031006-G and 9% for rs17293443-C. A higher polygenic risk score (PRS) for DZ twinning, calculated based on the results of the DZ twinning GWAS, was significantly associated with DZ twinning in Iceland (p = 0.001). A higher PRS was also associated with having children (p = 0.01), greater lifetime parity (p = 0.03), and earlier age at first child (p = 0.02). Allele rs11031006-G was associated with higher serum FSH levels, earlier age at menarche, earlier age at first child, higher lifetime parity, lower PCOS risk, and earlier age at menopause. Conversely, rs17293443-C was associated with later age at last child. We identified robust genetic risk variants for DZ twinning: one near FSHB and a second within SMAD3, the product of which plays an important role in gonadal responsiveness to FSH. These loci contribute to crucial aspects of reproductive capacity and health.
Human genes governing innate immunity provide a valuable tool for the study of the selective pressure imposed by microorganisms on host genomes. A comprehensive, genome-wide study of how selective constraints and adaptations have driven the evolution of innate immunity genes is missing. Using full-genome sequence variation from the 1000 Genomes Project, we first show that innate immunity genes have globally evolved under stronger purifying selection than the remainder of protein-coding genes. We identify a gene set under the strongest selective constraints, mutations in which are likely to predispose individuals to life-threatening disease, as illustrated by STAT1 and TRAF3. We then evaluate the occurrence of local adaptation and detect 57 high-scoring signals of positive selection at innate immunity genes, variation in which has been associated with susceptibility to common infectious or autoimmune diseases. Furthermore, we show that most adaptations targeting coding variation have occurred in the last 6,000-13,000 years, the period at which populations shifted from hunting and gathering to farming. Finally, we show that innate immunity genes present higher Neandertal introgression than the remainder of the coding genome. Notably, among the genes presenting the highest Neandertal ancestry, we find the TLR6-TLR1-TLR10 cluster, which also contains functional adaptive variation in Europeans. This study identifies highly constrained genes that fulfill essential, non-redundant functions in host survival and reveals others that are more permissive to change-containing variation acquired from archaic hominins or adaptive variants in specific populations-improving our understanding of the relative biological importance of innate immunity pathways in natural conditions.
The past five years have seen many scientific and biological discoveries made through the experimental design of genome-wide association studies (GWASs). These studies were aimed at detecting variants at genomic loci that are associated with complex traits in the population and, in particular, at detecting associations between common single-nucleotide polymorphisms (SNPs) and common diseases such as heart disease, diabetes, auto-immune diseases, and psychiatric disorders. We start by giving a number of quotes from scientists and journalists about perceived problems with GWASs. We will then briefly give the history of GWASs and focus on the discoveries made through this experimental design, what those discoveries tell us and do not tell us about the genetics and biology of complex traits, and what immediate utility has come out of these studies. Rather than giving an exhaustive review of all reported findings for all diseases and other complex traits, we focus on the results for auto-immune diseases and metabolic diseases. We return to the perceived failure or disappointment about GWASs in the concluding section.
The magnitude of the human antibody response to viral antigens is highly variable. To explore the human genetic contribution to this variability, we performed genome-wide association studies of the immunoglobulin G response to 14 pathogenic viruses in 2,363 immunocompetent adults. Significant associations were observed in the major histocompatibility complex region on chromosome 6 for influenza A virus, Epstein-Barr virus, JC polyomavirus, and Merkel cell polyomavirus. Using local imputation and fine mapping, we identified specific amino acid residues in human leucocyte antigen (HLA) class II proteins as the most probable causal variants underlying these association signals. Common HLA-DRβ1 haplotypes showed virus-specific patterns of humoral-response regulation. We observed an overlap between variants affecting the humoral response to influenza A and EBV and variants previously associated with autoimmune diseases related to these viruses. The results of this study emphasize the central and pathogen-specific role of HLA class II variation in the modulation of humoral immune response to viral antigens in humans.