Concept: Genetic genealogy
This study examines genetic diversity among 102 registered English Bulldogs used for breeding based on maternal and paternal haplotypes, allele frequencies in 33 highly polymorphic short tandem repeat (STR) loci on 25 chromosomes, STR-linked dog leukocyte antigen (DLA) class I and II haplotypes, and the number and size of genome-wide runs of homozygosity (ROH) determined from high density SNP arrays. The objective was to assess whether the breed retains enough genetic diversity to correct the genotypic and phenotypic abnormalities associated with poor health, to allow for the elimination of deleterious recessive mutations, or to make further phenotypic changes in body structure or coat. An additional 37 English bulldogs presented to the UC Davis Veterinary Clinical Services for health problems were also genetically compared with the 102 registered dogs based on the perception that sickly English bulldogs are products of commercial breeders or puppy-mills and genetically different and inferior.
Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for more than 200,000 years, were probably well adapted to this environment and its local pathogens. It is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three Toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to the Neandertal genome, and the third haplotype is most similar to the Denisovan genome. The Toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi, and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10-30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4-6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.
Autoimmune thyroid disease (AITD), including Graves' disease (GD) and Hashimoto’s thyroiditis (HT), is one of the most common of the immune-mediated diseases. To further investigate the genetic determinants of AITD, we conducted an association study using a custom-made single-nucleotide polymorphism (SNP) array, the ImmunoChip. The SNP array contains all known and genotype-able SNPs across 186 distinct susceptibility loci associated with one or more immune-mediated diseases. After stringent quality control, we analysed 103 875 common SNPs (minor allele frequency >0.05) in 2285 GD and 462 HT patients and 9364 controls. We found evidence for seven new AITD risk loci (P < 1.12 × 10(-6); a permutation test derived significance threshold), five at locations previously associated and two at locations awaiting confirmation, with other immune-mediated diseases.
Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated taking different information sources from the reference population into account: 1) allele frequencies; 2) LD pattern; 3) haplotypes; 4) haploid chromosomes; 5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (M(e)) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002±0.0001 (allele frequencies), 0.015±0.001 (LD pattern), 0.018±0.001 (haplotypes), 0.100±0.008 (haploid chromosomes) and 0.318±0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated M(e). So, reliabilities can be predicted accurately using empirically estimated M(e) and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length.
Sex chromosomes are an ideal system to study processes connected with suppressed recombination. We found evidence of microsatellite expansion, on the relatively young Y chromosome of the dioecious plant sorrel (Rumex acetosa, XY1Y2 system), but no such expansion on the more ancient Y chromosomes of liverwort (Marchantia polymorpha) and human. The most expanding motifs were AC and AAC, which also showed periodicity of array length, indicating the importance of beginnings and ends of arrays. Our data indicate that abundance of microsatellites in genomes depends on the inherent expansion potential of specific motifs, which could be related to their stability and ability to adopt unusual DNA conformations. We also found that the abundance of microsatellites is higher in the neighborhood of transposable elements (TEs) suggesting that microsatellites are probably targets for TE insertions. This evidence suggests that microsatellite expansion is an early event shaping the Y chromosome where this process is not opposed by recombination, while accumulation of TEs and chromosome shrinkage predominate later.
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
The human Y-chromosome does not recombine across its male-specific part and is therefore an excellent marker of human migrations. It also plays an important role in male fertility. However, its evolution is difficult to fully understand because of repetitive sequences, inverted repeats and the potentially large role of gene conversion. Here we perform an evolutionary analysis of 62 Y-chromosomes of Danish descent sequenced using a wide range of library insert sizes and high coverage, thus allowing large regions of these chromosomes to be well assembled. These include 17 father-son pairs, which we use to validate variation calling. Using a recent method that can integrate variants based on both mapping and de novo assembly, we genotype 10898 SNVs and 2903 indels (max length of 27241 bp) in our sample and show by father-son concordance and experimental validation that the non-recurrent SNP and indel variation on the Y chromosome tree is called very accurately. This includes variation called in a 0.9 Mb centromeric heterochromatic region, which is by far the most variable in the Y chromosome. Among the variation is also longer sequence-stretches not present in the reference genome but shared with the chimpanzee Y chromosome. We analyzed 2.7 Mb of large inverted repeats (palindromes) for variation patterns among the two palindrome arms and identified 603 mutation and 416 gene conversions events. We find clear evidence for GC-biased gene conversion in the palindromes (and a balancing AT mutation bias), but irrespective of this, also a strong bias towards gene conversion towards the ancestral state, suggesting that palindromic gene conversion may alleviate Muller’s ratchet. Finally, we also find a large number of large-scale gene duplications and deletions in the palindromic regions (at least 24) and find that such events can consist of complex combinations of simultaneous insertions and deletions of long stretches of the Y chromosome.
High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.
It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50-100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192-307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47-52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.