Concept: International HapMap Project
BACKGROUND: Genotyping and massively-parallel sequencing projects result in a vast amount of diploid data that is only rarely resolved into its constituent haplotypes. It is nevertheless this phased information that is transmitted from one generation to the next and is most directly associated with biological function and the genetic causes of biological effects. Despite progress made in genome-wide sequencing and phasing algorithms and methods, problems assembling (and reconstructing linear haplotypes in) regions of repetitive DNA and structural variation remain. These dynamic and structurally complex regions are often poorly understood from a sequence point of view. Regions such as these that are highly similar in their sequence tend to be collapsed onto the genome assembly. This is turn means downstream determination of the true sequence haplotype in these regions poses a particular challenge. For structurally complex regions, a more focussed approach to assembling haplotypes may be required. RESULTS: In order to investigate reconstruction of spatial information at structurally complex regions, we have used an emulsion haplotype fusion PCR approach to reproducibly link sequences of up to 1kb in length to allow phasing of multiple variants from neighbouring loci, using allele-specific PCR and sequencing to detect the phase. By using emulsion systems linking flanking regions to amplicons within the CNV, this led to the reconstruction of a 59kb haplotype across the DEFA1A3 CNV in HapMap individuals. CONCLUSION: This study has demonstrated a novel use for emulsion haplotype fusion PCR in addressing the issue of reconstructing structural haplotypes at multiallelic copy variable regions, using the DEFA1A3 locus as an example.
Haplotypes formed by polymorphisms (T-786C, rs2070744; a variable number of tandem repeats in intron 4, and Glu298Asp, rs1799983) of the eNOS gene were associated previously with gestational hypertension (GH) and preeclampsia (PE). However, no study has explored the Tag SNPs rs743506 and rs7830 in these disorders. The aim of the current study was to compare the distribution of the genotypes and haplotypes formed by the five eNOS polymorphisms mentioned among healthy pregnant (HP, n=122), GH (n=138), and PE (n=157). The haplotype formed by “C b G G C” was more frequent in HP compared to GH and PE (p=0.0071), which is supported by previous findings that demonstrated the association of the combination “C b G” with a higher level of nitrite (NO marker). Our results suggest a protective effect of the haplotype “C b G G C” against the development of hypertensive disorders of pregnancy.
There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence.
Characterization of genetic variations in maize has been challenging, mainly due to deterioration of collinearity between individual genomes in the species. An international consortium of maize research groups combined resources to develop the maize haplotype version 3 (HapMap 3), built from whole genome sequencing data from 1,218 maize lines, covering pre-domestication and domesticated Zea mays varieties across the world.
Precise characterization of NAHR breakpoints is key to identifying those features that influence NAHR frequency. Until now, analysis of NAHR-mediated rearrangements has generally been performed by comparison of the breakpoint-spanning sequences with the human genome reference sequence. We show here that the haplotype diversity of NAHR hotspots may interfere with breakpoint-mapping. We studied the transmitting parents of individuals with germline type-1 NF1 deletions mediated by NAHR within the PRS1 or PRS2 hotspots. Several parental wildtype PRS1 and PRS2 haplotypes were identified that exhibited considerable sequence diversity with respect to the reference sequence which also affected the number of predicted PRDM9-binding sites. Sequence comparisons between the parental wildtype PRS1 or PRS2 haplotypes and the deletion breakpoint-spanning sequences from the patients (method #2) turned out to be an accurate means to assign NF1 deletion breakpoints and proved superior to crude reference sequence comparisons that neglect to consider haplotype diversity (method #1). The mean length of the deletion breakpoint regions assigned by method #2 was 269-bp in contrast to 502-bp by method #1. Our findings imply that paralog-specific haplotype diversity of NAHR hotspots (such as PRS2) and population-specific haplotype diversity must be taken into account in order to accurately ascertain NAHR-mediated rearrangement breakpoints. This article is protected by copyright. All rights reserved.
The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.
Acute myeloid leukemia (AML) is a cancer of the myeloid line of blood cells, and generally considered to be caused by environment and genetic factors. In this study, we combined a genome-wide haplotype association study (GWHAS) and gene prioritization strategy to mine AML-related genetic affect factors and understand its pathogenesis. A total of 175 AML patients were downloaded from the public GEO database (GSE32462) and 218 matched Caucasian controls were from the HapMap Project. We first identified the linkage disequilibrium (LD) blocks and performed a GWHAS to scan AML-related haplotypes. Then we mapped these haplotypes to the corresponding genes as candidate. And finally, we prioritized all the AML candidate genes based on the similarity with 38 known AML susceptibility genes. The results showed that 1754 haplotypes were significant associated with AML (P<1E-5) and mapped to 591 candidate genes. After prioritizing all 591 AML candidate genes, we obtained four genes ranking at the front as AML risk genes: RUNX1, JAK1, PDGFRA, and FGFR2. Among them, RUNX1, JAK1 and PDGFRA had been confirmed as AML risk genes. In particular, we found that the gene FGFR2 was a novel AML susceptibility gene with a haplotype TT (rs7090018 and rs2912759) showed significant association with AML (P-value = 7.07E-06). In a word, our findings might provide a new perspective to understand the pathogenesis of AML.
X-linked cone dysfunction disorders such as Blue Cone Monochromacy and X-linked Cone Dystrophy are characterized by complete loss (of) or reduced L- and M- cone function due to defects in the OPN1LW/OPN1MW gene cluster. Here we investigated 24 affected males from 16 families with either a structurally intact gene cluster or at least one intact single (hybrid) gene but harbouring rare combinations of common SNPs in exon 3 in single or multiple OPN1LW and OPN1MW gene copies. We assessed twelve different OPN1LW/MW exon 3 haplotypes by semi-quantitative minigene splicing assay. Nine haplotypes resulted in aberrant splicing of ≥20% of transcripts including the known pathogenic haplotypes (i.e. ‘LIAVA’, ‘LVAVA’) with absent or minute amounts of correctly spliced transcripts, respectively. De novo formation of the ‘LIAVA’ haplotype derived from an ancestral less deleterious ‘LIAVS’ haplotype was observed in one family with strikingly different phenotypes among affected family members. We could establish intrachromosomal gene conversion in the male germline as underlying mechanism. Gene conversion in the OPN1LW/OPN1MW genes has been postulated, however, we are first to demonstrate a de novo gene conversion within the lineage of a pedigree.
Ancestry informative markers (AIMs) can be used to determine population affiliation of the donors of forensic samples. In order to examine ancestry evaluations of the four major populations in the USA, 23 highly informative AIMs were identified from the International HapMap project. However, the efficacy of these 23 AIMs could not be fully evaluated in silico. In this study, these 23 SNPs were multiplexed to test their actual performance in ancestry evaluations. Genotype data were obtained from 189 individuals collected from four American populations. One SNP (rs12149261) on chromosome 16 was removed from this panel because it was duplicated on chromosome 1. The resultant 22-AIMs panel was able to empirically resolve the four major populations as in the in silico study. Eight individuals were assigned to a different group than indicated on their samples. The assignments of the 22 AIMs for these samples were consistent with AIMs results from the ForenSeq™ panel. No departures from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were detected for all 22 SNPs in four US populations (after removing the eight problematic samples). The principal component analysis (PCA) results indicated that 181 individuals from these populations were assigned to the expected groups. These 22 SNPs can contribute to the candidate AIMs pool for potential forensic identification purposes in major US populations.
The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project.