Concept: Single-nucleotide polymorphism
Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
We performed a genome-wide association study (GWAS) and a multistage meta-analysis of type 2 diabetes (T2D) in Punjabi Sikhs from India. Our discovery GWAS in 1,616 individuals (842 case subjects) was followed by in silico replication of the top 513 independent SNPs (P < 10(-3)) in Punjabi Sikhs (n = 2,819; 801 case subjects). We further replicated 66 single nucleotide polymorphisms (SNPs) (P < 10(-4)) through genotyping in a Punjabi Sikh sample (n = 2,894; 1,711 case subjects). On combined meta-analysis in Sikh populations (n = 7,329; 3,354 case subjects), we identified a novel locus in association with T2D at 13q12 represented by a directly genotyped intronic SNP (rs9552911, P = 1.82 × 10(-8)) in the SGCG gene. Next, we undertook in silico replication (stage 2b) of the top 513 signals (P < 10(-3)) in 29,157 non-Sikh South Asians (10,971 case subjects) and de novo genotyping of up to 31 top signals (P < 10(-4)) in 10,817 South Asians (5,157 case subjects) (stage 3b). In combined South Asian meta-analysis, we observed six suggestive associations (P < 10(-5) to < 10(-7)), including SNPs at HMG1L1/CTCFL, PLXNA4, SCAP, and chr5p11. Further evaluation of 31 top SNPs in 33,707 East Asians (16,746 case subjects) (stage 3c) and 47,117 Europeans (8,130 case subjects) (stage 3d), and joint meta-analysis of 128,127 individuals (44,358 case subjects) from 27 multiethnic studies, did not reveal any additional loci nor was there any evidence of replication for the new variant. Our findings provide new evidence on the presence of a population-specific signal in relation to T2D, which may provide additional insights into T2D pathogenesis.
Autoimmune thyroid disease (AITD), including Graves' disease (GD) and Hashimoto’s thyroiditis (HT), is one of the most common of the immune-mediated diseases. To further investigate the genetic determinants of AITD, we conducted an association study using a custom-made single-nucleotide polymorphism (SNP) array, the ImmunoChip. The SNP array contains all known and genotype-able SNPs across 186 distinct susceptibility loci associated with one or more immune-mediated diseases. After stringent quality control, we analysed 103 875 common SNPs (minor allele frequency >0.05) in 2285 GD and 462 HT patients and 9364 controls. We found evidence for seven new AITD risk loci (P < 1.12 × 10(-6); a permutation test derived significance threshold), five at locations previously associated and two at locations awaiting confirmation, with other immune-mediated diseases.
Several studies have identified nearly 40 different type 2 diabetes susceptibility loci, mainly in European populations, but few of them have been evaluated in the Mexican population. The aim of this study was to examine the extent to which 24 common genetic variants previously associated with type 2 diabetes are associated in Mexican Mestizos. Twenty-four single nucleotide polymorphisms (SNPs) in or near genes (KCNJ11, PPARG, TCF7L2, SLC30A8, HHEX, CDKN2A/2B, CDKAL1, IGF2BP2, ARHGEF11, JAZF1, CDC123/CAMK1D, FTO, TSPAN8/LGR5, KCNQ1, THADA, ADAMTS9, NOTCH2, NXPH1, RORA, UBQLNL, and RALGPS2) were genotyped in Mexican Mestizos. A case-control association study comprising 1,027 type 2 diabetic individuals and 990 control individuals was conducted. To account for population stratification, a panel of 104 ancestry-informative markers was analyzed. Association to type 2 diabetes was found for rs13266634 (SLC30A8), rs7923837 (HHEX), rs10811661 (CDKN2A/2B), rs4402960 (IGF2BP2), rs12779790 (CDC123/CAMK1D), and rs2237892 (KCNQ1). In addition, rs7754840 (CDKAL1) was associated in the nonobese type 2 diabetes subgroup, and for rs7903146 (TCF7L2), association was observed for early-onset type 2 diabetes. Lack of association for the rest of the variants may have resulted from insufficient power to detect smaller allele effects.
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection and studying patterns of genomic diversity among cotton cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intra-specific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative inter-specific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array are developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal ex Seemann, G. mustelinum Miers x Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson & Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intra-specific and one inter-specific, 3,533 SNP markers were co-occurring in both maps. The produced intra-specific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The developed CottonSNP63K array and cluster file along with the marker sequences is a valuable new resource for the global cotton research community.
Sequencing of the human genome and decades of genetic association and linkage studies have dramatically improved our understanding of the etiology of many diseases. However, the multiple causes of complex diseases are still not well understood, in part because genetic and sociocultural risk factors are not typically investigated concurrently. Hypertension is a leading risk factor for cardiovascular disease and afflicts more African Americans than any other racially defined group in the US. Few genetic loci for hypertension have been replicated across populations, which may reflect population-specific differences in genetic variants and/or inattention to relevant sociocultural factors. Discrimination is a salient sociocultural risk factor for poor health and has been associated with hypertension. Here we use a biocultural approach to study blood pressure (BP) variation in African Americans living in Tallahassee, Florida by genotyping over 30,000 single nucleotide polymorphisms (SNPs) and capturing experiences of discrimination using novel measures of unfair treatment of self and others (n = 157). We perform a joint admixture and genetic association analysis for BP that prioritizes regions of the genome with African ancestry. We only report significant SNPs that were confirmed through our simulation analyses, which were performed to determine the false positive rate. We identify eight significant SNPs in five genes that were previously associated with cardiovascular diseases. When we include measures of unfair treatment and test for interactions between SNPs and unfair treatment, we identify a new class of genes involved in multiple phenotypes including psychosocial distress and mood disorders. Our results suggest that inclusion of culturally relevant stress measures, like unfair treatment in African Americans, may reveal new genes and biological pathways relevant to the etiology of hypertension, and may also improve our understanding of the complexity of gene-environment interactions that underlie complex diseases.
Ancient DNA (aDNA) recovered from plague victims of the second plague pandemic (14th to 17th century), excavated from two different burial sites in Germany, and spanning a time period of more than 300 years, was characterized using single nucleotide polymorphism (SNP) analysis. Of 30 tested skeletons 8 were positive for Yersinia pestis-specific nucleic acid, as determined by qPCR targeting the pla gene. In one individual (MP-19-II), the pla copy number in DNA extracted from tooth pulp was as high as 700 gene copies/μl, indicating severe generalized infection. All positive individuals were identical in all 16 SNP positions, separating phylogenetic branches within nodes N07_N10 (14 SNPs), N07_N08 (SNP s19) and N06_N07 (s545), and were highly similar to previously investigated plague victims from other European countries. Thus, beside the assumed continuous reintroduction of Y. pestis from central Asia in multiple waves during the second pandemic, long-term persistence of Y. pestis in Europe in a yet unknown reservoir host has also to be considered.
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify “functional SNPs” that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
People’s differences in cognitive functions are partly heritable and are associated with important life outcomes. Previous genome-wide association (GWA) studies of cognitive functions have found evidence for polygenic effects yet, to date, there are few replicated genetic associations. Here we use data from the UK Biobank sample to investigate the genetic contributions to variation in tests of three cognitive functions and in educational attainment. GWA analyses were performed for verbal-numerical reasoning (N=36 035), memory (N=112 067), reaction time (N=111 483) and for the attainment of a college or a university degree (N=111 114). We report genome-wide significant single-nucleotide polymorphism (SNP)-based associations in 20 genomic regions, and significant gene-based findings in 46 regions. These include findings in the ATXN2, CYP2DG, APBA1 and CADM2 genes. We report replication of these hits in published GWA studies of cognitive function, educational attainment and childhood intelligence. There is also replication, in UK Biobank, of SNP hits reported previously in GWA studies of educational attainment and cognitive function. GCTA-GREML analyses, using common SNPs (minor allele frequency>0.01), indicated significant SNP-based heritabilities of 31% (s.e.m.=1.8%) for verbal-numerical reasoning, 5% (s.e.m.=0.6%) for memory, 11% (s.e.m.=0.6%) for reaction time and 21% (s.e.m.=0.6%) for educational attainment. Polygenic score analyses indicate that up to 5% of the variance in cognitive test scores can be predicted in an independent cohort. The genomic regions identified include several novel loci, some of which have been associated with intracranial volume, neurodegeneration, Alzheimer’s disease and schizophrenia.Molecular Psychiatry advance online publication, 5 April 2016; doi:10.1038/mp.2016.45.
We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10(-8)) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion.