We performed a genome-wide association study (GWAS) and a multistage meta-analysis of type 2 diabetes (T2D) in Punjabi Sikhs from India. Our discovery GWAS in 1,616 individuals (842 case subjects) was followed by in silico replication of the top 513 independent SNPs (P < 10(-3)) in Punjabi Sikhs (n = 2,819; 801 case subjects). We further replicated 66 single nucleotide polymorphisms (SNPs) (P < 10(-4)) through genotyping in a Punjabi Sikh sample (n = 2,894; 1,711 case subjects). On combined meta-analysis in Sikh populations (n = 7,329; 3,354 case subjects), we identified a novel locus in association with T2D at 13q12 represented by a directly genotyped intronic SNP (rs9552911, P = 1.82 × 10(-8)) in the SGCG gene. Next, we undertook in silico replication (stage 2b) of the top 513 signals (P < 10(-3)) in 29,157 non-Sikh South Asians (10,971 case subjects) and de novo genotyping of up to 31 top signals (P < 10(-4)) in 10,817 South Asians (5,157 case subjects) (stage 3b). In combined South Asian meta-analysis, we observed six suggestive associations (P < 10(-5) to < 10(-7)), including SNPs at HMG1L1/CTCFL, PLXNA4, SCAP, and chr5p11. Further evaluation of 31 top SNPs in 33,707 East Asians (16,746 case subjects) (stage 3c) and 47,117 Europeans (8,130 case subjects) (stage 3d), and joint meta-analysis of 128,127 individuals (44,358 case subjects) from 27 multiethnic studies, did not reveal any additional loci nor was there any evidence of replication for the new variant. Our findings provide new evidence on the presence of a population-specific signal in relation to T2D, which may provide additional insights into T2D pathogenesis.
Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection and studying patterns of genomic diversity among cotton cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intra-specific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative inter-specific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array are developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal ex Seemann, G. mustelinum Miers x Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson & Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intra-specific and one inter-specific, 3,533 SNP markers were co-occurring in both maps. The produced intra-specific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The developed CottonSNP63K array and cluster file along with the marker sequences is a valuable new resource for the global cotton research community.
The use of circulating cell-free DNA (cfDNA) as a biomarker in transplant recipients offers advantages over invasive tissue biopsy as a quantitative measure for detection of transplant rejection and immunosuppression optimization. However, the fraction of donor-derived cfDNA (dd-cfDNA) in transplant recipient plasma is low and challenging to quantify. Previously reported methods to measure dd-cfDNA require donor and recipient genotyping, which is impractical in clinical settings and adds cost. We developed a targeted next-generation sequencing assay that uses 266 single-nucleotide polymorphisms to accurately quantify dd-cfDNA in transplant recipients without separate genotyping. Analytical performance of the assay was characterized and validated using 1117 samples comprising the National Institute for Standards and Technology Genome in a Bottle human reference genome, independently validated reference materials, and clinical samples. The assay quantifies the fraction of dd-cfDNA in both unrelated and related donor-recipient pairs. The dd-cfDNA assay can reliably measure dd-cfDNA (limit of blank, 0.10%; limit of detection, 0.16%; limit of quantification, 0.20%) across the linear quantifiable range (0.2% to 16%) with across-run CVs of 6.8%. Precision was also evaluated for independently processed clinical sample replicates and is similar to across-run precision. Application of the assay to clinical samples from heart transplant recipients demonstrated increased levels of dd-cfDNA in patients with biopsy-confirmed rejection and decreased levels of dd-cfDNA after successful rejection treatment. This noninvasive clinical-grade sequencing assay can be completed within 3 days, providing the practical turnaround time preferred for transplanted organ surveillance.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
Neisseria gonorrhoeae multilocus sequence typing (MLST) is a key tool used to investigate the macroepidemiology of gonococci exhibiting antimicrobial resistance (AMR). However, the utility of MLST is undermined by the high workload and cost associated with DNA sequencing of seven housekeeping genes. In this study, we investigated single nucleotide polymorphism (SNP)-based profiling as a means of circumventing these problems.
Objective: The identification of high-risk individuals can help to improve early cancer detection and patient survival. Risk assessment, however, can only be accomplished if the risk factors are known. To date, the genetic risk factors for ovarian cancer, other than mutations in the BRCA1/2 genes, have never been systematically explored in Malaysia. The present study aims to identify from a panel of cancer-associated single-nucleotide polymorphisms (SNPs), those associated with ovarian cancer risk in Malaysia. Methods: A total of 768 SNPs associated with various cancers among Asians were identified through a search of the relevant literature, and these SNPs were then screened for their association with ovarian cancer. A total of 160 Malaysian subjects were recruited for the study, including both ovarian cancer patients and controls. Genotyping was carried out using Illumina BeadArray platform. Results: A panel of 45 SNPs that are significantly (p<0.05) associated with ovarian cancer risk was identified. These ovarian cancer-associated SNPs were located in genes implicated in various pathways of carcinogenesis. Of these 45 SNPs, 5 have been previously associated with either ovarian cancer risk or survival. Conclusion: This study has identified a panel of 45 SNPs that are significantly associated with ovarian cancer in a Malaysian population.
Vascular endothelial growth factor (VEGF) and its receptor kinase insert domain-containing receptor (KDR) play crucial roles in angiogenesis, which contributes to the development and progression of solid tumors. The aim of this study was to investigate the associations of VEGF (-2578C > A, -1154G > A, -634G > C, and 936C > T) and KDR (-604T > C and 1192G > A) polymorphisms with the development of colorectal cancer (CRC). A total of 882 participants (390 CRC patients and 492 controls) were enrolled in the study. The genotyping of VEGF and KDR polymorphisms was performed by polymerase chain reaction-restriction fragment length polymorphism assay. We found that the CT and TT genotype of the 936C > T was associated with an increased risk of CRC compared with the CC genotype as the dominant model for the T allele. In addition, we also found a increased CRC risk with TC + CC genotype of KDR -604T > C compared with TT genotype in CRC patients and control subjects. Similarly, KDR 1192G > A also showed significant association between 1192G > A variants and risk of CRC. In the haplotype analyses, haplotype -2578A/-1154A/-634G/936T of VEGF polymorphisms and haplotype -604C/1192G and -604C/1192A of KDR polymorphisms were associated with an increased susceptibility of CRC. Our results suggest that the VEGF 936C > T, KDR -604T > C, and KDR 1192G > A polymorphisms may be contribute to CRC risk in the Korean population. © 2012 Wiley Periodicals, Inc.
Objective: Custom genotyping of markers in families with familial idiopathic scoliosis were used to fine-map candidate regions on chromosomes 9 and 16 in order to identify candidate genes that contribute to this disorder and prioritize them for next-generation sequence analysis. Methods: Candidate regions on 9q and 16p-16q, previously identified as linked to familial idiopathic scoliosis in a study of 202 families, were genotyped with a high-density map of single nucleotide polymorphisms. Tests of linkage for fine-mapping and intra-familial tests of association, including tiled regression, were performed on scoliosis as both a qualitative and quantitative trait. Results and Conclusions: Nominally significant linkage results were found for markers in both candidate regions. Results from intra-familial tests of association and tiled regression corroborated the linkage findings and identified possible candidate genes suitable for follow-up with next-generation sequencing in these same families. Candidate genes that met our prioritization criteria included FAM129B and CERCAM on chromosome 9 and SYT1, GNAO1, and CDH3 on chromosome 16.