Concept: Personal genomics
Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants
- CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne
- Published over 1 year ago
The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers.
BACKGROUND: To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. METHODS: We sequenced 15 exomes from four families using the Illumina HiSeq 2000 platform and Agilent SureSelect v.2 capture kit, with ~120X mean coverage. We analyzed the raw data using near-default parameters with 5 different alignment and variant calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMTools). We additionally sequenced a single whole genome using the Complete Genomics (CG) sequencing and analysis pipeline, with 95% of the exome region being covered by 20 or more reads per base. Finally, we attempted to validate 919 SNVs and 841 indels, including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with ~5000X average coverage. RESULTS: SNV concordance between five Illumina pipelines across all 15 exomes is 57.4%, while 0.5-5.1% variants were called as unique to each pipeline. Indel concordance is only 26.8% between three indel calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. 11% of CG variants that fall within targeted regions in exome sequencing were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2% and 99.1% of the GATK-only, SOAP-only and shared SNVs can be validated, but only 54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated. Additionally, our analysis of two families, one containing four individuals and the other containing seven, demonstrates additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. CONCLUSIONS: Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families, so as to increase the overall accuracy of whole genomes.
Background. In recent years, there has been an explosion in the number of technical and medical diagnostic platforms being developed. This has greatly improved our ability to more accurately, and more comprehensively, explore and characterize human biological systems on the individual level. Large quantities of biomedical data are now being generated and archived in many separate research and clinical activities, but there exists a paucity of studies that integrate the areas of clinical neuropsychiatry, personal genomics and brain-machine interfaces. Methods. A single person with severe mental illness was implanted with the Medtronic Reclaim(®) Deep Brain Stimulation (DBS) Therapy device for Obsessive Compulsive Disorder (OCD), targeting his nucleus accumbens/anterior limb of the internal capsule. Programming of the device and psychiatric assessments occurred in an outpatient setting for over two years. His genome was sequenced and variants were detected in the Illumina Whole Genome Sequencing Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. Results. We report here the detailed phenotypic characterization, clinical-grade whole genome sequencing (WGS), and two-year outcome of a man with severe OCD treated with DBS. Since implantation, this man has reported steady improvement, highlighted by a steady decline in his Yale-Brown Obsessive Compulsive Scale (YBOCS) score from ∼38 to a score of ∼25. A rechargeable Activa RC neurostimulator battery has been of major benefit in terms of facilitating a degree of stability and control over the stimulation. His psychiatric symptoms reliably worsen within hours of the battery becoming depleted, thus providing confirmatory evidence for the efficacy of DBS for OCD in this person. WGS revealed that he is a heterozygote for the p.Val66Met variant in BDNF, encoding a member of the nerve growth factor family, and which has been found to predispose carriers to various psychiatric illnesses. He carries the p.Glu429Ala allele in methylenetetrahydrofolate reductase (MTHFR) and the p.Asp7Asn allele in ChAT, encoding choline O-acetyltransferase, with both alleles having been shown to confer an elevated susceptibility to psychoses. We have found thousands of other variants in his genome, including pharmacogenetic and copy number variants. This information has been archived and offered to this person alongside the clinical sequencing data, so that he and others can re-analyze his genome for years to come. Conclusions. To our knowledge, this is the first study in the clinical neurosciences that integrates detailed neuropsychiatric phenotyping, deep brain stimulation for OCD and clinical-grade WGS with management of genetic results in the medical treatment of one person with severe mental illness. We offer this as an example of precision medicine in neuropsychiatry including brain-implantable devices and genomics-guided preventive health care.
- Genetics in medicine : official journal of the American College of Medical Genetics
- Published about 6 years ago
Purpose:The promise of personalized genomics for common complex diseases depends, in part, on the ability to predict genetic risks on the basis of single nucleotide polymorphisms. We examined and compared the methods of three companies (23andMe, deCODEme, and Navigenics) that have offered direct-to-consumer personal genome testing.Methods:We simulated genotype data for 100,000 individuals on the basis of published genotype frequencies and predicted disease risks using the methods of the companies. Predictive ability for six diseases was assessed by the AUC.Results:AUC values differed among the diseases and among the companies. The highest values of the AUC were observed for age-related macular degeneration, celiac disease, and Crohn disease. The largest difference among the companies was found for celiac disease: the AUC was 0.73 for 23andMe and 0.82 for deCODEme. Predicted risks differed substantially among the companies as a result of differences in the sets of single nucleotide polymorphisms selected and the average population risks selected by the companies, and in the formulas used for the calculation of risks.Conclusion:Future efforts to design predictive models for the genomics of common complex diseases may benefit from understanding the strengths and limitations of the predictive algorithms designed by these early companies.Genet Med advance online publication 27 June 2013Genetics in Medicine (2013); doi:10.1038/gim.2013.80.
In August 2013, the genetic-testing company 23andMe began running a compelling national television commercial, in which attractive young people said that for $99 you could learn “hundreds of things about your health,” including that you “might have an increased risk of heart disease, arthritis, gallstones, [or] hemochromatosis” (www.ispot.tv/ad/7qoF/23-and-me). It was the centerpiece of the company’s campaign to sign up 1 million consumers. On November 22, the Food and Drug Administration (FDA) sent 23andMe a warning letter ordering it to “immediately discontinue marketing the PGS [Saliva Collection Kit and Personal Genome Service] until such time as it receives FDA marketing authorization . . .
One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI’s annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks.
To address the need for more effective genomics training, beginning in 2012 the Icahn School of Medicine at Mount Sinai has offered a unique laboratory-style graduate genomics course, “Practical Analysis of Your Personal Genome” (PAPG), in which students optionally sequence and analyze their own whole genome. We hypothesized that incorporating personal genome sequencing (PGS) into the course pedagogy could improve educational outcomes by increasing student motivation and engagement. Here we extend our initial study of the pilot PAPG cohort with a report on student attitudes towards genome sequencing, decision-making, psychological wellbeing, genomics knowledge and pedagogical engagement across three course years.
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.