Journal: American journal of human genetics
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries-such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”-with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
Spontaneous dizygotic (DZ) twinning occurs in 1%-4% of women, with familial clustering and unknown physiological pathways and genetic origin. DZ twinning might index increased fertility and has distinct health implications for mother and child. We performed a GWAS in 1,980 mothers of spontaneous DZ twins and 12,953 control subjects. Findings were replicated in a large Icelandic cohort and tested for association across a broad range of fertility traits in women. Two SNPs were identified (rs11031006 near FSHB, p = 1.54 × 10(-9), and rs17293443 in SMAD3, p = 1.57 × 10(-8)) and replicated (p = 3 × 10(-3) and p = 1.44 × 10(-4), respectively). Based on ∼90,000 births in Iceland, the risk of a mother delivering twins increased by 18% for each copy of allele rs11031006-G and 9% for rs17293443-C. A higher polygenic risk score (PRS) for DZ twinning, calculated based on the results of the DZ twinning GWAS, was significantly associated with DZ twinning in Iceland (p = 0.001). A higher PRS was also associated with having children (p = 0.01), greater lifetime parity (p = 0.03), and earlier age at first child (p = 0.02). Allele rs11031006-G was associated with higher serum FSH levels, earlier age at menarche, earlier age at first child, higher lifetime parity, lower PCOS risk, and earlier age at menopause. Conversely, rs17293443-C was associated with later age at last child. We identified robust genetic risk variants for DZ twinning: one near FSHB and a second within SMAD3, the product of which plays an important role in gonadal responsiveness to FSH. These loci contribute to crucial aspects of reproductive capacity and health.
Human genes governing innate immunity provide a valuable tool for the study of the selective pressure imposed by microorganisms on host genomes. A comprehensive, genome-wide study of how selective constraints and adaptations have driven the evolution of innate immunity genes is missing. Using full-genome sequence variation from the 1000 Genomes Project, we first show that innate immunity genes have globally evolved under stronger purifying selection than the remainder of protein-coding genes. We identify a gene set under the strongest selective constraints, mutations in which are likely to predispose individuals to life-threatening disease, as illustrated by STAT1 and TRAF3. We then evaluate the occurrence of local adaptation and detect 57 high-scoring signals of positive selection at innate immunity genes, variation in which has been associated with susceptibility to common infectious or autoimmune diseases. Furthermore, we show that most adaptations targeting coding variation have occurred in the last 6,000-13,000 years, the period at which populations shifted from hunting and gathering to farming. Finally, we show that innate immunity genes present higher Neandertal introgression than the remainder of the coding genome. Notably, among the genes presenting the highest Neandertal ancestry, we find the TLR6-TLR1-TLR10 cluster, which also contains functional adaptive variation in Europeans. This study identifies highly constrained genes that fulfill essential, non-redundant functions in host survival and reveals others that are more permissive to change-containing variation acquired from archaic hominins or adaptive variants in specific populations-improving our understanding of the relative biological importance of innate immunity pathways in natural conditions.
The past five years have seen many scientific and biological discoveries made through the experimental design of genome-wide association studies (GWASs). These studies were aimed at detecting variants at genomic loci that are associated with complex traits in the population and, in particular, at detecting associations between common single-nucleotide polymorphisms (SNPs) and common diseases such as heart disease, diabetes, auto-immune diseases, and psychiatric disorders. We start by giving a number of quotes from scientists and journalists about perceived problems with GWASs. We will then briefly give the history of GWASs and focus on the discoveries made through this experimental design, what those discoveries tell us and do not tell us about the genetics and biology of complex traits, and what immediate utility has come out of these studies. Rather than giving an exhaustive review of all reported findings for all diseases and other complex traits, we focus on the results for auto-immune diseases and metabolic diseases. We return to the perceived failure or disappointment about GWASs in the concluding section.
The magnitude of the human antibody response to viral antigens is highly variable. To explore the human genetic contribution to this variability, we performed genome-wide association studies of the immunoglobulin G response to 14 pathogenic viruses in 2,363 immunocompetent adults. Significant associations were observed in the major histocompatibility complex region on chromosome 6 for influenza A virus, Epstein-Barr virus, JC polyomavirus, and Merkel cell polyomavirus. Using local imputation and fine mapping, we identified specific amino acid residues in human leucocyte antigen (HLA) class II proteins as the most probable causal variants underlying these association signals. Common HLA-DRβ1 haplotypes showed virus-specific patterns of humoral-response regulation. We observed an overlap between variants affecting the humoral response to influenza A and EBV and variants previously associated with autoimmune diseases related to these viruses. The results of this study emphasize the central and pathogen-specific role of HLA class II variation in the modulation of humoral immune response to viral antigens in humans.
Uncombable hair syndrome (UHS), also known as “spun glass hair syndrome,” “pili trianguli et canaliculi,” or “cheveux incoiffables” is a rare anomaly of the hair shaft that occurs in children and improves with age. UHS is characterized by dry, frizzy, spangly, and often fair hair that is resistant to being combed flat. Until now, both simplex and familial UHS-affected case subjects with autosomal-dominant as well as -recessive inheritance have been reported. However, none of these case subjects were linked to a molecular genetic cause. Here, we report the identification of UHS-causative mutations located in the three genes PADI3 (peptidylarginine deiminase 3), TGM3 (transglutaminase 3), and TCHH (trichohyalin) in a total of 11 children. All of these individuals carry homozygous or compound heterozygous mutations in one of these three genes, indicating an autosomal-recessive inheritance pattern in the majority of UHS case subjects. The two enzymes PADI3 and TGM3, responsible for posttranslational protein modifications, and their target structural protein TCHH are all involved in hair shaft formation. Elucidation of the molecular outcomes of the disease-causing mutations by cell culture experiments and tridimensional protein models demonstrated clear differences in the structural organization and activity of mutant and wild-type proteins. Scanning electron microscopy observations revealed morphological alterations in hair coat of Padi3 knockout mice. All together, these findings elucidate the molecular genetic causes of UHS and shed light on its pathophysiology and hair physiology in general.
During neurotransmission, synaptic vesicles undergo multiple rounds of exo-endocytosis, involving recycling and/or degradation of synaptic proteins. While ubiquitin signaling at synapses is essential for neural function, it has been assumed that synaptic proteostasis requires the ubiquitin-proteasome system (UPS). We demonstrate here that turnover of synaptic membrane proteins via the endolysosomal pathway is essential for synaptic function. In both human and mouse, hypomorphic mutations in the ubiquitin adaptor protein PLAA cause an infantile-lethal neurodysfunction syndrome with seizures. Resulting from perturbed endolysosomal degradation, Plaa mutant neurons accumulate K63-polyubiquitylated proteins and synaptic membrane proteins, disrupting synaptic vesicle recycling and neurotransmission. Through characterization of this neurological intracellular trafficking disorder, we establish the importance of ubiquitin-mediated endolysosomal trafficking at the synapse.
In 2014, the United States granted individuals a right of access to their own laboratory test results, including genomic data. Many observers feel that this right is in tension with regulatory and bioethical standards designed to protect the safety of people who undergo genomic testing. This commentary attributes this tension to growing pains within an expanding federal regulatory program for genetic and genomic testing. The Genetic Information Nondiscrimination Act of 2008 expanded the regulatory agenda to encompass civil rights and consumer safety. The individual access right, as it applies to genomic data, is best understood as a civil-rights regulation. Competing regulatory objectives-safety and civil rights-were not successfully integrated during the initial rollout of genomic civil-rights regulations after 2008. Federal law clarifies how to prioritize safety and civil rights when the two come into conflict, although with careful policy design, the two need not collide. This commentary opens a dialog about possible solutions to advance safety and civil rights together.
With recent rapid advances in genomic technologies, precise delineation of structural chromosome rearrangements at the nucleotide level is becoming increasingly feasible. In this era of “next-generation cytogenetics” (i.e., an integration of traditional cytogenetic techniques and next-generation sequencing), a consensus nomenclature is essential for accurate communication and data sharing. Currently, nomenclature for describing the sequencing data of these aberrations is lacking. Herein, we present a system called Next-Gen Cytogenetic Nomenclature, which is concordant with the International System for Human Cytogenetic Nomenclature (2013). This system starts with the alignment of rearrangement sequences by BLAT or BLAST (alignment tools) and arrives at a concise and detailed description of chromosomal changes. To facilitate usage and implementation of this nomenclature, we are developing a program designated BLA(S)T Output Sequence Tool of Nomenclature (BOSToN), a demonstrative version of which is accessible online. A standardized characterization of structural chromosomal rearrangements is essential both for research analyses and for application in the clinical setting.
Nemaline myopathy (NEM) is a common congenital myopathy. At the very severe end of the NEM clinical spectrum are genetically unresolved cases of autosomal-recessive fetal akinesia sequence. We studied a multinational cohort of 143 severe-NEM-affected families lacking genetic diagnosis. We performed whole-exome sequencing of six families and targeted gene sequencing of additional families. We identified 19 mutations in KLHL40 (kelch-like family member 40) in 28 apparently unrelated NEM kindreds of various ethnicities. Accounting for up to 28% of the tested individuals in the Japanese cohort, KLHL40 mutations were found to be the most common cause of this severe form of NEM. Clinical features of affected individuals were severe and distinctive and included fetal akinesia or hypokinesia and contractures, fractures, respiratory failure, and swallowing difficulties at birth. Molecular modeling suggested that the missense substitutions would destabilize the protein. Protein studies showed that KLHL40 is a striated-muscle-specific protein that is absent in KLHL40-associated NEM skeletal muscle. In zebrafish, klhl40a and klhl40b expression is largely confined to the myotome and skeletal muscle, and knockdown of these isoforms results in disruption of muscle structure and loss of movement. We identified KLHL40 mutations as a frequent cause of severe autosomal-recessive NEM and showed that it plays a key role in muscle development and function. Screening of KLHL40 should be a priority in individuals who are affected by autosomal-recessive NEM and who present with prenatal symptoms and/or contractures and in all Japanese individuals with severe NEM.