Concept: Classical genetics


Epidemiological and genetic association studies show that genetics play an important role in the attainment of education. Here, we investigate the effect of this genetic component on the reproductive history of 109,120 Icelanders and the consequent impact on the gene pool over time. We show that an educational attainment polygenic score, POLYEDU, constructed from results of a recent study is associated with delayed reproduction (P < 10(-100)) and fewer children overall. The effect is stronger for women and remains highly significant after adjusting for educational attainment. Based on 129,808 Icelanders born between 1910 and 1990, we find that the average POLYEDU has been declining at a rate of ∼0.010 standard units per decade, which is substantial on an evolutionary timescale. Most importantly, because POLYEDU only captures a fraction of the overall underlying genetic component the latter could be declining at a rate that is two to three times faster.

Concepts: DNA, Gene, Genetics, Biology, Organism, Genome, Classical genetics, Genetic association


Next-generation sequencing is revolutionizing genomic analysis, but this analysis can be compromised by high rates of missing true variants. To develop a robust statistical method capable of identifying variants that would otherwise not be called, we conducted sequence data simulations and both whole-genome and targeted sequencing data analysis of 28 families. Our method (Family-Based Sequencing Program, FamSeq) integrates Mendelian transmission information and raw sequencing reads. Sequence analysis using FamSeq reduced the number of false negative variants by 14-33% as assessed by HapMap sample genotype confirmation. In a large family affected with Wilms tumor, 84% of variants uniquely identified by FamSeq were confirmed by Sanger sequencing. In children with early-onset neurodevelopmental disorders from 26 families, de novo variant calls in disease candidate genes were corrected by FamSeq as Mendelian variants, and the number of uniquely identified variants in affected individuals increased proportionally as additional family members were included in the analysis. To gain insight into maximizing variant detection, we studied factors impacting actual improvements of family-based calling, including pedigree structure, allele frequency (common vs. rare variants), prior settings of minor allele frequency, sequence signal-to-noise ratio, and coverage depth (∼20× to >200×). These data will help guide the design, analysis, and interpretation of family-based sequencing studies to improve the ability to identify new disease-associated genes.

Concepts: Family, Gene, Genetics, Allele, Evolution, Classical genetics, Data, Sequence


BACKGROUND: Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs. RESULTS: In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request. CONCLUSIONS: The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits.

Concepts: Regression analysis, Logistic regression, Genetics, Classical genetics, Quantitative trait locus, Amplified fragment length polymorphism, Epistasis, Statistical genetics


High-density linkage maps can improve the precision of QTL localization. A high-density SNP-based linkage map containing 3207 markers covering 3072.7 cM of the Brassica napus genome was constructed in the KenC-8 × N53-2 (KNDH) population. A total of 67 and 38 QTLs for seed oil and protein content were identified with an average confidence interval of 5.26 and 4.38 cM, which could explain up to 22.24% and 27.48% of the phenotypic variation, respectively. Thirty-eight associated genomic regions from BSA overlapped with and/or narrowed the SOC-QTLs, further confirming the QTL mapping results based on the high-density linkage map. Potential candidates related to acyl-lipid and seed storage underlying SOC and SPC, respectively, were identified and analyzed, among which six were checked and showed expression differences between the two parents during different embryonic developmental periods. A large primary carbohydrate pathway based on potential candidates underlying SOC- and SPC-QTLs, and interaction networks based on potential candidates underlying SOC-QTLs, was constructed to dissect the complex mechanism based on metabolic and gene regulatory features, respectively. Accurate QTL mapping and potential candidates identified based on high-density linkage map and BSA analyses provide new insights into the complex genetic mechanism of oil and protein accumulation in the seeds of rapeseed.

Concepts: DNA, Gene, Genetics, Classical genetics, Quantitative trait locus, Genetic linkage, Rapeseed, Seed


Muffs and beard (Mb) is a phenotype in chickens where groups of elongated feathers gather from both sides of the face (muffs) and below the beak (beard). It is an autosomal, incomplete dominant phenotype encoded by the Muffs and beard (Mb) locus. Here we use genome-wide association (GWA) analysis, linkage analysis, Identity-by-Descent (IBD) mapping, array-CGH, genome re-sequencing and expression analysis to show that the Mb allele causing the Mb phenotype is a derived allele where a complex structural variation (SV) on GGA27 leads to an altered expression of the gene HOXB8. This Mb allele was shown to be completely associated with the Mb phenotype in nine other independent Mb chicken breeds. The Mb allele differs from the wild-type mb allele by three duplications, one in tandem and two that are translocated to that of the tandem repeat around 1.70 Mb on GGA27. The duplications contain total seven annotated genes and their expression was tested during distinct stages of Mb morphogenesis. A continuous high ectopic expression of HOXB8 was found in the facial skin of Mb chickens, strongly suggesting that HOXB8 directs this regional feather-development. In conclusion, our results provide an interesting example of how genomic structural rearrangements alter the regulation of genes leading to novel phenotypes. Further, it again illustrates the value of utilizing derived phenotypes in domestic animals to dissect the genetic basis of developmental traits, herein providing novel insights into the likely role of HOXB8 in feather development and differentiation.

Concepts: DNA, Gene, Genetics, Genotype, Evolution, Genome, Chromosome, Classical genetics


The role of sex in biomedical studies has often been overlooked, despite evidence of sexually dimorphic effects in some biological studies. Here, we used high-throughput phenotype data from 14,250 wildtype and 40,192 mutant mice (representing 2,186 knockout lines), analysed for up to 234 traits, and found a large proportion of mammalian traits both in wildtype and mutants are influenced by sex. This result has implications for interpreting disease phenotypes in animal models and humans.

Concepts: Gene, Natural selection, Genotype, Evolution, Sexual dimorphism, Classical genetics, Phenotype, Heredity


In morphological terms, “form” is used to describe an object’s shape and size. In dogs, facial form is stunningly diverse. Facial retrusion, the proximodistal shortening of the snout and widening of the hard palate is common to brachycephalic dogs and is a welfare concern, as the incidence of respiratory distress and ocular trauma observed in this class of dogs is highly correlated with their skull form. Progress to identify the molecular underpinnings of facial retrusion is limited to association of a missense mutation in BMP3 among small brachycephalic dogs. Here, we used morphometrics of skull isosurfaces derived from 374 pedigree and mixed-breed dogs to dissect the genetics of skull form. Through deconvolution of facial forms, we identified quantitative trait loci that are responsible for canine facial shapes and sizes. Our novel insights include recognition that the FGF4 retrogene insertion, previously associated with appendicular chondrodysplasia, also reduces neurocranium size. Focusing on facial shape, we resolved a quantitative trait locus on canine chromosome 1 to a 188-kb critical interval that encompasses SMOC2. An intronic, transposable element within SMOC2 promotes the utilization of cryptic splice sites, causing its incorporation into transcripts, and drastically reduces SMOC2 gene expression in brachycephalic dogs. SMOC2 disruption affects the facial skeleton in a dose-dependent manner. The size effects of the associated SMOC2 haplotype are profound, accounting for 36% of facial length variation in the dogs we tested. Our data bring new focus to SMOC2 by highlighting its clinical implications in both human and veterinary medicine.

Concepts: Gene, Genetics, Mutation, Evolution, Classical genetics, Quantitative trait locus


Background Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. Methods We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. Results A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10(-7)). Conclusions In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within the same pathway. (Funded by the National Institutes of Health and the Ting Tsung and Wei Fong Chao Foundation.).

Concepts: Gene, Genetics, Genetic disorder, Genotype, Evolution, Classical genetics, Phenotype, Genotype-phenotype distinction


Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.

Concepts: DNA, Gene, Genetics, Genotype, Allele, Evolution, Biology, Classical genetics


Genetic similarity of spouses can reflect factors influencing mate choice, such as physical/behavioral characteristics, and patterns of social endogamy. Spouse correlations for both genetic ancestry and measured traits may impact genotype distributions (Hardy Weinberg and linkage equilibrium), and therefore genetic association studies. Here we evaluate white spouse-pairs from the Framingham Heart Study (FHS) original and offspring cohorts (N = 124 and 755, respectively) to explore spousal genetic similarity and its consequences. Two principal components (PCs) of the genome-wide association (GWA) data were identified, with the first (PC1) delineating clines of Northern/Western to Southern European ancestry and the second (PC2) delineating clines of Ashkenazi Jewish ancestry. In the original (older) cohort, there was a striking positive correlation between the spouses in PC1 (r = 0.73, P = 3x10-22) and also for PC2 (r = 0.80, P = 7x10-29). In the offspring cohort, the spouse correlations were lower but still highly significant for PC1 (r = 0.38, P = 7x10-28) and for PC2 (r = 0.45, P = 2x10-39). We observed significant Hardy-Weinberg disequilibrium for single nucleotide polymorphisms (SNPs) loading heavily on PC1 and PC2 across 3 generations, and also significant linkage disequilibrium between unlinked SNPs; both decreased with time, consistent with reduced ancestral endogamy over generations and congruent with theoretical calculations. Ignoring ancestry, estimates of spouse kinship have a mean significantly greater than 0, and more so in the earlier generations. Adjusting kinship estimates for genetic ancestry through the use of PCs led to a mean spouse kinship not different from 0, demonstrating that spouse genetic similarity could be fully attributed to ancestral assortative mating. These findings also have significance for studies of heritability that are based on distantly related individuals (kinship less than 0.05), as we also demonstrate the poor correlation of kinship estimates in that range when ancestry is or is not taken into account.

Concepts: DNA, Bioinformatics, Classical genetics, Genetic linkage, Population genetics, Genetic association, Linkage disequilibrium, Ashkenazi Jews