Journal: G3 (Bethesda, Md.)
Variation in gene copy number can significantly affect organism fitness. When one allele is missing in a diploid, the phenotype can be compromised, due to haploinsufficiency. In this work we identified associations between Saccharomyces cerevisiae gene properties and genome-scale haploinsufficiency phenotypes from earlier work. We compared the haploinsufficiency profiles against 23 gene properties and found that genes with (i) higher level of connectivity (degree) in a protein-protein interaction network, (ii) higher genetic interaction degree, (iii) greater gene sequence conservation, and (iv) higher protein expression were significantly more likely to be haploinsufficient. Additionally, haploinsufficiency showed negative relationships with (v) cell cycle regulation and (vi) promoter sequence conservation. We exploited the aforementioned associations using Linear Discriminant Analysis to predict haploinsufficiency in existing data and guide experimental identification of 6 novel haploinsufficient phenotypes, previously undetected in genome-scale screenings. Using a similar approach we identified significant relationships between haploinsufficiency and two gene properties in Schizosaccharomyces pombe, relationships that hold despite the lack of conserved HI between S. cerevisiae and Sz. pombe orthologue pairs. These data suggest associations between haploinsufficiency and gene properties are conserved among hemiascomycetes yeasts. The relationships and predictive model presented here are a useful step towards understanding haploinsufficiency and its underlying mechanisms.
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection and studying patterns of genomic diversity among cotton cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intra-specific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative inter-specific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array are developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal ex Seemann, G. mustelinum Miers x Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson & Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intra-specific and one inter-specific, 3,533 SNP markers were co-occurring in both maps. The produced intra-specific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The developed CottonSNP63K array and cluster file along with the marker sequences is a valuable new resource for the global cotton research community.
HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies done using HeLa cells requires accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. The extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.
The Muller F element (4.2 Mb, ~80 protein-coding genes) is an unusual autosome of Drosophila melanogaster; it is mostly heterochromatic with a low recombination rate. To investigate how these properties impact the evolution of repeats and genes, we manually improved the sequence and annotated the genes on the D. erecta, D. mojavensis, and D. grimshawi F elements and euchromatic domains from the Muller D element. We find that F elements have higher transposon density (25%-50%) than euchromatic reference regions (3%-11%). Among the F elements, D. grimshawi has the lowest transposon density (particularly DINE-1: 2% versus 11%-27%). F element genes have larger coding spans, more coding exons, larger introns, and lower codon bias. Comparison of the Effective Number of Codons with the Codon Adaptation Index shows that, in contrast to the other species, codon bias in D. grimshawi F element genes can be attributed primarily to selection instead of mutational biases, suggesting that density and types of transposons affect the degree of local heterochromatin formation. F element genes have lower estimated DNA melting temperatures than D element genes, potentially facilitating transcription through heterochromatin. Most F element genes (~90%) have remained on that element, but the F element has smaller syntenic blocks than genome averages (3.4-3.6 versus 8.4-8.8 genes per block), indicating higher rates of inversion despite lower rates of recombination. Overall, the F element has maintained characteristics that are distinct from other autosomes in the Drosophila lineage, illuminating the constraints imposed by a heterochromatic milieu.
Environmental adaptation is one of the most fundamental features of organisms. Modern genome science has identified some genes associated with adaptive traits of organisms, and has provided insights into environmental adaptation and evolution. However, how genes contribute to adaptive traits and how traits are selected under an environment in the course of evolution remain mostly unclear. To approach these issues, we utilize “Dark-fly”, a Drosophila melanogaster line maintained in a constant dark condition for more than 60 years. Our previous analysis identified 220,000 single nucleotide polymorphisms (SNPs) in the Dark-fly genome, but did not clarify which SNPs of Dark-fly are truly adaptive for living in the dark. We found here that Dark-fly dominated over the wild-type fly in a mixed population under dark conditions, and based on this domination we designed an experiment for genome re-selection to identify adaptive genes of Dark-fly. For this experiment, large mixed populations of Dark-fly and the wild-type fly were maintained in light conditions or in dark conditions, and the frequencies of Dark-fly SNPs were compared between these populations across the whole genome. We thereby detected condition-dependent selections towards approximately 6% of the genome. In addition, we observed the time-course trajectory of SNP frequency in the mixed populations through generations 0, 22, and 49, which resulted in notable categorization of the selected SNPs into three types with different combinations of positive and negative selections. Our data provided a list of about 100 strong candidate genes associated with the adaptive traits of Dark-fly.
ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique in PubMed as of December 2012. Individually and in aggregate these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia Of DNA Elements (ENCODE) project, developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1st 2012. The majority (55%) of datasets scored as highly successful, but a substantial minority (20%) were of apparently poor quality, and another ~25% were of intermediate quality. We discuss how different uses of ChIP-Seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e. no-immunoprecipitation and mock-immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.
Domesticated species exhibit a suite of behavioral, endocrinological, and morphological changes referred to as “domestication syndrome.” These changes may include a reduction in reactivity of the hypothalamic-pituitary-adrenal (HPA) axis, specifically reduced adrenocorticotropic hormone release from the anterior pituitary. To investigate the biological mechanisms targeted during domestication, we investigated gene expression in the pituitaries of experimentally domesticated foxes (Vulpes vulpes). RNA was sequenced from the anterior pituitary of six foxes selectively bred for tameness (“tame foxes”) and six foxes selectively bred for aggression (“aggressive foxes”). Expression, splicing, and network differences identified between the two lines indicated the importance of genes related to regulation of exocytosis, specifically mediated by cAMP, organization of pseudopodia, and cell motility. These findings provide new insights into biological mechanisms that may have been targeted when these lines of foxes were selected for behavior, and suggest new directions for research into HPA axis regulation and the biological underpinnings of domestication.
The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long-reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by shortreads (80 ×) and checked for accuracy by congruency against an independent short-read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the complete and precise exon placement for 91.1% of the 9,805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.
Comparing genomes of closely related genotypes from populations with distinct demographic histories can help reveal the impact of effective population size on genome evolution. For this purpose, we present a high-quality genome assembly of Daphnia pulex (PA42), and compare this with the first sequenced genome of this species (TCO), which was derived from an isolate from a population with >90% reduction in nucleotide diversity. PA42 has numerous similarities to TCO at the gene level, with an average amino-acid sequence identity of 98.8% and >60% of orthologous proteins identical. Nonetheless, there is a highly elevated number of genes in the TCO genome annotation, with ~7,000 excess genes appearing to be false positives. This view is supported by the high GC content, lack of introns, and short length of these suspicious gene annotations. Consistent with the view that reduced effective population size can facilitate the accumulation of slightly deleterious genomic features, we observe more proliferation of transposable elements and a higher frequency of gained introns in the TCO genome.
Drosophila melanogaster is a powerful model organism for biological research. The essential and common instrument of fly research is genetics, the art of applying Mendelian rules in the specific context of Drosophila with its unique classical genetic tools and the breadth of modern genetic tools and strategies brought in by molecular biology, transgenic technologies and the use of recombinases. Training newcomers to fly genetics is a complex and time-consuming task but too important to be left to chance. Surprisingly, suitable training resources for beginners currently are not available. Here we provide a training package for basic Drosophila genetics, designed to ensure that basic knowledge on all key areas is covered while reducing the time invested by trainers. First, a manual introduces to fly history, rationale for mating schemes, fly handling, Mendelian rules in fly, markers and balancers, mating scheme design, and transgenic technologies. Its self-study is followed by a practical training session on gender and marker selection, introducing real flies under the dissecting microscope. Next, through self-study of a PowerPoint presentation, trainees are guided step-by-step through a mating scheme. Finally, to consolidate knowledge, trainees are asked to design similar mating schemes reflecting routine tasks in a fly laboratory. This exercise requires individual feedback but also provides unique opportunities for trainers to spot weaknesses and strengths of each trainee and take remedial action. This training package is being successfully applied at the Manchester fly facility and may serve as a model for further training resources covering other aspects of fly research.