Concept: Mathematical analysis
Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different samples. Comparative metagenomics studies the interrelationships between metagenomes from different samples. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences.
We present an approach for genome-wide association analysis with improved power on the Wellcome Trust data consisting of seven common phenotypes and shared controls. We achieved improved power by expanding the control set to include other disease cohorts, multiple races, and closely related individuals. Within this setting, we conducted exhaustive univariate and epistatic interaction association analyses. Use of the expanded control set identified more known associations with Crohn’s disease and potential new biology, including several plausible epistatic interactions in several diseases. Our work suggests that carefully combining data from large repositories could reveal many new biological insights through increased power. As a community resource, all results have been made available through an interactive web server.
Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality.
How reliable are results on spatial distribution of biodiversity based on databases? Many studies have evidenced the uncertainty related to this kind of analysis due to sampling effort bias and the need for its quantification. Despite that a number of methods are available for that, little is known about their statistical limitations and discrimination capability, which could seriously constrain their use. We assess for the first time the discrimination capacity of two widely used methods and a proposed new one (FIDEGAM), all based on species accumulation curves, under different scenarios of sampling exhaustiveness using Receiver Operating Characteristic (ROC) analyses. Additionally, we examine to what extent the output of each method represents the sampling completeness in a simulated scenario where the true species richness is known. Finally, we apply FIDEGAM to a real situation and explore the spatial patterns of plant diversity in a National Park. FIDEGAM showed an excellent discrimination capability to distinguish between well and poorly sampled areas regardless of sampling exhaustiveness, whereas the other methods failed. Accordingly, FIDEGAM values were strongly correlated with the true percentage of species detected in a simulated scenario, whereas sampling completeness estimated with other methods showed no relationship due to null discrimination capability. Quantifying sampling effort is necessary to account for the uncertainty in biodiversity analyses, however, not all proposed methods are equally reliable. Our comparative analysis demonstrated that FIDEGAM was the most accurate discriminator method in all scenarios of sampling exhaustiveness, and therefore, it can be efficiently applied to most databases in order to enhance the reliability of biodiversity analyses.
Sensitivity analyses refer to investigations of the degree to which the results of a meta-analysis remain stable when conditions of the data or the analysis change. To the extent that results remain stable, one can refer to them as robust. Sensitivity analyses are rarely conducted in the organizational science literature. Despite conscientiousness being a valued predictor in employment selection, sensitivity analyses have not been conducted with respect to meta-analytic estimates of the correlation (i.e., validity) between conscientiousness and job performance.
Most imaging studies in the biological sciences rely on analyses that are relatively simple. However, manual repetition of analysis tasks across multiple regions in many images can complicate even the simplest analysis, making record keeping difficult, increasing the potential for error, and limiting reproducibility. While fully automated solutions are necessary for very large data sets, they are sometimes impractical for the small- and medium-sized data sets common in biology. Here we present the Slide Set plugin for ImageJ, which provides a framework for reproducible image analysis and batch processing. Slide Set organizes data into tables, associating image files with regions of interest and other relevant information. Analysis commands are automatically repeated over each image in the data set, and multiple commands can be chained together for more complex analysis tasks. All analysis parameters are saved, ensuring transparency and reproducibility. Slide Set includes a variety of built-in analysis commands and can be easily extended to automate other ImageJ plugins, reducing the manual repetition of image analysis without the set-up effort or programming expertise required for a fully automated solution.
Nucleosome positioning plays significant roles in proper genome packing and its accessibility to execute transcription regulation. Despite a multitude of nucleosome positioning resources available on line including experimental datasets of genome-wide nucleosome occupancy profiles and computational tools to the analysis on these data, the complex language of eukaryotic Nucleosome positioning remains incompletely understood.
Since the emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrom Coronavirus (MERS-CoV) it has become increasingly clear that bats are important reservoirs of CoVs. Despite this, only 6% of all CoV sequences in GenBank are from bats. The remaining 94% largely consist of known pathogens of public health or agricultural significance, indicating that current research effort is heavily biased towards describing known diseases rather than the ‘pre-emergent’ diversity in bats. Our study addresses this critical gap, and focuses on resource poor countries where the risk of zoonotic emergence is believed to be highest. We surveyed the diversity of CoVs in multiple host taxa from twenty countries to explore the factors driving viral diversity at a global scale. We identified sequences representing 100 discrete phylogenetic clusters, ninety-one of which were found in bats, and used ecological and epidemiologic analyses to show that patterns of CoV diversity correlate with those of bat diversity. This cements bats as the major evolutionary reservoirs and ecological drivers of CoV diversity. Co-phylogenetic reconciliation analysis was also used to show that host switching has contributed to CoV evolution, and a preliminary analysis suggests that regional variation exists in the dynamics of this process. Overall our study represents a model for exploring global viral diversity and advances our fundamental understanding of CoV biodiversity and the potential risk factors associated with zoonotic emergence.
Serious and underappreciated sources of bias mean that extreme caution should be applied when using or interpreting functional enrichment analysis to validate findings from global RNA- or protein-expression analyses.
Although there has been much interest in estimating histories of divergence and admixture from genomic data, it has proven difficult to distinguish recent admixture from long-term structure in the ancestral population. Thus, recent genome-wide analyses based on summary statistics have sparked controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. Here we derive the probability of full mutational configurations in non-recombining sequence blocks under both admixture and ancestral structure scenarios. Dividing the genome into short blocks gives an efficient way to compute maximum likelihood estimates of parameters. We apply this likelihood scheme to triplets of human and Neandertal genomes and compare the relative support for a model of admixture from Neandertals into Eurasian populations after their expansion out of Africa against a history of persistent structure in their common ancestral population in Africa. Our analysis allows us to conclusively reject a model of ancestral structure in Africa and instead reveals strong support for Neandertal admixture in Eurasia at a higher rate (3.4%-7.3%) than suggested previously. Using analysis and simulations we show that our inference is more powerful than previous summary statistics and robust to realistic levels of recombination.