Conspiratorial ideation is the tendency of individuals to believe that events and power relations are secretly manipulated by certain clandestine groups and organisations. Many of these ostensibly explanatory conjectures are non-falsifiable, lacking in evidence or demonstrably false, yet public acceptance remains high. Efforts to convince the general public of the validity of medical and scientific findings can be hampered by such narratives, which can create the impression of doubt or disagreement in areas where the science is well established. Conversely, historical examples of exposed conspiracies do exist and it may be difficult for people to differentiate between reasonable and dubious assertions. In this work, we establish a simple mathematical model for conspiracies involving multiple actors with time, which yields failure probability for any given conspiracy. Parameters for the model are estimated from literature examples of known scandals, and the factors influencing conspiracy success and failure are explored. The model is also used to estimate the likelihood of claims from some commonly-held conspiratorial beliefs; these are namely that the moon-landings were faked, climate-change is a hoax, vaccination is dangerous and that a cure for cancer is being suppressed by vested interests. Simulations of these claims predict that intrinsic failure would be imminent even with the most generous estimates for the secret-keeping ability of active participants-the results of this model suggest that large conspiracies (≥1000 agents) quickly become untenable and prone to failure. The theory presented here might be useful in counteracting the potentially deleterious consequences of bogus and anti-science narratives, and examining the hypothetical conditions under which sustainable conspiracy might be possible.
Most researchers acknowledge an intrinsic hierarchy in the scholarly journals (“journal rank”) that they submit their work to, and adjust not only their submission but also their reading strategies accordingly. On the other hand, much has been written about the negative effects of institutionalizing journal rank as an impact measure. So far, contributions to the debate concerning the limitations of journal rank as a scientific impact assessment tool have either lacked data, or relied on only a few studies. In this review, we present the most recent and pertinent data on the consequences of our current scholarly communication system with respect to various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings or retractions). These data corroborate previous hypotheses: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether, in favor of a library-based scholarly communication system, will ultimately be necessary. This new system will use modern information technology to vastly improve the filter, sort and discovery functions of the current journal system.
Here, I argue that computational thinking and techniques are so central to the quest of understanding life that today all biology is computational biology. Computational biology brings order into our understanding of life, it makes biological concepts rigorous and testable, and it provides a reference map that holds together individual insights. The next modern synthesis in biology will be driven by mathematical, statistical, and computational methods being absorbed into mainstream biological training, turning biology into a quantitative science.
There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor. This suggests current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data. Additionally, we argue for scientists using complex non-linear dynamical systems with known ground truth, such as the microprocessor as a validation platform for time-series and structure discovery methods.
Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
Progress in regenerative medicine requires reverse-engineering cellular control networks to infer perturbations with desired systems-level outcomes. Such dynamic models allow phenotypic predictions for novel perturbations to be rapidly assessed in silico. Here, we analyzed a Xenopus model of conversion of melanocytes to a metastatic-like phenotype only previously observed in an all-or-none manner. Prior in vivo genetic and pharmacological experiments showed that individual animals either fully convert or remain normal, at some characteristic frequency after a given perturbation. We developed a Machine Learning method which inferred a model explaining this complex, stochastic all-or-none dataset. We then used this model to ask how a new phenotype could be generated: animals in which only some of the melanocytes converted. Systematically performing in silico perturbations, the model predicted that a combination of altanserin (5HTR2 inhibitor), reserpine (VMAT inhibitor), and VP16-XlCreb1 (constitutively active CREB) would break the all-or-none concordance. Remarkably, applying the predicted combination of three reagents in vivo revealed precisely the expected novel outcome, resulting in partial conversion of melanocytes within individuals. This work demonstrates the capability of automated analysis of dynamic models of signaling networks to discover novel phenotypes and predictively identify specific manipulations that can reach them.
As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.
Proposals to improve the reproducibility of biomedical research have emphasized scientific rigor. Although the word “rigor” is widely used, there has been little specific discussion as to what it means and how it can be achieved. We suggest that scientific rigor combines elements of mathematics, logic, philosophy, and ethics. We propose a framework for rigor that includes redundant experimental design, sound statistical analysis, recognition of error, avoidance of logical fallacies, and intellectual honesty. These elements lead to five actionable recommendations for research education.
Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.
How reliable are results on spatial distribution of biodiversity based on databases? Many studies have evidenced the uncertainty related to this kind of analysis due to sampling effort bias and the need for its quantification. Despite that a number of methods are available for that, little is known about their statistical limitations and discrimination capability, which could seriously constrain their use. We assess for the first time the discrimination capacity of two widely used methods and a proposed new one (FIDEGAM), all based on species accumulation curves, under different scenarios of sampling exhaustiveness using Receiver Operating Characteristic (ROC) analyses. Additionally, we examine to what extent the output of each method represents the sampling completeness in a simulated scenario where the true species richness is known. Finally, we apply FIDEGAM to a real situation and explore the spatial patterns of plant diversity in a National Park. FIDEGAM showed an excellent discrimination capability to distinguish between well and poorly sampled areas regardless of sampling exhaustiveness, whereas the other methods failed. Accordingly, FIDEGAM values were strongly correlated with the true percentage of species detected in a simulated scenario, whereas sampling completeness estimated with other methods showed no relationship due to null discrimination capability. Quantifying sampling effort is necessary to account for the uncertainty in biodiversity analyses, however, not all proposed methods are equally reliable. Our comparative analysis demonstrated that FIDEGAM was the most accurate discriminator method in all scenarios of sampling exhaustiveness, and therefore, it can be efficiently applied to most databases in order to enhance the reliability of biodiversity analyses.