Discover the most talked about and latest scientific content & concepts.

Concept: Bayes factor


A geographically-resolved, multi-level Bayesian model is used to analyze the data presented in the U.S. Police-Shooting Database (USPSD) in order to investigate the extent of racial bias in the shooting of American civilians by police officers in recent years. In contrast to previous work that relied on the FBI’s Supplemental Homicide Reports that were constructed from self-reported cases of police-involved homicide, this data set is less likely to be biased by police reporting practices. County-specific relative risk outcomes of being shot by police are estimated as a function of the interaction of: 1) whether suspects/civilians were armed or unarmed, and 2) the race/ethnicity of the suspects/civilians. The results provide evidence of a significant bias in the killing of unarmed black Americans relative to unarmed white Americans, in that the probability of being {black, unarmed, and shot by police} is about 3.49 times the probability of being {white, unarmed, and shot by police} on average. Furthermore, the results of multi-level modeling show that there exists significant heterogeneity across counties in the extent of racial bias in police shootings, with some counties showing relative risk ratios of 20 to 1 or more. Finally, analysis of police shooting data as a function of county-level predictors suggests that racial bias in police shootings is most likely to emerge in police departments in larger metropolitan counties with low median incomes and a sizable portion of black residents, especially when there is high financial inequality in that county. There is no relationship between county-level racial bias in police shootings and crime rates (even race-specific crime rates), meaning that the racial bias observed in police shootings in this data set is not explainable as a response to local-level crime rates.

Concepts: Scientific method, United States, Relative risk, Crime, White American, Police, Bayes factor, Constable


We addressed if immaturity relative to peers reflected in birth month increases the likelihood of ADHD diagnosis and treatment.

Concepts: Fundamental physics concepts, Bayes factor


Recent arguments connecting Na-Dene languages of North America with Yeniseian languages of Siberia have been used to assert proof for the origin of Native Americans in central or western Asia. We apply phylogenetic methods to test support for this hypothesis against an alternative hypothesis that Yeniseian represents a back-migration to Asia from a Beringian ancestral population. We coded a linguistic dataset of typological features and used neighbor-joining network algorithms and Bayesian model comparison based on Bayes factors to test the fit between the data and the linguistic phylogenies modeling two dispersal hypotheses. Our results support that a Dene-Yeniseian connection more likely represents radiation out of Beringia with back-migration into central Asia than a migration from central or western Asia to North America.

Concepts: United States, Europe, Asia, North America, Bayesian inference, Americas, Bayes factor, Dené-Yeniseian languages


Primates have long been a test case for the development of phylogenetic methods for divergence time estimation. Despite a large number of studies, however, the timing of origination of crown Primates relative to the K-Pg boundary and the timing of diversification of the main crown groups remain controversial. Here we analysed a dataset of 372 taxa (367 Primates and 5 outgroups, 3.4 million aligned base pairs) that includes nine primate genomes. We systematically explore the effect of different interpretations of fossil calibrations and molecular clock models on primate divergence time estimates. We find that even small differences in the construction of fossil calibrations can have a noticeable impact on estimated divergence times, especially for the oldest nodes in the tree. Notably, choice of molecular rate model (auto-correlated or independently distributed rates) has an especially strong effect on estimated times, with the independent rates model producing considerably more ancient age estimates for the deeper nodes in the phylogeny. We implement thermodynamic integration, combined with Gaussian quadrature, in the program MCMCTree, and use it to calculate Bayes factors for clock models. Bayesian model selection indicates that the auto-correlated rates model fits the primate data substantially better, and we conclude that time estimates under this model should be preferred. We show that for eight core nodes in the phylogeny, uncertainty in time estimates is close to the theoretical limit imposed by fossil uncertainties. Thus, these estimates are unlikely to be improved by collecting additional molecular sequence data. All analyses place the origin of Primates close to the K-Pg boundary, either in the Cretaceous or straddling the boundary into the Palaeogene.

Concepts: Molecular biology, Mathematics, Phylogenetics, Bayesian inference, Bayesian statistics, Bayes factor, Model selection, Cretaceous–Tertiary extinction event


We present a rapid and powerful inference procedure for identifying loci associated with rare hereditary disorders using Bayesian model comparison. Under a baseline model, disease risk is fixed across all individuals in a study. Under an association model, disease risk depends on a latent bipartition of rare variants into pathogenic and non-pathogenic variants, the number of pathogenic alleles that each individual carries, and the mode of inheritance. A parameter indicating presence of an association and the parameters representing the pathogenicity of each variant and the mode of inheritance can be inferred in a Bayesian framework. Variant-specific prior information derived from allele frequency databases, consequence prediction algorithms, or genomic datasets can be integrated into the inference. Association models can be fitted to different subsets of variants in a locus and compared using a model selection procedure. This procedure can improve inference if only a particular class of variants confers disease risk and can suggest particular disease etiologies related to that class. We show that our method, called BeviMed, is more powerful and informative than existing rare variant association methods in the context of dominant and recessive disorders. The high computational efficiency of our algorithm makes it feasible to test for associations in the large non-coding fraction of the genome. We have applied BeviMed to whole-genome sequencing data from 6,586 individuals with diverse rare diseases. We show that it can identify multiple loci involved in rare diseases, while correctly inferring the modes of inheritance, the likely pathogenic variants, and the variant classes responsible.

Concepts: Genetics, Allele, Logic, Reasoning, Statistical inference, Object-oriented programming, Inference, Bayes factor


Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a “limited offer” game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing.

Concepts: Game theory, Scientific method, Magnetic resonance imaging, Decision theory, Logic, Statistical inference, Bayesian statistics, Bayes factor


In previous papers, we introduced a normative scheme for scene construction and epistemic (visual) searches based upon active inference. This scheme provides a principled account of how people decide where to look, when categorising a visual scene based on its contents. In this paper, we use active inference to explain the visual searches of normal human subjects; enabling us to answer some key questions about visual foraging and salience attribution. First, we asked whether there is any evidence for ‘epistemic foraging’; i.e. exploration that resolves uncertainty about a scene. In brief, we used Bayesian model comparison to compare Markov decision process (MDP) models of scan-paths that did-and did not-contain the epistemic, uncertainty-resolving imperatives for action selection. In the course of this model comparison, we discovered that it was necessary to include non-epistemic (heuristic) policies to explain observed behaviour (e.g., a reading-like strategy that involved scanning from left to right). Despite this use of heuristic policies, model comparison showed that there is substantial evidence for epistemic foraging in the visual exploration of even simple scenes. Second, we compared MDP models that did-and did not-allow for changes in prior expectations over successive blocks of the visual search paradigm. We found that implicit prior beliefs about the speed and accuracy of visual searches changed systematically with experience. Finally, we characterised intersubject variability in terms of subject-specific prior beliefs. Specifically, we used canonical correlation analysis to see if there were any mixtures of prior expectations that could predict between-subject differences in performance; thereby establishing a quantitative link between different behavioural phenotypes and Bayesian belief updating. We demonstrated that better scene categorisation performance is consistently associated with lower reliance on heuristics; i.e., a greater use of a generative model of the scene to direct its exploration.

Concepts: Scientific method, Belief, Decision theory, Logic, Statistical inference, Canonical correlation, Bayesian statistics, Bayes factor


This study aimed to examine the associations between serious illness in earlier life and risk of pain in old age using data from a large nationally representative British birth cohort, the Medical Research Council (MRC) National Survey of Health and Development (NSHD). Serious illness was defined as any experience of illness before age 25 requiring hospital admission of ≥28 days. Pain was self-reported at age 68, with chronic widespread pain (CWP) defined according to American College of Rheumatology criteria. Multinomial logistic regression was used to test associations of serious illness in early life with CWP, chronic regional pain (CRP), and other pain, with no pain as the referent category. Adjustment was made for sex, socioeconomic position, adult health status, health behaviours, and psychosocial factors. Of 2401 NSHD participants with complete data, 10.5% reported CWP (13.2% of women and 7.7% of men), 30.2% reported CRP, and 14.8% other pain. Compared with those with no history of serious illness, those who experienced serious illness in early life had a higher likelihood of CWP (relative risk ratio [RRR] = 1.62 [95% CI: 1.21-2.17]) and of CRP (RRR = 1.25 [95% CI: 1.01-1.54]) after adjusting for sex. In fully adjusted models, serious illness in early life remained associated with CWP (RRR = 1.43 [95% CI: 1.05-1.95]), but associations with CRP were attenuated (RRR = 1.19 [95% CI: 0.96-1.48]). There were no associations with other pain. These findings suggest that those who have experienced serious illness in earlier life may require more support than others to minimise their risk of CWP in later life.This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Concepts: Cohort study, Medicine, Epidemiology, Relative risk, Multinomial logit, Creative Commons, Bayes factor, Creative Commons licenses


The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.

Concepts: Scientific method, Evolution, Phylogenetic tree, Phylogenetics, Bayesian inference, Bayes' theorem, Bayesian statistics, Bayes factor


Whole plastid genomes are being sequenced rapidly from across the green plant tree of life, and phylogenetic analyses of these are increasing resolution and support for relationships that have varied among or been unresolved in earlier single and multi-gene studies. Pooideae, the cool-season grass lineage, is the largest of the 12 grass subfamilies and includes important temperate cereals, turf grasses and forage species. Although numerous studies of the phylogeny of the subfamily have been undertaken, relationships among some “early-diverging” tribes conflict among studies, and some relationships among subtribes of Poeae have not yet been resolved. To address these issues, we newly sequenced 25 whole plastomes, which showed rearrangements typical of Poaceae. These plastomes represent nine tribes and 11 subtribes of Pooideae, and were analysed with 20 existing plastomes for the subfamily. Maximum likelihood, maximum parsimony and Bayesian inference robustly resolve most deep relationships in the subfamily. Complete plastome data provide increased nodal support compared to protein coding data alone at nodes that are not maximally supported. Following the divergence of Brachyelytrum, Phaenospermateae, Brylkinieae-Meliceae and Ampelodesmeae-Stipeae are the successive sister groups of the rest of the subfamily. Ampelodesmeae are nested within Stipeae in the plastome trees, consistent with its hybrid origin between a phaenospermatoid and a stipoid grass (the maternal parent). The core Pooideae are strongly supported and include Brachypodieae, a Bromeae-Triticeae clade and Poeae. Within Poeae, a novel sister-group relationship between Phalaridinae and Torreyochloinae is found, and the relative branching order of this clade and Aveninae, with respect to an Agrostidinae-Brizinae clade, are discordant between maximum parsimony and maximum likelihood/Bayesian inference trees. Maximum likelihood and Bayesian analyses strongly support Airinae and Holcinae as the successive sister groups of a Dactylidinae-Loliinae clade.

Concepts: Poaceae, Phylogenetics, Cladistics, Statistical inference, Bayesian inference, Bayes' theorem, Likelihood function, Bayes factor