How does network structure affect diffusion? Recent studies suggest that the answer depends on the type of contagion. Complex contagions, unlike infectious diseases (simple contagions), are affected by social reinforcement and homophily. Hence, the spread within highly clustered communities is enhanced, while diffusion across communities is hampered. A common hypothesis is that memes and behaviors are complex contagions. We show that, while most memes indeed spread like complex contagions, a few viral memes spread across many communities, like diseases. We demonstrate that the future popularity of a meme can be predicted by quantifying its early spreading pattern in terms of community concentration. The more communities a meme permeates, the more viral it is. We present a practical method to translate data about community structure into predictive knowledge about what information will spread widely. This connection contributes to our understanding in computational social science, social media analytics, and marketing applications.
Feelings of loneliness are common among young adults, and are hypothesized to impair the quality of sleep. In the present study, we tested associations between loneliness and sleep quality in a nationally representative sample of young adults. Further, based on the hypothesis that sleep problems in lonely individuals are driven by increased vigilance for threat, we tested whether past exposure to violence exacerbated this association.
Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name “deep patient”. We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.
To investigate cognitive operations underlying sequential problem solving, we confronted ten Goffin’s cockatoos with a baited box locked by five different inter-locking devices. Subjects were either naïve or had watched a conspecific demonstration, and either faced all devices at once or incrementally. One naïve subject solved the problem without demonstration and with all locks present within the first five sessions (each consisting of one trial of up to 20 minutes), while five others did so after social demonstrations or incremental experience. Performance was aided by species-specific traits including neophilia, a haptic modality and persistence. Most birds showed a ratchet-like progress, rarely failing to solve a stage once they had done it once. In most transfer tests subjects reacted flexibly and sensitively to alterations of the locks' sequencing and functionality, as expected from the presence of predictive inferences about mechanical interactions between the locks.
Standard theories of decision-making involving delayed outcomes predict that people should defer a punishment, whilst advancing a reward. In some cases, such as pain, people seem to prefer to expedite punishment, implying that its anticipation carries a cost, often conceptualized as ‘dread’. Despite empirical support for the existence of dread, whether and how it depends on prospective delay is unknown. Furthermore, it is unclear whether dread represents a stable component of value, or is modulated by biases such as framing effects. Here, we examine choices made between different numbers of painful shocks to be delivered faithfully at different time points up to 15 minutes in the future, as well as choices between hypothetical painful dental appointments at time points of up to approximately eight months in the future, to test alternative models for how future pain is disvalued. We show that future pain initially becomes increasingly aversive with increasing delay, but does so at a decreasing rate. This is consistent with a value model in which moment-by-moment dread increases up to the time of expected pain, such that dread becomes equivalent to the discounted expectation of pain. For a minority of individuals pain has maximum negative value at intermediate delay, suggesting that the dread function may itself be prospectively discounted in time. Framing an outcome as relief reduces the overall preference to expedite pain, which can be parameterized by reducing the rate of the dread-discounting function. Our data support an account of disvaluation for primary punishments such as pain, which differs fundamentally from existing models applied to financial punishments, in which dread exerts a powerful but time-dependent influence over choice.
Use of socially generated “big data” to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society’s reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between “real time monitoring” and “early predicting” remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.
Correctly assessing a scientist’s past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate’s future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist’s future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their “predictive power”. Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.
When we look at the rapid growth of scientific databases on the Internet in the past decade, we tend to take the accessibility and provenance of the data for granted. As we see a future of increased database integration, the licensing of the data may be a hurdle that hampers progress and usability. We have formulated four rules for licensing data for open drug discovery, which we propose as a starting point for consideration by databases and for their ultimate adoption. This work could also be extended to the computational models derived from such data. We suggest that scientists in the future will need to consider data licensing before they embark upon re-using such content in databases they construct themselves.
A genome-wide polygenic score (GPS), derived from a 2013 genome-wide association study (N=127,000), explained 2% of the variance in total years of education (EduYears). In a follow-up study (N=329,000), a new EduYears GPS explains up to 4%. Here, we tested the association between this latest EduYears GPS and educational achievement scores at ages 7, 12 and 16 in an independent sample of 5825 UK individuals. We found that EduYears GPS explained greater amounts of variance in educational achievement over time, up to 9% at age 16, accounting for 15% of the heritable variance. This is the strongest GPS prediction to date for quantitative behavioral traits. Individuals in the highest and lowest GPS septiles differed by a whole school grade at age 16. Furthermore, EduYears GPS was associated with general cognitive ability (~3.5%) and family socioeconomic status (~7%). There was no evidence of an interaction between EduYears GPS and family socioeconomic status on educational achievement or on general cognitive ability. These results are a harbinger of future widespread use of GPS to predict genetic risk and resilience in the social and behavioral sciences.Molecular Psychiatry advance online publication, 19 July 2016; doi:10.1038/mp.2016.107.
Thousands of lives are lost every year in developing countries for failing to detect epidemics early because of the lack of real-time disease surveillance data. We present results from a large-scale deployment of a telephone triage service as a basis for dengue forecasting in Pakistan. Our system uses statistical analysis of dengue-related phone calls to accurately forecast suspected dengue cases 2 to 3 weeks ahead of time at a subcity level (correlation of up to 0.93). Our system has been operational at scale in Pakistan for the past 3 years and has received more than 300,000 phone calls. The predictions from our system are widely disseminated to public health officials and form a critical part of active government strategies for dengue containment. Our work is the first to demonstrate, with significant empirical evidence, that an accurate, location-specific disease forecasting system can be built using analysis of call volume data from a public health hotline.