Discover the most talked about and latest scientific content & concepts.

Journal: NPJ digital medicine


We investigated how intelligent virtual assistants (IVA), including Amazon’s Alexa, Apple’s Siri, Google Assistant, Microsoft’s Cortana, and Samsung’s Bixby, responded to addiction help-seeking queries. We recorded if IVAs provided a singular response and if so, did they link users to treatment or treatment referral services. Only 4 of the 70 help-seeking queries presented to the five IVAs returned singular responses, with the remainder prompting confusion (e.g., “did I say something wrong?”). When asked “help me quit drugs” Alexa responded with a definition for the word drugs. “Help me quit…smoking” or “tobacco” on Google Assistant returned Dr. QuitNow (a cessation app), while on Siri “help me quit pot” promoted a marijuana retailer. IVAs should be revised to promote free, remote, federally sponsored addiction services, such as SAMSHA’s 1-800-662-HELP helpline. This would benefit millions of IVA users now and more to come as IVAs displace existing information-seeking engines.


Patients with chronic pain commonly believe their pain is related to the weather. Scientific evidence to support their beliefs is inconclusive, in part due to difficulties in getting a large dataset of patients frequently recording their pain symptoms during a variety of weather conditions. Smartphones allow the opportunity to collect data to overcome these difficulties. Our study Cloudy with a Chance of Pain analysed daily data from 2658 patients collected over a 15-month period. The analysis demonstrated significant yet modest relationships between pain and relative humidity, pressure and wind speed, with correlations remaining even when accounting for mood and physical activity. This research highlights how citizen-science experiments can collect large datasets on real-world populations to address long-standing health questions. These results will act as a starting point for a future system for patients to better manage their health through pain forecasts.


The use of apps that record detailed menstrual cycle data presents a new opportunity to study the menstrual cycle. The aim of this study is to describe menstrual cycle characteristics observed from a large database of cycles collected through an app and investigate associations of menstrual cycle characteristics with cycle length, age and body mass index (BMI). Menstrual cycle parameters, including menstruation, basal body temperature (BBT) and luteinising hormone (LH) tests as well as age and BMI were collected anonymously from real-world users of the Natural Cycles app. We analysed 612,613 ovulatory cycles with a mean length of 29.3 days from 124,648 users. The mean follicular phase length was 16.9 days (95% CI: 10-30) and mean luteal phase length was 12.4 days (95% CI: 7-17). Mean cycle length decreased by 0.18 days (95% CI: 0.17-0.18, R2 = 0.99) and mean follicular phase length decreased by 0.19 days (95% CI: 0.19-0.20, R2 = 0.99) per year of age from 25 to 45 years. Mean variation of cycle length per woman was 0.4 days or 14% higher in women with a BMI of over 35 relative to women with a BMI of 18.5-25. This analysis details variations in menstrual cycle characteristics that are not widely known yet have significant implications for health and well-being. Clinically, women who wish to plan a pregnancy need to have intercourse on their fertile days. In order to identify the fertile period it is important to track physiological parameters such as basal body temperature and not just cycle length.


We developed a digitally enabled care pathway for acute kidney injury (AKI) management incorporating a mobile detection application, specialist clinical response team and care protocol. Clinical outcome data were collected from adults with AKI on emergency admission before (May 2016 to January 2017) and after (May to September 2017) deployment at the intervention site and another not receiving the intervention. Changes in primary outcome (serum creatinine recovery to ≤120% baseline at hospital discharge) and secondary outcomes (30-day survival, renal replacement therapy, renal or intensive care unit (ICU) admission, worsening AKI stage and length of stay) were measured using interrupted time-series regression. Processes of care data (time to AKI recognition, time to treatment) were extracted from casenotes, and compared over two 9-month periods before and after implementation (January to September 2016 and 2017, respectively) using pre-post analysis. There was no step change in renal recovery or any of the secondary outcomes. Trends for creatinine recovery rates (estimated odds ratio (OR) = 1.04, 95% confidence interval (95% CI): 1.00-1.08, p = 0.038) and renal or ICU admission (OR = 0.95, 95% CI: 0.90-1.00, p = 0.044) improved significantly at the intervention site. However, difference-in-difference analyses between sites for creatinine recovery (estimated OR = 0.95, 95% CI: 0.90-1.00, p = 0.053) and renal or ICU admission (OR = 1.06, 95% CI: 0.98-1.16, p = 0.140) were not significant. Among process measures, time to AKI recognition and treatment of nephrotoxicity improved significantly (p < 0.001 and 0.047 respectively).


While smartphone usage is ubiquitous, and the app market for smartphone apps targeted at mental health is growing rapidly, the evidence of standalone apps for treating mental health symptoms is still unclear. This meta-analysis investigated the efficacy of standalone smartphone apps for mental health. A comprehensive literature search was conducted in February 2018 on randomized controlled trials investigating the effects of standalone apps for mental health in adults with heightened symptom severity, compared to a control group. A random-effects model was employed. When insufficient comparisons were available, data was presented in a narrative synthesis. Outcomes included assessments of mental health disorder symptom severity specifically targeted at by the app. In total, 5945 records were identified and 165 full-text articles were screened for inclusion by two independent researchers. Nineteen trials with 3681 participants were included in the analysis: depression (k = 6), anxiety (k = 4), substance use (k = 5), self-injurious thoughts and behaviors (k = 4), PTSD (k = 2), and sleep problems (k = 2). Effects on depression (Hedges' g = 0.33, 95%CI 0.10-0.57, P = 0.005, NNT = 5.43, I2 = 59%) and on smoking behavior (g = 0.39, 95%CI 0.21-0.57, NNT = 4.59, P ≤ 0.001, I2 = 0%) were significant. No significant pooled effects were found for anxiety, suicidal ideation, self-injury, or alcohol use (g = -0.14 to 0.18). Effect sizes for single trials ranged from g = -0.05 to 0.14 for PTSD and g = 0.72 to 0.84 for insomnia. Although some trials showed potential of apps targeting mental health symptoms, using smartphone apps as standalone psychological interventions cannot be recommended based on the current level of evidence.


Current healthcare practices are reactive and based on limited physiological information collected months or years apart. By enabling patients and healthy consumers access to continuous measurements of health, wearable devices and digital medicine stand to realize highly personalized and preventative care. However, most current digital technologies provide information on a limited set of physiological traits, such as heart rate and step count, which alone offer little insight into the etiology of most diseases. Here we propose to integrate data from biohealth smartphone applications with continuous metabolic phenotypes derived from urine metabolites. This combination of molecular phenotypes with quantitative measurements of lifestyle reflect the biological consequences of human behavior in real time. We present data from an observational study involving two healthy subjects and discuss the challenges, opportunities, and implications of integrating this new layer of physiological information into digital medicine. Though our dataset is limited to two subjects, our analysis (also available through an interactive web-based visualization tool) provides an initial framework to monitor lifestyle factors, such as nutrition, drug metabolism, exercise, and sleep using urine metabolites.


For most women of reproductive age, assessing menstrual health and fertility typically involves regular visits to a gynecologist or another clinician. While these evaluations provide critical information on an individual’s reproductive health status, they typically rely on memory-based self-reports, and the results are rarely, if ever, assessed at the population level. In recent years, mobile apps for menstrual tracking have become very popular, allowing us to evaluate the reliability and tracking frequency of millions of self-observations, thereby providing an unparalleled view, both in detail and scale, on menstrual health and its evolution for large populations. In particular, the primary aim of this study was to describe the tracking behavior of the app users and their overall observation patterns in an effort to understand if they were consistent with previous small-scale medical studies. The secondary aim was to investigate whether their precision allowed the detection and estimation of ovulation timing, which is critical for reproductive and menstrual health. Retrospective self-observation data were acquired from two mobile apps dedicated to the application of the sympto-thermal fertility awareness method, resulting in a dataset of more than 30 million days of observations from over 2.7 million cycles for two hundred thousand users. The analysis of the data showed that up to 40% of the cycles in which users were seeking pregnancy had recordings every single day. With a modeling approach using Hidden Markov Models to describe the collected data and estimate ovulation timing, it was found that follicular phases average duration and range were larger than previously reported, with only 24% of ovulations occurring at cycle days 14 to 15, while the luteal phase duration and range were in line with previous reports, although short luteal phases (10 days or less) were more frequently observed (in up to 20% of cycles). The digital epidemiology approach presented here can help to lead to a better understanding of menstrual health and its connection to women’s health overall, which has historically been severely understudied.


The global burden of diabetic retinopathy (DR) continues to worsen and DR remains a leading cause of vision loss worldwide. Here, we describe an algorithm to predict DR progression by means of deep learning (DL), using as input color fundus photographs (CFPs) acquired at a single visit from a patient with DR. The proposed DL models were designed to predict future DR progression, defined as 2-step worsening on the Early Treatment Diabetic Retinopathy Diabetic Retinopathy Severity Scale, and were trained against DR severity scores assessed after 6, 12, and 24 months from the baseline visit by masked, well-trained, human reading center graders. The performance of one of these models (prediction at month 12) resulted in an area under the curve equal to 0.79. Interestingly, our results highlight the importance of the predictive signal located in the peripheral retinal fields, not routinely collected for DR assessments, and the importance of microvascular abnormalities. Our findings show the feasibility of predicting future DR progression by leveraging CFPs of a patient acquired at a single visit. Upon further development on larger and more diverse datasets, such an algorithm could enable early diagnosis and referral to a retina specialist for more frequent monitoring and even consideration of early intervention. Moreover, it could also improve patient recruitment for clinical trials targeting DR.


Human-in-the-loop (HITL) AI may enable an ideal symbiosis of human experts and AI models, harnessing the advantages of both while at the same time overcoming their respective limitations. The purpose of this study was to investigate a novel collective intelligence technology designed to amplify the diagnostic accuracy of networked human groups by forming real-time systems modeled on biological swarms. Using small groups of radiologists, the swarm-based technology was applied to the diagnosis of pneumonia on chest radiographs and compared against human experts alone, as well as two state-of-the-art deep learning AI models. Our work demonstrates that both the swarm-based technology and deep-learning technology achieved superior diagnostic accuracy than the human experts alone. Our work further demonstrates that when used in combination, the swarm-based technology and deep-learning technology outperformed either method alone. The superior diagnostic accuracy of the combined HITL AI solution compared to radiologists and AI alone has broad implications for the surging clinical AI deployment and implementation strategies in future practice.


As wearable technologies are being increasingly used for clinical research and healthcare, it is critical to understand their accuracy and determine how measurement errors may affect research conclusions and impact healthcare decision-making. Accuracy of wearable technologies has been a hotly debated topic in both the research and popular science literature. Currently, wearable technology companies are responsible for assessing and reporting the accuracy of their products, but little information about the evaluation method is made publicly available. Heart rate measurements from wearables are derived from photoplethysmography (PPG), an optical method for measuring changes in blood volume under the skin. Potential inaccuracies in PPG stem from three major areas, includes (1) diverse skin types, (2) motion artifacts, and (3) signal crossover. To date, no study has systematically explored the accuracy of wearables across the full range of skin tones. Here, we explored heart rate and PPG data from consumer- and research-grade wearables under multiple circumstances to test whether and to what extent these inaccuracies exist. We saw no statistically significant difference in accuracy across skin tones, but we saw significant differences between devices, and between activity types, notably, that absolute error during activity was, on average, 30% higher than during rest. Our conclusions indicate that different wearables are all reasonably accurate at resting and prolonged elevated heart rate, but that differences exist between devices in responding to changes in activity. This has implications for researchers, clinicians, and consumers in drawing study conclusions, combining study results, and making health-related decisions using these devices.