Discover the most talked about and latest scientific content & concepts.

Concept: Data analysis


There is controversy on the proposed benefits of publishing mortality rates for individual surgeons. In some procedures, analysis at the level of an individual surgeon may lack statistical power. The aim was to determine the likelihood that variation in surgeon performance will be detected using published outcome data.

Concepts: Type I and type II errors, Data analysis


BACKGROUND: Severe eczema in young children is associated with an increased risk of developing asthma and rhino-conjunctivitis. In the general population, however, most cases of eczema are mild to moderate. In an unselected cohort, we studied the risk of current asthma and the co-existence of allergy-related diseases at 6 years of age among children with and without eczema at 2 years of age. METHODS: Questionnaires assessing various environmental exposures and health variables were administered at 2 years of age. An identical health questionnaire was completed at 6 years of age. The clinical investigation of a random subsample ascertained eczema diagnoses, and missing data were handled by multiple imputation analyses. RESULTS: The estimate for the association between eczema at 2 years and current asthma at 6 years was OR=1.80 (95 % CI 1.10-2.96). Four of ten children with eczema at 6 years had the onset of eczema after the age of 2 years, but the co-existence of different allergy-related diseases at 6 years was higher among those with the onset of eczema before 2 years of age. CONCLUSIONS: Although most cases of eczema in the general population were mild to moderate, early eczema was associated with an increased risk of developing childhood asthma. These findings support the hypothesis of an atopic march in the general population.Trial registrationThe Prevention of Allergy among Children in Trondheim study has been identified as ISRCTN28090297 in the international Current Controlled Trials database.

Concepts: Data analysis, Epidemiology, Allergen, Expectation-maximization algorithm, Clinical trial, Asthma, Allergy, Eczema


BACKGROUND: Experimental datasets are becoming larger and increasingly complex, spanning different data domains, thereby expanding the requirements for respective tool support for their analysis. Networks provide a basis for the integration, analysis and visualization of multi-omics experimental datasets. RESULTS: Here we present VANTED (version 2), a framework for systems biology applications, which comprises a comprehensive set of seven main tasks. These range from network reconstruction, data visualization, integration of various data types, network simulation to data exploration combined with a manifold support of systems biology standards for visualization and data exchange. The offered set of functionalities is instantiated by combining several tasks in order to enable users to view and explore a comprehensive dataset from different perspectives. We describe the system as well as an exemplary workflow. CONCLUSIONS: VANTED is a stand-alone framework which supports scientists during the data analysis and interpretation phase. It is available as a Java open source tool from

Concepts: Real analysis, Statistics, Data mining, Mathematics, Open source, Data analysis, Data set, Data


BACKGROUND: Treatment burden can be defined as the self-care practices that patients with chronic illness must perform to respond to the requirements of their healthcare providers, as well as the impact that these practices have on patient functioning and well being. Increasing levels of treatment burden may lead to suboptimal adherence and negative outcomes. Systematic review of the qualitative literature is a useful method for exploring the patient experience of care, in this case the experience of treatment burden. There is no consensus on methods for qualitative systematic review. This paper describes the methodology used for qualitative systematic reviews of the treatment burdens identified in three different common chronic conditions, using stroke as our exemplar. METHODS: Qualitative studies in peer reviewed journals seeking to understand the patient experience of stroke management were sought. Limitations of English language and year of publication 2000 onwards were set. An exhaustive search strategy was employed, consisting of a scoping search, database searches (Scopus, CINAHL, Embase, Medline & PsycINFO) and reference, footnote and citation searching. Papers were screened, data extracted, quality appraised and analysed by two individuals, with a third party for disagreements. Data analysis was carried out using a coding framework underpinned by Normalization Process Theory (NPT). RESULTS: A total of 4364 papers were identified, 54 were included in the review. Of these, 51 (94%) were retrieved from our database search. Methodological issues included: creating an appropriate search strategy; investigating a topic not previously conceptualised; sorting through irrelevant data within papers; the quality appraisal of qualitative research; and the use of NPT as a novel method of data analysis, shown to be a useful method for the purposes of this review. CONCLUSION: The creation of our search strategy may be of particular interest to other researchers carrying out synthesis of qualitative studies. Importantly, the successful use of NPT to inform a coding frame for data analysis involving qualitative data that describes processes relating to self management highlights the potential of a new method for analyses of qualitative data within systematic reviews.

Concepts: Patient, Methodology, Peer review, Normalization Process Theory, Medicine, Data analysis, Qualitative research, Scientific method


Hierarchical classification (HC) stratifies and classifies data from broad classes into more specific classes. Unlike commonly used data classification strategies, this enables the probabilistic prediction of unknown classes at different levels, minimizing the burden of incomplete databases. Despite these advantages, its translational application in biomedical sciences has been limited. We describe and demonstrate the implementation of a HC approach for “omics-driven” classification of 15 bacterial species at various taxonomic levels achieving 90-100% accuracy, and 9 cancer types into morphological types and 35 subtypes with 99% and 76% accuracy, respectively. Unknown bacterial species were probabilistically assigned with 100% accuracy to their respective genus or family using mass spectra (nā€‰=ā€‰284). Cancer types were predicted by mRNA data (nā€‰=ā€‰1960) for most subtypes with 95-100% accuracy. This has high relevance in clinical practice where complete datasets are difficult to compile with the continuous evolution of diseases and emergence of new strains, yet prediction of unknown classes, such as bacterial species, at upper hierarchy levels may be sufficient to initiate antimicrobial therapy. The algorithms presented here can be directly translated into clinical-use with any quantitative data, and have broad application potential, from unlabeled sample identification, to hierarchical feature selection, and discovery of new taxonomic variants.

Concepts: Archaea, Data analysis, Biological classification, Organism, Species, Hierarchy, Scientific method, Taxonomic rank


Many PhD programs incorporate boot camps and summer bridge programs to accelerate the development of doctoral students' research skills and acculturation into their respective disciplines. These brief, high-intensity experiences span no more than several weeks and are typically designed to expose graduate students to data analysis techniques, to develop scientific writing skills, and to better embed incoming students into the scholarly community. However, there is no previous study that directly measures the outcomes of PhD students who participate in such programs and compares them to the outcomes of students who did not participate. Likewise, no previous study has used a longitudinal design to assess these outcomes over time. Here we show that participation in such programs is not associated with detectable benefits related to skill development, socialization into the academic community, or scholarly productivity for students in our sample. Analyzing data from 294 PhD students in the life sciences from 53 US institutions, we found no statistically significant differences in outcomes between participants and nonparticipants across 115 variables. These results stand in contrast to prior studies presenting boot camps as effective interventions based on participant satisfaction and perceived value. Many universities and government agencies (e.g., National Institutes of Health and National Science Foundation) invest substantial resources in boot camp and summer bridge activities in the hopes of better supporting scientific workforce development. Our findings do not reveal any measurable benefits to students, indicating that an allocation of limited resources to alternative strategies with stronger empirical foundations warrants consideration.

Concepts: Postgraduate education, Scientific method, Science, Participation, Data analysis, Academia, University, Doctorate


Cluster analysis is aimed at classifying elements into categories on the basis of their similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern recognition. We propose an approach based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. This idea forms the basis of a clustering procedure in which the number of clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and clusters are recognized regardless of their shape and of the dimensionality of the space in which they are embedded. We demonstrate the power of the algorithm on several test cases.

Concepts: Mathematics, Uranium, Space exploration, Data analysis, Cluster analysis, Pattern recognition, Bioinformatics, Machine learning


Parkland Hospital’s Ruben Amarasingham built a model to predict patients at high risk for readmission and now leads efforts to extend the benefits of health information to the nation’s most vulnerable.

Concepts: University of Texas Southwestern Medical Center, Information theory, Hospital, The Nation, Parkland Memorial Hospital, Data analysis, Data, Health care


Many digital imaging devices operate by successive photon-to-electron, electron-to-voltage, and voltage-to-digit conversions. These processes are subject to various signal-dependent errors, which are typically modeled as Poisson-Gaussian noise. The removal of such noise can be effected indirectly by applying a variance-stabilizing transformation (VST) to the noisy data, denoising the stabilized data with a Gaussian denoising algorithm, and finally applying an inverse VST to the denoised data. The generalized Anscombe transformation (GAT) is often used for variance stabilization, but its unbiased inverse transformation has not been rigorously studied in the past. We introduce the exact unbiased inverse of the GAT and show that it plays an integral part in ensuring accurate denoising results. We demonstrate that this exact inverse leads to state-of-the-art results without any notable increase in the computational complexity compared to the other inverses. We also show that this inverse is optimal in the sense that it can be interpreted as a maximum likelihood inverse. Moreover, we thoroughly analyze the behavior of the proposed inverse, which also enables us to derive a closed-form approximation for it. This paper generalizes our work on the exact unbiased inverse of the Anscombe transformation, which we have presented earlier for the removal of pure Poisson noise.

Concepts: Variance-stabilizing transformation, Maximum likelihood, Data analysis, Anscombe transform, Inverse function, Computational complexity theory, Normal distribution, Poisson distribution


To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

Concepts: Data analysis, Decision theory, Orphan drug