SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Computer data

892

Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

Concepts: Sample size, Science, Bar chart, Computer data

8

Background:The risk of cancer with hypercalcaemia in primary care is unknown.Methods:This was a cohort study using calcium results in patients aged ⩾40 years in a primary care electronic data set. Diagnoses of cancer in the following year were identified.Results:Participants (54 267) had calcium results: 1674 (3%) were ⩾2.6 mmol l(-1). Hypercalcaemia was strongly associated with cancer, especially in males: OR 2.92, 95% CI 2.17-3.93, P=<0.001; positive predictive value (PPV) 11.5%; females: OR 1.86, 95% CI 1.39-2.50, P<0.001: PPV 4.1%.Conclusions:Hypercalcaemia is strongly associated with cancer in primary care, with men at most risk, despite hypercalcaemia being more common in women.British Journal of Cancer advance online publication, 5 August 2014; doi:10.1038/bjc.2014.433 www.bjcancer.com.

Concepts: Cohort study, Vitamin D, Epidemiology, Male, Following, Gender, Primary care, Computer data

4

Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called “deep learning”, which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google’s TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records.

Concepts: Statistics, Data, Artificial intelligence, Machine learning, Neural network, Artificial neural network, Unsupervised learning, Computer data

1

BACKGROUND: Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. RESULTS: We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. CONCLUSIONS: search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.

Concepts: Computer program, Search engine optimization, Internet, Searching, Computer software, Application software, Web service, Computer data

0

Here, we briefly describe the real-time fMRI data that is provided for testing the functionality of the open-source Python/Matlab framework for neurofeedback, termed Open NeuroFeedback Training (OpenNFT, Koush et al. [1]). The data set contains real-time fMRI runs from three anonymized participants (i.e., one neurofeedback run per participant), their structural scans and pre-selected ROIs/masks/weights. The data allows for simulating the neurofeedback experiment without an MR scanner, exploring the software functionality, and measuring data processing times on the local hardware. In accordance with the descriptions in our main article, we provide data of (1) periodically displayed (intermittent) activation-based feedback; (2) intermittent effective connectivity feedback, based on dynamic causal modeling (DCM) estimations; and (3) continuous classification-based feedback based on support-vector-machine (SVM) estimations. The data is available on our public GitHub repository: https://github.com/OpenNFT/OpenNFT_Demo/releases.

Concepts: Statistics, Participation, E-participation, Data, Electroencephalography, Neurofeedback, Computer data

0

Benefiting from global rank constraints, the lowrank representation (LRR) method has been shown to be an effective solution to subspace learning. However, the global mechanism also means that the LRR model is not suitable for handling large-scale data or dynamic data. For large-scale data, the LRR method suffers from high time complexity, and for dynamic data, it has to recompute a complex rank minimization for the entire data set whenever new samples are dynamically added, making it prohibitively expensive. Existing attempts to online LRR either take a stochastic approach or build the representation purely based on a small sample set and treat new input as out-of-sample data. The former often requires multiple runs for good performance and thus takes longer time to run, and the latter formulates online LRR as an out-ofsample classification problem and is less robust to noise. In this paper, a novel online low-rank representation subspace learning method is proposed for both large-scale and dynamic data. The proposed algorithm is composed of two stages: static learning and dynamic updating. In the first stage, the subspace structure is learned from a small number of data samples. In the second stage, the intrinsic principal components of the entire data set are computed incrementally by utilizing the learned subspace structure, and the low-rank representation matrix can also be incrementally solved by an efficient online singular value decomposition (SVD) algorithm. The time complexity is reduced dramatically for large-scale data, and repeated computation is avoided for dynamic problems. We further perform theoretical analysis comparing the proposed online algorithm with the batch LRR method. Finally, experimental results on typical tasks of subspace recovery and subspace clustering show that the proposed algorithm performs comparably or better than batch methods including the batch LRR, and significantly outperforms state-of-the-art online methods.

Concepts: Principal component analysis, Machine learning, Computer, Singular value decomposition, Computational complexity theory, Matrix, Singular value, Computer data

0

Results of planetary mapping are often shared openly for use in scientific research and mission planning. In its raw format, however, the data is not accessible to non-experts due to the difficulty in grasping the context and the intricate acquisition process. We present work on tailoring and integration of multiple data processing and visualization methods to interactively contextualize geospatial surface data of celestial bodies for use in science communication. As our approach handles dynamic data sources, streamed from online repositories, we are significantly shortening the time between discovery and dissemination of data and results. We describe the image acquisition pipeline, the pre-processing steps to derive a 2.5D terrain, and a chunked level-of-detail, out-of-core rendering approach to enable interactive exploration of global maps and high-resolution digital terrain models. The results are demonstrated for three different celestial bodies. The first case addresses high-resolution map data on the surface of Mars. A second case is showing dynamic processes, such as concurrent weather conditions on Earth that require temporal datasets. As a final example we use data from the New Horizons spacecraft which acquired images during a single flyby of Pluto. We visualize the acquisition process as well as the resulting surface data. Our work has been implemented in the OpenSpace software [8], which enables interactive presentations in a range of environments such as immersive dome theaters, interactive touch tables, and virtual reality headsets.

Concepts: Data, Mars, Computer graphics, Virtual reality, Topography, Pluto, Computer data, New Horizons

0

PET using O-(2-[(18)F]fluoroethyl)-L-tyrosine ((18)F-FET) is an established method for brain tumour diagnostics, but data processing varies in different centres. This study analyses the influence of methodological differences between two centres for tumour characterization with (18)F-FET PET using the same PET scanner. Methodological differences between centres A and B in the evaluation of (18)F-FET PET data were identified for (1) framing of PET dynamic data, (2) data reconstruction, (3) cut-off values for tumour delineation to determine tumour-to-brain ratios (TBR) and tumour volume (Tvol) and (4) ROI definition to determine time activity curves (TACs) in the tumour. Based on the (18)F-FET PET data of 40 patients with untreated cerebral gliomas (20 WHO grade II, 10 WHO grade III, 10 WHO grade IV), the effect of different data processing in the two centres on TBRmean, TBRmax, Tvol, time-to-peak (TTP) and slope of the TAC was compared. Further, the effect on tumour grading was evaluated by ROC analysis.

Concepts: Cancer, Oncology, Grade, Brain tumor, Astrocytoma, Methodology, Computer data

0

Big data is a term used for any collection of datasets whose size and complexity exceeds the capabilities of traditional data processing applications. Big data repositories, including those for molecular, clinical, and epidemiology data, offer unprecedented research opportunities to help guide scientific advancement. Advantages of big data can include ease and low cost of collection, ability to approach prospectively and retrospectively, utility for hypothesis generation in addition to hypothesis testing, and the promise of precision medicine. Limitations include cost and difficulty of storing and processing data; need for advanced techniques for formatting and analysis; and concerns about accuracy, reliability, and security. We discuss sources of big data and tools for its analysis to help inform the treatment and management of dermatologic diseases.

Concepts: Scientific method, Medicine, Epidemiology, Statistics, Data, Accuracy and precision, Dermatology, Computer data

0

To improve the practical use of the short forms (SFs) developed from the item bank, we compared the measurement precision of the 4- and 8-item SFs generated from a motor item bank composed of the Functional Independence Measure (FIM™) and the Minimum Data Set (MDS).

Concepts: United States, Measurement, Data, Psychometrics, Accuracy and precision, Human Development Index, Computer data