Concept: Computer data
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Background:The risk of cancer with hypercalcaemia in primary care is unknown.Methods:This was a cohort study using calcium results in patients aged ⩾40 years in a primary care electronic data set. Diagnoses of cancer in the following year were identified.Results:Participants (54 267) had calcium results: 1674 (3%) were ⩾2.6 mmol l(-1). Hypercalcaemia was strongly associated with cancer, especially in males: OR 2.92, 95% CI 2.17-3.93, P=<0.001; positive predictive value (PPV) 11.5%; females: OR 1.86, 95% CI 1.39-2.50, P<0.001: PPV 4.1%.Conclusions:Hypercalcaemia is strongly associated with cancer in primary care, with men at most risk, despite hypercalcaemia being more common in women.British Journal of Cancer advance online publication, 5 August 2014; doi:10.1038/bjc.2014.433 www.bjcancer.com.
Physical activity is widely known to be one of the key elements of a healthy life. The many benefits of physical activity described in the medical literature include weight loss and reductions in the risk factors for chronic diseases. With the recent advances in wearable devices, such as smartwatches or physical activity wristbands, motion tracking sensors are becoming pervasive, which has led to an impressive growth in the amount of physical activity data available and an increasing interest in recognizing which specific activity a user is performing. Moreover, big data and machine learning are now cross-fertilizing each other in an approach called “deep learning”, which consists of massive artificial neural networks able to detect complicated patterns from enormous amounts of input data to learn classification models. This work compares various state-of-the-art classification techniques for automatic cross-person activity recognition under different scenarios that vary widely in how much information is available for analysis. We have incorporated deep learning by using Google’s TensorFlow framework. The data used in this study were acquired from PAMAP2 (Physical Activity Monitoring in the Ageing Population), a publicly available dataset containing physical activity data. To perform cross-person prediction, we used the leave-one-subject-out (LOSO) cross-validation technique. When working with large training sets, the best classifiers obtain very high average accuracies (e.g., 96% using extra randomized trees). However, when the data volume is drastically reduced (where available data are only 0.001% of the continuous data), deep neural networks performed the best, achieving 60% in overall prediction accuracy. We found that even when working with only approximately 22.67% of the full dataset, we can statistically obtain the same results as when working with the full dataset. This finding enables the design of more energy-efficient devices and facilitates cold starts and big data processing of physical activity records.
BACKGROUND: Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. RESULTS: We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. CONCLUSIONS: search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.
The use of Personal Mobile Terrestrial System (PMTS) has increased considerably for mobile mapping applications because these systems offer dynamic data acquisition with ground perspective in places where the use of wheeled platforms is unfeasible, such as forests and indoor buildings. PMTS has become more popular with emerging technologies, such as miniaturized navigation sensors and off-the-shelf omnidirectional cameras, which enable low-cost mobile mapping approaches. However, most of these sensors have not been developed for high-accuracy metric purposes and therefore require rigorous methods of data acquisition and data processing to obtain satisfactory results for some mapping applications. To contribute to the development of light, low-cost PMTS and potential applications of these off-the-shelf sensors for forest mapping, this paper presents a low-cost PMTS approach comprising an omnidirectional camera with off-the-shelf navigation systems and its evaluation in a forest environment. Experimental assessments showed that the integrated sensor orientation approach using navigation data as the initial information can increase the trajectory accuracy, especially in covered areas. The point cloud generated with the PMTS data had accuracy consistent with the Ground Sample Distance (GSD) range of omnidirectional images (3.5-7 cm). These results are consistent with those obtained for other PMTS approaches.
Power lines are extending to complex environments (e.g., lakes and forests), and the distribution of power lines in a tower is becoming complicated (e.g., multi-loop and multi-bundle). Additionally, power line inspection is becoming heavier and more difficult. Advanced LiDAR technology is increasingly being used to solve these difficulties. Based on precise cable inspection robot (CIR) LiDAR data and the distinctive position and orientation system (POS) data, we propose a novel methodology to detect inspection objects surrounding power lines. The proposed method mainly includes four steps: firstly, the original point cloud is divided into single-span data as a processing unit; secondly, the optimal elevation threshold is constructed to remove ground points without the existing filtering algorithm, improving data processing efficiency and extraction accuracy; thirdly, a single power line and its surrounding data can be respectively extracted by a structured partition based on a POS data (SPPD) algorithm from “layer” to “block” according to power line distribution; finally, a partition recognition method is proposed based on the distribution characteristics of inspection objects, highlighting the feature information and improving the recognition effect. The local neighborhood statistics and the 3D region growing method are used to recognize different inspection objects surrounding power lines in a partition. Three datasets were collected by two CIR LIDAR systems in our study. The experimental results demonstrate that an average 90.6% accuracy and average 98.2% precision at the point cloud level can be achieved. The successful extraction indicates that the proposed method is feasible and promising. Our study can be used to obtain precise dimensions of fittings for modeling, as well as automatic detection and location of security risks, so as to improve the intelligence level of power line inspection.
De-identification is the first step to use these records for data processing or further medical investigations in electronic medical records. Consequently, a reliable automated de-identification system would be of high value.
Due to its wide occurrence in water resources and toxicity, pharmaceuticals and personal care products are becoming an emerging concern throughout the world. Application of residual/waste materials for water remediation can be a good strategy in waste management as well as in waste valorization. Herein, this dataset provides information on biochar application for the removal of emerging contaminant, diclofenac from water matrices. The data presented here is an extension of the research article explaining the mechanisms of adsorption diclofenac on biochars (Lonappan et al., 2017 ). This data article provides general information on the surface features of pine wood and pig manure biochar with the help of SEM and FTIR data. This dataset also provides information on XRD profiles of pine wood and pig manure biochars. In addition, different amounts of biochars were used to study the removal of a fixed concentration of diclofenac and the data is provided with this data set.
Recently released large-scale neuron morphological data has greatly facilitated the research in neuroinformatics. However, the sheer volume and complexity of these data pose significant challenges for efficient and accurate neuron exploration. In this paper, we propose an effective retrieval framework to address these problems, based on frontier techniques of deep learning and binary coding. For the first time, we develop a deep learning based feature representation method for the neuron morphological data, where the 3D neurons are first projected into binary images and then learned features using an unsupervised deep neural network, i.e., stacked convolutional autoencoders (SCAEs). The deep features are subsequently fused with the hand-crafted features for more accurate representation. Considering the exhaustive search is usually very time-consuming in large-scale databases, we employ a novel binary coding method to compress feature vectors into short binary codes. Our framework is validated on a public data set including 58,000 neurons, showing promising retrieval precision and efficiency compared with state-of-the-art methods. In addition, we develop a novel neuron visualization program based on the techniques of augmented reality (AR), which can help users take a deep exploration of neuron morphologies in an interactive and immersive manner.
High throughput sequencing makes it possible to evaluate thousands of genetic markers across genomes and populations. Reduced-representation sequencing approaches, like ddRADseq (double digest restriction site associated DNA sequencing), are frequently applied to screen for genetic variation. In particular in non-model organisms where whole-genome sequencing is not yet feasible, ddRADseq has become popular as it allows genome-wide assessment of variation patterns even in the absence of other genomic resources. However, while many tools are available for the analysis of ddRADseq data, few options exist to simulate ddRADseq data in order to evaluate the accuracy of downstream tools. The available tools either focus on the optimization of ddRAD experiment design or do not provide the information necessary for a detailed evaluation of different ddRAD analysis tools. For this task a ground truth, i.e. the underlying information of all effects in the data set, is required. Therefore, we here present DDRAGE, the ddRAD Dataset Generator, that allows both developers and users to evaluate their ddRAD analysis software. ddRAGE allows the user to adjust many parameters such as coverage and rates of mutations, sequencing errors or allelic dropouts, in order to generate a realistic simulated ddRADseq dataset for given experimental scenarios and organisms. The simulated reads can be easily processed with available analysis software such as STACKS or pyRAD and evaluated against the underlying parameters used to generate the data to gauge the impact of different parameter values used during downstream data processing This article is protected by copyright. All rights reserved.