Concept: Business intelligence
Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.
The first research conducted on violence against women in the university context in Spain reveals that 62% of the students know of or have experienced situations of this kind within the university institutions, but only 13% identify these situations in the first place. Two main interrelated aspects arise from the data analysis: not identifying and acknowledging violent situations, and the lack of reporting them. Policies and actions developed by Spanish universities need to be grounded in two goals: intransigence toward any kind of violence against women, and bystander intervention, support, and solidarity with the victims and with the people supporting the victims.
Protecting Your Patients' Interests in the Era of Big Data, Artificial Intelligence, and Predictive Analytics
- Journal of the American College of Radiology : JACR
- Published almost 2 years ago
The Hippocratic oath and the Belmont report articulate foundational principles for how physicians interact with patients and research subjects. The increasing use of big data and artificial intelligence techniques demands a re-examination of these principles in light of the potential issues surrounding privacy, confidentiality, data ownership, informed consent, epistemology, and inequities. Patients have strong opinions about these issues. Radiologists have a fiduciary responsibility to protect the interest of their patients. As such, the community of radiology leaders, ethicists, and informaticists must have a conversation about the appropriate way to deal with these issues and help lead the way in developing capabilities in the most just, ethical manner possible.
A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data.
The current rapid growth of Internet of Things (IoT) in various commercial and non-commercial sectors has led to the deposition of large-scale IoT data, of which the time-critical analytic and clustering of knowledge granules represent highly thought-provoking application possibilities. The objective of the present work is to inspect the structural analysis and clustering of complex knowledge granules in an IoT big-data environment. In this work, we propose a knowledge granule analytic and clustering (KGAC) framework that explores and assembles knowledge granules from IoT big-data arrays for a business intelligence (BI) application. Our work implements neuro-fuzzy analytic architecture rather than a standard fuzzified approach to discover the complex knowledge granules. Furthermore, we implement an enhanced knowledge granule clustering (e-KGC) mechanism that is more elastic than previous techniques when assembling the tactical and explicit complex knowledge granules from IoT big-data arrays. The analysis and discussion presented here show that the proposed framework and mechanism can be implemented to extract knowledge granules from an IoT big-data array in such a way as to present knowledge of strategic value to executives and enable knowledge users to perform further BI actions.
For the past decade, Maternal Mortality Reports, published in the United Kingdom every three years, have consistently raised concerns about maternal observations in maternity care. The reports identify that observations are not being done, not being completed fully, are not recorded on Early Warning Score systems, and/or are not escalated appropriately. This has resulted in delays in referral, intervention and increases the risk of maternal morbidity or mortality. However there has been little exploration of the possible reasons for non-completion of maternal observations.
Research is of little use if its results are not effectively communicated. Data visualised in tables (and graphs) are key components in any scientific report, but their design leaves much to be desired. This article focuses on table design, following two general principles: clear vision and clear understanding. Clear vision is achieved by maximising the signal to noise ratio. In a table, the signal is the data in the form of numbers, and the noise is the support structure necessary to interpret the numbers. Clear understanding is achieved when the story in the data is told effectively, through organisation of the data and use of text. These principles are illustrated by original and improved tables from recent publications. Two special cases are discussed separately: tables produced by the pharmaceutical industry (in clinical study reports and reports to data safety monitoring boards), and study flow diagrams as proposed by the Consolidated Standards of Reporting Trials and Preferred Reporting Items for Systematic Reviews and Meta-Analyses initiatives.
Phenotyping is a critical component of plant research. Accurate and precise trait collection, when integrated with genetic tools, can greatly accelerate the rate of genetic gain in crop improvement. However, efficient and automatic phenotyping of traits across large populations is a challenge; which is further exacerbated by the necessity of sampling multiple environments and growing replicated trials. A promising approach is to leverage current advances in imaging technology, data analytics and machine learning to enable automated and fast phenotyping and subsequent decision support. In this context, the workflow for phenotyping (image capture → data storage and curation → trait extraction → machine learning/classification → models/apps for decision support) has to be carefully designed and efficiently executed to minimize resource usage and maximize utility. We illustrate such an end-to-end phenotyping workflow for the case of plant stress severity phenotyping in soybean, with a specific focus on the rapid and automatic assessment of iron deficiency chlorosis (IDC) severity on thousands of field plots. We showcase this analytics framework by extracting IDC features from a set of ~4500 unique canopies representing a diverse germplasm base that have different levels of IDC, and subsequently training a variety of classification models to predict plant stress severity. The best classifier is then deployed as a smartphone app for rapid and real time severity rating in the field.
To design, develop and prototype clinical dashboards to integrate high-frequency health and wellness data streams using interactive and real-time data visualisation and analytics modalities.
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.