SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Relational database

183

There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF’s Integrated Publishing Toolkit).

Concepts: DNA, Molecular biology, Biology, Database, Relational database, Microsoft, SQL, Relational model

150

Predictive modeling is fundamental for extracting value from large clinical data sets, or “big clinical data,” advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise.

Concepts: Scientific method, Medicine, Clinical trial, Value, Computer, Morality, Predictive analytics, Relational database

138

The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly.

Concepts: Database, Relational database, Relational algebra, Databases, SQL, Relational model, Database theory, Relation

28

With the advancement of pharmaceutical development, drug interactions have become increasingly complex. As a result, a computer-based drug interaction search system is required to organize the whole of drug interaction data. To overcome problems faced with the existing systems, we developed a drug interaction search system using a hash table, which offers higher processing speeds and easier maintenance operations compared with relational databases (RDB). In order to compare the performance of our system and MySQL RDB in terms of search speed, drug interaction searches were repeated for all 45 possible combinations of two out of a group of 10 drugs for two cases: 5,604 and 56,040 drug interaction data. As the principal result, our system was able to process the search approximately 19 times faster than the system using the MySQL RDB. Our system also has several other merits such as that drug interaction data can be created in comma-separated value (CSV) format, thereby facilitating data maintenance. Although our system uses the well-known method of a hash table, it is expected to resolve problems common to existing systems and to be an effective system that enables the safe management of drugs.

Concepts: Pharmacology, Drugs, Pharmaceutical drug, Relational database, Searching, SQL, Relational model, Relation

28

Purpose: Within supervised rehabilitation programs, Lent and Lopez (2002) proposed that clients and therapists develop a “tripartite” network of efficacy beliefs, comprising their confidence in their own ability, their confidence in the other person’s ability, and their estimation of the other person’s confidence in them. To date, researchers have yet to explore the potential relational outcomes associated with this model in rehabilitation contexts. Method: In Study 1, we recruited 170 exercise clients (Mage = 63.73, SD = 6.46) who were enrolled in a one-to-one aerobic exercise program with a therapist as a result of a lower-limb musculoskeletal disorder. Clients reported their tripartite efficacy beliefs and perceptions about the quality of their relationship with their therapist, and respective therapists rated each client’s engagement in his or her exercise program. In Study 2, we recruited 68 separate exercise clients (Mage = 65.93, SD = 5.80) along with their therapists (n = 68, Mage = 31.89, SD = 4.79) from the same program, to examine whether individuals' efficacy perceptions were related to their own and/or the other person’s relationship quality perceptions. Results: In Study 1, each of the tripartite efficacy constructs displayed positive direct effects with respect to clients' relationship quality appraisals, as well as indirect effects in relation to program engagement. Actor-partner interdependence modeling in Study 2 demonstrated that clients and therapists reported more adaptive relationship perceptions when they themselves held strong tripartite efficacy beliefs (i.e., actor effects), and that clients viewed their relationship in a more positive light when their therapist was highly confident in the client’s ability (i.e., partner effect). Conclusion: These findings underscore the potential utility of the tripartite efficacy framework in relation to motivational and relational processes within supervised exercise programs. (PsycINFO Database Record © 2012 APA, all rights reserved).

Concepts: Effectiveness, Strength training, All rights reserved, Aerobic exercise, Tuple, Psychoanalysis, Relational database, Rehabilitation counseling

24

Modern approaches to biomedical research and diagnostics targeted towards precision medicine are generating ‘big data’ across a range of high-throughput experimental and analytical platforms. Integrative analysis of this rich clinical, pathological, molecular and imaging data represents one of the greatest bottlenecks in biomarker discovery research in cancer and other diseases. Following on from the publication of our successful framework for multimodal data amalgamation and integrative analysis, Pathology Integromics in Cancer (PICan), this article will explore the essential elements of assembling an integromics framework from a more detailed perspective. PICan, built around a relational database storing curated multimodal data, is the research tool sitting at the heart of our interdisciplinary efforts to streamline biomarker discovery and validation. While recognizing that every institution has a unique set of priorities and challenges, we will use our experiences with PICan as a case study and starting point, rationalizing the design choices we made within the context of our local infrastructure and specific needs, but also highlighting alternative approaches that may better suit other programmes of research and discovery. Along the way, we stress that integromics is not just a set of tools, but rather a cohesive paradigm for how modern bioinformatics can be enhanced. Successful implementation of an integromics framework is a collaborative team effort that is built with an eye to the future and greatly accelerates the processes of biomarker discovery, validation and translation into clinical practice.

Concepts: Scientific method, Pathology, Academic publishing, Research, Medical research, Relational database, Research and development, Relational model

18

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

Concepts: Database, Relational database, Data management, Relational algebra, Databases, SQL, Relational model, Relation

14

Ecological research relies increasingly on the use of previously collected data. Use of existing datasets allows questions to be addressed more quickly, more generally, and at larger scales than would otherwise be possible. As a result of large-scale data collection efforts, and an increasing emphasis on data publication by journals and funding agencies, a large and ever-increasing amount of ecological data is now publicly available via the internet. Most ecological datasets do not adhere to any agreed-upon standards in format, data structure or method of access. Some may be broken up across multiple files, stored in compressed archives, and violate basic principles of data structure. As a result acquiring and utilizing available datasets can be a time consuming and error prone process. The EcoData Retriever is an extensible software framework which automates the tasks of discovering, downloading, and reformatting ecological data files for storage in a local data file or relational database. The automation of these tasks saves significant time for researchers and substantially reduces the likelihood of errors resulting from manual data manipulation and unfamiliarity with the complexities of individual datasets.

Concepts: Database, Error, Subroutine, Relational database, Files, Data element, SQL, Relational model

11

This study investigated the relations between television exposure during the preschool years and the development of executive function (EF). Data were gathered from 107 parents of preschoolers who provided information on children’s television viewing, background television exposure, exposure to specific televised content, and the age at which children began watching television. Preschoolers' EF was assessed via one-on-one interviews. We found that several indicators of television exposure were significantly related to EF. These findings suggest that EF may be an important construct for continued research on the effects of media on young children. (PsycINFO Database Record © 2014 APA, all rights reserved).

Concepts: All rights reserved, Tuple, Relational database, Table

10

Grit has been presented as a higher order personality trait that is highly predictive of both success and performance and distinct from other traits such as conscientiousness. This paper provides a meta-analytic review of the grit literature with a particular focus on the structure of grit and the relation between grit and performance, retention, conscientiousness, cognitive ability, and demographic variables. Our results based on 584 effect sizes from 88 independent samples representing 66,807 individuals indicate that the higher order structure of grit is not confirmed, that grit is only moderately correlated with performance and retention, and that grit is very strongly correlated with conscientiousness. We also find that the perseverance of effort facet has significantly stronger criterion validities than the consistency of interest facet and that perseverance of effort explains variance in academic performance even after controlling for conscientiousness. In aggregate our results suggest that interventions designed to enhance grit may only have weak effects on performance and success, that the construct validity of grit is in question, and that the primary utility of the grit construct may lie in the perseverance facet. (PsycINFO Database Record

Concepts: Psychology, Database, Personality psychology, Psychometrics, Trait theory, Relational database, 16 Personality Factors, Conscientiousness