SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Database

221

Sole-source business models for genetic testing can create private databases containing information vital to interpreting the clinical significance of human genetic variations. But incomplete access to those databases threatens to impede the clinical interpretation of genomic medicine. National health systems and insurers, regulators, researchers, providers and patients all have a strong interest in ensuring broad access to information about the clinical significance of variants discovered through genetic testing. They can create incentives for sharing data and interpretive algorithms in several ways, including: promoting voluntary sharing; requiring laboratories to share as a condition of payment for or regulatory approval of laboratory services; establishing - and compelling participation in - resources that capture the information needed to interpret the data independent of company policies; and paying for sharing and interpretation in addition to paying for the test itself. US policies have failed to address the data-sharing issue. The entry of new and established firms into the European genetic testing market presents an opportunity to correct this failure.European Journal of Human Genetics advance online publication, 14 November 2012; doi:10.1038/ejhg.2012.217.

Concepts: Genetics, Statistics, Human genome, Database, Interpretation, Medical genetics, Language interpretation, Human evolutionary genetics

190

When we look at the rapid growth of scientific databases on the Internet in the past decade, we tend to take the accessibility and provenance of the data for granted. As we see a future of increased database integration, the licensing of the data may be a hurdle that hampers progress and usability. We have formulated four rules for licensing data for open drug discovery, which we propose as a starting point for consideration by databases and for their ultimate adoption. This work could also be extended to the computational models derived from such data. We suggest that scientists in the future will need to consider data licensing before they embark upon re-using such content in databases they construct themselves.

Concepts: Time, Mathematics, Database, Future, Science, Past, Need, Accessibility

183

There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF’s Integrated Publishing Toolkit).

Concepts: DNA, Molecular biology, Biology, Database, Relational database, Microsoft, SQL, Relational model

173

The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data.

Concepts: Bioinformatics, Database, Computer program, C, Application programming interface, Graphical user interface, Computer software, Application software

173

Bicycle traumata are very common and especially neurologic complications lead to disability and death in all stages of the life. This review assembles the most recent findings concerning research in the field of bicycle traumata combined with the factor of bicycle helmet use. The area of bicycle trauma research is by nature multidisciplinary and relevant not only for physicians but also for experts with educational, engineering, judicial, rehabilitative or public health functions. Due to this plurality of global publications and special subjects, short time reviews help to detect recent research directions and provide also information from neighbour disciplines for researchers. It can be stated that to date, that although a huge amount of research has been conducted in this area more studies are needed to evaluate and improve special conditions and needs in different regions, ages, nationalities and to create successful prevention programs of severe head and face injuries while cycling.Focus was explicit the bicycle helmet use, wherefore sledding, ski and snowboard studies were excluded and only one study concerning electric bicycles remained due to similar motion structures within this review. The considered studies were all published between January 2010 and August 2011 and were identified via the online databases Medline PubMed and ISI Web of Science.

Concepts: Database, Academic publishing, Research, Cycling, Publishing, Publication, Bicycle, Bicycle helmet

161

Artificial neural network (ANN)-based bone scan index (BSI), a marker of the amount of bone metastasis, has been shown to enhance diagnostic accuracy and reproducibility but is potentially affected by training databases. The aims of this study were to revise the software using a large number of Japanese databases and to validate its diagnostic accuracy compared with the original Swedish training database.

Concepts: Database, Artificial intelligence, Neural network, Artificial neural network, Computer software, Application software, Connectionism

154

Alveolar echinococcosis (AE) is an endemic zoonosis in France due to the cestode Echinococcus multilocularis. The French National Reference Centre for Alveolar Echinococcosis (CNR-EA), connected to the FrancEchino network, is responsible for recording all AE cases diagnosed in France. Administrative, epidemiological and medical information on the French AE cases may currently be considered exhaustive only on the diagnosis time. To constitute a reference data set, an information system (IS) was developed thanks to a relational database management system (MySQL language). The current data set will evolve towards a dynamic surveillance system, including follow-up data (e.g. imaging, serology) and will be connected to environmental and parasitological data relative to E. multilocularis to better understand the pathogen transmission pathway. A particularly important goal is the possible interoperability of the IS with similar European and other databases abroad; this new IS could play a supporting role in the creation of new AE registries.

Concepts: Database, Cestoda, Databases, SQL, Echinococcus multilocularis, Database management system, Relational database management system, Database model

150

Whole genome sequencing has become one of the routine methods in molecular epidemiological practice. In this study, we present BacWGSTdb (http://bacdb.org/BacWGSTdb), a bacterial whole genome sequence typing database which is designed for clinicians, clinical microbiologists and hospital epidemiologists. This database borrows the population structure from the current multi-locus sequence typing (MLST) scheme and adopts a hierarchical data structure: species, clonal complex and isolates. When users upload the pre-assembled genome sequences to BacWGSTdb, it offers the functionality of bacterial genotyping at both traditional MLST and whole-genome levels. More importantly, users are told which isolates in the public database are phylogenetically close to the query isolate, along with their clinical information such as host, isolation source, disease, collection time and geographical location. In this way, BacWGSTdb offers a rapid and convenient platform for worldwide users to address a variety of clinical microbiological issues such as source tracking bacterial pathogens.

Concepts: DNA, Bacteria, Molecular biology, Database, Microbiology, Virus, Genomics, Biotechnology

138

The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly.

Concepts: Database, Relational database, Relational algebra, Databases, SQL, Relational model, Database theory, Relation

137

Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time.

Concepts: Database, Star Trek: The Next Generation, Star Trek: First Contact, Worf, Array data structure, Hash table, Data structures