Concept: Software engineering
Software produced for research, published and otherwise, suffers from a number of common problems that make it difficult or impossible to run outside the original institution or even off the primary developer’s computer. We present ten simple rules to make such software robust enough to be run by anyone, anywhere, and thereby delight your users and collaborators.
The Raven-II is a platform for collaborative research on advances in surgical robotics. Seven Universities have begun research using this platform. The Raven-II system has two three DOF spherical positioning mechanisms capable of attaching interchangeable four DOF instruments. The Raven-II software is based on open standards such as Linux and ROS to maximally facilitate software development. The mechnism is robust enough for repeated experiments and animal surgery experiments, but is not engineered to sufficient safety standards for human use. Mechanisms in place for interaction among the user community and dissemination of results include an electronic forum, an online software SVN repository, and meetings and workshops at major robotics conferences.
Flux balance analysis (FBA) is a widely used computational method for characterizing and engineering intrinsic cellular metabolism. The increasing number of its successful applications and growing popularity are possibly attributable to the availability of specific software tools for FBA. Each tool has its unique features and limitations with respect to operational environment, user-interface and supported analysis algorithms. Presented herein is an in-depth evaluation of currently available FBA applications, focusing mainly on usability, functionality, graphical representation and inter-operability. Overall, most of the applications are able to perform basic features of model creation and FBA simulation. COBRA toolbox, OptFlux and FASIMU are versatile to support advanced in silico algorithms to identify environmental and genetic targets for strain design. SurreyFBA, WEbcoli, Acorn, FAME, GEMSiRV and MetaFluxNet are the distinct tools which provide the user friendly interfaces in model handling. In terms of software architecture, FBA-SimVis and OptFlux have the flexible environments as they enable the plug-in/add-on feature to aid prospective functional extensions. Notably, an increasing trend towards the implementation of more tailored e-services such as central model repository and assistance to collaborative efforts was observed among the web-based applications with the help of advanced web-technologies. Furthermore, most recent applications such as the Model SEED, FAME, MetaFlux and MicrobesFlux have even included several routines to facilitate the reconstruction of genome-scale metabolic models. Finally, a brief discussion on the future directions of FBA applications was made for the benefit of potential tool developers.
With Next Generation Sequencing data being routinely used, evolutionary biology is transforming into a computational science. Thus, researchers have to rely on a growing number of increasingly complex software. All widely used core tools in the field have grown considerably, in terms of the number of features as well as lines of code and consequently, also with respect to software complexity. A topic that has received little attention is the software engineering quality of widely used core analysis tools. Software developers appear to rarely assess the quality of their code, and this can have potential negative consequences for end-users. To this end, we assessed the code quality of 16 highly cited and compute-intensive tools mainly written in C/C ++ (e.g., MrBayes, MAFFT, SweepFinder etc.) and JAVA (BEAST) from the broader area of evolutionary biology that are being routinely used in current data analysis pipelines. Since, the software engineering quality of the tools we analyzed is rather unsatisfying, we provide a list of best practices for improving the quality of existing tools and list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal as well as science policy and, more importantly, funding issues that need to be addressed for improving software engineering quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software engineering quality issues and to emphasize the substantial lack of funding for scientific software development.
The gender gap in science, technology, engineering, and math (STEM) engagement is large and persistent. This gap is significantly larger in technological fields such as computer science and engineering than in math and science. Gender gaps begin early; young girls report less interest and self-efficacy in technology compared with boys in elementary school. In the current study (N=96), we assessed 6-year-old children’s stereotypes about STEM fields and tested an intervention to develop girls' STEM motivation despite these stereotypes. First-grade children held stereotypes that boys were better than girls at robotics and programming but did not hold these stereotypes about math and science. Girls with stronger stereotypes about robotics and programming reported lower interest and self-efficacy in these domains. We experimentally tested whether positive experience with programming robots would lead to greater interest and self-efficacy among girls despite these stereotypes. Children were randomly assigned either to a treatment group that was given experience in programming a robot using a smartphone or to control groups (no activity or other activity). Girls given programming experience reported higher technology interest and self-efficacy compared with girls without this experience and did not exhibit a significant gender gap relative to boys' interest and self-efficacy. These findings show that children’s views mirror current American cultural messages about who excels at computer science and engineering and show the benefit of providing young girls with chances to experience technological activities.
Almost all research work in computational neuroscience involves software. As researchers try to understand ever more complex systems, there is a continual need for software with new capabilities. Because of the wide range of questions being investigated, new software is often developed rapidly by individuals or small groups. In these cases, it can be hard to demonstrate that the software gives the right results. Software developers are often open about the code they produce and willing to share it, but there is little appreciation among potential users of the great diversity of software development practices and end results, and how this affects the suitability of software tools for use in research projects. To help clarify these issues, we have reviewed a range of software tools and asked how the culture and practice of software development affects their validity and trustworthiness. We identified four key questions that can be used to categorize software projects and correlate them with the type of product that results. The first question addresses what is being produced. The other three concern why, how, and by whom the work is done. The answers to these questions show strong correlations with the nature of the software being produced, and its suitability for particular purposes. Based on our findings, we suggest ways in which current software development practice in computational neuroscience can be improved and propose checklists to help developers, reviewers, and scientists to assess the quality of software and whether particular pieces of software are ready for use in research.
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; © taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest” undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
In this commentary, we discuss smartphone apps for psychiatry and the lack of resources to assist clinicians in evaluating the utility, safety, and efficacy of apps. Evaluating an app requires new considerations that are beyond those employed in evaluating a medication or typical clinical intervention. Based on our software engineering, informatics, and clinical knowledge and experiences, we propose an evaluation framework, “ASPECTS,” to spark discussion about apps and aid clinicians in determining whether an app is Actionable, Secure, Professional, Evidence-based, Customizable, and TranSparent. Clinicians who use the ASPECTS guide will be more informed and able to make more thorough evaluations of apps.
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a ‘cultural’ gap between the end user and the developer.