Wikipedia is a collaboratively edited encyclopedia. One of the most popular websites on the Internet, it is known to be a frequently used source of health care information by both professionals and the lay public.
Wikipedia has quickly become one of the most frequently accessed encyclopedic references, despite the ease with which content can be changed and the potential for ‘edit wars’ surrounding controversial topics. Little is known about how this potential for controversy affects the accuracy and stability of information on scientific topics, especially those with associated political controversy. Here we present an analysis of the Wikipedia edit histories for seven scientific articles and show that topics we consider politically but not scientifically “controversial” (such as evolution and global warming) experience more frequent edits with more words changed per day than pages we consider “noncontroversial” (such as the standard model in physics or heliocentrism). For example, over the period we analyzed, the global warming page was edited on average (geometric mean ±SD) 1.9±2.7 times resulting in 110.9±10.3 words changed per day, while the standard model in physics was only edited 0.2±1.4 times resulting in 9.4±5.0 words changed per day. The high rate of change observed in these pages makes it difficult for experts to monitor accuracy and contribute time-consuming corrections, to the possible detriment of scientific accuracy. As our society turns to Wikipedia as a primary source of scientific information, it is vital we read it critically and with the understanding that the content is dynamic and vulnerable to vandalism and other shenanigans.
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
- Database : the journal of biological databases and curation
- Published over 3 years ago
Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59 721 human genes and 73 355 mouse genes have been imported from NCBI and 27 306 human proteins and 16 728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists.In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web.Database URL: https://www.wikidata.org/.
As one of the most commonly read online sources of medical information, Wikipedia is an influential public health platform. Its medical content, community, collaborations and challenges have been evolving since its creation in 2001, and engagement by the medical community is vital for ensuring its accuracy and completeness. Both the encyclopaedia’s internal metrics as well as external assessments of its quality indicate that its articles are highly variable, but improving. Although content can be edited by anyone, medical articles are primarily written by a core group of medical professionals. Diverse collaborative ventures have enhanced medical article quality and reach, and opportunities for partnerships are more available than ever. Nevertheless, Wikipedia’s medical content and community still face significant challenges, and a socioecological model is used to structure specific recommendations. We propose that the medical community should prioritise the accuracy of biomedical information in the world’s most consulted encyclopaedia.
We report the outcomes of BioMed Central’s public consultation on implementing open data-compliant licensing in peer-reviewed open access journals. Respondents (42) to the 2012 consultation were six to one in favor (29 in support; 5 against; 8 abstentions) of changing our authors' default open access copyright license agreement, to introduce the Creative Commons CC0 public domain waiver for data published in BioMed Central’s journals. We summarize the different questions we received in response to the consultation and our responses to them - matters such as citation, plagiarism, patient privacy, and commercial use were raised. In light of the support for open data in our journals we outline our plans to implement, in September 2013, a combined Creative Commons Attribution license for published articles (papers) and Creative Commons CC0 waiver for published data.
We describe the pioneering experience of a Spanish family pursuing the goal of understanding their own personal genetic data to the fullest possible extent using Direct to Consumer (DTC) tests. With full informed consent from the Corpas family, all genotype, exome and metagenome data from members of this family, are publicly available under a public domain Creative Commons 0 (CC0) license waiver. All scientists or companies analysing these data (“the Corpasome”) were invited to return results to the family.
Circulating levels of both seasonal and pandemic influenza require constant surveillance to ensure the health and safety of the population. While up-to-date information is critical, traditional surveillance systems can have data availability lags of up to two weeks. We introduce a novel method of estimating, in near-real time, the level of influenza-like illness (ILI) in the United States (US) by monitoring the rate of particular Wikipedia article views on a daily basis. We calculated the number of times certain influenza- or health-related Wikipedia articles were accessed each day between December 2007 and August 2013 and compared these data to official ILI activity levels provided by the Centers for Disease Control and Prevention (CDC). We developed a Poisson model that accurately estimates the level of ILI activity in the American population, up to two weeks ahead of the CDC, with an absolute average difference between the two estimates of just 0.27% over 294 weeks of data. Wikipedia-derived ILI models performed well through both abnormally high media coverage events (such as during the 2009 H1N1 pandemic) as well as unusually severe influenza seasons (such as the 2012-2013 influenza season). Wikipedia usage accurately estimated the week of peak ILI activity 17% more often than Google Flu Trends data and was often more accurate in its measure of ILI intensity. With further study, this method could potentially be implemented for continuous monitoring of ILI activity in the US and to provide support for traditional influenza surveillance tools.
Wikipedia is a gateway to knowledge. However, the extent to which this gateway ends at Wikipedia or continues via supporting citations is unknown. Wikipedia’s gateway functionality has implications for information design and education, notably in medicine. This study aims to establish benchmarks for the relative distribution and referral (click) rate of citations-as indicated by presence of a Digital Object Identifier (DOI)-from Wikipedia, with a focus on medical citations. DOIs referred from the English Wikipedia in August 2016 were obtained from Crossref.org. Next, based on a DOI’s presence on a WikiProject Medicine page, all DOIs in Wikipedia were categorized as medical (WP:MED) or non-medical (non-WP:MED). Using this categorization, referred DOIs were classified as WP:MED, non-WP:MED, or BOTH, meaning the DOI may have been referred from either category. Data were analyzed using descriptive and inferential statistics. Out of 5.2 million Wikipedia pages, 4.42% (n = 229,857) included at least one DOI. 68,870 were identified as WP:MED, with 22.14% (n = 15,250) featuring one or more DOIs. WP:MED pages featured on average 8.88 DOI citations per page, whereas non-WP:MED pages had on average 4.28 DOI citations. For DOIs only on WP:MED pages, a DOI was referred every 2,283 pageviews and for non-WP:MED pages every 2,467 pageviews. DOIs from BOTH pages accounted for 12% (n = 58,475). The referral of DOI citations found in BOTH could not be assigned to WP:MED or non-WP:MED, as the page from which the referral was made was not provided with the data. While these results cannot provide evidence of greater citation referral from WP:MED than non-WP:MED, they do provide benchmarks to assess strategies for changing referral patterns. These changes might include editors adopting new methods for designing and presenting citations or the introduction of teaching strategies that address the value of consulting citations as a tool for extending learning.