SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Uniform Resource Identifier

40

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.

Concepts: Philosophical logic, Link rot, Web archiving, Semantics, Reference, Hypertext Transfer Protocol, Uniform Resource Identifier, World Wide Web

37

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource’s content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.

Concepts: Hypertext Transfer Protocol, Semantic Web, Website, Internet, Citation, Reference, World Wide Web, Uniform Resource Identifier

23

The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.

Concepts: Biology, Uniform Resource Identifier, Web 2.0, Semantics, World Wide Web Consortium, Integral, SPARQL, Semantic Web

15

Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs).RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easierto scale up inference and data analysis.Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples.Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO;exposes more information from the database; and is now available as dereferencable, linked data.To demonstrate these new features, we present novel use cases showing further integration withother web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standardontologies for querying.Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL databaseto other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDFresource creates a foundation for integrated semantic web cheminformatics applications,such as the presented decision support.

Concepts: Uniform Resource Locator, Linked Data, World Wide Web, Uniform Resource Identifier, Resource Description Framework, Semantic Web

8

DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380 000 associations between >16 000 genes and 13 000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/.

Concepts: Web browser, Disease, Uniform Resource Locator, Web 2.0, Resource Description Framework, Uniform Resource Identifier, World Wide Web, Semantic Web

3

Increasingly, older adults and their informal caregivers are using the Internet to search for health-related information. There is a proliferation of health information online, but the quality of this information varies, often based on exaggerated or dramatic findings, and not easily comprehended by consumers. The McMaster Optimal Aging Portal (Portal) was developed to provide Internet users with high-quality evidence about aging and address some of these current limitations of health information posted online. The Portal includes content for health professionals coming from three best-in-class resources (MacPLUS, Health Evidence, and Health Systems Evidence) and four types of content specifically prepared for the general public (Evidence Summaries, Web Resource Ratings, Blog Posts, and Twitter messages).

Concepts: IP address, Uniform Resource Locator, Blog, Website, Uniform Resource Identifier, Twitter, World Wide Web, Internet

3

On the Semantic Web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own IRI (International Resource Identifier) for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data.

Concepts: Biology, Life, Ecology, Semantic Web, Uniform Resource Identifier

1

Contrary to the assumption that web browsers are designed to support the user, an examination of a 900,000 distinct PCs shows that web browsers comprise a complex ecosystem with millions of addons collaborating and competing with each other. It is possible for addons to “sneak in” through third party installations or to get “kicked out” by their competitors without user involvement. This study examines that ecosystem quantitatively by constructing a large-scale graph with nodes corresponding to users, addons, and words (terms) that describe addon functionality. Analyzing addon interactions at user level using the Personalized PageRank (PPR) random walk measure shows that the graph demonstrates ecological resilience. Adapting the PPR model to analyzing the browser ecosystem at the level of addon manufacturer, the study shows that some addon companies are in symbiosis and others clash with each other as shown by analyzing the behavior of 18 prominent addon manufacturers. Results may herald insight on how other evolving internet ecosystems may behave, and suggest a methodology for measuring this behavior. Specifically, applying such a methodology could transform the addon market.

Concepts: Netscape, Hypertext Transfer Protocol, WorldWideWeb, Web page, Uniform Resource Identifier, Google, Web browser, World Wide Web

1

Mental disorders (MDs) affect almost 1 in 4 adults at some point during their lifetime, and coupled with substance use disorders are the fifth leading cause of disability adjusted life years worldwide. People with these disorders often use the Web as an informational resource, platform for convenient self-directed treatment, and a means for many other kinds of support. However, some features of the Web can potentially erect barriers for this group that limit their access to these benefits, and there is a lack of research looking into this eventuality. Therefore, it is important to identify gaps in knowledge about “what” barriers exist and “how” they could be addressed so that this knowledge can inform Web professionals who aim to ensure the Web is inclusive to this population.

Concepts: Uniform Resource Identifier, Disability, World Wide Web, Mental disorder

1

With the call for more rigorous scientific reporting, authentication, and transparency from the scientific community and funding agencies, one critical step is to make finding and identifying key resources in the published literature tractable. We discuss here the use of Research Resource Identifiers (RRIDs) as one tool to help resolve this tricky problem in reproducibility.

Concepts: Rigour, Science, Social sciences, Critical thinking, Pseudoscience, Uniform Resource Identifier, Research, Scientific method