SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Web 2.0

624

The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web (WWW) also allows for the rapid dissemination of unsubstantiated rumors and conspiracy theories that often elicit rapid, large, but naive social responses such as the recent case of Jade Helm 15–where a simple military exercise turned out to be perceived as the beginning of a new civil war in the United States. In this work, we address the determinants governing misinformation spreading through a thorough quantitative analysis. In particular, we focus on how Facebook users consume information related to two distinct narratives: scientific and conspiracy news. We find that, although consumers of scientific and conspiracy stories present similar consumption patterns with respect to content, cascade dynamics differ. Selective exposure to content is the primary driver of content diffusion and generates the formation of homogeneous clusters, i.e., “echo chambers.” Indeed, homogeneity appears to be the primary driver for the diffusion of contents and each echo chamber has its own cascade dynamics. Finally, we introduce a data-driven percolation model mimicking rumor spreading and we show that homogeneity and polarization are the main determinants for predicting cascades' size.

Concepts: Scientific method, United States, World Wide Web, Web 2.0, Oregon, Social media, Facebook, Conspiracy theory

301

The large availability of user provided contents on online social media facilitates people aggregation around shared beliefs, interests, worldviews and narratives. In spite of the enthusiastic rhetoric about the so called collective intelligence unsubstantiated rumors and conspiracy theories-e.g., chemtrails, reptilians or the Illuminati-are pervasive in online social networks (OSN). In this work we study, on a sample of 1.2 million of individuals, how information related to very distinct narratives-i.e. main stream scientific and conspiracy news-are consumed and shape communities on Facebook. Our results show that polarized communities emerge around distinct types of contents and usual consumers of conspiracy news result to be more focused and self-contained on their specific contents. To test potential biases induced by the continued exposure to unsubstantiated rumors on users' content selection, we conclude our analysis measuring how users respond to 4,709 troll information-i.e. parodistic and sarcastic imitation of conspiracy theories. We find that 77.92% of likes and 80.86% of comments are from users usually interacting with conspiracy stories.

Concepts: Scientific method, Sociology, Web 2.0, Social information processing, Social media, Social network aggregation, Facebook, Conspiracy theory

287

Altmetric measurements derived from the social web are increasingly advocated and used as early indicators of article impact and usefulness. Nevertheless, there is a lack of systematic scientific evidence that altmetrics are valid proxies of either impact or utility although a few case studies have reported medium correlations between specific altmetrics and citation rates for individual journals or fields. To fill this gap, this study compares 11 altmetrics with Web of Science citations for 76 to 208,739 PubMed articles with at least one altmetric mention in each case and up to 1,891 journals per metric. It also introduces a simple sign test to overcome biases caused by different citation and usage windows. Statistically significant associations were found between higher metric scores and higher citations for articles with positive altmetric scores in all cases with sufficient evidence (Twitter, Facebook wall posts, research highlights, blogs, mainstream media and forums) except perhaps for Google+ posts. Evidence was insufficient for LinkedIn, Pinterest, question and answer sites, and Reddit, and no conclusions should be drawn about articles with zero altmetric scores or the strength of any correlation between altmetrics and citations. Nevertheless, comparisons between citations and metric values for articles published at different times, even within the same year, can remove or reverse this association and so publishers and scientometricians should consider the effect of time when using altmetrics to rank articles. Finally, the coverage of all the altmetrics except for Twitter seems to be low and so it is not clear if they are prevalent enough to be useful in practice.

Concepts: Scientific method, Utility, Web 2.0, Social network service, Mass media, Social media, Online social networking, Style guide

240

The traditional vertical system of sharing information from sources of scientific authority passed down to the public through local health authorities and clinicians risks being made obsolete by emerging technologies that facilitate rapid horizontal information sharing. The rise of Public Health 2.0 requires professional acknowledgment that a new and substantive forum of public discourse about public health exists on social media, such as forums, blogs, Facebook, and Twitter.

Concepts: Public health, Health, Science, Web 2.0, Political philosophy, Social information processing, Social media, Legitimacy

207

Health care professionals, patients, caregivers, family, friends, and other supporters are increasingly joining online health communities to share information and find support. But social Web (Web 2.0) technology alone does not create a successful online community. Building and sustaining a successful community requires an enabler and strategic community management. Community management is more than moderation. The developmental life cycle of a community has four stages: inception, establishment, maturity, and mitosis. Each stage presents distinct characteristics and management needs. This paper describes the community management strategies, resources, and expertise needed to build and maintain a thriving online health community; introduces some of the challenges; and provides a guide for health organizations considering this undertaking. The paper draws on insights from an ongoing study and observation of online communities as well as experience managing and consulting a variety of online health communities. Discussion includes effective community building practices relevant to each stage, such as outreach and relationship building, data collection, content creation, and other proven techniques that ensure the survival and steady growth of an online health community.

Concepts: Health care, Health care provider, Community, Web 2.0, Community building, Health informatics, Virtual community, Social media

202

We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-scale measures such as obesity rates.

Concepts: Demography, United States, U.S. state, Demographics, Web 2.0, Social network, Social network service, Facebook

182

Web 2.0 media (eg, Facebook, Wikipedia) are considered very valuable for communicating with citizens in times of crisis. However, in the case of infectious disease outbreaks, their value has not been determined empirically. In order to be able to take full advantage of Web 2.0 media in such a situation, the link between these media, citizens' information behavior, and citizens' information needs has to be investigated.

Concepts: Epidemiology, Web 2.0

173

MOTIVATION: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. RESULTS: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. AVAILABILITY: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. CONTACT: robbinsd@uab.edu.

Concepts: Data, World Wide Web, Semantic Web, Web 2.0, Reproducibility, Resource Description Framework, The Cancer Genome Atlas, World Wide Web Consortium

170

BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the ‘Semantic Web’, and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.

Concepts: Biology, Organism, Life, Species, Science, World Wide Web, Semantic Web, Web 2.0

168

A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora.

Concepts: Costs, Linguistics, Gold, Cost, Task, Gold standard, Web 2.0, Natural language processing