We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or ‘boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 3 years ago
The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web (WWW) also allows for the rapid dissemination of unsubstantiated rumors and conspiracy theories that often elicit rapid, large, but naive social responses such as the recent case of Jade Helm 15–where a simple military exercise turned out to be perceived as the beginning of a new civil war in the United States. In this work, we address the determinants governing misinformation spreading through a thorough quantitative analysis. In particular, we focus on how Facebook users consume information related to two distinct narratives: scientific and conspiracy news. We find that, although consumers of scientific and conspiracy stories present similar consumption patterns with respect to content, cascade dynamics differ. Selective exposure to content is the primary driver of content diffusion and generates the formation of homogeneous clusters, i.e., “echo chambers.” Indeed, homogeneity appears to be the primary driver for the diffusion of contents and each echo chamber has its own cascade dynamics. Finally, we introduce a data-driven percolation model mimicking rumor spreading and we show that homogeneity and polarization are the main determinants for predicting cascades' size.
The large availability of user provided contents on online social media facilitates people aggregation around shared beliefs, interests, worldviews and narratives. In spite of the enthusiastic rhetoric about the so called collective intelligence unsubstantiated rumors and conspiracy theories-e.g., chemtrails, reptilians or the Illuminati-are pervasive in online social networks (OSN). In this work we study, on a sample of 1.2 million of individuals, how information related to very distinct narratives-i.e. main stream scientific and conspiracy news-are consumed and shape communities on Facebook. Our results show that polarized communities emerge around distinct types of contents and usual consumers of conspiracy news result to be more focused and self-contained on their specific contents. To test potential biases induced by the continued exposure to unsubstantiated rumors on users' content selection, we conclude our analysis measuring how users respond to 4,709 troll information-i.e. parodistic and sarcastic imitation of conspiracy theories. We find that 77.92% of likes and 80.86% of comments are from users usually interacting with conspiracy stories.
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and-contrary to previous findings-slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-scale measures such as obesity rates.
Research studies show that social media may be valuable tools in the disease surveillance toolkit used for improving public health professionals' ability to detect disease outbreaks faster than traditional methods and to enhance outbreak response. A social media work group, consisting of surveillance practitioners, academic researchers, and other subject matter experts convened by the International Society for Disease Surveillance, conducted a systematic primary literature review using the PRISMA framework to identify research, published through February 2013, answering either of the following questions: Can social media be integrated into disease surveillance practice and outbreak management to support and improve public health?Can social media be used to effectively target populations, specifically vulnerable populations, to test an intervention and interact with a community to improve health outcomes?Examples of social media included are Facebook, MySpace, microblogs (e.g., Twitter), blogs, and discussion forums. For Question 1, 33 manuscripts were identified, starting in 2009 with topics on Influenza-like Illnesses (n = 15), Infectious Diseases (n = 6), Non-infectious Diseases (n = 4), Medication and Vaccines (n = 3), and Other (n = 5). For Question 2, 32 manuscripts were identified, the first in 2000 with topics on Health Risk Behaviors (n = 10), Infectious Diseases (n = 3), Non-infectious Diseases (n = 9), and Other (n = 10).
Ambient awareness refers to the awareness social media users develop of their online network in result of being constantly exposed to social information, such as microblogging updates. Although each individual bit of information can seem like random noise, their incessant reception can amass to a coherent representation of social others. Despite its growing popularity and important implications for social media research, ambient awareness on public social media has not been studied empirically. We provide evidence for the occurrence of ambient awareness and examine key questions related to its content and functions. A diverse sample of participants reported experiencing awareness, both as a general feeling towards their network as a whole, and as knowledge of individual members of the network, whom they had not met in real life. Our results indicate that ambient awareness can develop peripherally, from fragmented information and in the relative absence of extensive one-to-one communication. We report the effects of demographics, media use, and network variables and discuss the implications of ambient awareness for relational and informational processes online.
Publish/subscribe is a communication paradigm where loosely-coupled clients communicate in an asynchronous fashion. Publish/subscribe supports the flexible development of large-scale, event-driven and ubiquitous systems. Publish/subscribe is prevalent in a number of application domains such as social networking, distributed business processes and real-time mission-critical systems. Many publish/subscribe applications are sensitive to message loss and violation of privacy. To overcome such issues, we propose a novel method of using secret sharing and replication techniques. This is to reliably and confidentially deliver decryption keys along with encrypted publications even under the presence of several Byzantine brokers across publish/subscribe overlay networks. We also propose a framework for dynamically and strategically allocating broker replicas based on flexibly definable criteria for reliability and performance. Moreover, a thorough evaluation is done through a case study on social networks using the real trace of interactions among Facebook users.
Exposure to news, opinion and civic information increasingly occurs through social media. How do these online networks influence exposure to perspectives that cut across ideological lines? Using de-identified data, we examined how 10.1 million U.S. Facebook users interact with socially shared news. We directly measured ideological homophily in friend networks, and examine the extent to which heterogeneous friends could potentially expose individuals to cross-cutting content. We then quantified the extent to which individuals encounter comparatively more or less diverse content while interacting via Facebook’s algorithmically ranked News Feed, and further studied users' choices to click through to ideologically discordant content. Compared to algorithmic ranking, individuals' choices about what to consume had a stronger effect limiting exposure to cross-cutting content.
Face-to-face social interactions enhance well-being. With the ubiquity of social media, important questions have arisen about the impact of online social interactions. In the present study, we assessed the associations of both online and offline social networks with several subjective measures of well-being. We used 3 waves (2013, 2014, and 2015) of data from 5,208 subjects in the nationally representative Gallup Panel Social Network Study survey, including social network measures, in combination with objective measures of Facebook use. We investigated the associations of Facebook activity and real-world social network activity with self-reported physical health, self-reported mental health, self-reported life satisfaction, and body mass index. Our results showed that overall, the use of Facebook was negatively associated with well-being. For example, a 1-standard-deviation increase in “likes clicked” (clicking “like” on someone else’s content), “links clicked” (clicking a link to another site or article), or “status updates” (updating one’s own Facebook status) was associated with a decrease of 5%-8% of a standard deviation in self-reported mental health. These associations were robust to multivariate cross-sectional analyses, as well as to 2-wave prospective analyses. The negative associations of Facebook use were comparable to or greater in magnitude than the positive impact of offline interactions, which suggests a possible tradeoff between offline and online relationships.