Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and-contrary to previous findings-slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.
We contrasted the predictive power of three measures of semantic richness-number of features (NFs), contextual dispersion (CD), and a novel measure of number of semantic neighbors (NSN)-for a large set of concrete and abstract concepts on lexical decision and naming tasks. NSN (but not NF) facilitated processing for abstract concepts, while NF (but not NSN) facilitated processing for the most concrete concepts, consistent with claims that linguistic information is more relevant for abstract concepts in early processing. Additionally, converging evidence from two datasets suggests that when NSN and CD are controlled for, the features that most facilitate processing are those associated with a concept’s physical characteristics and real-world contexts. These results suggest that rich linguistic contexts (many semantic neighbors) facilitate early activation of abstract concepts, whereas concrete concepts benefit more from rich physical contexts (many associated objects and locations).
Adequate normalization minimizes the effects of systematic technical variations and is a prerequisite for getting meaningful biological changes. However, there is inconsistency about miRNA normalization performances and recommendations. Thus, we investigated the impact of seven different normalization methods (reference gene index, global geometric mean, quantile, invariant selection, loess, loessM, and generalized procrustes analysis) on intra- and inter-platform performance of two distinct and commonly used miRNA profiling platforms.
The Food Choice Questionnaire (FCQ) assesses the importance that subjects attribute to nine factors related to food choices: health, mood, convenience, sensory appeal, natural content, price, weight control, familiarity and ethical concern. This study sought to assess the applicability of the FCQ in Brazil; it describes the translation and cultural adaptation from English into Portuguese of the FCQ via the following steps: independent translations, consensus, back-translation, evaluation by a committee of experts, semantic validation and pre-test. The pre-test was run with a randomly sampled group of 86 male and female college students from different courses with a median age of 19. Slight differences between the versions were observed and adjustments were made. After minor changes in the translation process, the committee of experts considered that the Brazilian Portuguese version was semantically and conceptually equivalent to the English original. Semantic validation showed that the questionnaire is easily understood. The instrument presented a high degree of internal consistency. The study is the first stage in the process of validating an instrument, which consists of face and content validity. Further stages, already underway, are needed before other researchers can use it.
The aim of this research was to study semantic abilities and their loss in mild cognitive impairment (MCI) and in dementia, while analyzing efficiency in the use of associative relations, within verbal and visuoperceptual modalities. Participants were split into 4 groups: 19 participants with amnestic MCI, 16 patients with mild Alzheimer disease (AD), 20 patients with moderate AD, and 20 healthy controls (HCs). All participants performed standardized neuropsychological tests and experimental (naming and semantic associations) tasks to evaluate verbal and visuoperceptual semantic abilities. We analyzed 4 associative relations (part/whole, function, superordinate, and contiguity) in both verbal and visuoperceptual code. Our results suggest a progressive impairment in semantic categorization knowledge, with worse performance in the AD groups relative to the MCI and HC groups. Our data show a different pattern in the 4 associative relations and the involvement of associative semantic relations already in the early stage of disease, as well as a different pattern of deterioration between verbal and visuoperceptual modalities. Our data indicate that the visuoperceptual semantic network appears to be less deteriorated than the verbal network in AD. The verbal semantic network may be more sensitive in detecting patients at an early stage of the disease.
A number of studies on network analysis have focused on language networks based on free word association, which reflects human lexical knowledge, and have demonstrated the small-world and scale-free properties in the word association network. Nevertheless, there have been very few attempts at applying network analysis to distributional semantic models, despite the fact that these models have been studied extensively as computational or cognitive models of human lexical knowledge. In this paper, we analyze three network properties, namely, small-world, scale-free, and hierarchical properties, of semantic networks created by distributional semantic models. We demonstrate that the created networks generally exhibit the same properties as word association networks. In particular, we show that the distribution of the number of connections in these networks follows the truncated power law, which is also observed in an association network. This indicates that distributional semantic models can provide a plausible model of lexical knowledge. Additionally, the observed differences in the network properties of various implementations of distributional semantic models are consistently explained or predicted by considering the intrinsic semantic features of a word-context matrix and the functions of matrix weighting and smoothing. Furthermore, to simulate a semantic network with the observed network properties, we propose a new growing network model based on the model of Steyvers and Tenenbaum. The idea underlying the proposed model is that both preferential and random attachments are required to reflect different types of semantic relations in network growth process. We demonstrate that this model provides a better explanation of network behaviors generated by distributional semantic models.
Why do people self-report an aversion to words like “moist”? The present studies represent an initial scientific exploration into the phenomenon of word aversion by investigating its prevalence and cause. Results of five experiments indicate that about 10-20% of the population is averse to the word “moist.” This population often speculates that phonological properties of the word are the cause of their displeasure. However, data from the current studies point to semantic features of the word-namely, associations with disgusting bodily functions-as a more prominent source of peoples' unpleasant experience. “Moist,” for averse participants, was notable for its valence and personal use, rather than imagery or arousal-a finding that was confirmed by an experiment designed to induce an aversion to the word. Analyses of individual difference measures suggest that word aversion is more prevalent among younger, more educated, and more neurotic people, and is more commonly reported by females than males.
The meaning of language is represented in regions of the cerebral cortex collectively known as the ‘semantic system’. However, little of the semantic system has been mapped comprehensively, and the semantic selectivity of most regions is unknown. Here we systematically map semantic selectivity across the cortex using voxel-wise modelling of functional MRI (fMRI) data collected while subjects listened to hours of narrative stories. We show that the semantic system is organized into intricate patterns that seem to be consistent across individuals. We then use a novel generative model to create a detailed semantic atlas. Our results suggest that most areas within the semantic system represent information about specific semantic domains, or groups of related concepts, and our atlas shows which domains are represented in each area. This study demonstrates that data-driven methods–commonplace in studies of human neuroanatomy and functional connectivity–provide a powerful and efficient means for mapping functional representations in the brain.
Causal inference is a core task of science. However, authors and editors often refrain from explicitly acknowledging the causal goal of research projects; they refer to causal effect estimates as associational estimates. This commentary argues that using the term “causal” is necessary to improve the quality of observational research. Specifically, being explicit about the causal objective of a study reduces ambiguity in the scientific question, errors in the data analysis, and excesses in the interpretation of the results. (Am J Public Health. Published online ahead of print March 22, 2018: e1-e4. doi:10.2105/AJPH.2018.304337).
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 3 years ago
How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides indirect access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here, we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries to translate words to and from languages carefully selected to be representative of worldwide diversity. These translations reveal cases where a particular language uses a single “polysemous” word to express multiple concepts that another language represents using distinct words. We use the frequency of such polysemies linking two concepts as a measure of their semantic proximity and represent the pattern of these linkages by a weighted network. This network is highly structured: Certain concepts are far more prone to polysemy than others, and naturally interpretable clusters of closely related concepts emerge. Statistical analysis of the polysemies observed in a subset of the basic vocabulary shows that these structural properties are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of a literary tradition. The methods developed here can be applied to any semantic domain to reveal the extent to which its conceptual structure is, similarly, a universal attribute of human cognition and language use.