We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase ‘sick of’ and the word ‘depressed’), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive ‘my’ when mentioning their ‘wife’ or ‘girlfriend’ more often than females use ‘my’ with ‘husband’ or ‘boyfriend’). To date, this represents the largest study, by an order of magnitude, of language and personality.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 5 years ago
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
The interaction between language and motor action has been approached by studying the effect of action verbs, kinaesthetic imagery and mental subtraction upon the performance of a complex movement, the squat vertical jump (SVJ). The time of flight gave the value of the height of the SVJ and was measured with an Optojump® and a Myotest® apparatuses. The results obtained by the effects of the cognitive stimuli showed a statistically significant improvement of the SVJ performance after either loudly or silently pronouncing, hearing or reading the verb saute (jump in French language). Action verbs specific for other motor actions (pince = pinch, lèche = lick) or non-specific (bouge = move) showed no or little effect. A meaningless verb for the French subjects (tiáo = jump in Chinese) showed no effect as did rêve (dream), tombe (fall) and stop. The verb gagne (win) improved significantly the SVJ height, as did its antonym perds (lose) suggesting a possible influence of affects in the subjects' performance. The effect of the specific action verb jump upon the heights of SVJ was similar to that obtained after kinaesthetic imagery and after mental subtraction of two digits numbers from three digits ones; possibly, in the latter, because of the intervention of language in calculus. It appears that the effects of the specific action verb jump did seem effective but not totally exclusive for the enhancement of the SVJ performance. The results imply an interaction among language and motor brain areas in the performance of a complex movement resulting in a clear specificity of the corresponding action verb. The effect upon performance may probably be influenced by the subjects' intention, increased attention and emotion produced by cognitive stimuli among which action verbs.
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the noun-verb distinction (e.g., Straits Salish), whereas others have a proliferation of fine-grained syntactic categories (e.g., Tzeltal); and some languages do without morphology (e.g., Mandarin), while others pack a whole sentence into a single word (e.g., Cayuga). A challenge for evolutionary biology is to reconcile the diversity of languages with the high degree of biological uniformity of their speakers. Here, we model processes of language change and geographical dispersion and find a consistent pressure for flexible learning, irrespective of the language being spoken. This pressure arises because flexible learners can best cope with the observed high rates of linguistic change associated with divergent cultural evolution following human migration. Thus, rather than genetic adaptations for specific aspects of language, such as recursion, the coevolution of genes and fast-changing linguistic structure provides the biological basis for linguistic diversity. Only biological adaptations for flexible learning combined with cultural evolution can explain how each child has the potential to learn any human language.
We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This “cooling pattern” forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
Cognitive models claim that spoken words are recognized by an optimally efficient sequential analysis process. Evidence for this is the finding that nonwords are recognized as soon as they deviate from all real words (Marslen-Wilson 1984), reflecting continuous evaluation of speech inputs against lexical representations. Here, we investigate the brain mechanisms supporting this core aspect of word recognition and examine the processes of competition and selection among multiple word candidates. Based on new behavioral support for optimal efficiency in lexical access from speech, a functional magnetic resonance imaging study showed that words with later nonword points generated increased activation in the left superior and middle temporal gyrus (Brodmann area [BA] 21/22), implicating these regions in dynamic sound-meaning mapping. We investigated competition and selection by manipulating the number of initially activated word candidates (competition) and their later drop-out rate (selection). Increased lexical competition enhanced activity in bilateral ventral inferior frontal gyrus (BA 47/45), while increased lexical selection demands activated bilateral dorsal inferior frontal gyrus (BA 44/45). These findings indicate functional differentiation of the fronto-temporal systems for processing spoken language, with left middle temporal gyrus (MTG) and superior temporal gyrus (STG) involved in mapping sounds to meaning, bilateral ventral inferior frontal gyrus (IFG) engaged in less constrained early competition processing, and bilateral dorsal IFG engaged in later, more fine-grained selection processes.
- Philosophical transactions of the Royal Society of London. Series B, Biological sciences
- Published over 6 years ago
Many children with specific language impairment (SLI) have persisting problems in the correct use of verb tense, but there has been disagreement as to the underlying reason. When we take into account studies using receptive as well as expressive language tasks, the data suggest that the difficulty for children with SLI is in knowing when to inflect verbs for tense, rather than how to do so. This is perhaps not surprising when we consider that tense does not have a transparent semantic interpretation, but depends on complex relationships between inflections and hierarchically organized clauses. An explanation in terms of syntactic limitations contrasts with a popular morpho-phonological account, the Words and Rules model. This model, which attributes problems to difficulties with applying a rule to generate regular inflected forms, has been widely applied to adult-acquired disorders. There are striking similarities in the pattern of errors in adults with anterior aphasia and children with SLI, suggesting that impairments in appreciation of when to mark tense may apply to acquired as well as developmental disorders.
During speech processing, human listeners can separately analyze lexical and intonational cues to arrive at a unified representation of communicative content. The evolution of this capacity can be best investigated by comparative studies. Using functional magnetic resonance imaging, we explored whether and how dog brains segregate and integrate lexical and intonational information. We found a left-hemisphere bias for processing meaningful words, independently of intonation; a right auditory brain region for distinguishing intonationally marked and unmarked words; and increased activity in primary reward regions only when both lexical and intonational information were consistent with praise. Neural mechanisms to separately analyze and integrate word meaning and intonation in dogs suggest that this capacity can evolve in the absence of language.
Mastering multiple languages is an increasingly important ability in the modern world; furthermore, multilingualism may affect human learning abilities. Here, we test how the brain’s capacity to rapidly form new representations for spoken words is affected by prior individual experience in non-native language acquisition. Formation of new word memory traces is reflected in a neurophysiological response increase during a short exposure to novel lexicon. Therefore, we recorded changes in electrophysiological responses to phonologically native and non-native novel word-forms during a perceptual learning session, in which novel stimuli were repetitively presented to healthy adults in either ignore or attend conditions. We found that larger number of previously acquired languages and earlier average age of acquisition (AoA) predicted greater response increase to novel non-native word-forms. This suggests that early and extensive language experience is associated with greater neural flexibility for acquiring novel words with unfamiliar phonology. Conversely, later AoA was associated with a stronger response increase for phonologically native novel word-forms, indicating better tuning of neural linguistic circuits to native phonology. The results suggest that individual language experience has a strong effect on the neural mechanisms of word learning, and that it interacts with the phonological familiarity of the novel lexicon.
Although most studies of language learning take place in quiet laboratory settings, everyday language learning occurs under noisy conditions. The current research investigated the effects of background speech on word learning. Both younger (22- to 24-month-olds; n = 40) and older (28- to 30-month-olds; n = 40) toddlers successfully learned novel label-object pairings when target speech was 10 dB louder than background speech but not when the signal-to-noise ratio (SNR) was 5 dB. Toddlers (28- to 30-month-olds; n = 26) successfully learned novel words with a 5-dB SNR when they initially heard the labels embedded in fluent speech without background noise, before they were mapped to objects. The results point to both challenges and protective factors that may impact language learning in complex auditory environments.