We report here trends in the usage of “mood” words, that is, words carrying emotional content, in 20th century English language books, using the data set provided by Google that includes word frequencies in roughly 4% of all books published up to the year 2008. We find evidence for distinct historical periods of positive and negative moods, underlain by a general decrease in the use of emotion-related words through time. Finally, we show that, in books, American English has become decidedly more “emotional” than British English in the last half-century, as a part of a more general increase of the stylistic divergence between the two variants of English language.
Crowdsourcing linguistic phenomena with smartphone applications is relatively new. In linguistics, apps have predominantly been developed to create pronunciation dictionaries, to train acoustic models, and to archive endangered languages. This paper presents the first account of how apps can be used to collect data suitable for documenting language change: we created an app, Dialäkt Äpp (DÄ), which predicts users' dialects. For 16 linguistic variables, users select a dialectal variant from a drop-down menu. DÄ then geographically locates the user’s dialect by suggesting a list of communes where dialect variants most similar to their choices are used. Underlying this prediction are 16 maps from the historical Linguistic Atlas of German-speaking Switzerland, which documents the linguistic situation around 1950. Where users disagree with the prediction, they can indicate what they consider to be their dialect’s location. With this information, the 16 variables can be assessed for language change. Thanks to the playfulness of its functionality, DÄ has reached many users; our linguistic analyses are based on data from nearly 60,000 speakers. Results reveal a relative stability for phonetic variables, while lexical and morphological variables seem more prone to change. Crowdsourcing large amounts of dialect data with smartphone apps has the potential to complement existing data collection techniques and to provide evidence that traditional methods cannot, with normal resources, hope to gather. Nonetheless, it is important to emphasize a range of methodological caveats, including sparse knowledge of users' linguistic backgrounds (users only indicate age, sex) and users' self-declaration of their dialect. These are discussed and evaluated in detail here. Findings remain intriguing nevertheless: as a means of quality control, we report that traditional dialectological methods have revealed trends similar to those found by the app. This underlines the validity of the crowdsourcing method. We are presently extending DÄ architecture to other languages.
While it is recognized that language can pose a barrier to the transfer of scientific knowledge, the convergence on English as the global language of science may suggest that this problem has been resolved. However, our survey searching Google Scholar in 16 languages revealed that 35.6% of 75,513 scientific documents on biodiversity conservation published in 2014 were not in English. Ignoring such non-English knowledge can cause biases in our understanding of study systems. Furthermore, as publication in English has become prevalent, scientific knowledge is often unavailable in local languages. This hinders its use by field practitioners and policy makers for local environmental issues; 54% of protected area directors in Spain identified languages as a barrier. We urge scientific communities to make a more concerted effort to tackle this problem and propose potential approaches both for compiling non-English scientific knowledge effectively and for enhancing the multilingualization of new and existing knowledge available only in English for the users of such knowledge.
For the 20(th) century since the Depression, we find a strong correlation between a ‘literary misery index’ derived from English language books and a moving average of the previous decade of the annual U.S. economic misery index, which is the sum of inflation and unemployment rates. We find a peak in the goodness of fit at 11 years for the moving average. The fit between the two misery indices holds when using different techniques to measure the literary misery index, and this fit is significantly better than other possible correlations with different emotion indices. To check the robustness of the results, we also analysed books written in German language and obtained very similar correlations with the German economic misery index. The results suggest that millions of books published every year average the authors' shared economic experiences over the past decade.
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 5 years ago
Languages vary enormously in global importance because of historical, demographic, political, and technological forces. However, beyond simple measures of population and economic power, there has been no rigorous quantitative way to define the global influence of languages. Here we use the structure of the networks connecting multilingual speakers and translated texts, as expressed in book translations, multiple language editions of Wikipedia, and Twitter, to provide a concept of language importance that goes beyond simple economic or demographic measures. We find that the structure of these three global language networks (GLNs) is centered on English as a global hub and around a handful of intermediate hub languages, which include Spanish, German, French, Russian, Portuguese, and Chinese. We validate the measure of a language’s centrality in the three GLNs by showing that it exhibits a strong correlation with two independent measures of the number of famous people born in the countries associated with that language. These results suggest that the position of a language in the GLN contributes to the visibility of its speakers and the global popularity of the cultural content they produce.
The Spanish trill /r/ is typically described as having a single realization-that is, as the result of aerodynamic forces operating on a tongue tip/blade constriction placed midsagittally in the dental, alveolar, or postalveolar place of articulation, in such a way that air channels along the sides of the tongue open and close for multiple cycles of vibration. As with American English /ɹ/, this sound is acquired late by typically developing children and is frequently an element in articulatory disorders. As with American English /ɹ/, perceptually equivalent “correct” trill /r/’s may be realized differently by different speakers. Knowledge of these alternate “correct” realizations would clearly be helpful to clinicians and learners of Spanish. In this preliminary study, we report data from ultrasound images of individuals who speak different dialects of Spanish. Preliminary data suggests there are at least two different articulatory postures used when producing the Spanish trill /r/, one of which involves lateralization. These articulatory differences do not affect what native listeners categorize as perceptually correct Spanish trills.
This study compares the duration and first two formants (F1 and F2) of 11 nominal monophthongs and five nominal diphthongs in Standard Southern British English (SSBE) and a Northern English dialect. F1 and F2 trajectories were fitted with parametric curves using the discrete cosine transform (DCT) and the zeroth DCT coefficient represented formant trajectory means and the first DCT coefficient represented the magnitude and direction of formant trajectory change to characterize vowel inherent spectral change (VISC). Cross-dialectal comparisons involving these measures revealed significant differences for the phonologically back monophthongs /ɒ, ɔː, ʊ, uː/ and also /зː/ and the diphthongs /eɪ, əʊ, aɪ, ɔɪ/. Most cross-dialectal differences are in zeroth DCT coefficients, suggesting formant trajectory means tend to characterize such differences, while first DCT coefficient differences were more numerous for diphthongs. With respect to VISC, the most striking differences are that /uː/ is considerably more diphthongized in the Northern dialect and that the F2 trajectory of /əʊ/ proceeds in opposite directions in the two dialects. Cross-dialectal differences were found to be largely unaffected by the consonantal context in which the vowels were produced. The implications of the results are discussed in relation to VISC, consonantal context effects and speech perception.
Pitch processing at cortical and subcortical stages of processing is shaped by language experience. We recently demonstrated that specific components of the cortical pitch response (CPR) index the more rapidly-changing portions of the high rising Tone 2 of Mandarin Chinese, in addition to marking pitch onset and sound offset. In this study, we examine how language experience (Mandarin vs. English) shapes the processing of different temporal attributes of pitch reflected in the CPR components using stimuli representative of within-category variants of Tone 2. Results showed that the magnitude of CPR components (Na-Pb and Pb-Nb) and the correlation between these two components and pitch acceleration were stronger for the Chinese listeners compared to English listeners for stimuli that fell within the range of Tone 2 citation forms. Discriminant function analysis revealed that the Na-Pb component was more than twice as important as Pb-Nb in grouping listeners by language affiliation. In addition, a stronger stimulus-dependent, rightward asymmetry was observed for the Chinese group at the temporal, but not frontal, electrode sites. This finding may reflect selective recruitment of experience-dependent, pitch-specific mechanisms in right auditory cortex to extract more complex, time-varying pitch patterns. Taken together, these findings suggest that long-term language experience shapes early sensory level processing of pitch in the auditory cortex, and that the sensitivity of the CPR may vary depending on the relative linguistic importance of specific temporal attributes of dynamic pitch.
The purpose of this study was to determine whether there are dialectal and gender related differences in nasalance of main Mandarin vowels and three sentences in 400 Chinese normal adults. The mean nasalance score difference for dialect and gender was significant (p < .001) in all speech materials. For different dialects, the average nasalance scores show that Chongqing > Beijing > Shanghai > Guangzhou for the nasal sentence, oro-nasal sentence, /a/, /i/ and /u/. In addition, the average nasalance scores of females were higher than those of males for all speech materials in all dialects. The clinical significance of this study can be helpful in making nasalance clinical decisions for Chinese people with cleft palate, hearing disorders and dysarthria with resonance disorders. It also shows the theoretical and socio-cultural features for linguists considering dialects and gender.
We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.