- Proceedings of the National Academy of Sciences of the United States of America
- Published over 6 years ago
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
It is usually assumed that modern language is a recent phenomenon, coinciding with the emergence of modern humans themselves. Many assume as well that this is the result of a single, sudden mutation giving rise to the full “modern package.” However, we argue here that recognizably modern language is likely an ancient feature of our genus pre-dating at least the common ancestor of modern humans and Neandertals about half a million years ago. To this end, we adduce a broad range of evidence from linguistics, genetics, paleontology, and archaeology clearly suggesting that Neandertals shared with us something like modern speech and language. This reassessment of the antiquity of modern language, from the usually quoted 50,000-100,000 years to half a million years, has profound consequences for our understanding of our own evolution in general and especially for the sciences of speech and language. As such, it argues against a saltationist scenario for the evolution of language, and toward a gradual process of culture-gene co-evolution extending to the present day. Another consequence is that the present-day linguistic diversity might better reflect the properties of the design space for language and not just the vagaries of history, and could also contain traces of the languages spoken by other human forms such as the Neandertals.
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and-contrary to previous findings-slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 2 years ago
Using footage from body-worn cameras, we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops. We develop computational linguistic methods that extract levels of respect automatically from transcripts, informed by a thin-slicing study of participant ratings of officer utterances. We find that officers speak with consistently less respect toward black versus white community members, even after controlling for the race of the officer, the severity of the infraction, the location of the stop, and the outcome of the stop. Such disparities in common, everyday interactions between police and the communities they serve have important implications for procedural justice and the building of police-community trust.
The Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment -using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.
We present evidence that the geographic context in which a language is spoken may directly impact its phonological form. We examined the geographic coordinates and elevations of 567 language locations represented in a worldwide phonetic database. Languages with phonemic ejective consonants were found to occur closer to inhabitable regions of high elevation, when contrasted to languages without this class of sounds. In addition, the mean and median elevations of the locations of languages with ejectives were found to be comparatively high. The patterns uncovered surface on all major world landmasses, and are not the result of the influence of particular language families. They reflect a significant and positive worldwide correlation between elevation and the likelihood that a language employs ejective phonemes. In addition to documenting this correlation in detail, we offer two plausible motivations for its existence. We suggest that ejective sounds might be facilitated at higher elevations due to the associated decrease in ambient air pressure, which reduces the physiological effort required for the compression of air in the pharyngeal cavity-a unique articulatory component of ejective sounds. In addition, we hypothesize that ejective sounds may help to mitigate rates of water vapor loss through exhaled air. These explications demonstrate how a reduction of ambient air density could promote the usage of ejective phonemes in a given language. Our results reveal the direct influence of a geographic factor on the basic sound inventories of human languages.
Human language can express limitless meanings from a finite set of words based on combinatorial rules (i.e., compositional syntax). Although animal vocalizations may be comprised of different basic elements (notes), it remains unknown whether compositional syntax has also evolved in animals. Here we report the first experimental evidence for compositional syntax in a wild animal species, the Japanese great tit (Parus minor). Tits have over ten different notes in their vocal repertoire and use them either solely or in combination with other notes. Experiments reveal that receivers extract different meanings from ‘ABC’ (scan for danger) and ’D' notes (approach the caller), and a compound meaning from ‘ABC-D’ combinations. However, receivers rarely scan and approach when note ordering is artificially reversed (’D-ABC'). Thus, compositional syntax is not unique to human language but may have evolved independently in animals as one of the basic mechanisms of information transmission.
(1) is children’s irony appreciation and processing related to their empathy skills? and (2) is children’s processing of a speaker’s ironic meaning best explained by a modular or interactive theory? Participants were thirty-one 8- and 9-year-olds children. We used a variant of the visual world paradigm to assess children’s processing of ironic and literal evaluative remarks; in this paradigm children’s cognition is revealed through their actions and eye gaze. Results in this paradigm showed that children’s irony appreciation and processing were correlated with their empathy development, suggesting that empathy or emotional perspective taking may be important for development of irony comprehension. Further, children’s processing of irony was consistent with an interactive framework, in which children consider ironic meanings in the earliest moments, as speech unfolds. These results provide important new insights about development of this complex aspect of emotion recognition.
Crowdsourcing linguistic phenomena with smartphone applications is relatively new. In linguistics, apps have predominantly been developed to create pronunciation dictionaries, to train acoustic models, and to archive endangered languages. This paper presents the first account of how apps can be used to collect data suitable for documenting language change: we created an app, Dialäkt Äpp (DÄ), which predicts users' dialects. For 16 linguistic variables, users select a dialectal variant from a drop-down menu. DÄ then geographically locates the user’s dialect by suggesting a list of communes where dialect variants most similar to their choices are used. Underlying this prediction are 16 maps from the historical Linguistic Atlas of German-speaking Switzerland, which documents the linguistic situation around 1950. Where users disagree with the prediction, they can indicate what they consider to be their dialect’s location. With this information, the 16 variables can be assessed for language change. Thanks to the playfulness of its functionality, DÄ has reached many users; our linguistic analyses are based on data from nearly 60,000 speakers. Results reveal a relative stability for phonetic variables, while lexical and morphological variables seem more prone to change. Crowdsourcing large amounts of dialect data with smartphone apps has the potential to complement existing data collection techniques and to provide evidence that traditional methods cannot, with normal resources, hope to gather. Nonetheless, it is important to emphasize a range of methodological caveats, including sparse knowledge of users' linguistic backgrounds (users only indicate age, sex) and users' self-declaration of their dialect. These are discussed and evaluated in detail here. Findings remain intriguing nevertheless: as a means of quality control, we report that traditional dialectological methods have revealed trends similar to those found by the app. This underlines the validity of the crowdsourcing method. We are presently extending DÄ architecture to other languages.
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the noun-verb distinction (e.g., Straits Salish), whereas others have a proliferation of fine-grained syntactic categories (e.g., Tzeltal); and some languages do without morphology (e.g., Mandarin), while others pack a whole sentence into a single word (e.g., Cayuga). A challenge for evolutionary biology is to reconcile the diversity of languages with the high degree of biological uniformity of their speakers. Here, we model processes of language change and geographical dispersion and find a consistent pressure for flexible learning, irrespective of the language being spoken. This pressure arises because flexible learners can best cope with the observed high rates of linguistic change associated with divergent cultural evolution following human migration. Thus, rather than genetic adaptations for specific aspects of language, such as recursion, the coevolution of genes and fast-changing linguistic structure provides the biological basis for linguistic diversity. Only biological adaptations for flexible learning combined with cultural evolution can explain how each child has the potential to learn any human language.