Crowdsourcing linguistic phenomena with smartphone applications is relatively new. In linguistics, apps have predominantly been developed to create pronunciation dictionaries, to train acoustic models, and to archive endangered languages. This paper presents the first account of how apps can be used to collect data suitable for documenting language change: we created an app, Dialäkt Äpp (DÄ), which predicts users' dialects. For 16 linguistic variables, users select a dialectal variant from a drop-down menu. DÄ then geographically locates the user’s dialect by suggesting a list of communes where dialect variants most similar to their choices are used. Underlying this prediction are 16 maps from the historical Linguistic Atlas of German-speaking Switzerland, which documents the linguistic situation around 1950. Where users disagree with the prediction, they can indicate what they consider to be their dialect’s location. With this information, the 16 variables can be assessed for language change. Thanks to the playfulness of its functionality, DÄ has reached many users; our linguistic analyses are based on data from nearly 60,000 speakers. Results reveal a relative stability for phonetic variables, while lexical and morphological variables seem more prone to change. Crowdsourcing large amounts of dialect data with smartphone apps has the potential to complement existing data collection techniques and to provide evidence that traditional methods cannot, with normal resources, hope to gather. Nonetheless, it is important to emphasize a range of methodological caveats, including sparse knowledge of users' linguistic backgrounds (users only indicate age, sex) and users' self-declaration of their dialect. These are discussed and evaluated in detail here. Findings remain intriguing nevertheless: as a means of quality control, we report that traditional dialectological methods have revealed trends similar to those found by the app. This underlines the validity of the crowdsourcing method. We are presently extending DÄ architecture to other languages.
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the noun-verb distinction (e.g., Straits Salish), whereas others have a proliferation of fine-grained syntactic categories (e.g., Tzeltal); and some languages do without morphology (e.g., Mandarin), while others pack a whole sentence into a single word (e.g., Cayuga). A challenge for evolutionary biology is to reconcile the diversity of languages with the high degree of biological uniformity of their speakers. Here, we model processes of language change and geographical dispersion and find a consistent pressure for flexible learning, irrespective of the language being spoken. This pressure arises because flexible learners can best cope with the observed high rates of linguistic change associated with divergent cultural evolution following human migration. Thus, rather than genetic adaptations for specific aspects of language, such as recursion, the coevolution of genes and fast-changing linguistic structure provides the biological basis for linguistic diversity. Only biological adaptations for flexible learning combined with cultural evolution can explain how each child has the potential to learn any human language.
We contrasted the predictive power of three measures of semantic richness-number of features (NFs), contextual dispersion (CD), and a novel measure of number of semantic neighbors (NSN)-for a large set of concrete and abstract concepts on lexical decision and naming tasks. NSN (but not NF) facilitated processing for abstract concepts, while NF (but not NSN) facilitated processing for the most concrete concepts, consistent with claims that linguistic information is more relevant for abstract concepts in early processing. Additionally, converging evidence from two datasets suggests that when NSN and CD are controlled for, the features that most facilitate processing are those associated with a concept’s physical characteristics and real-world contexts. These results suggest that rich linguistic contexts (many semantic neighbors) facilitate early activation of abstract concepts, whereas concrete concepts benefit more from rich physical contexts (many associated objects and locations).
A word like Huh?-used as a repair initiator when, for example, one has not clearly heard what someone just said- is found in roughly the same form and function in spoken languages across the globe. We investigate it in naturally occurring conversations in ten languages and present evidence and arguments for two distinct claims: that Huh? is universal, and that it is a word. In support of the first, we show that the similarities in form and function of this interjection across languages are much greater than expected by chance. In support of the second claim we show that it is a lexical, conventionalised form that has to be learnt, unlike grunts or emotional cries. We discuss possible reasons for the cross-linguistic similarity and propose an account in terms of convergent evolution. Huh? is a universal word not because it is innate but because it is shaped by selective pressures in an interactional environment that all languages share: that of other-initiated repair. Our proposal enhances evolutionary models of language change by suggesting that conversational infrastructure can drive the convergent cultural evolution of linguistic items.
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 2 years ago
How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides indirect access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here, we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries to translate words to and from languages carefully selected to be representative of worldwide diversity. These translations reveal cases where a particular language uses a single “polysemous” word to express multiple concepts that another language represents using distinct words. We use the frequency of such polysemies linking two concepts as a measure of their semantic proximity and represent the pattern of these linkages by a weighted network. This network is highly structured: Certain concepts are far more prone to polysemy than others, and naturally interpretable clusters of closely related concepts emerge. Statistical analysis of the polysemies observed in a subset of the basic vocabulary shows that these structural properties are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of a literary tradition. The methods developed here can be applied to any semantic domain to reveal the extent to which its conceptual structure is, similarly, a universal attribute of human cognition and language use.
There would be little adaptive value in a complex communication system like human language if there were no ways to detect and correct problems. A systematic comparison of conversation in a broad sample of the world’s languages reveals a universal system for the real-time resolution of frequent breakdowns in communication. In a sample of 12 languages of 8 language families of varied typological profiles we find a system of ‘other-initiated repair’, where the recipient of an unclear message can signal trouble and the sender can repair the original message. We find that this system is frequently used (on average about once per 1.4 minutes in any language), and that it has detailed common properties, contrary to assumptions of radical cultural variation. Unrelated languages share the same three functionally distinct types of repair initiator for signalling problems and use them in the same kinds of contexts. People prefer to choose the type that is the most specific possible, a principle that minimizes cost both for the sender being asked to fix the problem and for the dyad as a social unit. Disruption to the conversation is kept to a minimum, with the two-utterance repair sequence being on average no longer that the single utterance which is being fixed. The findings, controlled for historical relationships, situation types and other dependencies, reveal the fundamentally cooperative nature of human communication and offer support for the pragmatic universals hypothesis: while languages may vary in the organization of grammar and meaning, key systems of language use may be largely similar across cultural groups. They also provide a fresh perspective on controversies about the core properties of language, by revealing a common infrastructure for social interaction which may be the universal bedrock upon which linguistic diversity rests.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 1 year ago
It is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have been documented in linguistics, cognitive science, and anthropology, but these studies only involved small subsets of the 6,000+ languages spoken in the world today. By analyzing word lists covering nearly two-thirds of the world’s languages, we demonstrate that a considerable proportion of 100 basic vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages (linguistic families or isolates). Prominently among these relations, we find property words (“small” and i, “full” and p or b) and body part terms (“tongue” and l, “nose” and n). The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed. Our results therefore have important implications for the language sciences, given that nonarbitrary associations have been proposed to play a critical role in the emergence of cross-modal mappings, the acquisition of language, and the evolution of our species' unique communication system.
The claim that Eskimo languages have words for different types of snow is well-known among the public, but has been greatly exaggerated through popularization and is therefore viewed with skepticism by many scholars of language. Despite the prominence of this claim, to our knowledge the line of reasoning behind it has not been tested broadly across languages. Here, we note that this reasoning is a special case of the more general view that language is shaped by the need for efficient communication, and we empirically test a variant of it against multiple sources of data, including library reference works, Twitter, and large digital collections of linguistic and meteorological data. Consistent with the hypothesis of efficient communication, we find that languages that use the same linguistic form for snow and ice tend to be spoken in warmer climates, and that this association appears to be mediated by lower communicative need to talk about snow and ice. Our results confirm that variation in semantic categories across languages may be traceable in part to local communicative needs. They suggest moreover that despite its awkward history, the topic of “words for snow” may play a useful role as an accessible instance of the principle that language supports efficient communication.
In this paper we explore the results of a large-scale online game called ‘the Great Language Game’, in which people listen to an audio speech sample and make a forced-choice guess about the identity of the language from 2 or more alternatives. The data include 15 million guesses from 400 audio recordings of 78 languages. We investigate which languages are confused for which in the game, and if this correlates with the similarities that linguists identify between languages. This includes shared lexical items, similar sound inventories and established historical relationships. Our findings are, as expected, that players are more likely to confuse two languages that are objectively more similar. We also investigate factors that may affect players' ability to accurately select the target language, such as how many people speak the language, how often the language is mentioned in written materials and the economic power of the target language community. We see that non-linguistic factors affect players' ability to accurately identify the target. For example, languages with wider ‘global reach’ are more often identified correctly. This suggests that both linguistic and cultural knowledge influence the perception and recognition of languages and their similarity.
Recent studies have shown that concurrent physical activity enhances learning a completely unfamiliar L2 vocabulary as compared to learning it in a static condition. In this paper we report a study whose aim is twofold: to test for possible positive effects of physical activity when L2 learning has already reached some level of proficiency, and to test whether the assumed better performance when engaged in physical activity is limited to the linguistic level probed at training (i.e. L2 vocabulary tested by means of a Word-Picture Verification task), or whether it extends also to the sentence level (which was tested by means of a Sentence Semantic Judgment Task). The results show that Chinese speakers with basic knowledge of English benefited from physical activity while learning a set of new words. Furthermore, their better performance emerged also at the sentential level, as shown by their performance in a Semantic Judgment task. Finally, an interesting temporal asymmetry between the lexical and the sentential level emerges, with the difference between the experimental and control group emerging from the 1st testing session at the lexical level but after several weeks at the sentential level.