Discover the most talked about and latest scientific content & concepts.

Concept: Indo-European languages


The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion-mainly to East Europe and the northern Balkans-resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.

Concepts: Ukraine, Bulgaria, Indo-European languages, Slavic peoples, Slavic languages, Baltic languages, Old Church Slavonic, Balto-Slavic languages


Zipf’s law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf’s exponent diverges. Indeed, the character frequency decays exponentially in the Zipf’s plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf’s and Heaps' laws in human language systems.

Concepts: Language, Exponential growth, Exponential function, Zipf's law, Historical linguistics, Exponentials, Language family, Indo-European languages


India is a patchwork of tribal and non-tribal populations that speak many different languages from various language families. Indo-European, spoken across northern and central India, and also in Pakistan and Bangladesh, has been frequently connected to the so-called “Indo-Aryan invasions” from Central Asia ~3.5 ka and the establishment of the caste system, but the extent of immigration at this time remains extremely controversial. South India, on the other hand, is dominated by Dravidian languages. India displays a high level of endogamy due to its strict social boundaries, and high genetic drift as a result of long-term isolation which, together with a very complex history, makes the genetic study of Indian populations challenging.

Concepts: India, Sri Lanka, South Asia, Urdu, Sanskrit, Language family, Indo-European languages, Dravidian languages


There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.

Concepts: Europe, German language, Historical linguistics, Language family, Indo-European languages, Proto-Indo-European language, Tarim mummies, Celtic languages


The negative bias accompanying the terms left and left-handers has long interested researchers. This paper examines a large number of languages of Indo-European and non-Indo-European origin for such biasing. One surprising outcome is that, within the Indo-European language family, the terms for right and left do not go back to one set of antonyms but have their etymological roots in a number of different core semantic concepts. As in the non-Indo-European languages, right is almost always thought of positively, whereas left is negatively connotated. This is interpreted as the outcome of a universal human evaluation process, partly based on the principle of embodiment. The terms for right never have, in any of the examined languages, a negative bias; the words for left, usually never positively biased, were turned into euphemisms in three language groups (Scandinavian, Greek, and Avestan). On one interpretation, this seems to be an act of historical political correctness, corroborating the negative attitude cultures have for left-handers, very likely an outcome of discrimination of minorities.

Concepts: Linguistics, Language, Semantics, Greek language, Historical linguistics, Language family, Indo-European languages, Ancient Greek


We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost 400,000 polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of Western and Far Eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ∼8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ∼24,000-year-old Siberian. By ∼6,000-5,000 years ago, farmers throughout much of Europe had more hunter-gatherer ancestry than their predecessors, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ∼4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ∼75% of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ∼3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for a steppe origin of at least some of the Indo-European languages of Europe.

Concepts: Europe, Eastern Europe, Turkey, Ukraine, Greek language, Russia, Indo-European languages, Moldova


In a recent interdisciplinary study, Das and co-authors have attempted to trace the homeland of Ashkenazi Jews and of their historical language, Yiddish (Das et al. 2016. Localizing Ashkenazic Jews to Primeval Villages in the Ancient Iranian Lands of Ashkenaz. Genome Biology and Evolution). Das and co-authors applied the geographic population structure (GPS) method to autosomal genotyping data and inferred geographic coordinates of populations supposedly ancestral to Ashkenazi Jews, placing them in Eastern Turkey. They argued that this unexpected genetic result goes against the widely accepted notion of Ashkenazi origin in the Levant, and speculated that Yiddish was originally a Slavic language strongly influenced by Iranian and Turkic languages, and later remodeled completely under Germanic influence. In our view, there are major conceptual problems with both the genetic and linguistic parts of the work. We argue that GPS is a provenancing tool suited to inferring the geographic region where a modern and recently unadmixed genome is most likely to arise, but is hardly suitable for admixed populations and for tracing ancestry up to 1000 years before present, as its authors have previously claimed. Moreover, all methods of historical linguistics concur that Yiddish is a Germanic language, with no reliable evidence for Slavic, Iranian, or Turkic substrata.

Concepts: Jews, Judaism, Jewish ethnic divisions, Language family, Indo-European languages, Hebrew language, Ashkenazi Jews, Yiddish language


The small alpine district of East Tyrol (Austria) has an exceptional demographic history. It was contemporaneously inhabited by members of the Romance, the Slavic and the Germanic language groups for centuries. Since the Late Middle Ages, however, the population of the principally agrarian-oriented area is solely Germanic speaking. Historic facts about East Tyrol’s colonization are rare, but spatial density-distribution analysis based on the etymology of place-names has facilitated accurate spatial mapping of the various language groups' former settlement regions. To test for present-day Y chromosome population substructure, molecular genetic data were compared to the information attained by the linguistic analysis of pasture names. The linguistic data were used for subdividing East Tyrol into two regions of former Romance (A) and Slavic (B) settlement. Samples from 270 East Tyrolean men were genotyped for 17 Y-chromosomal microsatellites (Y-STRs) and 27 single nucleotide polymorphisms (Y-SNPs). Analysis of the probands' surnames revealed no evidence for spatial genetic structuring. Also, spatial autocorrelation analysis did not indicate significant correlation between genetic (Y-STR haplotypes) and geographic distance. Haplogroup R-M17 chromosomes, however, were absent in region A, but constituted one of the most frequent haplogroups in region B. The R-M343 (R1b) clade showed a marked and complementary frequency distribution pattern in these two regions. To further test East Tyrol’s modern Y-chromosomal landscape for geographic patterning attributable to the early history of settlement in this alpine area, principal coordinates analysis was performed. The Y-STR haplotypes from region A clearly clustered with those of Romance reference populations and the samples from region B matched best with Germanic speaking reference populations. The combined use of onomastic and molecular genetic data revealed and mapped the marked structuring of the distribution of Y chromosomes in an alpine region that has been culturally homogeneous for centuries.

Concepts: DNA, Gene, Genetic genealogy, Y chromosome, Genealogical DNA test, Language family, Indo-European languages, Germanic languages


Previous empirical studies have suggested that language is primarily used to exchange social information, but our evidence on this derives mainly from English speakers. We present data from a study of natural conversations among Farsi (Persian) speakers in Iran and show that not only are conversation groups the same size as those observed in Europe and North America, but people also talk predominantly about social topics. We argue that these results reinforce the suggestion that language most likely evolved for the transmission of information about the social world. We also explore sex differences in conversational behavior: while the pattern is broadly similar between the sexes, men may be more sensitive than women are to discussing some topics in the presence of many other people.

Concepts: Male, Sexual dimorphism, Gender, Sociology, Sex, Iran, Conversation, Indo-European languages


We aimed to study narrative skills in Mandarin-speaking children with language impairment (LI) to compare with children with LI speaking Indo-European languages.

Concepts: Language, Programming language, German language, Speech, Historical linguistics, Language family, Indo-European languages, August Schleicher