In complex animal vocalizations, such as bird or whale song, a great variety of songs can be produced via rearrangements of a smaller set of ‘syllables’, known as ‘phonological syntax’ or ‘phonocoding’ However, food or alarm calls, which function as referential signals, were previously thought to lack such combinatorial structure. A new study of calls in the banded mongoose Mungos mungo provides the first evidence of phonocoding at the level of single calls. The first portion of the call provides cues to the identity of the caller, and the second part encodes its current activity. This provides the first example known in animals of something akin to the consonants and vowels of human speech.See research article http://www.biomedcentral.com/1741-7007/10/97.
Language is a distinguishing characteristic of our species, and the course of its evolution is one of the hardest problems in science. It has long been generally considered that human speech requires a low larynx, and that the high larynx of nonhuman primates should preclude their producing the vowel systems universally found in human language. Examining the vocalizations through acoustic analyses, tongue anatomy, and modeling of acoustic potential, we found that baboons (Papio papio) produce sounds sharing the F1/F2 formant structure of the human [ɨ æ ɑ ɔ u] vowels, and that similarly with humans those vocalic qualities are organized as a system on two acoustic-anatomic axes. This confirms that hominoids can produce contrasting vowel qualities despite a high larynx. It suggests that spoken languages evolved from ancient articulatory skills already present in our last common ancestor with Cercopithecoidea, about 25 MYA.
Two calling melodies of Polish were investigated, the routine call, used to call someone for an everyday reason, and the urgent call, which conveys disapproval of the addressee’s actions. A Discourse Completion Task was used to elicit the two melodies from Polish speakers using twelve names from one to four syllables long; there were three names per syllable count, and speakers produced three tokens of each name with each melody. The results, based on eleven speakers, show that the routine calling melody consists of a low F0 stretch followed by a rise-fall-rise; the urgent calling melody, on the other hand, is a simple rise-fall. Systematic differences were found in the scaling and alignment of tonal targets: the routine call showed late alignment of the accentual pitch peak, and in most instances lower scaling of targets. The accented vowel was also affected, being overall louder in the urgent call. Based on the data and comparisons with other Polish melodies, we analyze the routine call as LH* !H-H% and the urgent call as H* L-L%. We discuss the results and our analysis in light of recent findings on calling melodies in other languages, and explore their repercussions for intonational phonology and the modeling of intonation.
Worldwide distribution of the DCDC2 READ1 regulatory element and its relationship with phoneme variation across languages
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 1 year ago
DCDC2 is a gene strongly associated with components of the phonological processing system in animal models and in multiple independent studies of populations and languages. We propose that it may also influence population-level variation in language component usage. To test this hypothesis, we investigated the evolution and worldwide distribution of the READ1 regulatory element within DCDC2, and compared its distribution with variation in different language properties. The mutational history of READ1 was estimated by examining primate and archaic hominin sequences. This identified duplication and expansion events, which created a large number of polymorphic alleles based on internal repeat units (RU1 and RU2). Association of READ1 alleles was studied with respect to the numbers of consonants and vowels for languages in 43 human populations distributed across five continents. Using population-based approaches with multivariate ANCOVA and linear mixed effects analyses, we found that the RU1-1 allele group of READ1 is significantly associated with the number of consonants within languages independent of genetic relatedness, geographic proximity, and language family. We propose that allelic variation in READ1 helped create a subtle cognitive bias that was amplified by cultural transmission, and ultimately shaped consonant use by different populations over time.
AIMS: To test the hypothesis that exposure to ambient language in the womb alters phonetic perception shortly after birth. This two-country study aimed to see if neonates demonstrated prenatal learning by how they responded to vowels in a category from their native language and another nonnative language, regardless of how much postnatal experience the infants had. METHOD: A counterbalanced experiment was conducted in Sweden (n=40) and the USA (n=40) using Swedish and English vowel sounds. The neonates (mean postnatal age = 33 hrs) controlled audio presentation of either native or nonnative vowels by sucking on a pacifier, with the number of times they sucked their pacifier being used to demonstrate what vowel sounds attracted their attention. The vowels were either the English /i/ or Swedish /y/ in the form of a prototype plus 16 variants of the prototype. RESULTS: The infants in the native and nonnative groups responded differently. As predicted, the infants responded to the unfamiliar nonnative language with higher mean sucks. They also sucked more to the nonnative prototype. Time since birth (range: 7-75 hours) did not affect the outcome. CONCLUSION: The ambient language to which foetuses are exposed in the womb starts to affect their perception of their native language at a phonetic level. This can be measured shortly after birth by differences in responding to familiar vs. unfamiliar vowels. ©2012 The Author(s)/Acta Paediatrica ©2012 Foundation Acta Paediatrica.
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
- Journal of voice : official journal of the Voice Foundation
- Published over 6 years ago
The present study examines the extent to which increased nasal coupling affects estimates of glottal parameters derived from inverse filtering based on an all-pole assumption of the vocal tract. A series of steady-state tokens for five Swedish vowels were synthesized using the HLsyn quasi-articulatory synthesizer (Sensimetrics, Malden, MA). For each vowel, the parameter controlling the cross-sectional area of the nasal aperture, an, was systematically varied, while the other HLsyn parameters were kept constant. The resultant pressure signal for each utterance was subsequently inverse filtered, and estimates were made of five glottal source parameters (EE, RG, RA, RK, and OQ) derived from fitting the Liljencrants and Fant source model to the inverse filtered signal. The results show that when analyzing nasalized vowels using inverse filtering based on an all-pole assumption of the vocal tract, the RA parameter estimate-a main determinant of the source spectral slope-can be adversely affected by nasal coupling. The errors in our estimates were particularly high for the high vowels: this was true not only for RA, but for all the parameters measured. However, with the exception of the distortion in the RA estimate, the effects were relatively small, regardless of the degree of nasal coupling.
In electroencephalogram (EEG) recordings, there is a characteristic P1-N1-P2 complex after the onset of a sound, and a related complex, called the Acoustic Change Complex (ACC), when there is a change within a sound (e.g., a formant transition between two vowels). In the present study, the ACC was measured for all possible pairs of eight sustained voiced and voiceless English fricatives, in EEG recordings from native speakers of British English. The magnitude of the ACC was used as a similarity measure for multidimensional scaling (MDS), producing a two-dimensional perceptual space that related to both voicing and place of articulation. The results thus demonstrate that this combination of ACC and MDS can be effective for mapping multidimensional phonetic spaces at relatively early levels of auditory processing, which may be useful for evaluating the effects of language experience in adults and infants.
Auditory cortical activity in normal hearing subjects to consonant vowels presented in quiet and in noise
- Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology
- Published over 6 years ago
OBJECTIVE: Compare brain potentials to consonant vowels (CVs) as a function of both voice onset times (VOTs) and consonant position; initial (CV) versus second (VCV). METHODS: Auditory cortical potentials (N100, P200, N200, and a late slow negativity, (SN) were recorded from scalp electrodes in twelve normal hearing subjects to consonant vowels in initial position (CVs: /du/ and /tu/), in second position (VCVs: /udu/ and /utu/), and to vowels alone (V: /u/) and paired (VVs: /uu/) separated in time to simulate consonant voice onset times (VOTs). RESULTS: CVs evoked “acoustic onset” N100s of similar latency but larger amplitudes to /du/ than /tu/. CVs preceded by a vowel (VCVs) evoked “acoustic change” N100s with longer latencies to /utu/ than /udu/. Their absolute latency difference was less than the corresponding VOT difference. The SN following N100 to VCVs was larger to /utu/ than /udu/. Paired vowels (/uu/) separated by intervals corresponding to consonant VOTs evoked N100s with latency differences equal to the simulated VOT differences and SNs of similar amplitudes. Noise masking resulted in VCV N100 latency differences that were now equal to consonant VOT differences. Brain activations by CVs, VCVs, and VVs were maximal in right temporal lobe. CONCLUSION: Auditory cortical activities to CVs are sensitive to: (1) position of the CV in the utterance; (2) VOTs of consonants; and (3) noise masking. SIGNIFICANCE: VOTs of stop consonants affect auditory cortical activities differently as a function of the position of the consonant in the utterance.
This study compares the duration and first two formants (F1 and F2) of 11 nominal monophthongs and five nominal diphthongs in Standard Southern British English (SSBE) and a Northern English dialect. F1 and F2 trajectories were fitted with parametric curves using the discrete cosine transform (DCT) and the zeroth DCT coefficient represented formant trajectory means and the first DCT coefficient represented the magnitude and direction of formant trajectory change to characterize vowel inherent spectral change (VISC). Cross-dialectal comparisons involving these measures revealed significant differences for the phonologically back monophthongs /ɒ, ɔː, ʊ, uː/ and also /зː/ and the diphthongs /eɪ, əʊ, aɪ, ɔɪ/. Most cross-dialectal differences are in zeroth DCT coefficients, suggesting formant trajectory means tend to characterize such differences, while first DCT coefficient differences were more numerous for diphthongs. With respect to VISC, the most striking differences are that /uː/ is considerably more diphthongized in the Northern dialect and that the F2 trajectory of /əʊ/ proceeds in opposite directions in the two dialects. Cross-dialectal differences were found to be largely unaffected by the consonantal context in which the vowels were produced. The implications of the results are discussed in relation to VISC, consonantal context effects and speech perception.