This study investigated how forty-six mothers modified their talk about familiar and unfamiliar nouns and verbs when interacting with their children with Down Syndrome (DS), language impairment (LI), or typical development (TD). Children (MLUs < 2·7) were group-matched on expressive vocabulary size. Mother-child dyads were recorded playing with toy animals (noun task) and action boxes (verb task). Mothers of children with DS used shorter utterances and more verb labels in salient positions than the other two groups. All mothers produced unfamiliar target nouns in short utterances, in utterance-final position, and with the referent perceptually available. Mothers also talked more about familiar nouns and verbs and labelled them more often and more consistently. These findings suggest that mothers of children in the early period of language development fine-tune their input in ways that reflect their children's vocabulary knowledge, but do so differently for nouns and verbs.
The ability to understand speech is significantly degraded by aging, particularly in noisy environments. One way that older adults cope with this hearing difficulty is through the use of contextual cues. Several behavioral studies have shown that older adults are better at following a conversation when the target speech signal has high contextual content or when the background distractor is not meaningful. Specifically, older adults gain significant benefit in focusing on and understanding speech if the background is spoken by a talker in a language that is not comprehensible to them (i.e. a foreign language). To better understand the neural mechanisms underlying this benefit in older adults, we investigated aging effects on midbrain and cortical encoding of speech when in the presence of a single competing talker speaking in language that is meaningful or meaningless to the listener (i.e., English vs. Dutch). Our results suggest that neural processing is strongly affected by the informational content of noise. Specifically, older listeners' cortical responses to the attended speech signal are less deteriorated when the competing speech signal is an incomprehensible language than when it is their native language. Conversely, temporal processing in the midbrain is affected by different backgrounds only during rapid changes in speech, and only in younger listeners. Additionally, we found that cognitive decline is associated with an increase in cortical envelope tracking, suggesting an age-related over (or inefficient) use of cognitive resources that may explain their difficulty in processing speech targets while trying to ignore interfering noise.
Children who hear large amounts of diverse speech learn language more quickly than children who do not. However, high correlations between the amount and the diversity of the input in speech samples makes it difficult to isolate the influence of each. We overcame this problem by controlling the input to a computational model so that amount of exposure to linguistic input (quantity) and the quality of that input (lexical diversity) were independently manipulated. Sublexical, lexical, and multi-word knowledge were charted across development (Study 1), showing that while input quantity may be important early in learning, lexical diversity is ultimately more crucial, a prediction confirmed against children’s data (Study 2). The model trained on a lexically diverse input also performed better on nonword repetition and sentence recall tests (Study 3) and was quicker to learn new words over time (Study 4). A language input that is rich in lexical diversity outperforms equivalent richness in quantity for learned sublexical and lexical knowledge, for well-established language tests, and for acquiring words that have never been encountered before.
Investigating the evolution of human speech is difficult and controversial because human speech surpasses nonhuman primate vocal communication in scope and flexibility [1-3]. Monkey vocalizations have been assumed to be largely innate, highly affective, and stereotyped for over 50 years [4, 5]. Recently, this perception has dramatically changed. Current studies have revealed distinct learning mechanisms during vocal development [6-8] and vocal flexibility, allowing monkeys to cognitively control when [9, 10], where , and what to vocalize [10, 12, 13]. However, specific call features (e.g., duration, frequency) remain surprisingly robust and stable in adult monkeys, resulting in rather stereotyped and discrete call patterns . Additionally, monkeys seem to be unable to modulate their acoustic call structure under reinforced conditions beyond natural constraints [15, 16]. Behavioral experiments have shown that monkeys can stop sequences of calls immediately after acoustic perturbation but cannot interrupt ongoing vocalizations, suggesting that calls consist of single impartible pulses [17, 18]. Using acoustic perturbation triggered by the vocal behavior itself and quantitative measures of resulting vocal adjustments, we show that marmoset monkeys are capable of producing calls with durations beyond the natural boundaries of their repertoire by interrupting ongoing vocalizations rapidly after perturbation onset. Our results indicate that marmosets are capable of interrupting vocalizations only at periodic time points throughout calls, further supported by the occurrence of periodically segmented phees. These ideas overturn decades-old concepts on primate vocal pattern generation, indicating that vocalizations do not consist of one discrete call pattern but are built of many sequentially uttered units, like human speech.
Infants differ substantially in their rates of language growth, and slow growth predicts later academic difficulties. In this study, we explored how the amount of speech directed to infants in Spanish-speaking families low in socioeconomic status influenced the development of children’s skill in real-time language processing and vocabulary learning. All-day recordings of parent-infant interactions at home revealed striking variability among families in how much speech caregivers addressed to their child. Infants who experienced more child-directed speech became more efficient in processing familiar words in real time and had larger expressive vocabularies by the age of 24 months, although speech simply overheard by the child was unrelated to vocabulary outcomes. Mediation analyses showed that the effect of child-directed speech on expressive vocabulary was explained by infants' language-processing efficiency, which suggests that richer language experience strengthens processing skills that facilitate language growth.
We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient.
Experiments involving verbal self-monitoring show that memory for spoken words varies with types of sensory feedback: memory is better when words are spoken aloud than when they are lip-synched or covertly produced. Such effects can be explained by the Central Monitoring Theory (CMT) via a process that matches a forward model reflecting expected sensory effects of practiced forms and sensory information during speech. But CMT oversees factors of shared attention as achieved by speaker-listener gaze, and implies that sensory feedback may not affect the learning of unpracticed forms (non-words). These aspects of CMT were examined in two experiments of self-monitoring focusing on oro-sensory feedback. In Experiment 1 we show that varying feedback creates differential effects on memory for spoken words and that speaker-listener gaze alters these effects. Using non-words, Experiment 2 shows the absence of differential feedback effects. The results confirm CMT but suggest the need to refine the theory in terms of processes that mediate attention.
The innovation of iconic gestures is essential to establishing the vocabularies of signed languages, but might iconicity also play a role in the origin of spoken words? Can people create novel vocalizations that are comprehensible to naïve listeners without prior convention? We launched a contest in which participants submitted non-linguistic vocalizations for 30 meanings spanning actions, humans, animals, inanimate objects, properties, quantifiers and demonstratives. The winner was determined by the ability of naïve listeners to infer the meanings of the vocalizations. We report a series of experiments and analyses that evaluated the vocalizations for: (1) comprehensibility to naïve listeners; (2) the degree to which they were iconic; (3) agreement between producers and listeners in iconicity; and (4) whether iconicity helps listeners learn the vocalizations as category labels. The results show contestants were able to create successful iconic vocalizations for most of the meanings, which were largely comprehensible to naïve listeners, and easier to learn as category labels. These findings demonstrate how iconic vocalizations can enable interlocutors to establish understanding in the absence of conventions. They suggest that, prior to the advent of full-blown spoken languages, people could have used iconic vocalizations to ground a spoken vocabulary with considerable semantic breadth.
The receptive language measure information-carrying word (ICW) level, is used extensively by speech and language therapists in the UK and Ireland. Despite this it has never been validated via its relationship to any other relevant measures. This study aims to validate the ICW measure by investigating the relationship between the receptive ICW score of children with specific language impairment (SLI) and their performance on standardized memory and language assessments. Twenty-seven children with SLI, aged between 5;07 and 8;11, completed a sentence comprehension task in which the instructions gradually increased in number of ICWs. The children also completed subtests from The Working Memory Test Battery for children and The Clinical Evaluation of Language Fundamentals- 4. Results showed that there was a significant positive relationship between both language and memory measures and children’s ICW score. While both receptive and expressive language were significant in their contribution to children’s ICW score, the contribution of memory was solely determined by children’s working memory ability. ICW score is in fact a valid measure of the language ability of children with SLI. However therapists should also be cognisant of its strong association with working memory when using this construct in assessment or intervention methods.
In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.