How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.
Recent neuroscience research suggests that tinnitus may reflect synaptic loss in the cochlea that does not express in the audiogram but leads to neural changes in auditory pathways that reduce sound level tolerance (SLT). Adolescents (N = 170) completed a questionnaire addressing their prior experience with tinnitus, potentially risky listening habits, and sensitivity to ordinary sounds, followed by psychoacoustic measurements in a sound booth. Among all adolescents 54.7% reported by questionnaire that they had previously experienced tinnitus, while 28.8% heard tinnitus in the booth. Psychoacoustic properties of tinnitus measured in the sound booth corresponded with those of chronic adult tinnitus sufferers. Neither hearing thresholds (≤15 dB HL to 16 kHz) nor otoacoustic emissions discriminated between adolescents reporting or not reporting tinnitus in the sound booth, but loudness discomfort levels (a psychoacoustic measure of SLT) did so, averaging 11.3 dB lower in adolescents experiencing tinnitus in the acoustic chamber. Although risky listening habits were near universal, the teenagers experiencing tinnitus and reduced SLT tended to be more protective of their hearing. Tinnitus and reduced SLT could be early indications of a vulnerability to hidden synaptic injury that is prevalent among adolescents and expressed following exposure to high level environmental sounds.
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 2 years ago
The perception of the pitch of harmonic complex sounds is a crucial function of human audition, especially in music and speech processing. Whether the underlying mechanisms of pitch perception are unique to humans, however, is unknown. Based on estimates of frequency resolution at the level of the auditory periphery, psychoacoustic studies in humans have revealed several primary features of central pitch mechanisms. It has been shown that (i) pitch strength of a harmonic tone is dominated by resolved harmonics; (ii) pitch of resolved harmonics is sensitive to the quality of spectral harmonicity; and (iii) pitch of unresolved harmonics is sensitive to the salience of temporal envelope cues. Here we show, for a standard musical tuning fundamental frequency of 440 Hz, that the common marmoset (Callithrix jacchus), a New World monkey with a hearing range similar to that of humans, exhibits all of the primary features of central pitch mechanisms demonstrated in humans. Thus, marmosets and humans may share similar pitch perception mechanisms, suggesting that these mechanisms may have emerged early in primate evolution.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 2 years ago
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced “impairment” in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Bats are among the most gregarious and vocal mammals, with some species demonstrating a diverse repertoire of syllables under a variety of behavioral contexts. Despite extensive characterization of big brown bat (Eptesicus fuscus) biosonar signals, there have been no detailed studies of adult social vocalizations. We recorded and analyzed social vocalizations and associated behaviors of captive big brown bats under four behavioral contexts: low aggression, medium aggression, high aggression, and appeasement. Even limited to these contexts, big brown bats possess a rich repertoire of social vocalizations, with 18 distinct syllable types automatically classified using a spectrogram cross-correlation procedure. For each behavioral context, we describe vocalizations in terms of syllable acoustics, temporal emission patterns, and typical syllable sequences. Emotion-related acoustic cues are evident within the call structure by context-specific syllable types or variations in the temporal emission pattern. We designed a paradigm that could evoke aggressive vocalizations while monitoring heart rate as an objective measure of internal physiological state. Changes in the magnitude and duration of elevated heart rate scaled to the level of evoked aggression, confirming the behavioral state classifications assessed by vocalizations and behavioral displays. These results reveal a complex acoustic communication system among big brown bats in which acoustic cues and call structure signal the emotional state of a caller.
Older adults frequently complain that while they can hear a person talking, they cannot understand what is being said; this difficulty is exacerbated by background noise. Peripheral hearing loss cannot fully account for this age-related decline in speech-in-noise ability, as declines in central processing also contribute to this problem. Given that musicians have enhanced speech-in-noise perception, we aimed to define the effects of musical experience on subcortical responses to speech and speech-in-noise perception in middle-aged adults. Results reveal that musicians have enhanced neural encoding of speech in quiet and noisy settings. Enhancements include faster neural response timing, higher neural response consistency, more robust encoding of speech harmonics, and greater neural precision. Taken together, we suggest that musical experience provides perceptual benefits in an aging population by strengthening the underlying neural pathways necessary for the accurate representation of important temporal and spectral features of sound.
Echolocating bats use the time elapsed from biosonar pulse emission to the arrival of echo (defined as echo-delay) to assess target-distance. Target-distance is represented in the brain by delay-tuned neurons that are classified as either “heteroharmonic” or “homoharmormic.” Heteroharmonic neurons respond more strongly to pulse-echo pairs in which the timing of the pulse is given by the fundamental biosonar harmonic while the timing of echoes is provided by one (or several) of the higher order harmonics. On the other hand, homoharmonic neurons are tuned to the echo delay between similar harmonics in the emitted pulse and echo. It is generally accepted that heteroharmonic computations are advantageous over homoharmonic computations; i.e., heteroharmonic neurons receive information from call and echo in different frequency-bands which helps to avoid jamming between pulse and echo signals. Heteroharmonic neurons have been found in two species of the family Mormoopidae (Pteronotus parnellii and Pteronotus quadridens) and in Rhinolophus rouxi. Recently, it was proposed that heteroharmonic target-range computations are a primitive feature of the genus Pteronotus that was preserved in the evolution of the genus. Here, we review recent findings on the evolution of echolocation in Mormoopidae, and try to link those findings to the evolution of the heteroharmonic computation strategy (HtHCS). We stress the hypothesis that the ability to perform heteroharmonic computations evolved separately from the ability of using long constant-frequency echolocation calls, high duty cycle echolocation, and Doppler Shift Compensation. Also, we present the idea that heteroharmonic computations might have been of advantage for categorizing prey size, hunting eared insects, and living in large conspecific colonies. We make five testable predictions that might help future investigations to clarify the evolution of the heteroharmonic echolocation in Mormoopidae and other families.
A microfluidic device was developed to separate heterogeneous particle or cell mixtures in a continuous flow using acoustophoresis. In this device, two identical surface acoustic waves (SAWs) generated by interdigital transducers (IDTs) propagated toward a microchannel, which accordingly built up a standing surface acoustic wave (SSAW) field across the channel. A numerical model, coupling a piezoelectric effect in the solid substrate and acoustic pressure in the fluid, was developed to provide a better understanding of SSAW-based particle manipulation. It was found that the pressure nodes across the channel were individual planes perpendicular to the solid substrate. In the separation experiments, two side sheath flows hydrodynamically focused the injected particle or cell mixtures into a very narrow stream along the centerline. Particles flowing through the SSAW field experienced an acoustic radiation force that highly depends on the particle properties. As a result, dissimilar particles or cells were laterally attracted toward the pressure nodes at different magnitudes, and were eventually switched to different outlets. Two types of fluorescent microspheres with different sizes were successfully separated using the developed device. In addition, E. coli bacteria pre-mixed in peripheral blood mononuclear cells (PBMCs) were also efficiently isolated using the SSAW-base separation technique. Flow cytometric analysis on the collected samples found that the purity of separated E. coli bacteria was 95.65%.
Primate loud calls have the potential to encode information about the identity, arousal, age, or physical condition of the caller, even at long distances. In this study, we conducted an analysis of the acoustic features of the loud calls produced by a species of Asian colobine monkey (simakobu, Simias concolor). Adult male simakobu produce loud calls spontaneously and in response to loud sounds and other loud calls, which are audible more than 500 m. Individual differences in calling rates and durations exist, but it is unknown what these differences signal and which other acoustic features vary among individuals. We aimed to describe the structure and usage of calls and to examine acoustic features that vary within and among individuals. We determined the context of 318 loud calls and analyzed 170 loud calls recorded from 10 adult males at an undisturbed site, Pungut, Siberut Island, Indonesia. Most calls (53%) followed the loud call of another male, 31% were spontaneous, and the remaining 16% followed a loud environmental disturbance. The fundamental frequency (F0) decreased while inter-unit intervals (IUI) increased over the course of loud call bouts, possibly indicating caller fatigue. Discriminant function analysis indicated that calls were not well discriminated by context, but spontaneous calls had higher peak frequencies, suggesting a higher level of arousal. Individual calls were distinct and individuals were mainly discriminated by IUI, call duration, and F0. Loud calls of older males had shorter IUI and lower F0, while middle-aged males had the highest peak frequencies. Overall, we found that calls were individually distinct and may provide information about the age, stamina, and arousal of the calling male, and could thus be a way for males and females to assess competitors and mates from long distances.
Language is a distinguishing characteristic of our species, and the course of its evolution is one of the hardest problems in science. It has long been generally considered that human speech requires a low larynx, and that the high larynx of nonhuman primates should preclude their producing the vowel systems universally found in human language. Examining the vocalizations through acoustic analyses, tongue anatomy, and modeling of acoustic potential, we found that baboons (Papio papio) produce sounds sharing the F1/F2 formant structure of the human [ɨ æ ɑ ɔ u] vowels, and that similarly with humans those vocalic qualities are organized as a system on two acoustic-anatomic axes. This confirms that hominoids can produce contrasting vowel qualities despite a high larynx. It suggests that spoken languages evolved from ancient articulatory skills already present in our last common ancestor with Cercopithecoidea, about 25 MYA.