Browsing by Author "Katz, William F."

Now showing 1 - 6 of 6

Effects of Visual Feedback on the Production and Perception of Second Language Speech Sounds : A Comparison of Articulatory and Auditory Instruction
(2020-09-10) Mehta, Sonya; Katz, William F.
A fundamental issue in speech science concerns the extent to which speech sounds are mentally represented by articulatory-motor and/or auditory-acoustic features. This dissertation aims to expand upon the current literature by investigating changes in production and perception following visual feedback training with either articulatory or acoustic speech targets. Eleven second language (L2) learners of English participated in a single session of pronunciation practice in which they produced either an English vowel (/æ/-/ɛ/) or consonant (/s/-/ʃ/) contrast while their speech movements and acoustics were recorded using an electromagnetic articulography system. Participants were randomly assigned to one of two training conditions (visual feedback or control) and one of two feedback conditions (articulatory or acoustic). Articulatory-based visual feedback was provided by a talker-controlled tongue-avatar, while acoustic-based visual feedback was provided by a real-time sound spectrograph. Changes in production for vowel contrasts were acoustically analyzed by measuring the Euclidean distance between the two vowels. A subset of vowel and consonant tokens were additionally judged by native English listeners in a forced-choice perceptual discrimination task. In general, the results showed that talkers who received visual feedback training moderately improved their production accuracy when compared to those exposed to the control condition, although this result did not reach significance. For vowel contrasts, acoustic and perceptual data demonstrated that articulatory-based visual feedback led to a similar magnitude of improvement as acoustic-based visual feedback. Of the two talkers who trained on consonant contrasts, the talker who practiced with articulatory-based visual feedback showed a greater increase in production accuracy than the talker who practiced with acoustic-based visual feedback. An analysis of the relationship between the changes in talkers’ L2 production and perception following training revealed a significant positive correlation between production of the trained sound contrast and its perceptual discrimination. Overall, these findings do not support the hypothesis that a single session of visual feedback modifies talkers’ internal representation for L2 speech sounds. These data are limited by the small number of participants and may reflect learning constraints imposed by a single training session. In addition, planned analyses of the kinematic data may yet reveal covert contrasts between the speech sound contrasts that are not evident in the acoustic and perceptual data. Future work, including additional analyses of individual subject data, is needed to fully understand the mechanisms underlying visual feedback instruction.
Evaluation of Consonant Error Patterns in Speech Perception and Production of Pediatric Cochlear Implant Users
(2020-12-01T06:00:00.000Z) Peskova, Olga; Assmann, Peter F.; Abdi, Herve; Geers, Ann E.; Goffman, Lisa A.; Warner-Czyz, Andrea D.; Katz, William F.
This dissertation uses the framework of the speech chain to examine the associations between perception/production and associated error patterns in children using cochlear implants (CIs), children with normal hearing (NH), and children with NH listening to vocoder simulations (NHV). Chapter 1 introduces background information on the populations represented by the three groups. This chapter focuses on typical perception and production development and introduces the challenges in evaluating perception and production patterns in children with hearing loss (HL). Chapter 2 examines the association between consonant perception, production error patterns and speech ineligibility in two groups of children using CIs: one group implanted at early ages using newer CI technologies, and the other group implanted at later ages using older CI technologies. Data from the Chapter 2 helps to establish the methodological basis for constructing a new database described in the Chapter 3. Chapter 3 examines perception and production error patterns in children using CIs who were implanted after 2010 using newer CI technologies in comparison to NH and NHV control groups. Chapter 4 provides a general discussion that relates the findings of these three studies. Results from Chapter 2 indicate lower speech perception scores with a higher number and greater variability of errors in the CI group implanted with older technologies compared to the CI group implanted with newer technologies. Methodological limitations in the data presented in Chapter 2 did not permit a direct comparison of errors in consonant production and perception. The testing procedures developed in Chapter 3, which allowed for such comparisons, showed that production and perception error patterns generally did not mirror one another. Although no overall differences in mean error rate were observed between CI and NH groups, the error patterns for individual consonants in these groups were different. Perception performance of children in the NHV group was worse than in CI group suggesting caution is needed when interpreting CI vocoder simulation studies in children. The results provide important clinical information suggesting that intervention for children using CIs needs to consider how speech perception confusions interact with the production errors in order to develop more effective techniques for including them in clinical protocols.
Gestural Coordination of Modern Greek Consonant Clusters : Kinematic Analysis and Use of Supervised Learning Methods
(2021-07-01) Doli, Evdoxia; Katz, William F.
Consonant clusters are groups of consonants not interrupted by a vowel and are produced by a set of articulatory gestures that coordinate in time. Little is known about which factors affect the temporal overlap of these gestures. Articulatory Phonology (AP), an influential speech production theory that seeks to explain syllable organization patterns across languages, provides a useful framework for identifying articulatory landmarks and determining the temporal overlap of consonant clusters. According to AP, when consonants are added to the onset of a syllable, thus forming a consonant cluster (CCV), the timing of articulatory gestures readjusts, and the midpoint of these gestures remains the same as the midpoint of a syllable formed by a single consonant in its onset (CV). This phenomenon of temporal readjustment of gestures is known as the “c-center effect”. Previous literature examining the onset of different languages has shown that the “c-center effect” is not universal, as onsets in some languages demonstrate this effect (i.e., complex onsets), while onsets in other languages do not (i.e., simple onsets). Studies of clusters in different languages have suggested that three motoric factors can affect the temporal organization of an onset: (1) the place-order effect (POE), a motoric factor with perceptual implications, (2) the number of articulators (NART) forming a syllable (dependent and independent articulators) and (3) the degree of articulatory constraint (DAC), a principle that classifies speech sounds according to how much they resist coarticulation. This dissertation has three goals: First, to determine the temporal organization profile of consonant clusters in Standard Modern Greek, a language rich in clusters produced at different places of articulation (i.e., bilabial, labiodental, interdental, alveolar, velar) a language with little kinematic description. Second, to document the extent to which the three motoric factors (POE, NART, and DAC), considered individually or in combination, explain these temporal patterns. Third, to validate the AP claim of determining complex/simple temporal patterns based on displacement data by performing classification based on machine learning algorithms on subsets of the Modern Greek data. Seven native speakers of Modern Greek were recruited and instructed to produce words starting with consonant clusters and singletons. Their productions were recorded with the use of an 3D electromagnetic articulograph. The results suggested that 8/11 clusters examined were produced with a simple temporal organization (i.e., not “c-centered”), and all speakers were able to produce both complex and simple clusters. Next, it was analyzed which factors (i.e, POE, DAC, NART) or factor combination yielded the highest accuracy in predicting the temporal organization of these eleven clusters. The results revealed that the place-order effect, the only motoric factor with perceptual implications, had the highest accuracy (54%). This finding suggests that perceptual constrains can affect gestural overlap in speech production. No factor combination predicted the temporal profile of clusters with high accuracy; thus, it appears that factor predictions should rather be considered independently when predicting the temporal profile of consonant clusters. Machine learning algorithms (random forest and an artificial neural network) were used to determine the simple/complex profile of Greek clusters through the classification of raw kinematic (displacement) data. Two different approaches were applied: the first was to examine whether machine learning algorithms could predict the temporal profile of newly introduced clusters and the second to test the model’s ability to generalize to different speakers. The results showed that these algorithms were able to predict cluster temporal organization with 66% overall accuracy for the artificial neural network and 70% accuracy for the random forest, supporting the methodological approach of AP for determining the temporal organization based on kinematic data. In addition, the models were able to generalize better to new speakers, mainly because inter- and intra-talker variability in the data set was relatively small. Future work, including the analysis of a larger data set, is needed to better assess which factors and machine learning algorithms can predict clusters’ temporal organization most accurately. Overall, this dissertation provides new information for the field of speech production, regarding motor factors that affect temporal overlap and contributes to the elaboration of the theoretical framework of Articulatory Phonology with the use of machine learning algorithms.
Prelinguistic/emerging Linguistic Gesture Use in Young Autistic Children
(May 2023) De Froy, Adrienne Marie 1985-; Rollins, Pamela R.; Warner-Czyz, Andrea; Goffman, Lisa; Katz, William F.; Rojas, Raul
Autism is a complex neurodevelopmental disorder characterized by disruptions in early social communication skills. Early gesture plays a key role in prelinguistic/emerging linguistic communication and may provide insight into a child’s social communication skills before they acquire speech. As such, gesture is of particular interest for autism researchers. However, little is known about the gesture production of young autistic children from diverse racial/ethnic backgrounds in the early stages of gesture. In this dissertation, we explored the gesture production of young autistic children in the pre-linguistic/emerging linguistic stage of development within dyadic interactions across three studies. Data were gathered from culturally and socioeconomically diverse autistic children (ages 18-59 months) and a parent who participated in one of two more extensive randomized control trials of an early autism intervention. Study 1 explored the associations among race/ethnicity, parent gesture rate, and child gesture rate. Results indicated parents—like parents of non-autistic children—exhibited cross-racial/ethnic differences in gesture rate. However, child gesture rates were not related to the gesture rates of their parents. Consequently, children did not exhibit the same cross- racial/ethnic pattern of differences in gesture as their parents. Study 2 explored the relationship between elicitation task and child gesture. The rate and type of child gestures were compared across a naturalistic interaction with a parent and a standardized assessment of child communication administered by a research clinician. Results indicated (1) the most gestures and (2) the most developmentally advanced gestures were produced within structured interactions with clinicians that tempted child communication, while the fewest gestures were produced within play. Further, developmental differences emerged, such that greater differences between tasks were exhibited by children with receptive language ages greater than or equal to 9 months than by children below this developmental level. Study 3 analyzed the relationships among motor skill, social skill, and gesture in two ways: (1) the roles of gross motor, fine motor, and social skills (as measured by standardized assessments) on the rate of child gesture and (2) the relationship between the social sophistication of communication (level of communicative intention and coordination of communicative behaviors) and motor complexity of points. We found (1) standardized measures of social and gross motor (but not fine motor) skills explained unique variance in child gesture rate; (2a) coordination of communicative behaviors was positively related to motor complexity of gesture, with a small effect size; and (2b) motor complexity was related to the intentionality of communication (i.e., the degree to which the point had to first get/direct the adult’s attention) rather than to the social sophistication of the intention. Taken together, the work presented here underscores the importance of using diverse samples in autism research and interpreting results through a lens of cultural awareness. Our findings suggest an important role of motor skill in gesture production for young autistic children which has implications for future early intervention research.
Using Electromagnetic Articulography with a Tongue Lateral Sensor to Discriminate Manner of Articulation
(Acoustical Society of America) Katz, William F.; Mehta, Sonya; Wood, Matthew; Wang, Jun; Katz, William F.; Mehta, Sonya; Wood, Matthew; Wang, Jun
This study examined the contributions of the tongue tip (TT), tongue body (TB), and tongue lateral (TL) sensors in the electromagnetic articulography (EMA) measurement of American English alveolar consonants. Thirteen adults produced /ɹ/, /l/, /z/ and /d/ in /αCα/ syllables while being recorded with an EMA system. According to statistical analysis of sensor movement and the results of a machine classification experiment, the TT sensor contributed most to consonant differences, followed by TB. The TL sensor played a complementary role, particularly for distinguishing /z/. © 2017 Acoustical Society of America.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning
(Frontiers Media S. A.) Katz, William F.; Mehta, S.; 305456098 (Katz, WF)
Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one’s own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker’s learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers’ productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. .