Gestural Coordination of Modern Greek Consonant Clusters : Kinematic Analysis and Use of Supervised Learning Methods




Journal Title

Journal ISSN

Volume Title



Consonant clusters are groups of consonants not interrupted by a vowel and are produced by a set of articulatory gestures that coordinate in time. Little is known about which factors affect the temporal overlap of these gestures. Articulatory Phonology (AP), an influential speech production theory that seeks to explain syllable organization patterns across languages, provides a useful framework for identifying articulatory landmarks and determining the temporal overlap of consonant clusters. According to AP, when consonants are added to the onset of a syllable, thus forming a consonant cluster (CCV), the timing of articulatory gestures readjusts, and the midpoint of these gestures remains the same as the midpoint of a syllable formed by a single consonant in its onset (CV). This phenomenon of temporal readjustment of gestures is known as the “c-center effect”. Previous literature examining the onset of different languages has shown that the “c-center effect” is not universal, as onsets in some languages demonstrate this effect (i.e., complex onsets), while onsets in other languages do not (i.e., simple onsets). Studies of clusters in different languages have suggested that three motoric factors can affect the temporal organization of an onset: (1) the place-order effect (POE), a motoric factor with perceptual implications, (2) the number of articulators (NART) forming a syllable (dependent and independent articulators) and (3) the degree of articulatory constraint (DAC), a principle that classifies speech sounds according to how much they resist coarticulation. This dissertation has three goals: First, to determine the temporal organization profile of consonant clusters in Standard Modern Greek, a language rich in clusters produced at different places of articulation (i.e., bilabial, labiodental, interdental, alveolar, velar) a language with little kinematic description. Second, to document the extent to which the three motoric factors (POE, NART, and DAC), considered individually or in combination, explain these temporal patterns. Third, to validate the AP claim of determining complex/simple temporal patterns based on displacement data by performing classification based on machine learning algorithms on subsets of the Modern Greek data. Seven native speakers of Modern Greek were recruited and instructed to produce words starting with consonant clusters and singletons. Their productions were recorded with the use of an 3D electromagnetic articulograph. The results suggested that 8/11 clusters examined were produced with a simple temporal organization (i.e., not “c-centered”), and all speakers were able to produce both complex and simple clusters. Next, it was analyzed which factors (i.e, POE, DAC, NART) or factor combination yielded the highest accuracy in predicting the temporal organization of these eleven clusters. The results revealed that the place-order effect, the only motoric factor with perceptual implications, had the highest accuracy (54%). This finding suggests that perceptual constrains can affect gestural overlap in speech production. No factor combination predicted the temporal profile of clusters with high accuracy; thus, it appears that factor predictions should rather be considered independently when predicting the temporal profile of consonant clusters. Machine learning algorithms (random forest and an artificial neural network) were used to determine the simple/complex profile of Greek clusters through the classification of raw kinematic (displacement) data. Two different approaches were applied: the first was to examine whether machine learning algorithms could predict the temporal profile of newly introduced clusters and the second to test the model’s ability to generalize to different speakers. The results showed that these algorithms were able to predict cluster temporal organization with 66% overall accuracy for the artificial neural network and 70% accuracy for the random forest, supporting the methodological approach of AP for determining the temporal organization based on kinematic data. In addition, the models were able to generalize better to new speakers, mainly because inter- and intra-talker variability in the data set was relatively small. Future work, including the analysis of a larger data set, is needed to better assess which factors and machine learning algorithms can predict clusters’ temporal organization most accurately. Overall, this dissertation provides new information for the field of speech production, regarding motor factors that affect temporal overlap and contributes to the elaboration of the theoretical framework of Articulatory Phonology with the use of machine learning algorithms.



Machine learning, Phonetics, Motor ability, Speech, Speech acts (Linguistics)