Department of Lingustics Faculty and Researcher Data and Papers
Permanent URI for this collectionhttps://digital.lib.washington.edu/handle/1773/25267
Browse
Recent Submissions
Item type: Item , Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data(2023) Kelley, Matthew CThe speech signal is a consummate example of time-series data. The acoustics of the signal change over time, sometimes dramatically. Yet, the most common type of comparison we perform in phonetics is between instantaneous acoustic measurements, such as formant values. In the present paper, I discuss the concept of absement as a quantification of differences between two time-series. I then provide an experimental example of absement applied to phonetic analysis for human and/or computer speech recognition. The experiment is a template-based speech recognition task, using dynamic time warping to compare the acoustics between recordings of isolated words. A recognition accuracy of 57.9% was achieved. The results of the experiment are discussed in terms of using absement as a tool, as well as the implications of using acoustics-only models of spoken word recognition with the word as the smallest discrete linguistic unit.Item type: Item , Production and perception of novel consonant clusters(2023-05-08) Kelley, Matthew C.; Oganyan, Marina; Wright, Richard A.The present study details a visual world paradigm eye tracking experiment on perception and production of novel consonant clusters. The clusters varied in difficulty based on sonority or perceptual salience scales. During the familiarization phase, listeners heard and watched a story on-screen and produced the names of novel creatures and objects. Each story focused on one cluster. Four creatures/objects were introduced corresponding to the (1) target cluster (e.g., [dlɑpɹ]), (2) epenthesized cluster competitor (e.g., [dəlɑkn]), (3) first cluster consonant (e.g., donkey), and (4) second cluster consonant (e.g., lollipop). During each trial, participants verbally identified one of the four creatures/objects, and their gaze was tracked. We analyzed the productions and gazes to investigate the relative difficulty of producing and perceiving the consonant clusters. We predicted participants would (1) correctly produce/perceive a cluster, (2) epenthesize a vowel/perceive vowel epenethesis, or (3) perceive/produce only one of the cluster sounds. We statistically analyzed the participants’ propensity toward each behavior in both production and perception. As part of the analysis, we examined whether participant behavior was better predicted by analyzing the clusters with the sonority sequencing principle or in terms of acoustic salience and cue recoverability.Item type: Item , The recognition of spoken pseudowords(2022) Kelley, Matthew C; Tucker, Benjamin VPseudowords are used as stimuli in many psycholinguistic experiments, yet they remain largely under-researched. To better understand the cognitive processing of pseudowords, we analysed the pseudoword responses in the Massive Auditory Lexical Decision megastudy data set. Linguistic characteristics that influence the processing of real English words – namely, phonotactic probability, phonological neighbourhood density, uniqueness point, and morphological complexity – were also found to influence the processing time of spoken pseudowords. Subsequently, we analysed how the linguistic characteristics of non-unique portions of pseudowords influenced processing time. We again found that the named linguistic characteristics affected processing time, highlighting the dynamicity of activation and competition. We argue these findings also speak to learning new words and spoken word recognition generally. We then discuss what aspects of pseudoword recognition a full model of spoken word recognition must account for. We finish with a re-description of the auditory lexical decision task in light of our results. This is an Accepted Manuscript of an article published by Taylor & Francis in Language, Cognition, and Neuroscience in 2022, available at: https://www.tandfonline.com/10.1080/23273798.2022.2053729.Item type: Item , Perception and timing of acoustic distance(2022-05-26) Kelley, Matthew C.; Tucker, Benjamin V.The notion of acoustic distance figures into many aspects of phonetics, including phonological neighborhoods. A measurement of word-level acoustic distance useful for cognitive modeling must account for two listener characteristics: sensitivity to acoustic differences and sensitivity to duration discrepancies between words. The present work used dynamic time warping to measure how acoustic distance accumulates between words over time. The results of a distance rating task with synthesized vowels were used as a basis for selecting a mathematical function that best matched listener sensitivities. Additionally, the results of a reminder task with synthesized vowels were used to determine a just noticeable difference threshold for vowel duration. The results suggested that a distance function based on the 4.5-norm using a 30 ms radius for dynamic time warping best matched human behavior. A third analysis used these dynamic time warping configurations to model reaction times in an auditory lexical decision task and found that Euclidean distance and no temporal constraints on dynamic time warping best matched human behavior. These results are discussed in relation to spoken word recognition models, including how to assess the acoustic match between the speech signal and a word in the lexicon.Item type: Item , AGGREGATION(2020) Bender, Emily M.; Howell, Kristen; Xia, Fei; Zamaraeva, Olga; Goodman, Michael Wayne; Crowgey, Joshua; Packard, Woodley; Lockwood, Michael Wayne; Lepp, Haley; Ramaswamy, Swetha; Bateman, Emma; Heath, Jeff; Inman, David; Burrel, Alex; Zhang, Claude; Flickinger, Dan; Oepen, Stephan; Drellishak, Scott; Poulson, Laurie; O'Hara, Kelly; Fokkens, Antske; Hou, Joshua; Mills, Daniel P.; Song, Sanghoun; Halgrim, Scott; Wax, David; Gracheva, Varya; Trimble, TJ; Curtis, Chris; Dermer, Laurie; Haeger, Mike; Nielsen, Elizabeth; Nordlinger, RachelThe AGGREGATION Project aims to bring the benefits of grammar engineering to language documentation without requiring field linguists to become grammar engineers. We achieve this by automatically creating precision grammars on the basis of analyses and annotations already produced by field linguists together with a typologically-grounded cross-linguistic grammar resource (the LinGO Grammar Matrix) and natural language processing techniques developed for high-resource languages. Precision grammars are machine-readable encodings of mutually-consistent linguistic hypotheses, in our case, concerning morphotactics, morphosyntax and the syntax-semantics interface. They can be used to automatically process text, assigning structures to input strings and strings to input semantic representations. Text processed in this way can then be searched for sentences or word forms with structures of interest or items that are not covered by the grammar (i.e. fall outside current hypotheses).Item type: Item , Separating segmental and prosodic contributions to intelligibility(2013-09) McCloy, Daniel RobertIt is well known that the intelligibility of speech can vary both across individuals within styles or tasks, and within individuals across styles or tasks. Various properties of the speech signal have been shown to correlate with such differences in intelligibility, including speech rate, [5,7,8] segmental reduction or deletion, [1] vowel space size, [1,2,4,6] pitch range, [2] and pitch accent deletion. [3] However, these dimensions are rarely (if ever) manipulated independently in natural speech. This poses a challenge to understanding the sources of individual differences in intelligibility (both across individuals and across styles), and makes it difficult to know whether any particular dimension measured causes speech to be more or less intelligible, or merely indexes some other aspect of speech that is responsible for intelligibility differences. As an alternative to measuring fine-grained dimensions of the speech signal, this research makes a broad distinction between prosodic dimensions (pitch, intensity, and duration) on one hand, and segmental content on the other. Through careful resynthesis, a corpus of parallel sentences are created that effectively hold constant either prosody or segmental content across resynthesized “talkers”. High-quality stimuli are achieved by hand-correction of glottal pulse epochs and semi-automated hand segmentation of syllable durations, followed by automated dynamic time warping of durations and swapping of pitch and intensity contours. Results from a speech-in-noise task with both unmodified and resynthesized stimuli show that talkers with low intrinsic intelligibility may have relatively “good” prosody, evidenced by improvements in intelligibility when their prosody is mapped onto other talkers’ waveforms. In contrast, talkers with high intrinsic intelligibility may have relatively “bad” prosody, evidenced by lower intelligibility caused by mapping their prosody onto other talkers. A linear mixed-effects regression model (controlling for signal processing distortion and variation in sentence difficulty) supports this view: patterns of coefficients for “prosodic donor” and “segmental donor” show different rankings than the overall intelligibility scores for unmodified talkers. Comparison between these patterns and post-hoc acoustic analyses of the stimuli allows classification of acoustic predictors based on how well they correlate with “prosodic donor” or “segmental donor” coefficient patterns. References [1] Bond, Z. S., & Moore, T. J. (1994). A note on the acoustic-phonetic characteristics of inadvertently clear speech. Speech Communication, 14(4), 325–337. doi: 10.1016/0167-6393(94)90026-4. [2] Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20(3-4), 255–272. doi: 10.1016/S0167-6393(96)00063-5. [3] Clopper, C. G., & Smiljanić, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245. doi: 10.1016/j.wocn.2011.02.006. [4] Hazan, V., & Markham, D. (2004). Acoustic-phonetic correlates of talker intelligibility for adults and children. The Journal of the Acoustical Society of America, 116(5), 3108–3118. doi: 10.1121/1.1806826. [5] Mayo, C., Aubanel, V., & Cooke, M. (2012). Effect of prosodic changes on speech intelligibility. Paper presented at the 13th Annual Conference of the International Speech Communication Association. In INTERSPEECH-2012. url: http://interspeech2012.org/accepted-abstract.html?id=661 [6] Neel, A. T. (2008). Vowel space characteristics and vowel identification accuracy. Journal of Speech, Language, and Hearing Research, 51(3), 574–585. doi: 10.1044/1092-4388(2008/041). [7] Sommers, M. S., Nygaard, L. C., & Pisoni, D. B. (1994). Stimulus variability and spoken word recognition I: Effects of variability in speaking rate and overall amplitude. The Journal of the Acoustical Society of America, 96(3), 1314–1324. doi: 10.1121/1.411453. [8] Tolhurst, G. C. (1957). Effects of duration and articulation changes on intelligibility, word reception and listener preference. Journal of Speech and Hearing Disorders, 22(3), 328–334.Item type: Item , Corpus-based productivity measures of English -er agentives and instrumentals(University of Washington, 2013) McCloy, Daniel RobertThis paper investigates the claim that agentive and instrumental forms of English “-er”morpheme show differing productivity (a claim due to Derwing 1976). An attempt is made to replicate Derwing’s findings using modern corpus methods. Novel annotations for animacy and agentivity/instrumentality were created on the Brown corpus (Kučera and Francis, 1967). Findings show agentive “-er” is much more frequent than instrumental “-er” (>5× token frequency, >3× type frequency). Exponential modeling suggests the productivity of instrumental “-er” is not less than agentive “-er”, and perhaps slightly greater (contra Derwing). Agentive/instrumental annotations also reveal many difficult-to-classify cases. However, productivity values based on the agentivity/instrumentality were mirrored by those based on animate/inanimate distinctions. This parity arises from high correlation between the agentive and animate categories, and suggests that future studies with larger corpora could safely rely on animacy as a proxy for agentivity.Item type: Item , Modelling talker intelligibility variation in a dialect-controlled corpus(2012) McCloy, Daniel Robert; Wright, Richard A.; McGrath, August T. D.In a newly created corpus of 3600 read sentences (20 talkers x 180 sentences), considerable variability in talker intelligibility has been found. This variability occurs despite rigorous attempts to ensure uniformity, including strict dialectal criteria in subject selection, speech style guidance with feedback during recording, and head-mounted microphones to ensure consistent signal-to-noise ratio. Nonetheless, we observe dramatic differences in talker intelligibility when the sentences are presented to dialect-matched listeners in noise. We fit a series of linear mixed-effects models using several acoustic characteristics as fixed-effect predictors, with random effects terms controlling for both talker & listener variability. Results indicate that between-talker variability is captured by speech rate, vowel space expansion, and phonemic crowding. These three dimensions account for virtually all of the talker-related variance, obviating the need for a random effect for talker in the model.Vowel space expansion is found to be best captured by polygonal area (contra Bradlow et al 1996), and phonemic overlap is best captured by repulsive force (cf. Liljencrants & Lindblom 1972, Wright 2004). Results are discussed in relation to prior studies of intelligibility.Item type: Item , Vowel laxing in Indonesian as a test case for interaction of morphological and syllabic structure(2011-05) McCloy, Daniel RobertLax (also known as, centralized) vowel allophones are attested in Indonesian for non‐low vowels in closed syllables [e.g., Sneddon (1996)]. In consonant‐final stems with vowel‐initial suffixes (ke+apik+an), phonological theory (the maximal onset principle) predicts the stem‐final consonant to syllabify with the suffix (ke.a.pi.kan) and the preceding vowel to manifest as unreduced. This study compares speaker‐normalized formant values for such vowels against formant values for vowels in stem‐final open syllables with obstruent‐initial affixes (men+jadi+kan) and vowels in monomorphemic contexts (tikam). Word‐final open and closed syllables (jadi, cerdik) are included as reference points in the vowel space. Male and female L1 speakers of standard Indonesian are recorded reading three randomizations of the word list. Data collection is ongoing, but preliminary results suggest that for front (unround) vowels in stem‐final closed syllables with vowel‐initial affixes (ke+apik+an), the formant values fall between the values for prototypical lax and non‐lax Indonesian vowels; no clear pattern has yet emerged for back (round) vowels. Findings suggest that morphological structure (i.e., whether a segment “belongs” to stem or affix) may constrain syllabification or cause deviations from preferred syllable structures.Item type: Item , The semantics of implicitly relational predicates(Simon Fraser University, 2010) McCloy, Daniel RobertItem type: Item , Revisiting population size vs. phoneme inventory size(Linguistic Society of America, 2012-12) Moran, Steven; McCloy, Daniel Robert; Wright, Richard A.In this paper we argue against the findings presented in Hay & Bauer 2007, which show a positive correlation between population size and phoneme inventory size. We argue that the positive correlation is an artifact of the authors’ statistical technique and biased data set. Using a hierarchical mixed model to account for genealogical relatedness of languages, and a much larger and more diverse sample of the world’s languages, we find little support for population size as an explanatory predictor of phoneme inventory size once the genealogical relatedness of languages is accounted for.
