RMIT University
Browse

Computational speech acquisition for articulatory synthesis

Download (6.64 MB)
thesis
posted on 2024-11-24, 07:24 authored by Denis SHITOV
Articulatory synthesis is one of the approaches used for modelling of human speech production. This thesis proposes a model-based algorithm for learning the policy to control a vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. The proposed method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample-efficiency of the learning, the model of speech production dynamics was trained simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated in the algorithm. Additionally, acoustic features were extracted using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Experiments showed that introducing this AWE model was crucial to guide the policy towards a near-optimal solution. These acoustic embeddings, obtained using the proposed approach, revealed to be useful when applied as inputs to the policy and the model of speech production dynamics.

History

Degree Type

Doctorate by Research

Imprint Date

2020-01-01

School name

School of Engineering, RMIT University

Former Identifier

9921956711201341

Open access

  • Yes

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC