RMIT University
Browse

Visual speech recognition and utterance segmentation based on mouth movement

Download (559.8 kB)
conference contribution
posted on 2024-11-23, 03:35 authored by Wai Yau, H Weghorn, Dinesh KumarDinesh Kumar
This paper presents a vision-based approach to recognize speech without evaluating the acoustic signals. The proposed technique combines motion features and support vector machines (SVMs) to classify utterances. Segmentation of utterances is important in a visual speech recognition system. This research proposes a video segmentation method to detect the start and end frames of isolated utterances from an image sequence. Frames that correspond to `speaking' and `silence' phases are identified based on mouth movement information. The experimental results demonstrate that the proposed visual speech recognition technique yields high accuracy in a phoneme classification task. Potential applications of such a system are, e.g., human computer interface (HCI) for mobility-impaired users, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments.

History

Start page

7

End page

14

Total pages

8

Outlet

Digital Image Computing : Techniques and Applications (DICTA 2007)

Editors

M. Bottema, A. Maeder, N. Redding and A. van den Hengel

Name of conference

Digital Image Computing : Techniques and Applications (DICTA 2007)

Publisher

IEEE

Place published

Piscataway, USA

Start date

2007-12-03

End date

2007-12-05

Language

English

Copyright

© 2007 IEEE

Former Identifier

2006007584

Esploro creation date

2020-06-22

Fedora creation date

2009-04-08

Open access

  • Yes

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC