This paper presents a vision-based approach to recognize speech without evaluating the acoustic signals. The proposed technique combines motion features and support vector machines (SVMs) to classify utterances. Segmentation of utterances is important in a visual speech recognition system. This research proposes a video segmentation method to detect the start and end frames of isolated utterances from an image sequence. Frames that correspond to `speaking' and `silence' phases are identified based on mouth movement information. The experimental results demonstrate that the proposed visual speech recognition technique yields high accuracy in a phoneme classification task. Potential applications of such a system are, e.g., human computer interface (HCI) for mobility-impaired users, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments.
History
Start page
7
End page
14
Total pages
8
Outlet
Digital Image Computing : Techniques and Applications (DICTA 2007)
Editors
M. Bottema, A. Maeder, N. Redding and A. van den Hengel
Name of conference
Digital Image Computing : Techniques and Applications (DICTA 2007)