RMIT University
Browse

Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech

conference contribution
posted on 2024-10-31, 09:46 authored by Ling He, Margaret LechMargaret Lech, Namunu Maddage, Nicholas Allen
Three new methods of feature extraction based on time-frequency analysis of speech are presented and compared. In the first approach, speech spectrograms were passed through a bank of 12 log-Gabor filters and the outputs are averaged. In the second approach, the spectrograms were sub-divided into ERB frequency bands and the average energy for each band is calculated. In the third approach, wavelet packet arrays were calculated and passed through a bank of 12 log-Gabor filters and averaged. The feature extraction methods were tested in the process of automatic stress and emotion classification. The feature distributions were modeled and classified using a Gaussian mixture model. The test samples included single vowels, words and sentences from the SUSAS data base with 3 classes of stress, and spontaneous speech recordings with 5 emotional classes from the ORI data base. The classification results showed correct classification rates ranging from 64.70% to 84.85%, for different SUSAS data sets and from 39.6% to 53.4% for the ORI data base.

History

Start page

1

End page

5

Total pages

5

Outlet

Conference Proceedings of the Seventh International Conference on Information, Communications & Signal Processing (ICICS 2009)

Name of conference

7th International Conference on Information, Communications and Signal Processing (ICICS 2009)

Publisher

IEEE

Place published

Singapore

Start date

2009-12-08

End date

2009-12-10

Language

English

Copyright

©2009 IEEE.

Former Identifier

2006018604

Esploro creation date

2020-06-22

Fedora creation date

2011-06-10

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC