RMIT University
Browse

Effects of band reduction and coding on speech emotion recognition

conference contribution
posted on 2024-10-31, 20:37 authored by Abas Albahri, Margaret LechMargaret Lech
Majority of Speech Emotion Recognition results refer to full-band uncompressed speech signals. Potential applications of SER on various types of speech platforms pose important questions about potential effects of bandwidth limitations and compression techniques used by speech communication systems on the accuracy of SER. The current study provides answers to these questions based on SER experiments with a band-limited speech as well as compressed speech. Compression techniques included AMR, AMR-WB, AMR-WB+ and mp3 methods. The modelling and classification of speech emotions was achieved using a benchmark approach based on the GMM classifier and speech features including MFCCs, TEO and glottal time and frequency domain parameters. The tests used the Berlin Emotional Speech database with speech signals sampled at 16 kHz. The results indicated that the low frequency components (0-1 kHz) of speech as well as, the high frequency components (above 4 kHz) play an important role in SER. The mp3 compression worked better with the MFCC features than with the TEO and glottal parameters. The AMR-WB and AMR-WB+ outperformed the AMR.

History

Start page

1

End page

7

Total pages

7

Outlet

Proceedings of the 10th International Conference on Signal Processing and Communication Systems, (ICSPCS 2016)

Editors

Tadeusz A Wysocki and Beata J Wysocki

Name of conference

ICSPCS 2016

Publisher

IEEE

Place published

United States

Start date

2016-12-19

End date

2016-12-21

Language

English

Copyright

© 2016 IEEE

Former Identifier

2006071045

Esploro creation date

2020-06-22

Fedora creation date

2017-03-06

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC