Majority of Speech Emotion Recognition
results refer to full-band uncompressed speech signals.
Potential applications of SER on various types of speech
platforms pose important questions about potential effects
of bandwidth limitations and compression techniques used
by speech communication systems on the accuracy of SER.
The current study provides answers to these questions
based on SER experiments with a band-limited speech as
well as compressed speech. Compression techniques
included AMR, AMR-WB, AMR-WB+ and mp3 methods.
The modelling and classification of speech emotions was
achieved using a benchmark approach based on the GMM
classifier and speech features including MFCCs, TEO and
glottal time and frequency domain parameters. The tests
used the Berlin Emotional Speech database with speech
signals sampled at 16 kHz. The results indicated that the
low frequency components (0-1 kHz) of speech as well as,
the high frequency components (above 4 kHz) play an
important role in SER. The mp3 compression worked
better with the MFCC features than with the TEO and
glottal parameters. The AMR-WB and AMR-WB+
outperformed the AMR.
History
Start page
1
End page
7
Total pages
7
Outlet
Proceedings of the 10th International Conference on Signal Processing and Communication Systems, (ICSPCS 2016)