RMIT University
Browse

Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition

conference contribution
posted on 2024-11-03, 12:21 authored by Melissa Stolar, Margaret LechMargaret Lech, Ian Burnett
This study investigates the effectiveness of speech emotion recognition using a new approach called the Optimized Multi-Channel Deep Neural Network (OMC-DNN), The proposed method has been tested with input features given as simple 2D black and white images representing graphs of the MFCC coefficients or the TEO parameters calculated either from speech (MFCC-S, TEO-S) or glottal waveforms (MFCC-G, TEO-G). A comparison with 6 different single-channel benchmark classifiers has shown that the OMC-DNN provided the best performance in both pair-wise (emotion vs. neutral) and simultaneous multiclass recognition of 7 emotions (anger, boredom, disgust, happiness, fear, sadness and neutral). In the pair-wise case, the OMC-DNN outperformed the single-channel DNN by 5%-10% depending on the feature set. In the multiclass case, the OMC-DNN outperformed or matched the singlechannel equivalents for all features. The speech spectrum and the glottal energy characteristics were identified as two important factors in discriminating between different types of categorical emotions in speech.

History

Start page

55

End page

60

Total pages

6

Outlet

Proceedings of the 8th International Conference on Signal Processing and Communication Systems (ICSPCS 2014)

Name of conference

ICSPCS 2014

Publisher

IEEE

Place published

United States

Start date

2014-12-15

End date

2014-12-17

Language

English

Copyright

© 2014 IEEE

Former Identifier

2006089231

Esploro creation date

2020-06-22

Fedora creation date

2019-01-31

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC