This study investigates the effectiveness of speech emotion recognition using a new approach called the Optimized Multi-Channel Deep Neural Network (OMC-DNN), The proposed method has been tested with input features given as simple 2D black and white images representing graphs of the MFCC coefficients or the TEO parameters calculated either from speech (MFCC-S, TEO-S) or glottal waveforms (MFCC-G, TEO-G). A comparison with 6 different single-channel benchmark classifiers has shown that the OMC-DNN provided the best performance in both pair-wise (emotion vs. neutral) and simultaneous multiclass recognition of 7 emotions (anger, boredom, disgust, happiness, fear, sadness and neutral). In the pair-wise case, the OMC-DNN outperformed the single-channel DNN by 5%-10% depending on the feature set. In the multiclass case, the OMC-DNN outperformed or matched the singlechannel equivalents for all features. The speech spectrum and the glottal energy characteristics were identified as two important factors in discriminating between different types of categorical emotions in speech.
History
Start page
55
End page
60
Total pages
6
Outlet
Proceedings of the 8th International Conference on Signal Processing and Communication Systems (ICSPCS 2014)