Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images
journal contribution
posted on 2024-11-02, 08:08authored byMargaret LechMargaret Lech, Melissa Stolar, Robert Bolia, Michael Skinner
Automatic speech emotion recognition (SER) techniques based on acoustic analysis show high confusion between certain emotional categories. This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods. The analysis was carried out by transforming short 1-second blocks of speech into RGB or grey-scale images of spectrograms. The images were used to fine-tune a pre-trained image classification network to recognize emotions. Spectrogram representation on four different frequency scales - linear, melodic, equivalent rectangular bandwidth (ERB), and logarithmic - allowed observation of the effects of high, mid-high, mid-low and low frequency characteristics of speech, respectively. Whereas the use of either red (R), green (G) or blue (B) components of RGB images showed the importance of speech components with high, mid and low amplitude levels, respectively. Experiments conducted on the Berlin emotional speech (EMO-DB) data revealed the relative positions of seven emotional categories (anger, boredom, disgust, fear, joy, neutral and sadness) on the amplitude-frequency plane.
History
Journal
Advances in Science, Technology and Engineering Systems Journal
Volume
3
Issue
4
Start page
363
End page
371
Total pages
9
Publisher
Advances in Science, Technology and Engineering Systems