posted on 2024-11-02, 14:15authored byZafi Syed, Sajjad Memon, Muhammad Shehram Shah Syed, Abbas Syed
Speech emotion recognition is one of the most active areas of research in the field of affective computing and social signal processing. However, most research is directed towards a select group of languages such as English, German, and French. This is mainly due to a lack of available datasets in other languages. Such languages are called low-resource languages given that there is a scarcity of publicly available datasets. In the recent past, there has been a concerted effort within the research community to create and introduce datasets for emotion recognition for low-resource languages. To this end, we introduce in this paper the Urdu-Sindhi Speech Emotion Corpus, a novel dataset consisting of 1,435 speech recordings for two widely spoken languages of South Asia, that is Urdu and Sindhi. Furthermore, we also trained machine learning models to establish a baseline for classification performance, with accuracy being measured in terms of unweighted average recall (UAR). We report that the best performing model for Urdu language achieves a UAR = 65:00% on the validation partition and a UAR = 56:96% on the test partition. Meanwhile, the model for Sindhi language achieved UARs of 66:50% and 55:29% on the validation and test partitions, respectively. This classification performance is considerably better than the chance level UAR of 16:67%. The dataset can be accessed via https://zenodo.org/record/3685274