RMIT University
Browse

Introducing the urdu-sindhi speech emotion corpus: A novel dataset of speech recordings for emotion recognition for two low-resource languages

journal contribution
posted on 2024-11-02, 14:15 authored by Zafi Syed, Sajjad Memon, Muhammad Shehram Shah Syed, Abbas Syed
Speech emotion recognition is one of the most active areas of research in the field of affective computing and social signal processing. However, most research is directed towards a select group of languages such as English, German, and French. This is mainly due to a lack of available datasets in other languages. Such languages are called low-resource languages given that there is a scarcity of publicly available datasets. In the recent past, there has been a concerted effort within the research community to create and introduce datasets for emotion recognition for low-resource languages. To this end, we introduce in this paper the Urdu-Sindhi Speech Emotion Corpus, a novel dataset consisting of 1,435 speech recordings for two widely spoken languages of South Asia, that is Urdu and Sindhi. Furthermore, we also trained machine learning models to establish a baseline for classification performance, with accuracy being measured in terms of unweighted average recall (UAR). We report that the best performing model for Urdu language achieves a UAR = 65:00% on the validation partition and a UAR = 56:96% on the test partition. Meanwhile, the model for Sindhi language achieved UARs of 66:50% and 55:29% on the validation and test partitions, respectively. This classification performance is considerably better than the chance level UAR of 16:67%. The dataset can be accessed via https://zenodo.org/record/3685274

History

Related Materials

  1. 1.
    DOI - Is published in 10.14569/IJACSA.2020.01104104
  2. 2.
    ISSN - Is published in 2158107X

Journal

International Journal of Advanced Computer Science and Applications

Volume

11

Number

4104

Issue

4

Start page

1

End page

6

Total pages

6

Publisher

Science and Information Organization

Place published

United Kingdom

Language

English

Copyright

© 2020 Science and Information Organization.

Former Identifier

2006102594

Esploro creation date

2020-11-19

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC