RMIT University
Browse

Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music

journal contribution
posted on 2024-11-01, 11:41 authored by Namunu Maddage, Haizhou Li
Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation with the frame size being the duration of the smallest note in the music, as opposed to fixed length segmentation in spoken language recognition. It is found that acoustic features extracted from the rhythm based segmentation scheme outperform those from fixed length segmentation. We also study the effectiveness of a musically motivated acoustic feature. Octave scale cepstral coefficients (OSCCs) by comparing with the other acoustic features: Log frequency cepstral coefficients, Linear prediction coefficients (LPC) and LPC-derived cepstral coefficients. Finally, we examine the modeling capabilities of Gaussian mixture models and support vector machines in sung language recognition experiments. Experiments conducted on a corpus of 400 popular songs sung in English, Chinese, German, and Indonesian, showed that the OSCC feature outperforms other features. A sung language recognition accuracy of 64.9% was achieved when Gaussian mixture models were trained on shifted-delta-OSCC acoustic features, extracted via rhythm based music segmentation.

History

Journal

Acm Transactions On Multimedia Computing Communications And Applications

Volume

7

Number

37

Issue

4

Start page

1

End page

19

Total pages

19

Publisher

Association for Computing Machinery, Inc.

Place published

United States

Language

English

Copyright

© 2011ACM.

Former Identifier

2006032257

Esploro creation date

2020-06-22

Fedora creation date

2015-01-16

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC