RMIT University
Browse

Using spatial audio cues from speech excitation for meeting speech segmentation

conference contribution
posted on 2024-10-31, 09:12 authored by Eva Cheng, Ian Burnett
Multiparty meetings generally involve stationary participants. Participant location information can thus be used to segment the recorded meeting speech into each speaker's 'turn' for meeting 'browsing'. To represent speaker location information from speech, previous research showed that the most reliable time delay estimates are extracted from the Hubert envelope of the linear prediction residual signal. The authors' past work has proposed the use of spatial audio cues to represent speaker location information. This paper proposes extracting spatial audio cues from the Hubert envelope of the speech residual for indicating changing speaker location for meeting speech segmentation. Experiments conducted on recordings of a real acoustic environment show that spatial cues from the Hubert envelope are more consistent across frequency subbands and can clearly distinguish between spatially distributed speakers, compared to spatial cues estimated from the recorded speech or residual signal

History

Start page

3067

End page

3070

Total pages

4

Outlet

8th International Conference on Signal Processing (ICSP '06)

Name of conference

ICSP 2006

Publisher

IEEE

Place published

USA

Start date

2006-11-16

End date

2006-11-20

Language

English

Copyright

© IEEE 2006

Former Identifier

2006014300

Esploro creation date

2020-06-22

Fedora creation date

2013-03-18

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC