RMIT University
Browse

Computational interference of trustworthiness in social figures through analysis of speech acoustic, textual, and visual signals

Download (2.54 MB)
thesis
posted on 2024-11-24, 04:13 authored by Muhammad Shehram Shah Syed

Trustworthiness is an important social attribute that governs how humans interact and conduct themselves in society. Trust has been a popular topic of research in the areas of psychology and social sciences. This PhD study approaches the notion of trust from an engineering perspective. It puts forward a computational framework for classifying politicians into three categories of perceived trustworthiness (high, mid, and low). This research evaluates the effectiveness of traditional and state-of-the-art feature computation and representation techniques for three modalities (speech, text, and images) for trust prediction. It also explores approaches for combining these individual modalities to achieve a multimodal prediction of trust.

As part of this study, a new labelled dataset comprising of audio, text, and image modalities has been created using data from publicly available sources on the internet. Acoustic parameters were computed by processing YouTube videos of subjects delivering speeches to a crowd of people, textual data was obtained from Twitter posts, and visual data were curated from photographs. Features for these modalities were computed using traditional as well as state-of-the-art approaches. Machine learning (ML) classifiers were then trained to identify subjects with low, mid, and high levels of perceived trustworthiness. For the speech acoustic part, in addition to trust prediction experiments, statistical tests were also performed to identify the acoustic features that influence trust prediction. Whereas for text modality, trust prediction experiments were performed using models that represent textual features using both traditional techniques as well as those derived from deep-learning. For the image modality, experiments were performed using pre-trained deep-learning based computer vision models. When predicting trust using each modality individually, the highest accuracy achieved through audio modality was 92.81%, for text, it was 72.26%, and for images, it was 77.96%. Finally, by combining these modalities using the proposed multimodal solution, accuracy of up to 95.33% is achieved.

Additionally, the efficacy of methods used for the prediction of trustworthiness was also tested on datasets provided as part of ‘Interspeech’ challenges, where the proposed methods achieved top-level results. These experiments demonstrated the effectiveness of the proposed methodology and its suitability to other applications (similar to that of trustworthiness prediction) using social signal processing techniques and deep-learning algorithms.

History

Degree Type

Doctorate by Research

Imprint Date

2022-01-01

School name

School of Engineering, RMIT University

Former Identifier

9922107057301341

Open access

  • Yes

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC