Computational interference of trustworthiness in social figures through analysis of speech acoustic, textual, and visual signals
Trustworthiness is an important social attribute that governs how humans interact and conduct themselves in society. Trust has been a popular topic of research in the areas of psychology and social sciences. This PhD study approaches the notion of trust from an engineering perspective. It puts forward a computational framework for classifying politicians into three categories of perceived trustworthiness (high, mid, and low). This research evaluates the effectiveness of traditional and state-of-the-art feature computation and representation techniques for three modalities (speech, text, and images) for trust prediction. It also explores approaches for combining these individual modalities to achieve a multimodal prediction of trust.
As part of this study, a new labelled dataset comprising of audio, text, and image modalities has been created using data from publicly available sources on the internet. Acoustic parameters were computed by processing YouTube videos of subjects delivering speeches to a crowd of people, textual data was obtained from Twitter posts, and visual data were curated from photographs. Features for these modalities were computed using traditional as well as state-of-the-art approaches. Machine learning (ML) classifiers were then trained to identify subjects with low, mid, and high levels of perceived trustworthiness. For the speech acoustic part, in addition to trust prediction experiments, statistical tests were also performed to identify the acoustic features that influence trust prediction. Whereas for text modality, trust prediction experiments were performed using models that represent textual features using both traditional techniques as well as those derived from deep-learning. For the image modality, experiments were performed using pre-trained deep-learning based computer vision models. When predicting trust using each modality individually, the highest accuracy achieved through audio modality was 92.81%, for text, it was 72.26%, and for images, it was 77.96%. Finally, by combining these modalities using the proposed multimodal solution, accuracy of up to 95.33% is achieved.
Additionally, the efficacy of methods used for the prediction of trustworthiness was also tested on datasets provided as part of ‘Interspeech’ challenges, where the proposed methods achieved top-level results. These experiments demonstrated the effectiveness of the proposed methodology and its suitability to other applications (similar to that of trustworthiness prediction) using social signal processing techniques and deep-learning algorithms.
History
Degree Type
Doctorate by ResearchImprint Date
2022-01-01School name
School of Engineering, RMIT UniversityFormer Identifier
9922107057301341Open access
- Yes