Prediction of Inter-Personal Trust and Team Familiarity From Speech: A Double Transfer Learning Approach
journal contribution
posted on 2024-11-02, 16:59authored byCatherine Sandoval Rodriguez, April Panganiban, Melissa Stolar, Robert Bolia, Margaret LechMargaret Lech
Speech classification is one of the most convenient objective measures of internal state exhibited during a problem-solving task that requires verbal communication. This study investigates the hypothesis of speech acoustic characteristics being indicative of trust between team members and team members' familiarity with each other. Speech recordings from 27 dyadic teams (26 males and 28 females) were made during a distributed threat perception task, determining safe points along a route through the town to be visited by a VIP. Before the threat detection mission, 26 team members knew each other, and the remaining 28 had no prior knowledge of their partners. Two levels (Low Trust and High Trust) of two trust constructs, TTP (Trust, Trustworthiness, Propensity to trust), and RIS (Reliance Intentions Scale), were estimated based on numerical responses to pre- and post-mission surveys. Speech recordings of individual speakers were divided into 1-second intervals and converted into RGB images of amplitude spectrograms. The images were classified using a pre-trained convolutional neural network ResNet-18 fine-tuned to recognize either the trust level or familiarity. In the baseline classification scenario, the speech was classified using a single transfer learning into Low/High-trust categories separately for RIS and TTP constructs before and after the mission yielding an average classification accuracy of 82%-86%. Single transfer learning classification into Know/Unknown-partners categories led to 85% accuracy. Application of double transfer learning, i.e., first tuning the ResNet-18 on Know/Unknown labels and then on Low/High-trust, increased the trust classification accuracy up to 89%. When tuning the ResNet-18 on Low/High-trust and then on Known/Unknown labels, the accuracy of partner familiarity recognition was also increased up to 89%. These results support the hypothesis of speech acoustics being indicative of trust and familiarity between team members and show that by adding prior related knowledge to the model, more efficient learning can be achieved without increasing the training data size.