posted on 2024-11-24, 08:11authored byYassien SHAALAN
With online reviews numbering in the millions, it is an arduous task for humans to consume and extract insights from them. Meanwhile, opinion spams spread widely and the detection of spam reviews is critically important for ensuring the integrity of the echo system of online reviews. In this thesis, we propose novel machine learning approaches to detect spams and aggregate opinions from reviews with the focus on minimising costly human annotation. <br><br>
Deep learning has made revolutionary advances for many applications by effectively learning representations that capture hidden patterns in complex data. Despite the development of general language models for text representations, the representation of opinionated text has not been extensively studied. We propose a novel unsupervised deep aspect-level sentiment model employing deep Boltzmann machines (DBMs) to learn fine-grained opinion representations from review texts. DBMs are known for their extensive power in discovering deep latent relationships in data. We exploit their prominent generative powers to uncover the complex generative process of reviews. We find that joining two DBMs, one for aspect modelling and one for sentiment modelling, harvesting interrelation knowledge collaboratively, can effectively represent reviews and capture opinion variations. <br><br>
Singleton spam reviews - one-time reviews - have spread widely of late as spammers can create multiple accounts to purposefully cheat the system. Most available techniques fail to detect this cunning form of malicious review, mainly due to the scarcity of behaviour trails left behind by singleton spammers. We present an unsupervised singleton spam review detection model. We find that tracking the evolution of opinion through the fluctuation of sentiments in a temporal context is highly effective in identifying opinion spam. This can be achieved through training a long short-term memory (LSTM) network on our learned opinion representation to discover latent reviewing trends and identify normal reviewing patterns. A robust variational autoencoder (RVAE) is then applied to the learned temporal opinion correlations to identify spam instances as anomalies along the temporal dimension. <br><br>
Opinion aggregation is a task that supports entity rankings for E-commerce applications. A current limitation to aggregation methods is the equal treatment of all reviews. This overlooks variations in quality, freshness and usefulness. Ranking entities based on aggregate scores encounters many problems, including minimal numbers of reviews and long-tail distribution of reviews for some entities. To aggregate opinions, we propose an efficient temporal-rating aggregation model based on heuristics such as usefulness of opinions and credibility and experience of reviewers. To effectively rank entities, we cast the problem as a learning-to-rank (L2R) problem. We devise a reliable rank-oriented loss function to directly optimise the ranking of entities based on reviews. Additionally, we propose to automatically generate weak supervision ranking labels for low-cost practical training to overcome the lack of annotated ranking labels. <br><br>
In a nutshell, our research shows that leveraging the enormous body of data available for machine learning with minimal human annotation provides effective solutions for many real-life opinion-mining tasks.