RMIT University
Browse

Topic-oriented sentiment analysis on blogs and microblogs

Download (2.57 MB)
thesis
posted on 2024-11-23, 22:20 authored by Zhixin Zhou
In this thesis, we study topic-oriented sentiment analysis on Blogs and Microblogs. Specifically, we address challenges brought by the presence of topic drift and informal terms and expressions.<br><br>Topic drift is a common phenomenon on the blogosphere. We propose to combat this problem by reducing the noise for the sentiment classifiers. In our context of sentiment classification, the noise refers to sentences that are not on-topic or not expressing opinions. With a user-specified topic, we use information retrieval techniques to generate noise-free snippets and perform classification on such snippets instead of on the full documents. We contrast the performance of both supervised and lexicon-based classifiers on a wide range of topics and show that classification done on very short snippets (5\% of the sentences) can be as accurate as on full text. With lexicon-based classifiers, the performance can be significantly better on snippets.<br><br>Apart from sentiment classification, another task in sentiment analysis is sentiment retrieval, which aims to retrieve the subjective text. On social media, users often subscribe to data feeds to get the latest updates.<br><br>On the blogosphere, users can subscribe to blog feeds. We propose a re-ranking approach to retrieving subjective blogs that match the interest of users expressed in the form of queries. We first retrieve the query-relevant blogs, and then re-rank the most relevant blogs by subjectivity. In our analysis we show that the performance of this approach is highly sensitive to the performance of the topic-based retrieval step. We thus explore different methods to aggregate topical relevance at the post-level to improve the retrieval performance. Our experiments reveal that ranking blogs by the sum of topical relevance of the most relevant posts in each blog best match the human users' judgements.<br><br>On the microblogging service, Twitter, the 140-word length threshold leads to a surge of user-created emoticons, mis-spellings, and informal abbreviations, causing word-mismatch for lexicon-based classifiers and hurting the performance. To address this problem, we propose a lexicon-expansion technique that uses point-wise mutual information between emoticons and terms to automatically assign sentiment polarity scores to terms. With the expanded lexicon for each topic, better classificaition performance can be achieved.

History

Degree Type

Doctorate by Research

Imprint Date

2016-01-01

School name

School of Science, RMIT University

Former Identifier

9921864027101341

Open access

  • Yes

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC