RMIT University
Browse

Measuring Similarity of Opinion-bearing Sentences

conference contribution
posted on 2024-11-03, 14:46 authored by Wenyi Tay, Xiuzhen ZhangXiuzhen Zhang, Stephen Wan, Sarvnaz Karimi
For many NLP applications of online reviews, comparison of two opinion-bearing sentences is key. We argue that, while general purpose text similarity metrics have been applied for this purpose, there has been limited exploration of their applicability to opinion texts. We address this gap in the literature, studying: (1) how humans judge the similarity of pairs of opinion-bearing sentences; and, (2) the degree to which existing text similarity metrics, particularly embedding-based ones, correspond to human judgments. We crowdsourced annotations for opinion sentence pairs and our main findings are: (1) annotators tend to agree on whether or not opinion sentences are similar or different; and (2) embedding-based metrics capture human judgments of “opinion similarity” but not “opinion difference”. Based on our analysis, we identify areas where the current metrics should be improved. We further propose to learn a similarity metric for opinion similarity via fine-tuning the Sentence-BERT sentence-embedding network based on review text and weak supervision by review ratings. Experiments show that our learned metric outperforms existing text similarity metrics and especially show significantly higher correlations with human annotations for differing opinions.

History

Start page

74

End page

84

Total pages

11

Outlet

Proceedings of the 3rd Workshop on New Frontiers in Summarization (EMNLP 2021)

Name of conference

EMNLP 2021

Publisher

Association for Computational Linguistics

Place published

United States

Start date

2021-11-10

End date

2021-11-10

Language

English

Copyright

© 2021 The Association for Computational Linguistics

Former Identifier

2006111613

Esploro creation date

2021-12-13

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC