Convolutional Neural Networks (CNN) have been widely used for text classification. Both word-based CNNs and character-based CNNs have shown good performance for Twitter sentiment classification. Most research on CNNs is towards English Twitter sentiment analysis and language-independent sentiment classification is still a challenging task due to the lack of non-English resources. Recently there are several works using character-based CNNs for tackling the language-independence challenge. Motivated by the intuition that the word-level and character-level deep features contain complimentary information, we propose a hybrid word-character CNN for language-agnostic Twitter sentiment classification. Word-character CNN comprises two convolutional channels, one for word-level convolution and one for character-level convolution, and a merge layer is included in our model for combining features from two convolutional channels. Moreover, our model does not require language identification and do not use unsupervised embeddings or other external resources. Our proposed model can achieve more superior performance on language-agnostic Twitter sentiment classification than word-based CNNs and character-based CNNs.
History
Start page
12
End page
18
Total pages
7
Outlet
Proceedings of the 22nd Australasian Document Computing Symposium (ADCS 2017)