2017 Volume 12 Pages 80-102
Recognition of sarcasm in microblogging is important in a range of NLP applications, such as opinion mining. However, this is a challenging task, as the real meaning of a sarcastic sentence is the opposite of the literal meaning. Furthermore, microblogging messages are short and usually written in a free style that may include misspellings, grammatical errors, and complex sentence structures. This paper proposes a novel method for identifying sarcasm in tweets. It combines two supervised classifiers, a Support Vector Machine (SVM) using N-gram features and an SVM using our proposed features. Our features represent the intensity and contradictions of sentiment in a tweet, derived by sentiment analysis. The sentiment contradiction feature also considers coherence among multiple sentences in the tweet, and this is automatically identified by our proposed method using unsupervised clustering and an adaptive genetic algorithm. Furthermore, a method for identifying the concepts of unknown sentiment words is used to compensate for gaps in the sentiment lexicon. Our method also considers punctuation and the special symbols that are frequently used in Twitter messaging. Experiments using two datasets demonstrated that our proposed system outperformed baseline systems on one dataset, while producing comparable results on the other. Accuracy of 82% and 76% was achieved in sarcasm identification on the two datasets.