TY - GEN
T1 - Deep semantic frame-based deceptive opinion spam analysis
AU - Kim, Seongsoon
AU - Chang, Hyeokyoon
AU - Lee, Seongwoon
AU - Yu, Minhwan
AU - Kang, Jaewoo
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/17
Y1 - 2015/10/17
N2 - User-generated content is becoming increasingly valuable to both individuals and businesses due to its usefulness and influence in e-commerce markets. As consumers rely more on such information, posting deceptive opinions, which can be deliberately used for potential profit, is becoming more of an issue. Existing work on opinion spam detection focuses mainly on linguistic features such as n-grams, syntactic patterns, or LIWC. However, deep semantic analysis remains largely unstudied. In this paper, we propose a frame-based deep semantic analysis method for understanding rich characteristics of deceptive and truthful opinions written by various types of individuals including crowdsourcing workers, employees who have expert-level domain knowledge about local businesses, and online users who post on Yelp and Tri-pAdvisor. Using our proposed semantic frame feature, we developed a classification model that outperforms the baseline model and achieves an accuracy of nearly 91%. Also, we performed qualitative analysis of deceptive and truthful review datasets and considered their semantic differences. Finally, we successfully found some interesting features that existing methods were unable to identify.
AB - User-generated content is becoming increasingly valuable to both individuals and businesses due to its usefulness and influence in e-commerce markets. As consumers rely more on such information, posting deceptive opinions, which can be deliberately used for potential profit, is becoming more of an issue. Existing work on opinion spam detection focuses mainly on linguistic features such as n-grams, syntactic patterns, or LIWC. However, deep semantic analysis remains largely unstudied. In this paper, we propose a frame-based deep semantic analysis method for understanding rich characteristics of deceptive and truthful opinions written by various types of individuals including crowdsourcing workers, employees who have expert-level domain knowledge about local businesses, and online users who post on Yelp and Tri-pAdvisor. Using our proposed semantic frame feature, we developed a classification model that outperforms the baseline model and achieves an accuracy of nearly 91%. Also, we performed qualitative analysis of deceptive and truthful review datasets and considered their semantic differences. Finally, we successfully found some interesting features that existing methods were unable to identify.
KW - Deceptive opinion spam
KW - FrameNet
KW - Semantic analysis
UR - http://www.scopus.com/inward/record.url?scp=84958246343&partnerID=8YFLogxK
U2 - 10.1145/2806416.2806551
DO - 10.1145/2806416.2806551
M3 - Conference contribution
AN - SCOPUS:84958246343
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1131
EP - 1140
BT - CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 24th ACM International Conference on Information and Knowledge Management, CIKM 2015
Y2 - 19 October 2015 through 23 October 2015
ER -