Profane or Not: Improving Korean Profane Detection using Deep Learning

Jiyoung Woo, Sung Hee Park, Huy Kang Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.

Original languageEnglish
Pages (from-to)305-318
Number of pages14
JournalKSII Transactions on Internet and Information Systems
Volume16
Issue number1
DOIs
Publication statusPublished - 2022 Jan 31

Keywords

  • Convolutional neural network
  • Deep learning
  • Natural language processing
  • Profanity
  • Text mining

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Profane or Not: Improving Korean Profane Detection using Deep Learning'. Together they form a unique fingerprint.

Cite this