A novel density-based clustering method using word embedding features for dialogue intention recognition

Jungsun Jang, Yeonsoo Lee, Seolhwa Lee, Dongwon Shin, Dongjun Kim, Hae-Chang Rim

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.

Original languageEnglish
Pages (from-to)2315-2326
Number of pages12
JournalCluster Computing
Volume19
Issue number4
DOIs
Publication statusPublished - 2016 Dec 1

Fingerprint

Personnel

Keywords

  • Density-based clustering
  • Dialogue Act
  • Emotion recognition
  • Word embedding

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

A novel density-based clustering method using word embedding features for dialogue intention recognition. / Jang, Jungsun; Lee, Yeonsoo; Lee, Seolhwa; Shin, Dongwon; Kim, Dongjun; Rim, Hae-Chang.

In: Cluster Computing, Vol. 19, No. 4, 01.12.2016, p. 2315-2326.

Research output: Contribution to journalArticle

Jang, Jungsun ; Lee, Yeonsoo ; Lee, Seolhwa ; Shin, Dongwon ; Kim, Dongjun ; Rim, Hae-Chang. / A novel density-based clustering method using word embedding features for dialogue intention recognition. In: Cluster Computing. 2016 ; Vol. 19, No. 4. pp. 2315-2326.
@article{0bbfb93fc1de40d3a9ab0de8afe0de0e,
title = "A novel density-based clustering method using word embedding features for dialogue intention recognition",
abstract = "In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.",
keywords = "Density-based clustering, Dialogue Act, Emotion recognition, Word embedding",
author = "Jungsun Jang and Yeonsoo Lee and Seolhwa Lee and Dongwon Shin and Dongjun Kim and Hae-Chang Rim",
year = "2016",
month = "12",
day = "1",
doi = "10.1007/s10586-016-0649-7",
language = "English",
volume = "19",
pages = "2315--2326",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "4",

}

TY - JOUR

T1 - A novel density-based clustering method using word embedding features for dialogue intention recognition

AU - Jang, Jungsun

AU - Lee, Yeonsoo

AU - Lee, Seolhwa

AU - Shin, Dongwon

AU - Kim, Dongjun

AU - Rim, Hae-Chang

PY - 2016/12/1

Y1 - 2016/12/1

N2 - In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.

AB - In dialogue systems, understanding user utterances is crucial for providing appropriate responses. Various classification models have been proposed to deal with natural language understanding tasks related to user intention analysis, such as dialogue acts or emotion recognition. However, models that use original lexical features without any modifications encounter the problem of data sparseness, and constructing sufficient training data to overcome this problem is labor-intensive, time-consuming, and expensive. To address this issue, word embedding models that can learn lexical synonyms using vast raw corpora have recently been proposed. However, the analysis of embedding features is not yet sufficient to validate the efficiency of such models. Specifically, using the cosine similarity score as a feature in the embedding space neglects the skewed nature of the word frequency distribution, which can affect the improvement of model performance. This paper describes a novel density-based clustering method that efficiently integrates word embedding vectors into dialogue intention recognition. Experimental results show that our proposed model helps overcome the data sparseness problem seen in previous classification models and can assist in improving the classification performance.

KW - Density-based clustering

KW - Dialogue Act

KW - Emotion recognition

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=84988731991&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988731991&partnerID=8YFLogxK

U2 - 10.1007/s10586-016-0649-7

DO - 10.1007/s10586-016-0649-7

M3 - Article

AN - SCOPUS:84988731991

VL - 19

SP - 2315

EP - 2326

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 4

ER -