ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding

Deokgun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, Niklas Elmqvist

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.

Original languageEnglish
Article number8023823
Pages (from-to)361-370
Number of pages10
JournalIEEE Transactions on Visualization and Computer Graphics
Volume24
Issue number1
DOIs
Publication statusPublished - 2018 Jan 1

Fingerprint

Seed
Seeds
Language
Datasets

Keywords

  • concepts
  • Text analytics
  • text classification
  • text summarization
  • visual analytics
  • word embedding

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Cite this

ConceptVector : Text Visual Analytics via Interactive Lexicon Building Using Word Embedding. / Park, Deokgun; Kim, Seungyeon; Lee, Jurim; Choo, Jaegul; Diakopoulos, Nicholas; Elmqvist, Niklas.

In: IEEE Transactions on Visualization and Computer Graphics, Vol. 24, No. 1, 8023823, 01.01.2018, p. 361-370.

Research output: Contribution to journalArticle

Park, Deokgun ; Kim, Seungyeon ; Lee, Jurim ; Choo, Jaegul ; Diakopoulos, Nicholas ; Elmqvist, Niklas. / ConceptVector : Text Visual Analytics via Interactive Lexicon Building Using Word Embedding. In: IEEE Transactions on Visualization and Computer Graphics. 2018 ; Vol. 24, No. 1. pp. 361-370.
@article{31cd759e04dc4de280526ef1e6d600b7,
title = "ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding",
abstract = "Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.",
keywords = "concepts, Text analytics, text classification, text summarization, visual analytics, word embedding",
author = "Deokgun Park and Seungyeon Kim and Jurim Lee and Jaegul Choo and Nicholas Diakopoulos and Niklas Elmqvist",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TVCG.2017.2744478",
language = "English",
volume = "24",
pages = "361--370",
journal = "IEEE Transactions on Visualization and Computer Graphics",
issn = "1077-2626",
publisher = "IEEE Computer Society",
number = "1",

}

TY - JOUR

T1 - ConceptVector

T2 - Text Visual Analytics via Interactive Lexicon Building Using Word Embedding

AU - Park, Deokgun

AU - Kim, Seungyeon

AU - Lee, Jurim

AU - Choo, Jaegul

AU - Diakopoulos, Nicholas

AU - Elmqvist, Niklas

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.

AB - Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.

KW - concepts

KW - Text analytics

KW - text classification

KW - text summarization

KW - visual analytics

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=85029172691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029172691&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2017.2744478

DO - 10.1109/TVCG.2017.2744478

M3 - Article

C2 - 28880180

AN - SCOPUS:85029172691

VL - 24

SP - 361

EP - 370

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

SN - 1077-2626

IS - 1

M1 - 8023823

ER -