DemographicVis: Analyzing demographic information based on user generated content

Wenwen Dou, Isaac Cho, Omar ElTayeby, Jaegul Choo, Xiaoyu Wang, William Ribarsky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

The wide-spread of social media provides unprecedented sources of written language that can be used to model and infer online demographics. In this paper, we introduce a novel visual text analytics system, DemographicVis, to aid interactive analysis of such demographic information based on user-generated content. Our approach connects categorical data (demographic information) with textual data, allowing users to understand the characteristics of different demographic groups in a transparent and exploratory manner. The modeling and visualization are based on ground truth demographic information collected via a survey conducted on Reddit.com. Detailed user information is taken into our modeling process that connects the demographic groups with features that best describe the distinguishing characteristics of each group. Features including topical and linguistic are generated from the user-generated contents. Such features are then analyzed and ranked based on their ability to predict the users' demographic information. To enable interactive demographic analysis, we introduce a web-based visual interface that presents the relationship of the demographic groups, their topic interests, as well as the predictive power of various features. We present multiple case studies to showcase the utility of our visual analytics approach in exploring and understanding the interests of different demographic groups. We also report results from a comparative evaluation, showing that the DemographicVis is quantitatively superior or competitive and subjectively preferred when compared to a commercial text analysis tool.

Original languageEnglish
Title of host publication2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings
EditorsMin Chen, Gennady Andrienko
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages57-64
Number of pages8
ISBN (Electronic)9781467397834
DOIs
Publication statusPublished - 2015 Dec 4
Event10th IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Chicago, United States
Duration: 2015 Oct 252015 Oct 30

Publication series

Name2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings

Conference

Conference10th IEEE Conference on Visual Analytics Science and Technology, VAST 2015
CountryUnited States
CityChicago
Period15/10/2515/10/30

Fingerprint

Linguistics
Visualization

Keywords

  • Demographic Analysis
  • Social Media
  • User Interface
  • Visual Text Analysis

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Dou, W., Cho, I., ElTayeby, O., Choo, J., Wang, X., & Ribarsky, W. (2015). DemographicVis: Analyzing demographic information based on user generated content. In M. Chen, & G. Andrienko (Eds.), 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings (pp. 57-64). [7347631] (2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VAST.2015.7347631

DemographicVis : Analyzing demographic information based on user generated content. / Dou, Wenwen; Cho, Isaac; ElTayeby, Omar; Choo, Jaegul; Wang, Xiaoyu; Ribarsky, William.

2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings. ed. / Min Chen; Gennady Andrienko. Institute of Electrical and Electronics Engineers Inc., 2015. p. 57-64 7347631 (2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dou, W, Cho, I, ElTayeby, O, Choo, J, Wang, X & Ribarsky, W 2015, DemographicVis: Analyzing demographic information based on user generated content. in M Chen & G Andrienko (eds), 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings., 7347631, 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 57-64, 10th IEEE Conference on Visual Analytics Science and Technology, VAST 2015, Chicago, United States, 15/10/25. https://doi.org/10.1109/VAST.2015.7347631
Dou W, Cho I, ElTayeby O, Choo J, Wang X, Ribarsky W. DemographicVis: Analyzing demographic information based on user generated content. In Chen M, Andrienko G, editors, 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2015. p. 57-64. 7347631. (2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings). https://doi.org/10.1109/VAST.2015.7347631
Dou, Wenwen ; Cho, Isaac ; ElTayeby, Omar ; Choo, Jaegul ; Wang, Xiaoyu ; Ribarsky, William. / DemographicVis : Analyzing demographic information based on user generated content. 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings. editor / Min Chen ; Gennady Andrienko. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 57-64 (2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings).
@inproceedings{19dcebbf6dbd41a98461565829ecd4c9,
title = "DemographicVis: Analyzing demographic information based on user generated content",
abstract = "The wide-spread of social media provides unprecedented sources of written language that can be used to model and infer online demographics. In this paper, we introduce a novel visual text analytics system, DemographicVis, to aid interactive analysis of such demographic information based on user-generated content. Our approach connects categorical data (demographic information) with textual data, allowing users to understand the characteristics of different demographic groups in a transparent and exploratory manner. The modeling and visualization are based on ground truth demographic information collected via a survey conducted on Reddit.com. Detailed user information is taken into our modeling process that connects the demographic groups with features that best describe the distinguishing characteristics of each group. Features including topical and linguistic are generated from the user-generated contents. Such features are then analyzed and ranked based on their ability to predict the users' demographic information. To enable interactive demographic analysis, we introduce a web-based visual interface that presents the relationship of the demographic groups, their topic interests, as well as the predictive power of various features. We present multiple case studies to showcase the utility of our visual analytics approach in exploring and understanding the interests of different demographic groups. We also report results from a comparative evaluation, showing that the DemographicVis is quantitatively superior or competitive and subjectively preferred when compared to a commercial text analysis tool.",
keywords = "Demographic Analysis, Social Media, User Interface, Visual Text Analysis",
author = "Wenwen Dou and Isaac Cho and Omar ElTayeby and Jaegul Choo and Xiaoyu Wang and William Ribarsky",
year = "2015",
month = "12",
day = "4",
doi = "10.1109/VAST.2015.7347631",
language = "English",
series = "2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "57--64",
editor = "Min Chen and Gennady Andrienko",
booktitle = "2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings",

}

TY - GEN

T1 - DemographicVis

T2 - Analyzing demographic information based on user generated content

AU - Dou, Wenwen

AU - Cho, Isaac

AU - ElTayeby, Omar

AU - Choo, Jaegul

AU - Wang, Xiaoyu

AU - Ribarsky, William

PY - 2015/12/4

Y1 - 2015/12/4

N2 - The wide-spread of social media provides unprecedented sources of written language that can be used to model and infer online demographics. In this paper, we introduce a novel visual text analytics system, DemographicVis, to aid interactive analysis of such demographic information based on user-generated content. Our approach connects categorical data (demographic information) with textual data, allowing users to understand the characteristics of different demographic groups in a transparent and exploratory manner. The modeling and visualization are based on ground truth demographic information collected via a survey conducted on Reddit.com. Detailed user information is taken into our modeling process that connects the demographic groups with features that best describe the distinguishing characteristics of each group. Features including topical and linguistic are generated from the user-generated contents. Such features are then analyzed and ranked based on their ability to predict the users' demographic information. To enable interactive demographic analysis, we introduce a web-based visual interface that presents the relationship of the demographic groups, their topic interests, as well as the predictive power of various features. We present multiple case studies to showcase the utility of our visual analytics approach in exploring and understanding the interests of different demographic groups. We also report results from a comparative evaluation, showing that the DemographicVis is quantitatively superior or competitive and subjectively preferred when compared to a commercial text analysis tool.

AB - The wide-spread of social media provides unprecedented sources of written language that can be used to model and infer online demographics. In this paper, we introduce a novel visual text analytics system, DemographicVis, to aid interactive analysis of such demographic information based on user-generated content. Our approach connects categorical data (demographic information) with textual data, allowing users to understand the characteristics of different demographic groups in a transparent and exploratory manner. The modeling and visualization are based on ground truth demographic information collected via a survey conducted on Reddit.com. Detailed user information is taken into our modeling process that connects the demographic groups with features that best describe the distinguishing characteristics of each group. Features including topical and linguistic are generated from the user-generated contents. Such features are then analyzed and ranked based on their ability to predict the users' demographic information. To enable interactive demographic analysis, we introduce a web-based visual interface that presents the relationship of the demographic groups, their topic interests, as well as the predictive power of various features. We present multiple case studies to showcase the utility of our visual analytics approach in exploring and understanding the interests of different demographic groups. We also report results from a comparative evaluation, showing that the DemographicVis is quantitatively superior or competitive and subjectively preferred when compared to a commercial text analysis tool.

KW - Demographic Analysis

KW - Social Media

KW - User Interface

KW - Visual Text Analysis

UR - http://www.scopus.com/inward/record.url?scp=84962853190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962853190&partnerID=8YFLogxK

U2 - 10.1109/VAST.2015.7347631

DO - 10.1109/VAST.2015.7347631

M3 - Conference contribution

AN - SCOPUS:84962853190

T3 - 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings

SP - 57

EP - 64

BT - 2015 IEEE Conference on Visual Analytics Science and Technology, VAST 2015 - Proceedings

A2 - Chen, Min

A2 - Andrienko, Gennady

PB - Institute of Electrical and Electronics Engineers Inc.

ER -