Technology analysis from patent data using latent dirichlet allocation

Gabjo Kim, Sangsung Park, Dong Sik Jang

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.

Original languageEnglish
Pages (from-to)71-80
Number of pages10
JournalAdvances in Intelligent Systems and Computing
Volume271
DOIs
Publication statusPublished - 2014

Fingerprint

Greenhouse gases
Feature extraction

Keywords

  • Latent Dirchlet allocation
  • Text mining
  • Tf-idf
  • Topic model

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Technology analysis from patent data using latent dirichlet allocation. / Kim, Gabjo; Park, Sangsung; Jang, Dong Sik.

In: Advances in Intelligent Systems and Computing, Vol. 271, 2014, p. 71-80.

Research output: Contribution to journalArticle

@article{2e22382922b34b5cb09b1e0ad2c8278e,
title = "Technology analysis from patent data using latent dirichlet allocation",
abstract = "This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.",
keywords = "Latent Dirchlet allocation, Text mining, Tf-idf, Topic model",
author = "Gabjo Kim and Sangsung Park and Jang, {Dong Sik}",
year = "2014",
doi = "10.1007/978-3-319-05527-5_8",
language = "English",
volume = "271",
pages = "71--80",
journal = "Advances in Intelligent Systems and Computing",
issn = "2194-5357",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Technology analysis from patent data using latent dirichlet allocation

AU - Kim, Gabjo

AU - Park, Sangsung

AU - Jang, Dong Sik

PY - 2014

Y1 - 2014

N2 - This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.

AB - This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.

KW - Latent Dirchlet allocation

KW - Text mining

KW - Tf-idf

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=84927613576&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927613576&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05527-5_8

DO - 10.1007/978-3-319-05527-5_8

M3 - Article

VL - 271

SP - 71

EP - 80

JO - Advances in Intelligent Systems and Computing

JF - Advances in Intelligent Systems and Computing

SN - 2194-5357

ER -