Business environmental analysis for textual data using data mining and sentence-level classification

Yoon Sung Kim, Hae-Chang Rim, Do Gil Lee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Purpose: The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks. Design/methodology/approach: This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining. Findings: The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems. Research limitations/implications: This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies. Originality/value: The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

Original languageEnglish
JournalIndustrial Management and Data Systems
DOIs
Publication statusAccepted/In press - 2018 Jan 1

Fingerprint

Data mining
Industry
Supervised learning
Learning systems
Managers
Semantics
Environmental analysis
Feature extraction
Methodology
Semi-supervised learning
Costs
Machine learning
Classification system

Keywords

  • Machine learning
  • PEST analysis
  • SWOT analysis
  • Text categorization
  • Text mining
  • Word embedding

ASJC Scopus subject areas

  • Management Information Systems
  • Industrial relations
  • Computer Science Applications
  • Strategy and Management
  • Industrial and Manufacturing Engineering

Cite this

@article{fb9691f8e5a544cd91515c8e61a4ceba,
title = "Business environmental analysis for textual data using data mining and sentence-level classification",
abstract = "Purpose: The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks. Design/methodology/approach: This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining. Findings: The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems. Research limitations/implications: This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies. Originality/value: The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.",
keywords = "Machine learning, PEST analysis, SWOT analysis, Text categorization, Text mining, Word embedding",
author = "Kim, {Yoon Sung} and Hae-Chang Rim and Lee, {Do Gil}",
year = "2018",
month = "1",
day = "1",
doi = "10.1108/IMDS-07-2017-0317",
language = "English",
journal = "Industrial Management and Data Systems",
issn = "0263-5577",
publisher = "Emerald Group Publishing Ltd.",

}

TY - JOUR

T1 - Business environmental analysis for textual data using data mining and sentence-level classification

AU - Kim, Yoon Sung

AU - Rim, Hae-Chang

AU - Lee, Do Gil

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Purpose: The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks. Design/methodology/approach: This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining. Findings: The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems. Research limitations/implications: This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies. Originality/value: The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

AB - Purpose: The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks. Design/methodology/approach: This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining. Findings: The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems. Research limitations/implications: This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies. Originality/value: The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

KW - Machine learning

KW - PEST analysis

KW - SWOT analysis

KW - Text categorization

KW - Text mining

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85055699379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055699379&partnerID=8YFLogxK

U2 - 10.1108/IMDS-07-2017-0317

DO - 10.1108/IMDS-07-2017-0317

M3 - Article

AN - SCOPUS:85055699379

JO - Industrial Management and Data Systems

JF - Industrial Management and Data Systems

SN - 0263-5577

ER -