Automated classification of industry and occupation codes using document classification method

Research output: Contribution to journalArticle

Abstract

This paper describes development of the automated industry and occupation coding system for the Korean Census records. The purpose of the system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. We employ kNN(k Nearest Neighbors)-based document classification method and information retrieval techniques to index and to weight index terms. In order to solve the description inconsistency of many respondents, we use nouns and phrases acquired from past census data. Using the data, we could estimate the nouns or phrases frequently used to describe a certain code. The Experimental results show that the past census data plays an important role in increasing code classification accuracy.

Original languageEnglish
Pages (from-to)827-833
Number of pages7
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3316
Publication statusPublished - 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Automated classification of industry and occupation codes using document classification method'. Together they form a unique fingerprint.

  • Cite this