This paper describes development of the automated industry and occupation coding system for the Korean Census records. The purpose of the system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. We employ kNN(k Nearest Neighbors)-based document classification method and information retrieval techniques to index and to weight index terms. In order to solve the description inconsistency of many respondents, we use nouns and phrases acquired from past census data. Using the data, we could estimate the nouns or phrases frequently used to describe a certain code. The Experimental results show that the past census data plays an important role in increasing code classification accuracy.
|Number of pages||7|
|Journal||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Publication status||Published - 2004|
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)