An application of information retrieval technique to automated code classification

Heui Seok Lim, Seong Hoon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes an application of information retrieval techniques to automated industry and occupation code classification for Korean Census records. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross-validation evaluation method. As experimental results, the system showed 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. In semi-automated mode, it showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages90-96
Number of pages7
Volume3681 LNAI
Publication statusPublished - 2005 Dec 1
Externally publishedYes
Event9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005 - Melbourne, Australia
Duration: 2005 Sep 142005 Sep 16

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3681 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005
CountryAustralia
CityMelbourne
Period05/9/1405/9/16

Fingerprint

Information Storage and Retrieval
Information retrieval
Information Retrieval
Industry
Censuses
Occupations
Census
Language
Evaluation Method
Numerics
Cross-validation
Questionnaire
Natural Language
Convert
Fold
Experimental Results

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Lim, H. S., & Lee, S. H. (2005). An application of information retrieval technique to automated code classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3681 LNAI, pp. 90-96). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3681 LNAI).

An application of information retrieval technique to automated code classification. / Lim, Heui Seok; Lee, Seong Hoon.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI 2005. p. 90-96 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3681 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lim, HS & Lee, SH 2005, An application of information retrieval technique to automated code classification. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3681 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3681 LNAI, pp. 90-96, 9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005, Melbourne, Australia, 05/9/14.
Lim HS, Lee SH. An application of information retrieval technique to automated code classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI. 2005. p. 90-96. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Lim, Heui Seok ; Lee, Seong Hoon. / An application of information retrieval technique to automated code classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI 2005. pp. 90-96 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ded9affb49b44e6f8a6d0e303a71457d,
title = "An application of information retrieval technique to automated code classification",
abstract = "This paper describes an application of information retrieval techniques to automated industry and occupation code classification for Korean Census records. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross-validation evaluation method. As experimental results, the system showed 87.08{\%} and 66.08{\%} production rates when classifying industry records into level 2 and level 5 codes respectively. In semi-automated mode, it showed 99.10{\%} and 92.88{\%} production rates for level 2 and level 5 codes respectively.",
author = "Lim, {Heui Seok} and Lee, {Seong Hoon}",
year = "2005",
month = "12",
day = "1",
language = "English",
isbn = "3540288945",
volume = "3681 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "90--96",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - An application of information retrieval technique to automated code classification

AU - Lim, Heui Seok

AU - Lee, Seong Hoon

PY - 2005/12/1

Y1 - 2005/12/1

N2 - This paper describes an application of information retrieval techniques to automated industry and occupation code classification for Korean Census records. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross-validation evaluation method. As experimental results, the system showed 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. In semi-automated mode, it showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.

AB - This paper describes an application of information retrieval techniques to automated industry and occupation code classification for Korean Census records. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross-validation evaluation method. As experimental results, the system showed 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. In semi-automated mode, it showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.

UR - http://www.scopus.com/inward/record.url?scp=33745326584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745326584&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540288945

SN - 9783540288947

VL - 3681 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 90

EP - 96

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -