An efficient method for document image geometric layout analysis

Suyoung Chi, Yunkoo Chung, DaeGeun Jang, Weongeun Oh, Jaeyeon Lee, Chang-Hun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Document image analysis is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. In this paper, we propose a document image geometric layout analysis system which has less region segmentation and classification error than that of the commercial software and previous works. The proposed method segments the document image into small regions to the size of a character using fast connected components generation method, so that it prevents the different types of connected components from combining. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce computation time. Experiment shows classification error rate of text and picture regions is decreased.

Original languageEnglish
Title of host publicationIASTED International Conference on Computer Graphics and Imaging
EditorsM.H. Hamza, M.H. Hamza
Pages238-243
Number of pages6
Publication statusPublished - 2003 Dec 1
EventSixth IASTED International Conference on Computer Graphics and Imaging - Honolulu, HI, United States
Duration: 2003 Aug 132003 Aug 15

Other

OtherSixth IASTED International Conference on Computer Graphics and Imaging
CountryUnited States
CityHonolulu, HI
Period03/8/1303/8/15

Fingerprint

Optical character recognition
Image analysis
Experiments

Keywords

  • Connected Component Analysis
  • Document Image Analysis
  • Optical Character Recognition(OCR)

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design

Cite this

Chi, S., Chung, Y., Jang, D., Oh, W., Lee, J., & Kim, C-H. (2003). An efficient method for document image geometric layout analysis. In M. H. Hamza, & M. H. Hamza (Eds.), IASTED International Conference on Computer Graphics and Imaging (pp. 238-243)

An efficient method for document image geometric layout analysis. / Chi, Suyoung; Chung, Yunkoo; Jang, DaeGeun; Oh, Weongeun; Lee, Jaeyeon; Kim, Chang-Hun.

IASTED International Conference on Computer Graphics and Imaging. ed. / M.H. Hamza; M.H. Hamza. 2003. p. 238-243.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chi, S, Chung, Y, Jang, D, Oh, W, Lee, J & Kim, C-H 2003, An efficient method for document image geometric layout analysis. in MH Hamza & MH Hamza (eds), IASTED International Conference on Computer Graphics and Imaging. pp. 238-243, Sixth IASTED International Conference on Computer Graphics and Imaging, Honolulu, HI, United States, 03/8/13.
Chi S, Chung Y, Jang D, Oh W, Lee J, Kim C-H. An efficient method for document image geometric layout analysis. In Hamza MH, Hamza MH, editors, IASTED International Conference on Computer Graphics and Imaging. 2003. p. 238-243
Chi, Suyoung ; Chung, Yunkoo ; Jang, DaeGeun ; Oh, Weongeun ; Lee, Jaeyeon ; Kim, Chang-Hun. / An efficient method for document image geometric layout analysis. IASTED International Conference on Computer Graphics and Imaging. editor / M.H. Hamza ; M.H. Hamza. 2003. pp. 238-243
@inproceedings{3a8ea2d03bf64038877240281302d232,
title = "An efficient method for document image geometric layout analysis",
abstract = "Document image analysis is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. In this paper, we propose a document image geometric layout analysis system which has less region segmentation and classification error than that of the commercial software and previous works. The proposed method segments the document image into small regions to the size of a character using fast connected components generation method, so that it prevents the different types of connected components from combining. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce computation time. Experiment shows classification error rate of text and picture regions is decreased.",
keywords = "Connected Component Analysis, Document Image Analysis, Optical Character Recognition(OCR)",
author = "Suyoung Chi and Yunkoo Chung and DaeGeun Jang and Weongeun Oh and Jaeyeon Lee and Chang-Hun Kim",
year = "2003",
month = "12",
day = "1",
language = "English",
isbn = "0889863768",
pages = "238--243",
editor = "M.H. Hamza and M.H. Hamza",
booktitle = "IASTED International Conference on Computer Graphics and Imaging",

}

TY - GEN

T1 - An efficient method for document image geometric layout analysis

AU - Chi, Suyoung

AU - Chung, Yunkoo

AU - Jang, DaeGeun

AU - Oh, Weongeun

AU - Lee, Jaeyeon

AU - Kim, Chang-Hun

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Document image analysis is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. In this paper, we propose a document image geometric layout analysis system which has less region segmentation and classification error than that of the commercial software and previous works. The proposed method segments the document image into small regions to the size of a character using fast connected components generation method, so that it prevents the different types of connected components from combining. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce computation time. Experiment shows classification error rate of text and picture regions is decreased.

AB - Document image analysis is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. In this paper, we propose a document image geometric layout analysis system which has less region segmentation and classification error than that of the commercial software and previous works. The proposed method segments the document image into small regions to the size of a character using fast connected components generation method, so that it prevents the different types of connected components from combining. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce computation time. Experiment shows classification error rate of text and picture regions is decreased.

KW - Connected Component Analysis

KW - Document Image Analysis

KW - Optical Character Recognition(OCR)

UR - http://www.scopus.com/inward/record.url?scp=1542359455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542359455&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:1542359455

SN - 0889863768

SP - 238

EP - 243

BT - IASTED International Conference on Computer Graphics and Imaging

A2 - Hamza, M.H.

A2 - Hamza, M.H.

ER -