Contextual postprocessing of a Korean OCR system by linguistic constraints

Hyuk Chul Kwon, Ho Jeong Hwang, Min Jung Kim, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%.

Original languageEnglish
Title of host publicationProceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995
PublisherIEEE Computer Society
Pages557-562
Number of pages6
ISBN (Electronic)0818671289
DOIs
Publication statusPublished - 1995 Jan 1
Event3rd International Conference on Document Analysis and Recognition, ICDAR 1995 - Montreal, Canada
Duration: 1995 Aug 141995 Aug 16

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2
ISSN (Print)1520-5363

Conference

Conference3rd International Conference on Document Analysis and Recognition, ICDAR 1995
CountryCanada
CityMontreal
Period95/8/1495/8/16

Fingerprint

Optical character recognition
Linguistics

Keywords

  • confusion matrix
  • distance evaluation function
  • heuristics
  • postprocessing
  • syllable di-grams
  • viable-prefrres

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Kwon, H. C., Hwang, H. J., Kim, M. J., & Lee, S. W. (1995). Contextual postprocessing of a Korean OCR system by linguistic constraints. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995 (pp. 557-562). [601958] (Proceedings of the International Conference on Document Analysis and Recognition, ICDAR; Vol. 2). IEEE Computer Society. https://doi.org/10.1109/ICDAR.1995.601958

Contextual postprocessing of a Korean OCR system by linguistic constraints. / Kwon, Hyuk Chul; Hwang, Ho Jeong; Kim, Min Jung; Lee, Seong Whan.

Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995. IEEE Computer Society, 1995. p. 557-562 601958 (Proceedings of the International Conference on Document Analysis and Recognition, ICDAR; Vol. 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kwon, HC, Hwang, HJ, Kim, MJ & Lee, SW 1995, Contextual postprocessing of a Korean OCR system by linguistic constraints. in Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995., 601958, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2, IEEE Computer Society, pp. 557-562, 3rd International Conference on Document Analysis and Recognition, ICDAR 1995, Montreal, Canada, 95/8/14. https://doi.org/10.1109/ICDAR.1995.601958
Kwon HC, Hwang HJ, Kim MJ, Lee SW. Contextual postprocessing of a Korean OCR system by linguistic constraints. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995. IEEE Computer Society. 1995. p. 557-562. 601958. (Proceedings of the International Conference on Document Analysis and Recognition, ICDAR). https://doi.org/10.1109/ICDAR.1995.601958
Kwon, Hyuk Chul ; Hwang, Ho Jeong ; Kim, Min Jung ; Lee, Seong Whan. / Contextual postprocessing of a Korean OCR system by linguistic constraints. Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995. IEEE Computer Society, 1995. pp. 557-562 (Proceedings of the International Conference on Document Analysis and Recognition, ICDAR).
@inproceedings{d378a67074524a06a4a2688ae4789c78,
title = "Contextual postprocessing of a Korean OCR system by linguistic constraints",
abstract = "The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50{\%} to 94.72{\%}.",
keywords = "confusion matrix, distance evaluation function, heuristics, postprocessing, syllable di-grams, viable-prefrres",
author = "Kwon, {Hyuk Chul} and Hwang, {Ho Jeong} and Kim, {Min Jung} and Lee, {Seong Whan}",
year = "1995",
month = "1",
day = "1",
doi = "10.1109/ICDAR.1995.601958",
language = "English",
series = "Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",
publisher = "IEEE Computer Society",
pages = "557--562",
booktitle = "Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995",

}

TY - GEN

T1 - Contextual postprocessing of a Korean OCR system by linguistic constraints

AU - Kwon, Hyuk Chul

AU - Hwang, Ho Jeong

AU - Kim, Min Jung

AU - Lee, Seong Whan

PY - 1995/1/1

Y1 - 1995/1/1

N2 - The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%.

AB - The approach in this paper focuses on the contextual postprocessing by selecting the most feasible word from multiple output strings of an OCR system. The correction is applied only when the selection fails. The selected word is confirmed by the collocation between the word and the adjacent words. The five functions applied in the system are (1) to select a word from candidate words, (2) to correct candidate words using a confusion matrix of syllables, (3) to combine two substrings to a word that spans two lines, (4) to guess unknown nouns, and (5) to confirm a selected word by the contextual information of adjacent words. To improve speed, we use syllable di-grams and viable-prefixes of Korean words. The experimental result shows that the two heuristics speed up the system more than 1,000 times in worst case. Our system improves the word recognition rate of the OCR system from 90.50% to 94.72%.

KW - confusion matrix

KW - distance evaluation function

KW - heuristics

KW - postprocessing

KW - syllable di-grams

KW - viable-prefrres

UR - http://www.scopus.com/inward/record.url?scp=77955007505&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955007505&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.1995.601958

DO - 10.1109/ICDAR.1995.601958

M3 - Conference contribution

AN - SCOPUS:77955007505

T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

SP - 557

EP - 562

BT - Proceedings of the 3rd International Conference on Document Analysis and Recognition, ICDAR 1995

PB - IEEE Computer Society

ER -