A robust proposal generation method for text lines in natural scene images

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Motivated by the success of object proposal generation methods for object detection, we propose a novel method for generating text line proposals from natural scene images. Our strategy is to detect text regions which we define as part of text lines containing a whole character or transitions between two adjacent characters. We observe that, if we scale text regions to a small and fixed size, their image gradients exhibit certain patterns irrespective of text shapes and language types. Based on this observation, we propose simple features which consist of means and standard deviations of image gradients to train a Random Forest so as to detect text regions over multiple image scales and color channels. Text regions are then merged into text line candidates which are ranked based on the Random Forest responses combined with the shapes of the candidates, e.g., horizontally elongated candidates are given higher scores, because they are more likely to contain texts. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One and Many-to-One detection criteria, our method achieves 91.6%, 87.4%, 92.1% and 97.9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan's multilingual Dataset and Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.

Original languageEnglish
Pages (from-to)47-63
Number of pages17
JournalNeurocomputing
Volume304
DOIs
Publication statusPublished - 2018 Aug 23

Fingerprint

Color
Experiments
Language
Benchmarking
Reading
Object detection
Datasets
Forests

Keywords

  • Feature extraction
  • Random Forest
  • Scene text detection
  • Text line proposals

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

A robust proposal generation method for text lines in natural scene images. / Fan, Kun; Baek, Seung Jun.

In: Neurocomputing, Vol. 304, 23.08.2018, p. 47-63.

Research output: Contribution to journalArticle

@article{6ce66b896aab4f24aabbcffe66b313e3,
title = "A robust proposal generation method for text lines in natural scene images",
abstract = "Motivated by the success of object proposal generation methods for object detection, we propose a novel method for generating text line proposals from natural scene images. Our strategy is to detect text regions which we define as part of text lines containing a whole character or transitions between two adjacent characters. We observe that, if we scale text regions to a small and fixed size, their image gradients exhibit certain patterns irrespective of text shapes and language types. Based on this observation, we propose simple features which consist of means and standard deviations of image gradients to train a Random Forest so as to detect text regions over multiple image scales and color channels. Text regions are then merged into text line candidates which are ranked based on the Random Forest responses combined with the shapes of the candidates, e.g., horizontally elongated candidates are given higher scores, because they are more likely to contain texts. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One and Many-to-One detection criteria, our method achieves 91.6{\%}, 87.4{\%}, 92.1{\%} and 97.9{\%} recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan's multilingual Dataset and Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.",
keywords = "Feature extraction, Random Forest, Scene text detection, Text line proposals",
author = "Kun Fan and Baek, {Seung Jun}",
year = "2018",
month = "8",
day = "23",
doi = "10.1016/j.neucom.2018.03.041",
language = "English",
volume = "304",
pages = "47--63",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

TY - JOUR

T1 - A robust proposal generation method for text lines in natural scene images

AU - Fan, Kun

AU - Baek, Seung Jun

PY - 2018/8/23

Y1 - 2018/8/23

N2 - Motivated by the success of object proposal generation methods for object detection, we propose a novel method for generating text line proposals from natural scene images. Our strategy is to detect text regions which we define as part of text lines containing a whole character or transitions between two adjacent characters. We observe that, if we scale text regions to a small and fixed size, their image gradients exhibit certain patterns irrespective of text shapes and language types. Based on this observation, we propose simple features which consist of means and standard deviations of image gradients to train a Random Forest so as to detect text regions over multiple image scales and color channels. Text regions are then merged into text line candidates which are ranked based on the Random Forest responses combined with the shapes of the candidates, e.g., horizontally elongated candidates are given higher scores, because they are more likely to contain texts. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One and Many-to-One detection criteria, our method achieves 91.6%, 87.4%, 92.1% and 97.9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan's multilingual Dataset and Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.

AB - Motivated by the success of object proposal generation methods for object detection, we propose a novel method for generating text line proposals from natural scene images. Our strategy is to detect text regions which we define as part of text lines containing a whole character or transitions between two adjacent characters. We observe that, if we scale text regions to a small and fixed size, their image gradients exhibit certain patterns irrespective of text shapes and language types. Based on this observation, we propose simple features which consist of means and standard deviations of image gradients to train a Random Forest so as to detect text regions over multiple image scales and color channels. Text regions are then merged into text line candidates which are ranked based on the Random Forest responses combined with the shapes of the candidates, e.g., horizontally elongated candidates are given higher scores, because they are more likely to contain texts. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One and Many-to-One detection criteria, our method achieves 91.6%, 87.4%, 92.1% and 97.9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan's multilingual Dataset and Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.

KW - Feature extraction

KW - Random Forest

KW - Scene text detection

KW - Text line proposals

UR - http://www.scopus.com/inward/record.url?scp=85046823519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046823519&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.03.041

DO - 10.1016/j.neucom.2018.03.041

M3 - Article

AN - SCOPUS:85046823519

VL - 304

SP - 47

EP - 63

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -