A robust proposal generation method for text lines in natural scene images

Kun Fan, Seung Jun Baek

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


Motivated by the success of object proposal generation methods for object detection, we propose a novel method for generating text line proposals from natural scene images. Our strategy is to detect text regions which we define as part of text lines containing a whole character or transitions between two adjacent characters. We observe that, if we scale text regions to a small and fixed size, their image gradients exhibit certain patterns irrespective of text shapes and language types. Based on this observation, we propose simple features which consist of means and standard deviations of image gradients to train a Random Forest so as to detect text regions over multiple image scales and color channels. Text regions are then merged into text line candidates which are ranked based on the Random Forest responses combined with the shapes of the candidates, e.g., horizontally elongated candidates are given higher scores, because they are more likely to contain texts. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One and Many-to-One detection criteria, our method achieves 91.6%, 87.4%, 92.1% and 97.9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan's multilingual Dataset and Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.

Original languageEnglish
Pages (from-to)47-63
Number of pages17
Publication statusPublished - 2018 Aug 23


  • Feature extraction
  • Random Forest
  • Scene text detection
  • Text line proposals

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'A robust proposal generation method for text lines in natural scene images'. Together they form a unique fingerprint.

Cite this