TY - JOUR
T1 - Automatic generation of structured hyperdocuments from document images
AU - Lee, Ji Yeon
AU - Park, Jeong Seon
AU - Byun, Hyeran
AU - Moon, Jongsub
AU - Lee, Seong Whan
N1 - Funding Information:
This research was supported by Creative Research Initiatives of the Korean Ministry of Science and Technology. A preliminary version of this paper has been presented at the 15th International Conference on Pattern Recognition, Barcelona, September 2000.
Funding Information:
From February 1989 to February 1995, he was an Assistant Professor in the Department of Computer Science at Chungbuk National University, Cheongju, Korea. In March 1995, he joined the faculty of the Department of Computer Science and Engineering at Korea University, Seoul, Korea, as an Associate Professor, and now he is a Full Professor. Currently, Dr. Lee is the director of National Creative Research Initiative Center for Artificial Vision Research (CAVR) supported by the Korean Ministry of Science and Technology.
Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2002/2
Y1 - 2002/2
N2 - As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents.
AB - As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents.
KW - Document conversion
KW - Document image understanding
KW - Logical structure analysis
KW - Multi-column document
KW - Structured hyperdocument
UR - http://www.scopus.com/inward/record.url?scp=0036467005&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036467005&partnerID=8YFLogxK
U2 - 10.1016/S0031-3203(01)00026-7
DO - 10.1016/S0031-3203(01)00026-7
M3 - Article
AN - SCOPUS:0036467005
VL - 35
SP - 485
EP - 503
JO - Pattern Recognition
JF - Pattern Recognition
SN - 0031-3203
IS - 2
ER -