Utilizing theweb for automatic word spacing

Gumwon Hong, Jeong Hoon Lee, Young In Song, Do Gil Lee, Hae-Chang Rim

Research output: Contribution to journalArticle

Abstract

This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noisefree data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

Original languageEnglish
Pages (from-to)2553-2556
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE92-D
Issue number12
DOIs
Publication statusPublished - 2009 Dec 1

Keywords

  • Word segmentation
  • Word spacing

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Utilizing theweb for automatic word spacing. / Hong, Gumwon; Lee, Jeong Hoon; Song, Young In; Lee, Do Gil; Rim, Hae-Chang.

In: IEICE Transactions on Information and Systems, Vol. E92-D, No. 12, 01.12.2009, p. 2553-2556.

Research output: Contribution to journalArticle

Hong, Gumwon ; Lee, Jeong Hoon ; Song, Young In ; Lee, Do Gil ; Rim, Hae-Chang. / Utilizing theweb for automatic word spacing. In: IEICE Transactions on Information and Systems. 2009 ; Vol. E92-D, No. 12. pp. 2553-2556.
@article{72c2b625d04b4c53af9caae8110a7589,
title = "Utilizing theweb for automatic word spacing",
abstract = "This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noisefree data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.",
keywords = "Word segmentation, Word spacing",
author = "Gumwon Hong and Lee, {Jeong Hoon} and Song, {Young In} and Lee, {Do Gil} and Hae-Chang Rim",
year = "2009",
month = "12",
day = "1",
doi = "10.1587/transinf.E92.D.2553",
language = "English",
volume = "E92-D",
pages = "2553--2556",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "12",

}

TY - JOUR

T1 - Utilizing theweb for automatic word spacing

AU - Hong, Gumwon

AU - Lee, Jeong Hoon

AU - Song, Young In

AU - Lee, Do Gil

AU - Rim, Hae-Chang

PY - 2009/12/1

Y1 - 2009/12/1

N2 - This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noisefree data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

AB - This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noisefree data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.

KW - Word segmentation

KW - Word spacing

UR - http://www.scopus.com/inward/record.url?scp=77950243287&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950243287&partnerID=8YFLogxK

U2 - 10.1587/transinf.E92.D.2553

DO - 10.1587/transinf.E92.D.2553

M3 - Article

AN - SCOPUS:77950243287

VL - E92-D

SP - 2553

EP - 2556

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 12

ER -