Automatic word spacing using probabilistic models based on character n-grams

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.

Original languageEnglish
Pages (from-to)28-35
Number of pages8
JournalIEEE Intelligent Systems
Volume22
Issue number1
DOIs
Publication statusPublished - 2007 Jan 1

Fingerprint

Character sets
Hidden Markov models
Statistical Models

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Automatic word spacing using probabilistic models based on character n-grams. / Lee, Do Gil; Rim, Hae-Chang; Yook, Dongsuk.

In: IEEE Intelligent Systems, Vol. 22, No. 1, 01.01.2007, p. 28-35.

Research output: Contribution to journalArticle

@article{4fad91b649cf4b1392a12d2820c7be81,
title = "Automatic word spacing using probabilistic models based on character n-grams",
abstract = "Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.",
author = "Lee, {Do Gil} and Hae-Chang Rim and Dongsuk Yook",
year = "2007",
month = "1",
day = "1",
doi = "10.1109/MIS.2007.4",
language = "English",
volume = "22",
pages = "28--35",
journal = "IEEE Intelligent Systems",
issn = "1541-1672",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Automatic word spacing using probabilistic models based on character n-grams

AU - Lee, Do Gil

AU - Rim, Hae-Chang

AU - Yook, Dongsuk

PY - 2007/1/1

Y1 - 2007/1/1

N2 - Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.

AB - Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.

UR - http://www.scopus.com/inward/record.url?scp=33847611998&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847611998&partnerID=8YFLogxK

U2 - 10.1109/MIS.2007.4

DO - 10.1109/MIS.2007.4

M3 - Article

AN - SCOPUS:33847611998

VL - 22

SP - 28

EP - 35

JO - IEEE Intelligent Systems

JF - IEEE Intelligent Systems

SN - 1541-1672

IS - 1

ER -