We propose a machine learning approach for language-independent sentence boundary detection. The proposed method requires no heuristic rules and language-specific features, such as Part-of-Speech (POS) information, a list of abbreviations or proper names. With only the language-independent features, we perform experiments on not only an inflectional language but also an agglutinative language, having fairly different characteristics (in this paper, English and Korean, respectively). In addition, we obtain good performances in both languages.
|Number of pages||4|
|Journal||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Publication status||Published - 2004 Dec 1|
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)