Vowel-Oriented String Search Algorithm with Vowel-Oriented Binary Tree

Kwang Sik Chung, Heonchang Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the size of text documents based on cloud storage increases, the time and cost of string search and keyword search increase. However, when searching for words or sentences in documents, most string search algorithms do not take the lexical structure used in the real world, or the constitutional characteristics of the character, into account. In particular, the previous string search algorithms have not considered well-formatted official document (articles, news, novels, academic papers, patents, etc.) characteristics of a limited number of characters and composition. In this paper, we propose a vowel-oriented binary tree that considers the probability of the occurrence of a character in real world documents and its compositional characteristics in well-formatted documents and well-formatted words. Based on the vowel-oriented binary tree, we propose a vowel-centered string search algorithm that searches for a specific word in a document. Based on several dictionaries (Free Dictionary Project Dictionary, Scrabble Helper), the frequency and pattern of occurrence of vowels and consonants were analyzed. A strategy and an algorithm for constructing a vowel-oriented binary tree that can express the frequency and probability patterns of the occurrence of vowels are proposed. The vowel-oriented binary tree is reconstructed according to the characteristics of the occurrence of vowels, and the consonants existing between vowels are distinguished and expressed. In addition, based on the vowel-oriented binary tree, we propose an enhanced vowel-oriented string search algorithm that quickly searches for words that can occur in real world documents.

Original languageEnglish
Title of host publicationAdvances in Information and Communication - Proceedings of the 2022 Future of Information and Communication Conference, FICC
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages1-11
Number of pages11
ISBN (Print)9783030980115
DOIs
Publication statusPublished - 2022
EventFuture of Information and Communication Conference, FICC 2022 - Virtual, Online
Duration: 2022 Mar 32022 Mar 4

Publication series

NameLecture Notes in Networks and Systems
Volume438 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceFuture of Information and Communication Conference, FICC 2022
CityVirtual, Online
Period22/3/322/3/4

Keywords

  • Occurrence frequency of vowels
  • Repetition pattern of vowels
  • String search
  • Vowel-based string search
  • Vowel-oriented binary tree
  • Vowel-oriented string search algorithm

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Vowel-Oriented String Search Algorithm with Vowel-Oriented Binary Tree'. Together they form a unique fingerprint.

Cite this