TY - GEN
T1 - Vowel-Oriented String Search Algorithm with Vowel-Oriented Binary Tree
AU - Chung, Kwang Sik
AU - Yu, Heonchang
N1 - Funding Information:
Acknowledgments. This work was supported by the 2021 Korea National Open University Research Fund.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - As the size of text documents based on cloud storage increases, the time and cost of string search and keyword search increase. However, when searching for words or sentences in documents, most string search algorithms do not take the lexical structure used in the real world, or the constitutional characteristics of the character, into account. In particular, the previous string search algorithms have not considered well-formatted official document (articles, news, novels, academic papers, patents, etc.) characteristics of a limited number of characters and composition. In this paper, we propose a vowel-oriented binary tree that considers the probability of the occurrence of a character in real world documents and its compositional characteristics in well-formatted documents and well-formatted words. Based on the vowel-oriented binary tree, we propose a vowel-centered string search algorithm that searches for a specific word in a document. Based on several dictionaries (Free Dictionary Project Dictionary, Scrabble Helper), the frequency and pattern of occurrence of vowels and consonants were analyzed. A strategy and an algorithm for constructing a vowel-oriented binary tree that can express the frequency and probability patterns of the occurrence of vowels are proposed. The vowel-oriented binary tree is reconstructed according to the characteristics of the occurrence of vowels, and the consonants existing between vowels are distinguished and expressed. In addition, based on the vowel-oriented binary tree, we propose an enhanced vowel-oriented string search algorithm that quickly searches for words that can occur in real world documents.
AB - As the size of text documents based on cloud storage increases, the time and cost of string search and keyword search increase. However, when searching for words or sentences in documents, most string search algorithms do not take the lexical structure used in the real world, or the constitutional characteristics of the character, into account. In particular, the previous string search algorithms have not considered well-formatted official document (articles, news, novels, academic papers, patents, etc.) characteristics of a limited number of characters and composition. In this paper, we propose a vowel-oriented binary tree that considers the probability of the occurrence of a character in real world documents and its compositional characteristics in well-formatted documents and well-formatted words. Based on the vowel-oriented binary tree, we propose a vowel-centered string search algorithm that searches for a specific word in a document. Based on several dictionaries (Free Dictionary Project Dictionary, Scrabble Helper), the frequency and pattern of occurrence of vowels and consonants were analyzed. A strategy and an algorithm for constructing a vowel-oriented binary tree that can express the frequency and probability patterns of the occurrence of vowels are proposed. The vowel-oriented binary tree is reconstructed according to the characteristics of the occurrence of vowels, and the consonants existing between vowels are distinguished and expressed. In addition, based on the vowel-oriented binary tree, we propose an enhanced vowel-oriented string search algorithm that quickly searches for words that can occur in real world documents.
KW - Occurrence frequency of vowels
KW - Repetition pattern of vowels
KW - String search
KW - Vowel-based string search
KW - Vowel-oriented binary tree
KW - Vowel-oriented string search algorithm
UR - http://www.scopus.com/inward/record.url?scp=85127095887&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-98012-2_1
DO - 10.1007/978-3-030-98012-2_1
M3 - Conference contribution
AN - SCOPUS:85127095887
SN - 9783030980115
T3 - Lecture Notes in Networks and Systems
SP - 1
EP - 11
BT - Advances in Information and Communication - Proceedings of the 2022 Future of Information and Communication Conference, FICC
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
T2 - Future of Information and Communication Conference, FICC 2022
Y2 - 3 March 2022 through 4 March 2022
ER -