TY - JOUR
T1 - Network-based document clustering using external ranking loss for network embedding
AU - Yoon, Yeo Chan
AU - Gee, Hyung Kuen
AU - Lim, Heuiseok
N1 - Funding Information:
This work was supported by the Institute for Information and Communications Technology Promotion grant funded by the Korean Government (Digital Content In-House Research and Development under Grant 2016-0-00010-003.
PY - 2019
Y1 - 2019
N2 - Network-based document clustering involves forming clusters of documents based on their significance and relationship strength. This approach can be used with various types of metadata that express the significance of the documents and the relationships among them. In this study, we defined a probabilistic network graph for fine-grained document clustering and developed a probabilistic generative model and calculation method. Furthermore, a novel neural-network-based network embedding learning method was devised that considers the significance of a document based on its rankings with external measures, such as the download counts of relevant files, and reflects the relationship strength between the documents. By considering the significance of a document, reputative documents of clusters can be centralized and shown as representative documents for tasks such as data analysis and data representation. During evaluation tests, the proposed ranking-based network-embedding method performs significantly better on various algorithms, such as the k-means algorithm and common word/phrase-based clustering methods, than the existing network embedding approaches.
AB - Network-based document clustering involves forming clusters of documents based on their significance and relationship strength. This approach can be used with various types of metadata that express the significance of the documents and the relationships among them. In this study, we defined a probabilistic network graph for fine-grained document clustering and developed a probabilistic generative model and calculation method. Furthermore, a novel neural-network-based network embedding learning method was devised that considers the significance of a document based on its rankings with external measures, such as the download counts of relevant files, and reflects the relationship strength between the documents. By considering the significance of a document, reputative documents of clusters can be centralized and shown as representative documents for tasks such as data analysis and data representation. During evaluation tests, the proposed ranking-based network-embedding method performs significantly better on various algorithms, such as the k-means algorithm and common word/phrase-based clustering methods, than the existing network embedding approaches.
KW - Clustering algorithms
KW - artificial neural networks
UR - http://www.scopus.com/inward/record.url?scp=85077812031&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077812031&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2948662
DO - 10.1109/ACCESS.2019.2948662
M3 - Article
AN - SCOPUS:85077812031
VL - 7
SP - 155412
EP - 155423
JO - IEEE Access
JF - IEEE Access
SN - 2169-3536
M1 - 8878093
ER -