TY - JOUR
T1 - Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization
AU - Kim, Hannah
AU - Choo, Jaegul
AU - Reddy, Chandan K.
AU - Park, Haesun
N1 - Funding Information:
Chandan K. Reddy is an Associate Professor in the Department of Computer Science at Wayne State University. He received his Ph.D. from Cornell University and M.S. from Michigan State University. His primary research interests are data mining and machine learning with applications to healthcare, bioinformatics and social networks. His research is funded by NSF, NIH, DOT, and the Susan Komen for the Cure Foundation. He received the Best Application Paper Award at SIGKDD conference in 2010, and was finalist of the INFORMS Franz Edelman Award Competition in 2011. He is a senior member of IEEE and a member of ACM.
Funding Information:
This work was supported in part by DARPA XDATA grant FA8750-12-2-0309 and NSF grants CCF-0808863 , IIS-1242304 , and IIS-1231742 . Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies. We also thank anonymous reviewers for their insightful comments and suggestions.
Publisher Copyright:
© 2014 Elsevier B.V.
PY - 2015
Y1 - 2015
N2 - Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.
AB - Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label information as well as the data, have been effective in revealing the overall structure of data with respect to their pre-given class labels. However, the main principle of most of these supervised methods has been to enhance class separability, which generally leads to significant distortion of original relationships. To compensate for such distortion, we propose a novel doubly supervised dimension reduction approach that highlights both natural groupings conforming to original relationships and classes determined by pre-given labels. Our method imposes minimal supervision on the pre-given class information depending on their original distributions while imposing additional supervision on natural groupings to better preserve them in reduced feature space. Specifically, we apply the notion of doubly supervised dimension reduction to a state-of-the-art method called t-distributed stochastic neighbor embedding and present a new formulation and an algorithm. By performing both quantitative and qualitative analyses, we demonstrate the effectiveness of our method using various visualization examples on real-world data. Our results show that, compared to other existing methods, the proposed method better preserves the original high-dimensional relationships while simultaneously maintaining class separability and preserving cluster structures. In addition, due to the characteristics of preserving natural groupings, the visualization results generated by our method reveal interesting sub-groups that cohesively preserve the original relationships in the data.
KW - Clustering
KW - Multidimensional projection
KW - Scatter plot
KW - Supervised dimension reduction
KW - T-distributed stochastic neighbor embedding
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=84922598421&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2014.09.064
DO - 10.1016/j.neucom.2014.09.064
M3 - Article
AN - SCOPUS:84922598421
SN - 0925-2312
VL - 150
SP - 570
EP - 582
JO - Neurocomputing
JF - Neurocomputing
IS - PB
ER -