In recent years, with the development of the information society, the accumulation and utilization of diverse data has become increasingly important. Among them, there is a growing need to analyze high-dimensional data that is difficult for humans to interpret. However, it is difficult to analyze high-dimensional data as it is, and extracting and visualizing important information from the enormous dimensions leads to understanding the overall trends of the data and the relationships between data points. Against this background, methods have been studied to reduce high-dimensional data to low-dimensional space and extract important information that is interpretable by humans. Among them, probabilistic neighborhood embedding methods that consider nonlinear relationships are widely used. For example, t-SNE, which visualizes based on the local similarity between data points, and vMF-SNE, which expresses global data similarity using angles in low-dimensional space, are known. By considering the similarity of labels between data in these dimensionality reduction methods, it becomes possible to perform classification tasks in reduced low-dimensional space and verify the characteristics of adjacent data. In fact, supervised t-SNE has been proposed as an extension of t-SNE, but because embedding is based on local information, when label information is taken into account, only data with the same label tend to be gathered as adjacent points. In this study, we propose an embedding method for vMF-SNE when category labels are given. By performing embedding based on angles that consider both global data similarity and label information, it becomes possible to evaluate the similarity between data with different labels. Finally, we perform visualization of high-dimensional data and evaluate the model by comparing visualization and classification with conventional methods.
View full abstract