Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
Purchase history data collected through consumer input is a valuable source for analyzing purchasing behavior across various retail stores. However, when data is entered by customers, character data such as product names often contain variations in notation, such as abbreviations and long vowel marks, which can create noise in the analysis. Existing methods include name matching using edit distance and embeddings. However, conventional edit distance methods cannot account for Japanese language characteristics, and traditional embeddings are difficult to apply to short brand names. While large language models could be considered, their application may be impractical due to confidentiality and cost issues. This research proposes char2vec to obtain character-level embeddings and defines a new edit distance utilizing these embeddings, enabling name matching even for short text data. We demonstrate the effectiveness of our method through application to real data, showing enhanced analytical possibilities with the matched data.