Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 4Q1-GS-10-02
Conference information

A Method for Brand Name Variant Normalization by Integrating Character-level Embeddings and Edit Distance
*Masataka SUZUKIYuto OKUDAAyako YAMAGIWAMasayuki GOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Purchase history data collected through consumer input is a valuable source for analyzing purchasing behavior across various retail stores. However, when data is entered by customers, character data such as product names often contain variations in notation, such as abbreviations and long vowel marks, which can create noise in the analysis. Existing methods include name matching using edit distance and embeddings. However, conventional edit distance methods cannot account for Japanese language characteristics, and traditional embeddings are difficult to apply to short brand names. While large language models could be considered, their application may be impractical due to confidentiality and cost issues. This research proposes char2vec to obtain character-level embeddings and defines a new edit distance utilizing these embeddings, enabling name matching even for short text data. We demonstrate the effectiveness of our method through application to real data, showing enhanced analytical possibilities with the matched data.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top