Journal of Japan Industrial Management Association
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Original Paper (Theory and Methodology)
Topic Model for Automatically Structuring Product Hierarchies and Its Application to Retail Sales Data Analysis
Toko FUJITATatsuya ISHIITokimasa ISOMURAAkiko YONEDARyotaro SHIMIZUMasayuki GOTO
Author information
JOURNAL FREE ACCESS

2025 Volume 76 Issue 2 Pages 37-53

Details
Abstract

Analyzing consumer trends using product-level sales data is valuable not only for retailers but also for manufacturers and data collection agencies. A key component of such analysis is product category information. Manufacturers of products usually have their own product classifications, and retailers have product master data for the products they handle. However, in the case of data pertaining to the sales volume of products sold by various retailers, such as convenience stores, supermarkets, and drug stores, that has been collected over a long period of time, while individual product IDs and sales volumes are recorded, product classification is usually only performed using broad categories. Additionally, in cases where data is collected over long periods, frequent product launches, discontinuations, and relaunches result in many similar products being recorded under different product IDs. As a result, analyzing sales trends and demand at the individual product ID level often fails to yield meaningful insights. To address this issue, this study aims to enable analysis at an appropriate level of granularity by automatically assigning hierarchical categories based on product names, descriptions, and other textual information. One promising approach for estimating such hierarchical categories (or topics) is hierarchical Latent Dirichlet Allocation (hLDA). However, hLDA is optimized for longer textual data, making it challenging to accurately infer topics from the shorter text data that is the focus of this study. In response, this study proposes the hierarchical Biterm Topic Model (hBTM), which enables robust estimation of hierarchical topic structures from short text data. The proposed method generates new hierarchical categories automatically for each product, allowing for flexible analysis tailored to various objectives. Finally, the present study demonstrates the effectiveness and utility of the proposed approach through evaluation experiments using real-world data. Several examples of analyses leveraging the generated categories are also presented.

Content from these authors
© 2025 Japan Industrial Management Association
Next article
feedback
Top