列名と値の結合特徴量に基づいた列構成の異なるテーブルデータを処理可能な深層学習方式

山本 晋太郎; 安藤 純平; 渡邉 航; 小野 利幸

doi:10.11517/pjsai.JSAI2024.0_4Xin233

Abstract

Tabular data analysis is a crucial technique in various fields, including manufacturing and social infrastructure. In real-world scenarios, columns of tabular data may differ between samples due to factors such as variations in data collection sources or the inclusion of additional data contents. Most methods for tabular data analysis assume that the columns of all samples are identical. Consequently, a data analyst must choose between extracting columns that are available in all samples or selecting samples that contain the same columns. To address tabular data with different columns, a method called TransTab has been proposed. However, TransTab overlooks the relationship between column names and categorical values, making it challenging to address samples with the same categorical values but different column names. To mitigate above mentioned issue, we propose a novel approach that fuses features from column names and values. Our method has demonstrated a minimum improvement of 16.1 points in terms of AUROC compared to that of TransTab.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!