Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Active Synthetic Data Generation with Joint Consideration of Differential Privacy and Labeling Efficiency
Osamu SaishoTakayuki MiuraKazuki IwahanaMasanobu KiiRina Okada
Author information
JOURNAL FREE ACCESS

2025 Volume 33 Pages 1172-1180

Details
Abstract

The principles of Privacy by Design have become increasingly critical in artificial intelligence (AI) systems that handle sensitive data due to growing legal and social restrictions. To construct novel AI applications, human-data interaction is essential, with human annotation typically serving as the first step in the AI lifecycle. However, privacy-preserving data annotation remains a challenging task. Privacy-preserving synthetic data is a promising approach, but due to differential privacy (DP) noise, it often faces a trade-off between strict privacy guarantees and high AI performance. To address these challenges, this paper proposes and demonstrates an active privacy-preserving synthetic data generation framework that integrates active learning into the data synthesis process during the annotation phase. Unlike conventional approaches, our method mitigates the negative effects of DP noise on classification boundaries by generating synthetic data solely from explanatory variables while obtaining target labels through human annotation. Specifically, our method iteratively generates privacy-preserving synthetic data by using an active learning acquisition function. Furthermore, our framework includes an interaction mechanism that enables annotators to flag out-of-distribution samples thus preventing erroneous synthetic data from contaminating the training dataset. Experimental results show that our method enhances annotation efficiency and overcomes the trade-off between privacy preservation and AI performance.

Content from these authors
© 2025 by the Information Processing Society of Japan
Previous article Next article
feedback
Top