2024 年 32 巻 p. 757-766
Synthetic data generation techniques are promising for anonymizing high-dimensional tabular datasets, and their privacy protection can be evaluated by membership inference attacks. However, the existing evaluation framework has limitations from two perspectives: (1) it cannot evaluate the worst-case because a target sample is chosen randomly; and (2) the decision criterion of an adversary's inference is black box since the adversary conducts membership inference by using machine learning models. In this paper, we propose a framework to overcome the above limitations in a simple and clear fashion. To cope with limitation (1), we introduce a statistical distance to choose a vulnerable target sample. To cope with limitation (2), we propose two interpretable inference methods. One is a method with typical statistics scores, and the other is a method with the Euclidean distance from the target sample. We conduct extensive experiments on two datasets and five synthesis algorithms to confirm the effectiveness of our framework. The experiments show that our framework enables us to evaluate privacy in synthetic data generation techniques more tightly.