Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
De-identification for Transaction Data Secure against Re-identification Risk Based on Payment Records
Satoshi ItoReo HaradaHiroaki Kikuchi
Author information
JOURNAL FREE ACCESS

2020 Volume 28 Pages 511-519

Details
Abstract

De-identification is a process to prevent revealing the identity of a person based on personal data of that individual including personal identification information. In conventional de-identification studies, re-identification is a process used to identify individuals from static data where there is one record specified for each individual. In contrast, in this paper, we employ dynamic data, for example, trajectory data and online payment records. In particular, we consider the open competition data from the 2016 Privacy Workshop Cup (PWS Cup 2016) held in Japan consisting of purchasing history data. Throughout the analysis, we find that attackers can re-identify individuals with a high degree of accuracy from their de-identified purchase history data based on a feature of the set of goods. To address this re-identification risk, we propose a new method to de-identify history data by adding dummy records under certain restrictions. In our method, we use the Jaccard coefficient and the TF-IDF to form user clusters. We evaluate the performance of our proposed method and compare it with the performance of the PWS Cup 2016 participants as an experiment in data privacy. Even in the best de-identified data in PWS Cup 2016, 22.25% of customers were re-identified by our re-identification algorithm based on the Jaccard coefficient. However, only about 12% of customers are re-identified by random re-identification method and about 17% of customers are re-identified by re-identification method based on the Jaccard coefficient in the data that are de-identified by our de-identification method.

Content from these authors
© 2020 by the Information Processing Society of Japan
Previous article Next article
feedback
Top