2020 Volume 28 Pages 511-519
De-identification is a process to prevent revealing the identity of a person based on personal data of that individual including personal identification information. In conventional de-identification studies, re-identification is a process used to identify individuals from static data where there is one record specified for each individual. In contrast, in this paper, we employ dynamic data, for example, trajectory data and online payment records. In particular, we consider the open competition data from the 2016 Privacy Workshop Cup (PWS Cup 2016) held in Japan consisting of purchasing history data. Throughout the analysis, we find that attackers can re-identify individuals with a high degree of accuracy from their de-identified purchase history data based on a feature of the set of goods. To address this re-identification risk, we propose a new method to de-identify history data by adding dummy records under certain restrictions. In our method, we use the Jaccard coefficient and the TF-IDF to form user clusters. We evaluate the performance of our proposed method and compare it with the performance of the PWS Cup 2016 participants as an experiment in data privacy. Even in the best de-identified data in PWS Cup 2016, 22.25% of customers were re-identified by our re-identification algorithm based on the Jaccard coefficient. However, only about 12% of customers are re-identified by random re-identification method and about 17% of customers are re-identified by re-identification method based on the Jaccard coefficient in the data that are de-identified by our de-identification method.