2015 Volume E98.D Issue 1 Pages 78-88
While online social networking is a popular way for people to share information, it carries the risk of unintentionally disclosing personal information. One way to reduce this risk is to anonymize personal information in messages before they are posted. Furthermore, if personal information is somehow disclosed, the person who disclosed it should be identifiable. Several methods developed for anonymizing personal information in natural language text simply remove sensitive phrases, making the anonymized text message unnatural. Other methods change the message by using synonymization or structural alteration to create fingerprints for detecting disclosure, but they do not support the creation of a sufficient number of fingerprints for friends of an online social network user. We have developed a system for anonymizing personal information in text messages that generalizes sensitive phrases. It also creates a sufficient number of fingerprints of a message by using synonyms so that, if personal information is revealed online, the person who revealed it can be identified. A distribution metric is used to ensure that the degree of anonymization is appropriate for each group of friends. A threshold is used to improve the naturalness of the fingerprinted messages so that they do not catch the attention of attackers. Evaluation using about 55,000 personal tweets in English demonstrated that our system creates sufficiently natural fingerprinted messages for friends and groups of friends. The practicality of the system was demonstrated by creating a web application for controlling messages posted on Facebook.