2018 年 E101.D 巻 11 号 p. 2584-2599
Permission warnings and privacy policy enforcement are widely used to inform mobile app users of privacy threats. These mechanisms disclose information about use of privacy-sensitive resources such as user location or contact list. However, it has been reported that very few users pay attention to these mechanisms during installation. Instead, a user may focus on a more user-friendly source of information: text description, which is written by a developer who has an incentive to attract user attention. When a user searches for an app in a marketplace, his/her query keywords are generally searched on text descriptions of mobile apps. Then, users review the search results, often by reading the text descriptions; i.e., text descriptions are associated with user expectation. Given these observations, this paper aims to address the following research question: What are the primary reasons that text descriptions of mobile apps fail to refer to the use of privacy-sensitive resources? To answer the research question, we performed empirical large-scale study using a huge volume of apps with our ACODE (Analyzing COde and DEscription) framework, which combines static code analysis and text analysis. We developed light-weight techniques so that we can handle hundred of thousands of distinct text descriptions. We note that our text analysis technique does not require manually labeled descriptions; hence, it enables us to conduct a large-scale measurement study without requiring expensive labeling tasks. Our analysis of 210,000 apps, including free and paid, and multilingual text descriptions collected from official and third-party Android marketplaces revealed four primary factors that are associated with the inconsistencies between text descriptions and the use of privacy-sensitive resources: (1) existence of app building services/frameworks that tend to add API permissions/code unnecessarily, (2) existence of prolific developers who publish many applications that unnecessarily install permissions and code, (3) existence of secondary functions that tend to be unmentioned, and (4) existence of third-party libraries that access to the privacy-sensitive resources. We believe that these findings will be useful for improving users' awareness of privacy on mobile software distribution platforms.