Flexible Imputation Method for Sensor Data based on Programming by Example: APREP-S

Hiroko Nagashima; Yuka Kato

doi:10.2197/ipsjjip.29.157

Abstract

The quantity of data available for analysis, including data collected by sensors and wearable devices, has been increasing hugely. However, to obtain accurate analysis results, data pre-processing such as outlier detection, handling of missing data, and preparing data recorded by different measuring instruments in different units, is essential. Considering that the pre-processing task consumes 80% of analyst resources, we previously proposed a method to address this problem. The method integrates machine learning based on Bayesian inference with human knowledge by using programming by example approach. However, in situations in which the process of generating the model and the process of updating the model are executed at different sites, the previous method is problematic in two ways: 1) all sites have to use the same features defined when the model is generated, and 2) a helpful process to generate new training data from features without using inference data when updating the model, is not available. This prompted us to propose APREP-S, which has flexible feature processes and a process for updating the model using a clustering method. We evaluate the accuracy of the imputation and the similarity of the trends by comparing APREP-S with the original data and other existing methods. The results show that APREP-S can return the most optimal methods with both accuracy and similarity.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!