A Reward Optimization Model for Decision-making under Budget Constraint

Chen Zhao; Bin Yang; Yu Hirate

doi:10.2197/ipsjjip.27.190

Abstract

This paper designs a novel predictive model that learns stochastic functions given a limited set of data samples. Interpolation algorithms are commonly seen in supervised learning applications for function approximation by constructing models generalizable to unseen data. However, parametric models such as regression and linear SVMs are limited to functions in the form of predefined algebraic expressions and are thus unsuitable for arbitrary functions without finite number of parameters. While properly trained neural networks are capable of computing universal functions, the amount of required training data can be prohibitively large in some practical scenarios such as online recommendation. The proposed model addresses both problems based on a semi-parametric graphical model that approximates function outputs with limited data samples through Bayesian optimization. An online algorithm is also presented to show how model inference is used to locate global optima of an unknown function, as the primary objective of making optimal decisions. Comparative experiments are conducted among a set of sampling policies to demonstrate how click-through rates can be improved by optimized recommendation strategy with the proposed model. Empirical evaluation suggests that an adapted version of Thompson sampling is the best suitable policy for the proposed algorithm.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!