Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
A Reward Optimization Model for Decision-making under Budget Constraint
Chen ZhaoBin YangYu Hirate
Author information

2019 Volume 27 Pages 190-200


This paper designs a novel predictive model that learns stochastic functions given a limited set of data samples. Interpolation algorithms are commonly seen in supervised learning applications for function approximation by constructing models generalizable to unseen data. However, parametric models such as regression and linear SVMs are limited to functions in the form of predefined algebraic expressions and are thus unsuitable for arbitrary functions without finite number of parameters. While properly trained neural networks are capable of computing universal functions, the amount of required training data can be prohibitively large in some practical scenarios such as online recommendation. The proposed model addresses both problems based on a semi-parametric graphical model that approximates function outputs with limited data samples through Bayesian optimization. An online algorithm is also presented to show how model inference is used to locate global optima of an unknown function, as the primary objective of making optimal decisions. Comparative experiments are conducted among a set of sampling policies to demonstrate how click-through rates can be improved by optimized recommendation strategy with the proposed model. Empirical evaluation suggests that an adapted version of Thompson sampling is the best suitable policy for the proposed algorithm.

Information related to the author
© 2019 by the Information Processing Society of Japan
Previous article Next article