Abstract
A whaler has some harpoons available for catching whales and finite number of periods to go. The whales are of two kinds, type I and type 2; type I needs to be hit by the given number of harpoons in order to be caught by the whaler, whereas type 2 needs to be hit by one harpoon. Each type appears with a known probability at each period and produces its reward. When meeting the whale, he can expend, simultaneously, some harpoons of his to obtain the reward. The probability of hit is known to the whaler. The objective is to find the sequence of optimal number of harpoons which maximizes the total expected reward. We investigate the structure of optimal policy.