The H.264/AVC video coding standard alternatively switches motion-compensated prediction and intra-frame prediction on a block-by-block basis to achieve high coding performance. However, it doesn't allow joint use of spatial and temporal prediction within the same block. This paper describes a block-adaptive spatio-temporal prediction method which can exploit spatial and temporal correlations of video signals at the same time. In this method, a predicted value at each pel is generated by a linear 3D predictor which uses causal neighborhood in both the current and motion-compensated previous frames. When the causal neighborhood is within the block to be predicted, previously predicted values instead of the reconstructed ones are recursively used. In order to minimize the sum of squared prediction errors, a set of 3D predictors is iteratively optimized using the quasi-Newton method. Simulation results indicate that joint use of spatio-temporal prediction attains higher SNR than exclusive use of spatial or temporal prediction in a framework of the proposed method.