Since 1960s, aperture patterns have been studied extensively and a variety of coded apertures have been proposed for various applications, including extended depth of field, defocus deblurring, depth from defocus, light field acquisition, etc. Researches have shown that optimal aperture patterns can be quite different due to different applications, imaging conditions, or scene contents. In addition, many coded aperture techniques require aperture patterns to be temporally changed during capturing. As a result, it is often necessary to have a programmable aperture camera whose aperture pattern can be dynamically changed as needed in order to capture more useful information. In this paper, we propose a programmable aperture camera using a Liquid Crystal on Silicon (LCoS) device. This design affords a high brightness contrast and high resolution aperture with a relatively low light loss, and enables one change the pattern at a reasonably high frame rate. We build a prototype camera and evaluate its features and drawbacks comprehensively by experiments. We also demonstrate three coded aperture applications in defocus deblurring, depth from defocus and light field acquisition.
The objective of scalable video coding (SVC) is to enable the generation of a unique bitstream that can adapt to various bit-rates, transmission channels and display capabilities. The scalability is categorised in terms of temporal, spatial, and quality. In order to improve encoding efficiency, the SVC scheme incorporates inter-layer prediction mechanisms to complement the H.264/AVC very refined Motion Estimation (ME) and mode decision processes. However, this further increases the overall encoding complexity of the scalable coding standard. In this paper several conditional probabilities are established relating motion estimation characteristics and the mode distribution at different layers of the H.264/SVC. An evaluation of these probabilities is used to structure a low-complexity prediction algorithm for Group of Pictures (GOP) in H.264/SVC, reducing computational complexity whilst maintaining similar RD performance. When compared to the JSVM software, the proposed algorithm achieves a significant reduction of encoding time, with a negligible average PSNR loss and bit-rate increase in temporal, spatial and SNR scalability. Experiments are conducted to provide a comparison between our method and recently developed fast mode selection algorithms. These demonstrate the proposed method achieves appreciable time savings for scalable spatial and scalable quality video coding, while maintaining similar PSNR and bit rate.
This work proposes a novel approach for the detection of free-form shapes in a 3D space. The proposed method matches 3D features through their descriptions to attain correspondences, then accumulates evidence of the presence of the object(s) being sought by verifying the consensus of correspondences within a 3D Hough space. Our approach is capable of recognizing 3D shapes under significant degree of occlusion and clutter and can deal with multiple instances of the shape to be recognized. We validate our proposal by means of a quantitative experimental comparison to the state of the art over two datasets acquired with different sensors (a laser scanner and a stereo camera) and characterized by high degrees of clutter and occlusion. In addition, we propose an extension of the approach to RGB-D (i.e., color and depth) data together with results concerning 3D object recognition from RGB-D data acquired by a Microsoft Kinect sensor.
We propose a cascade of two complementary features to detect pedestrians from static images quickly and accurately. Co-occurrence Histograms of Oriented Gradients (CoHOG) descriptors have a strong classification capability but are extremely high dimensional. On the other hand, Haar-like features are computationally efficient but not highly discriminative for extremely varying texture and shape information such as pedestrians with different clothing and stances. Therefore, the combination of both features enables fast and accurate pedestrian detection. Our framework comprises a cascade of Haar based weak classifiers followed by a CoHOG-SVM classifier. The experimental results on the DaimlerChrysler and INRIA benchmark datasets show that we can reach very close accuracy to the most accurate CoHOG-only classifier but in less than 1/200 of its computational cost. Additionally, we show that by integrating two of our proposed cascades: one full body with another upper body detectors, we can reach higher accuracy than the standalone full body CoHOG-only in about 1/100 of its computational cost.
In the present paper, we propose a one-shot scanning system consisting of multiple projectors and cameras for dense entire shape acquisition of a moving object. One potential application of the proposed system is to capture a moving object at a high frame rate. Since the patterns used for one-shot scanning are usually complicated, and the patterns interfere with each other if they are projected onto the same object, it is difficult to use multiple sets of patterns for entire shape acquisition. In addition, the overlapped areas of each object have gaps and errors are accumulated. As such, merged shapes are usually noisy and inconsistent. In order to address this problem, we propose a one-shot shape reconstruction method using a projector to project a static pattern of parallel lines of one or two colors. Since each projector projects only parallel lines with a small number of colors, these patterns are easily decomposed and detected even if the patterns are projected multiple times onto the same object. We also propose a multi-view reconstruction algorithm for the projector-camera system. In the experiment, we built a system consisting of six projectors and six cameras, and dense shapes of entire objects were successfully reconstructed.
This paper describes a large-scale gait database comprising the Treadmill Dataset. The dataset focuses on variations in walking conditions and includes 200 subjects with 25 views, 34 subjects with 9 speed variations from 2km/h to 10km/h with a 1km/h interval, and 68 subjects with at most 32 clothes variations. The range of variations in these three factors is significantly larger than that of previous gait databases, and therefore, the Treadmill Dataset can be used in research on invariant gait recognition. Moreover, the dataset contains more diverse gender and ages than the existing databases and hence it enables us to evaluate gait-based gender and age group classification in more statistically reliable way.
We present a method for recognition of structured images and demonstrate it on the detection of windows in facade images. Given an ability to obtain local low-level data evidence on primitive elements of a structure (like window in a facade image), we determine their most probable number, attribute values (location, size) and neighborhood relation. The embedded structure is weakly modeled by pair-wise attribute constraints, which allow structure and attributes to mutually support each other. We use a very general framework of reversible jump MCMC, which allows simple implementation of a specific structure model and plug-in of almost arbitrary element classifiers. We have chosen the domain of window recognition in facade images to demonstrate that the result is an efficient algorithm achieving performance of other strongly informed methods for regular structures.
In this paper, we propose a new approach for recognizing group events and abnormality detection in a crowded scene. A manifold learning algorithm with temporal-constraints is proposed to embed a video of a crowded scene in a low-dimensional space. Our low dimensional representation of a video preserves the spatial temporal property of a video as well as the characteristic of the video. Recognizing video events and abnormality detection in a crowded scene is achieved by studying the video trajectory in the manifold space. We evaluate our proposed method on the state-of-the-art public data-sets containing different crowd events. Qualitative and quantitative results show the promising performance of the proposed method.
In this paper we present new efficient solutions to the absolute pose problems for cameras with unknown focal length and unknown focal length and radial distortion from four 2D-to-3D point correspondences. We propose to solve these problems separately for non-planar and for planar scenes. By decomposing the problems into these two situations we obtain simpler and more efficient solvers than the previously known general solvers. We demonstrate in synthetic and real experiments significant speedup of our solvers. Especially our new solvers for absolute pose problem for camera with unknown focal length and radial distortion are about 40 × (non-planar) and 160 × (planar) faster than the general solver. Moreover, we show that our specific solvers can be joined into new general solvers, based on performing either planar or non-planar solver according to the scene structure or performing both solvers simultaneously and selecting the better result. Such joined solvers give comparable or even better results than the existing general solvers for planar as well as non-planar scenes.
The importance of person identification techniques is increasing for visual surveillance applications. In social living scenarios, people often act in groups composed of friends, family, and co-workers, and this is a useful cue for person identification. This paper describes a method for person identification in video sequences based on this group cue. In the proposed approach, the relationships between the people in an input sequence are modeled using a graphical model. The identity of each person is then propagated to their neighbors in the form of message passing in a graph via belief propagation, depending on each person's group affiliation information and their characteristics, such as spatial distance and velocity vector difference, so that the members of the same group with similar characteristics enhance each other's identities as group members. The proposed method is evaluated through gait-based person identification experiments using both simulated and real input sequences. Experimental results show that the identification performance is considerably improved when compared with that of the straightforward method based on the gait feature alone.
In this paper, an adaptive framework based on histogram separation and mapping for image contrast enhancement is presented. In this framework, the histogram is separated by binary tree structure with the proposed adaptive histogram separation strategy. Generally, histogram equalization (HE) is an effective technique for contrast enhancement. However, the conventional HE usually gives the processed image with unnatural look and artifacts by excessive enhancement. For overcoming this shortage, the adaptive histogram separation unit (AHSU) is proposed to convert the global enhancement problem into local. And for mapping the histogram partitions into more optimal ranges, the exact histogram separation is discussed. Finally, an adaptive histogram separation and mapping framework (AHSMF) for contrast enhancement is presented, and the experimental results show better effectiveness than other histogram based methods.
Nearest neighbor search (NNS) among large-scale and high-dimensional vectors has played an important role in recent large-scale multimedia search applications. This paper proposes an optimized codebook construction algorithm for approximate NNS based on product quantization. The proposed algorithm iteratively optimizes both codebooks for product quantization and an assignment table that indicates the optimal codebook in product quantization. In experiments, the proposed method is shown to achieve better accuracy in approximate NNS than the conventional method with the same memory requirement and the same computational cost. Furthermore, use of a larger number of codebooks increases the accuracy of approximate NNS at the expense of a slight increase in the memory requirement.
We analyzed a light-field super-resolution problem in which the 3-D scene is reconstructed with a higher resolution using super-resolution (SR) reconstruction with a given set of multi-view images with a low resolution. The arrangement of the multi-view cameras is important because it determines the quality of the reconstruction. To simplify the analysis, we considered a situation in which a plane is located at a certain depth and a texture on that plane is super-resolved. We formulated the SR reconstruction process in the frequency domain, where the camera arrangement can be independently expressed as a matrix in the image formation model. We then evaluated the condition number of the matrix to quantify the quality of the SR reconstruction. We clarified that when the cameras are arranged in a regular grid, there exist singular depths in which the SR reconstruction becomes ill-posed. We also determined that this singularity can be avoided if the arrangement is randomly perturbed.
We present a method for synthesizing high-quality free-viewpoint images from a set of multi-view images. First, an accurate depth map is estimated from a given target viewpoint using modified semi-global stereo matching. Then, a high-resolution image from that viewpoint is obtained through super-resolution (SR) reconstruction. The depth estimation results from the first step are used for the second step in two ways. First, the depth values are used to associate pixels between the input images and the latent high-resolution image. Second, the pixel-wise reliabilities of the depth information are used for regularization to adaptively control the strength of the SR reconstruction. Extensive experimental results using real images show the effectiveness of our method.
In this paper we propose a novel method that performs 3D face reconstruction, and non-constrained and non-contact gaze estimation on a moving object, whose head-pose can freely change, from multi-view video. The main idea is to first reconstruct the 3D face with high accuracy using symmetry prior. Then we generate a super-resolution virtual frontal face video from the estimated 3D face geometry and the original multi-view video. Finally a 3D eyeball model is introduced to estimate the three-dimensional gaze direction from the virtual frontal face video. Experiments with real data illustrate the effectiveness of our method.
One of promising approach to reconstruct a 3D shape is a projector-camera system that projects structured light pattern. One of the problem of this approach is that it has difficulty to obtain texture simultaneously because the texture is interfered by the illumination by the projector. The system proposed in this paper overcomes this issue by separating the light wavelength for texture and shape. The pattern is projected by using infrared light and the texture is captured by using visible light. If the cameras for infrared and visible lights are placed at different position, it causes the misalignment between texture and shape, which degrades the quality of textured 3D model. Therefore, we developed a multi-band camera that acquires both visible and infrared lights from a single viewpoint. Moreover, to reconstruct a 3D shape using multiple wavelengths of light, namely multiple colors, an infrared pattern projector is developed to generate a multi-band grid pattern. Additionally, a simple method to calibrate the system is proposed by using the fixed grid pattern. Finally, we show the textured 3D shapes captured by the experimental system.
This paper presents a real-time incremental mosaicing method that generates a large seamless 2D image by stitching video key-frames as soon as they are detected. There are four main contributions: (1) we propose a “fast” key-frame selection procedure based solely on the distribution of the distance of matched feature descriptors. This procedure automatically selects key-frames that are used to expand the mosaics while achieving real-time performance; (2) we register key-frame images by using a non-rigid deformation model based on a triangular mesh in order to “smoothly” stitch images when scene transformations can not be expressed by homography; (3) we add a new constraint on the non-rigid deformation model that penalizes over-deformation in order to create mosaics with natural appearance; (4) we propose a fast image stitching algorithm for real-time mosaic rendering modeled as an instance of the minimum graph cut problem, applied to mesh triangles instead of the image pixels. The performance of the proposed method is validated by experiments in non-controlled conditions and by comparison with a state-of-the-art method.
October 05, 2017 Due to the maintenance‚following linking services will not be available on Oct 18 from 10:00 to 19:00 (JST)(Oct 18‚ from 1:00 to 10:00(UTC)). We apologize for the inconvenience. a)reference linking b)cited-by linking c)linking to J-STAGE with JOI/OpenURL
May 18, 2016 We have released “J-STAGE BETA site”.
May 01, 2015 Please note the "spoofing mail" that pretends to be J-STAGE.