We present a semi-supervised technique of object extraction for natural image matting. At first, we present a novel unsupervised graph-spectral algorithm for extraction of homogeneous regions in an image. We next derive a semi-supervised scheme from this unsupervised algorithm. In our method, it is sufficient for users to draw strokes only in one of object and background regions. The semi-supervised optimization problem is solved with an iterative method where memberships are propagated from strokes to their surroundings. We suggest a guideline for placement of strokes by exploiting the same iterative solution process in the unsupervised algorithm. We project the color vectors with the linear discriminant analysis to improve the color discriminability and speed up the convergence of the iterative method. Performance of the proposed method is examined for some images and the results are compared with other methods and ground truth mattes.
This article presents an investigation of the impact of camera warm-up on the image acquisition process and therefore on the accuracy of segmented image features. Based on an experimental study we show that the camera image is shifted to an extent of some tenth of a pixel after camera start-up. The drift correlates with the temperature of the sensor board and stops when the camera reaches its thermal equilibrium. A further study of the observed image flow shows that it originates from a slight displacement of the image sensor due to thermal expansion of the mechanical components of the camera. This sensor displacement can be modeled using standard methods of projective geometry in addition with bi-exponential decay terms to model the temporal dependency. The parameters of the proposed model can be calibrated and then used to compensate warm-up effects. Further experimental studies show that our method is applicable to different types of cameras and that the warm-up behaviour is characteristic for a specific camera.
Measuring a bidirectional reflectance distribution function (BRDF) requires long time because a target object must be illuminated from all incident angles and the reflected light must be measured from all reflected angles. In this paper, we introduce a rapid BRDF measuring system using an ellipsoidal mirror and a projector. Since the system changes incident angles without a mechanical drive, dense BRDF can be rapidly measured. Moreover, it is shown that the S/N ratio of the measured BRDF can be significantly increased by multiplexed illumination based on the Hadamard matrix.
We present a simple and practical approach for segmenting un-occluded items in a scene by actively casting shadows. By ‘items’, we refer to objects (or part of objects) enclosed by depth edges. Our approach utilizes the fact that under varying illumination, un-occluded items will cast shadows on occluded items or background, but will not be shadowed themselves. We employ an active illumination approach by taking multiple images under different illumination directions, with illumination source close to the camera. Our approach ignores the texture edges in the scene and uses only the shadow and silhouette information to determine the occlusions. We show that such a segmentation does not require the estimation of a depth map or 3D information, which can be cumbersome, expensive and often fails due to the lack of texture and presence of specular objects in the scene. Our approach can handle complex scenes with self-shadows and specularities. In addition, we show how to identify regions belonging to occluded objects and segment the scene into multiple layers. Our approach is able to recover the shape of occluded objects if none of its depth edges are occluded. Results on several real scenes along with the analysis of failure cases are presented.
We propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly extracted from a sample of images. We also present the possibility of updating the model as new images come in, and the capability of comparing new images using a model previously constructed from a different set of images. The approach is quantitatively evaluated on various types of images and achieves high recognition rates despite its conceptual simplicity and computational efficiency.
This paper presents a novel approach to tracking ground targets in multiple cameras. A target is tracked not only in each camera but also in the ground plane by individual particle filters. These particle filters collaborate in two different ways. First, the particle filters in each camera pass messages to those in the ground plane where the multi-camera information is integrated by intersecting the targets' principal axes. This largely relaxes the dependence on precise foot positions when mapping targets from images to the ground plane using homographies. Second, the fusion results in the ground plane are then incorporated by each camera as boosted proposal functions. A mixture proposal function is composed for each tracker in a camera by combining an independent transition kernel and the boosted proposal function. The general framework of our approach allows us to track individual targets distributively and independently, which is of potential use in case that we are only interested in the trajectories of a few key targets and that we cannot track all the targets in the scene simultaneously.
In this paper we propose a method to construct a virtual sequence for a camera moving through a static environment, given an input sequence from a different camera trajectory. Existing image-based rendering techniques can generate photorealistic images given a set of input views, though the output images almost unavoidably contain small regions where the colour has been incorrectly chosen. In a single image these artifacts are often hard to spot, but become more obvious when viewing a real image with its virtual stereo pair, and even more so when a sequence of novel views is generated, since the artifacts are rarely temporally consistent. To address this problem of consistency, we propose a new spatio-temporal approach to novel video synthesis. Our method exploits epipolar geometry to impose constraints on temporal coherence of the rendered views. The pixels in the output video sequence are modelled as nodes of a 3-D graph. We define an MRF on the graph which encodes photoconsistency of pixels as well as texture priors in both space and time. Unlike methods based on scene geometry, which yield highly connected graphs, our approach results in a graph whose degree is independent of scene structure. The MRF energy is therefore tractable and we solve it for the whole sequence using a state-of-the-art message passing optimisation algorithm. We demonstrate the effectiveness of our approach in reducing temporal artifacts.
Many natural image sets are samples of a low-dimensional manifold in the space of all possible images. Understanding this manifold is a key first step in understanding many sets of images, and manifold learning approaches have recently been used within many application domains, including face recognition, medical image segmentation, gait recognition and hand-written character recognition. This paper attempts to characterize the special features of manifold learning on image data sets, and to highlight the value and limitations of these approaches.
We presented an invited talk at the MIRU-IUW workshop on correcting photometric distortions in photographs. In this paper, we describe our work on addressing one form of this distortion, namely defocus blur. Defocus blur can lead to the loss of fine-scale scene detail, and we address the problem of recovering it. Our approach targets a single-image solution that capitalizes on redundant scene information by restoring image patches that have greater defocus blur using similar, more focused patches as exemplars. The major challenge in this approach is to produce a spatially coherent and natural result given the rather limited exemplar data present in a single image. To address this problem, we introduce a novel correction algorithm that maximizes the use of available image information and employs additional prior constraints. Unique to our approach is an exemplar-based deblurring strategy that simultaneously considers candidate patches from both sharper image regions as well as deconvolved patches from blurred regions. This not only allows more of the image to contribute to the recovery process but inherently combines synthesis and deconvolution into a single procedure. In addition, we use a top-down strategy where the pool of in-focus exemplars is progressively expanded as increasing levels of defocus are corrected. After detail recovery, regularization based on sparsity and contour continuity constraints is applied to produce a more plausible and natural result. Our method compares favorably to related techniques such as defocus inpainting and deconvolution with constraints from natural image statistics alone.
Partial matching of geometric structures is important in computer vision, pattern recognition and shape analysis applications. The problem consists of matching similar parts of shapes that may be dissimilar as a whole. Recently, it was proposed to consider partial similarity as a multi-criterion optimization problem trying to simultaneously maximize the similarity and the significance of the matching parts. A major challenge in that framework is providing a quantitative measure of the significance of a part of an object. Here, we define the significance of a part of a shape by its discriminative power with respect do a given shape database — that is, the uniqueness of the part. We define a point-wise significance density using a statistical weighting approach similar to the term frequency-inverse document frequency (tf-idf) weighting employed in search engines. The significance measure of a given part is obtained by integrating over this density. Numerical experiments show that the proposed approach produces intuitive significant parts, and demonstrate an improvement in the performance of partial matching between shapes.
Recognizing people in images is one of the foremost challenges in computer vision. It is important to remember that consumer photography has a highly social aspect. The photographer captures images not in a random fashion, but rather to remember or document meaningful events in her life. Understanding images of people necessitates that the context of each person in an image is considered. Context includes information related to the image of the scene surrounding the person, camera context such as location and image capture time, and the social context that describes the interactions between people. The goal of this paper is to provide the computer with the same intuition that humans would use for analyzing images of people. Fortunately, rather than relying on a lifetime of experience, context can often be modeled with large amounts of publicly available data. Probabilistic graph models and machine learning are used to model the relationship between people and context in a principled manner.
The scattering effect of incident light, called subsurface scattering, occurs under the surface of translucent objects. In this paper, we present a method for analyzing the subsurface scattering from a single image taken in a known arbitrary illumination environment. In our method, diffuse subsurface reflectance in the subsurface scattering model can be linearly solved by quantizing the distances between each pair of surface points. Then, the dipole approximation is fit to the diffuse subsurface reflectance. By applying our method to real images of translucent objects, we confirm that the parameters of subsurface scattering can be computed for different materials.
Shape acquisition of moving deformable objects with little texture is important for applications such as motion capture of human facial expression. Several techniques using structured light have been proposed. These techniques can be largely categorized into two main types. The first type temporally encodes positional information of a projector's pixels using multiple projected patterns, and the second spatially encodes positional information into areas or color spaces. Although the former technique allows dense reconstruction with a sufficient number of patterns, it has difficulty in scanning objects in rapid motion. The latter technique uses only a single pattern, so it is more suitable for capturing dynamic scenes ; however, it often uses complex patterns with various colors, which are susceptible to noise, pattern discontinuity caused by edges, or textures. Thus, achieving dense and stable 3D acquisition for fast-moving and deformable objects remains an open problem. We propose a technique to achieve dense shape reconstruction that requires only a single-frame image of a grid pattern based on coplanarity constraints. With our technique, positional information is not encoded in local regions of a projected pattern, but is distributed over the entire grid pattern, which results in robust image processing and 3D reconstruction. The technique also has the advantage of low computational cost due to its efficient formulation.
A considerable issue in designing catadioptric imaging systems is what shape the component mirrors should be formed. In this paper, we propose a new algorithm for a catadioptric imaging system that satisfies the desired projection using a free-form mirror. A free-form mirror expressed as an assembly of gradients is a flexible surface representation that can form various shapes including non-smooth surfaces. We improve the shape reconstruction framework in the photometric stereo scheme to design free-form mirrors. An optimal mirror shape is formed to produce the desired projection under the integrability condition that requires it to be a consistent surface. We assume various catadioptric configurations, for which actual free-form mirrors are designed. The design experiments confirm that the resulting free-form mirrors can approximate the desired projections, including non-smooth ones.
We present a new method for detecting incorrect feature point tracking. In this paper, we detect incorrect feature point tracking by imposing the constraint that under the affine camera model feature trajectories should be in an affine space in the parameter space. Introducing a statistical model of image noise, we test detected partial trajectories are sufficiently reliable. Then we detect incorrect partial trajectories. Using real video images, we demonstrate that our proposed method can detect incorrect feature point tracking fairly well.
In statistical pattern recognition, it is important to avoid density estimation since density estimation is often more difficult than pattern recognition itself. Following this idea — known as Vapnik's principle, a statistical data processing framework that employs the ratio of two probability density functions has been developed recently and is gathering a lot of attention in the machine learning and data mining communities. The purpose of this paper is to introduce to the computer vision community recent advances in density ratio estimation methods and their usage in various statistical data processing tasks such as non-stationarity adaptation, outlier detection, feature selection, and independent component analysis.
This paper presents a fast registration method based on solving an energy minimization problem derived by implicit polynomials (IPs). Once a target object is encoded by an IP, it will be driven fast towards a corresponding source object along the IP's gradient flow without using point-wise correspondences. This registration process is accelerated by a new IP transformation method. Instead of the time-consuming transformation to a large discrete data set, the new method can transform the polynomial coefficients to maintain the same Euclidean transformation. Its computational efficiency enables us to improve a new application for real-time Ultrasound (US) pose estimation. The reported experimental results demonstrate the capabilities of our method in overcoming the limitations of a noisy, unconstrained, and freehand US image, resulting in fast and robust registration.
This paper presents a new image segmentation method for the recognition of texture-based objects in a road environment scene. Using the proposed method, we can classify texture-based objects three dimensionally using the SfM (Structure from Motion) module and the HLAC (Higher-order Local Autocorrelation) features. By estimating the vehicle's ego-motion, the SfM module can reconstruct the three dimensional structure of the road scene. Texture features of input images are extracted from HLAC functions according to their depth, as obtained using the SfM module. The proposed method can effectively recognize texture-based objects of a road scene by considering their three-dimensional structure in a perspective 2D image. Experimental results show that the proposed method can not only effectively classify the texture patterns of structures in a 2D road scene, but also represent classified texture patterns as three-dimensional structures. The proposed system can revolutionize a three-dimensional scene understanding system for vehicle environment perception.
This paper introduces a framework called generalized N-dimensional principal component analysis (GND-PCA) for statistical appearance modeling of facial images with multiple modes including different people, different viewpoint and different illumination. The facial images with multiple modes can be considered as high-dimensional data. GND-PCA can represent the high-order dimensional data more efficiently. We conduct extensive experiments on MaVIC Database (KAO-Ritsumeikan Multi-angle View, Illumination and Cosmetic Facial Database) to evaluate the effectiveness of the proposed algorithm and compared the conventional ND-PCA in terms of reconstruction error. The results indicated that the extraction of data features is computationally more efficient using GND-PCA than PCA and ND-PCA.
This paper presents a novel approach for simultaneous silhouette extraction from multi-viewpoint images. The main contribution of this paper is a new algorithm for 1) 3D context aware error detection and correction of 2D multi-viewpoint silhouette extraction and 2) 3D context aware classification of cast shadow regions. Our method takes both monocular image segmentation and background subtraction of each viewpoint as its inputs, but does not assume they are correct. Inaccurate segmentation and background subtraction are corrected through our iterative method based on inter-viewpoint checking. Some experiments quantitatively demonstrate advantages against previous approaches.
The latest robust estimators usually take advantage of density estimation, such as kernel density estimation, to improve the robustness of inlier detection. However, the challenging problem for these systems is choosing the suitable smoothing parameter, which can result in the population of inliers being over- or under-estimated, and this, in turn, reduces the robustness of the estimation. To solve this problem, we propose a robust estimator that estimates an accurate inlier scale. The proposed method first carries out an analysis to figure out the residual distribution model using the obvious case-dependent constraint, the residual function. Then the proposed inlier scale estimator performs a global search for the scale producing the residual distribution that best fits the residual distribution model. Knowledge about the residual distribution model provides a major advantage that allows us to estimate the inlier scale correctly, thereby improving the estimation robustness. Experiments with various simulations and real data are carried out to validate our algorithm, which shows certain benefits compared with several of the latest robust estimators.
There are two major problems with learning-based super-resolution algorithms. One is that they require a large amount of memory to store examples; while the other is the high computational cost of finding the nearest neighbors in the database. In order to alleviate these problems, it is helpful to reduce the dimensionality of examples and to store only a small number of examples that contribute to the synthesis of a high quality video. Based on these ideas, we have developed an efficient algorithm for learning-based video super-resolution. We introduce several strategies to construct an efficient database. Through the evaluation experiments we show the efficiency of our approach in improving super-resolution algorithms.
In outdoor scenes, polarization of the sky provides a significant clue to understanding the environment. The polarized state of light conveys the information for obtaining the orientation of the sun. Robot navigation, sensor planning, and many other application areas benefit from using this navigation mechanism. Unlike previous investigations, we analyze sky polarization patterns when the fish-eye lens is not vertical, since a camera in a general position is effective in analyzing outdoor measurements. We have tilted the measurement system based on a fish-eye lens, a CCD camera, and a linear polarizer, in order to analyze transition of the 180-degree sky polarization patterns while tilting. We also compared our results measured under overcast skies with the corresponding celestial polarization patterns calculated using the single-scattering Rayleigh model.
April 03, 2017 There had been a system trouble from April 1, 2017, 13:24 to April 2, 2017, 16:07(JST) (April 1, 2017, 04:24 to April 2, 2017, 07:07(UTC)) .The service has been back to normal.We apologize for any inconvenience this may cause you.
May 18, 2016 We have released “J-STAGE BETA site”.
May 01, 2015 Please note the "spoofing mail" that pretends to be J-STAGE.