The cornea of the human eye acts as a mirror that reflects light from a person's environment. These corneal reflections can be extracted from an image of the eye by modeling the eye-camera geometry as a catadioptric imaging system. As a result, one obtains the visual information of the environment and the relation to the observer (view, gaze), which allows for application in a number of fields. The recovered illumination map can be further applied to various computational tasks. This paper provides a comprehensive introduction on corneal imaging, and aims to show the potential of the topic and encourage advancement. It makes a number of contributions, including (1) a combined view on previously unrelated fields, (2) an overview of recent developments, (3) a detailed explanation on anatomic structures, geometric eye and corneal reflection modeling including multiple eye images, (4) a summary of our work and contributions to the field, and (5) a discussion of implications and promising future directions. The idea behind this paper is a geometric framework to solve persisting technical problems and enable non-intrusive interfaces and smart sensors for traditional, ubiquitous and ambient environments.
The best known method for optimally computing parameters from noisy data based on geometric constraints is maximum likelihood (ML). This paper reinvestigates “hyperaccurate correction” for further improving the accuracy of ML. In the past, only the case of a single scalar constraint was studied. In this paper, we extend it to multiple constraints given in the form of vector equations. By detailed error analysis, we illuminate the existence of a term that has been ignored in the past. Doing simulation experiments of ellipse fitting, fundamental matrix, and homography computation, we show that the new term does not effectively affect the final solution. However, we show that our hyperaccurate correction is even superior to hyper-renormalization, the latest method regarded as the best fitting method, but that the iterations of ML computation do not necessarily converge in the presence of large noise.
In this paper, we propose a method of scale-invariant edge detection that represents edge images as polynomials in a scale parameter using spectral decomposition (generalized PCA), in order to obtain an optimal local scale. As this proposed method is successfully able to estimate the local scale of each pixel, accurate scale-invariant edge amplitudes and directions can be obtained. Our experimental results show that the proposed method detects rough edge contours in indistinct parts and detailed contours in the clarified parts of test images.
We focus on gait recognition for criminal investigation. In criminal investigation, person authentication is performed by comparing target data at the crime scene and multiple gait data with slightly different views from that of the target data. For this task, we propose fusion of direct cross-view matching. Cross-view matching generally produces worse result than those of same-view matching when view-variant features are used. However, the correlation between cross-view matching with different view pairs is low and it provides improved accuracy. Experimental results performed utilizing large-scale dataset under settings resembling actual criminal investigation cases, show that the proposed approach works well.
This paper address the problem of binary coding of real vectors for efficient similarity computations. It has been argued that orthogonal transformation of center-subtracted vectors followed by sign function produces binary codes which well preserve similarities in the original space, especially when orthogonally transformed vectors have covariance matrix with equal diagonal elements. We propose a simple hashing algorithm that can orthogonally transform an arbitrary covariance matrix to the one with equal diagonal elements. We further expand this method to make the projection matrix sparse, which yield faster coding. It is demonstrated that proposed methods have comparable level of similarity preservation to the existing methods.
We present a method for enhancing the color recognition ability of dichromats. Whereas trichromats (usual people) recognize all colors in a 3-D color space, dichromats only recognize colors on a degenerate 2-D space in it. Our method compensates for the lost information along the degenerate direction in the color space with the amount of noise in the image. Dichromats recognize the lost color information as noisy textures, while the original color information for trichromats is preserved. Our method is applicable not only to artificial figures such as graphs but also to natural photographs. We show the effectiveness of our method by experiments.
In image retrieval applications, the Fisher vector of the Gaussian mixture model (GMM) with a diagonal-covariance structure is known as a powerful tool to describe an image by aggregating local descriptors extracted from the image. In this paper, we propose the Fisher vector of the GMM with a full-covariance structure. The closed-form approximation of the GMM with a full-covariance structure is derived. Our observation is that the Fisher vector of a higher dimensional GMM yields higher image retrieval performance. The Fisher vector for the GMM with a block-diagonal-covariance structure is also introduced to provide moderate dimensionality for the GMM. Experimental comparisons performed using two major datasets demonstrate that the proposed Fisher vector outperforms state-of-the-art algorithms.
Motion estimation and segmentation poses challenges in dynamic scenarios where multiple motions are mixed up and interdependent. However, existing approaches in 2D motion field usually require the mixed motions to be independent. Algorithms incorporating 3D information have proven to be superior to purely 2D approaches in many studies. Inspired by this idea, we propose a new algorithm for evolving 3D potential surfaces using Helmholtz decomposition to represent 2D motion field. Meanwhile, a surface segmentation scheme is introduced to put different motions onto different layers, so that those interdependent motions can be separated and recovered efficiently. Unlike other approaches, our method does not require the prior knowledge of the motion model. The performance is demonstrated using real data under various complex scenarios.
We propose a method for directly estimating a square grid ground surface from stereo images. We estimate the heights of all vertices in a square mesh, in which each square is divided into two triangular patches, drawn on a level plane of the ground, from a pair of images captured by nearly front-looking stereo cameras. We formulate a data term, representing the sum of the squared differences of photometrically transformed pixel values in homography-related projective triangular patches between the two stereo images, by the inverse compositional trick for both surface and photometric parameters for realizing an efficient estimation algorithm. The main difficulty of this problem formulation lies in the estimation instability for the heights of the distant vertices from the cameras, since the image projections of the distant triangular patches are crushed in the images. We effectively improve the stability by the combinational use of an additional smoothness term, update constraint term, and a hierarchical meshing approach. We demonstrate the validity of the proposed method through experiments using real images, and the usability for mobile robots by showing traversable area detection results on the ground surfaces estimated by the proposed method.
This paper is aimed at presenting a new algorithm for full 3D shape reconstruction and online free-viewpoint rendering of objects in water. The key contributions are (1) a new calibration model for the refractive projection, and (2) a new 3D shape reconstruction algorithm based on shape-from-silhouette (SfS) concept. We also propose an online free-viewpoint rendering system as a practical application.
We propose a new method that efficiently and accurately estimates the parameters of the Gaussian function that describes the given local image profiles. The Gaussian function is non-linear with respect to the parameters to be estimated, and this non-linearity makes their efficient and accurate estimation difficult. In our proposed method, the weighted integral method is introduced to linearize the parameter estimation problem: A system of differential equations is firstly derived that is satisfied by the Gaussian function and that is linear with respect to the parameters. The system is then converted to that of integral equations. Given a local sub-window of the image, one can obtain the system of integral equations and estimate the parameters of the Gaussian that describe the appearance in the sub-window by solving the linear system of the parameters. Experimental results showed that our proposed method estimates the parameters more efficiently and accurately than existing state-of-the-art methods.
When we are watching videos, there are spatiotemporal gaps between where we look (points of gaze) and what we focus on (points of attentional focus), which result from temporally delayed responses or anticipation in eye movements. We focus on the underlying structure of those gaps and propose a novel learning-based model to predict where humans look in videos. The proposed model selects a relevant point of focus in the spatiotemporal neighborhood around a point of gaze, and jointly learns its salience and spatiotemporal gap with the point of gaze. It tells us “this point is likely to be looked at because there is a point of focus around the point with a reasonable spatiotemporal gap.” Experimental results with a public dataset demonstrate the effectiveness of the model to predict the points of gaze by learning a particular structure of gaps with respect to the types of eye movements and those of salient motions in videos.
In this paper, we propose a novel local image descriptor DoP which is termed as the difference of images represented by polynomials in different degrees. Once an interest point/region is extracted by a common image detector such as Harris corner, our DoP descriptor is able to characterize the interest point/region with high distinctiveness, compactness, and robustness to viewpoint change, image blur, and illumination variation. To efficiently build DoP descriptor, we propose to numerically reduce the computational cost by jumping over the repeatedly calculating polynomial representation. Our experimental results demonstrate a better performance compared to several state-of-art candidates.
Face recognition is a multi-class classification problem that has long attracted many researchers in the community of image analysis. We consider using the Mahalanobis distance for the task. Classically, the inverse of a covariance matrix has been chosen as the Mahalanobis matrix, a parameter of the Mahalanobis distance. Modern studies often employ machine learning algorithms called metric learning to determine the Mahalanobis matrix so that the distance is more discriminative, although they resort to eigen-decomposition requiring heavy computation. This paper presents a new metric learning algorithm that finds discriminative Mahalanobis matrices efficiently without eigen-decomposition, and shows promising experimental results on real-world face-image datasets.
In this paper, we propose a novel method for comparing the shape of similar objects. From the viewpoint of linear algebra, we turn this identifiable region detection problem into a low-rank submatrices searching process, and solve it with biclustering. Comparing with traditional cluster analysis, our method looks for structural information on both object index and local shape dimensions, which leads to more detailed local comparison results. The proposed method is evaluated with real world data with satisfactory results, which verifies the effectiveness of our method.
This paper proposes a novel gait feature representation that well describes characteristics of a walking person from the perspective of a range sensor. Most existing methods for gait feature extraction use a sequence of his/her silhouette as their input, so that they inevitably suffer from the difficulty of silhouette extraction in real scenes and change of view direction, which prevent them from being applied in practice. The proposed method, on the other hand, does not require such accurate segmentation, and is not affected by view change since captured range data has three-dimensional information. In addition, our method can explicitly separate dynamic feature from a static one, e.g., body shape, which have never been realized. Experimental results of gait authentication show its effectiveness.
MR Image fusion is desired in various image-guide breast surgeries. However it often suffers from the difficulty on dealing with large deformation of breast. This paper presents a novel method for efficiently modeling and inferring the physical parameters, including gravity, Young's modulus, Poisson's ratio, etc, which are important elements for handling the biomechanical deformations of breast with finite element model. Our method consists of two major steps: 1) deformation modeling and 2) non-rigid registration. The former builds a deformable implicit polynomial (DIP) model to encode the physical parameters according to deformation. The latter fast registers the prior DIP to the online breast image such that the image fusion can be achieved. Experimental results demonstrate the good performance of our method.
We propose an outdoor photometric stereo method, which considers environmental lighting for improving the performance of surface normal estimation. In the previous methods, the sky illumination effect has been either assumed to be constant throughout the scene, or to be removed by pre-processing. This paper exploits a sky model which can derive the entire sky luminance from a sky zenith image; then, sky illumination effect on a surface can be correctly calculated by iteratively refining its normal direction. This paper also extends a two-source photometric stereo method by introducing RANSAC, so that the input images of this method can be taken in a day. Experimental results with real outdoor objects show the effectiveness of the method.
This paper presents a method for improving the accuracy of template-based planar tracking. It has been shown that when the ROI of the input image has a lower resolution than the template, tracking accuracy will deteriorate; then, this can be remedied by blurring the template in response to the motion of the plane. In this study, we show that, conversely, when the template has a lower resolution than the input image, tracking accuracy will deteriorate in a different manner. We then present a method that can simultaneously deal with both cases and thus achieves higher tracking accuracy.
In this work we propose a novel method for modeling and synthesizing objects appearance based on planned sampling. The proposed method can efficiently model the BRDF of an object with uniform and isotropic reflectance using a small number of light source directions. This is achieved by utilizing together the knowledge of the object's shape along with the statistics of various BRDFs. The method considers the shape of the object, compact basis representing variations in a reflectance dataset, a fixed view direction and all possible light source directions around the object. Then using an iterative optimization process which simulates the contribution of each light source in modeling the object appearance, our method identifies the most suitable set of light source directions for efficiently modeling the BRDF of the object's material. The selected light sources are then used to acquire actual images of the object for recovering its reflectance properties. Experiments conducted using several objects with varying shapes and a small number of light sources optimally selected by the method validate the effectiveness of the proposed approach in modeling object appearance.
Full-dimensional (8-D) BSSRDF completely expresses the various light interactions on object surface such as reflection and subsurface scattering. However, it is difficult to sample full-dimensional BSSRDF because it requires a lot of illuminations and observations from every direction. There are many researches which approximated BSSRDF as a low-dimensional function by only considering the medium as homogeneous or assuming isotropic scattering. Therefore, in this paper, we show a novel sampling and analyzing method for full-dimensional BSSRDF in real scenes. We sample this full-dimensional BSSRDF using a polyhedral mirror system to place a lot of virtual cameras and projectors. In addition, we propose a method of decomposition of BSSRDF into isotropic and anisotropic components for scattering analysis. We show the empirical characteristics of subsurface scattering inside a real medium by analyzing sampled full-dimensional BSSRDF.
Image recognition in client server system has a problem of data traffic. However, reducing data traffic gives rise to worsening of performance. Therefore, we represent binary codes as high dimensional local features in client side, and represent real vectors in server side. As a result, we can suppress the worsening of the performance, but it problems of an increase in the computational cost of the distance computation and a different scale of norm between feature vectors. Therefore, to solve the first problem, we optimize the scale factor so as to absorb the scale difference of Euclidean norm. For second problem, we compute efficiently the Euclidean distance by decomposing the real vector into weight factors and binary basis vectors. As a result, the proposed method achieves the keypoint matching with high-speed and high-precision even if the data traffic was reduced.
This paper discusses about object detection based on spatio-temporal light field sensing. Our proposed method generates an arbitrary in-focus plane in the surveillance scene, and the background region can be filtered out by out-focusing. A new feature representation, called Local Ray Pattern (LRP), is introduced to evaluate the spatial consistency of light rays. The combination of LRP and GMM-based background modeling realizes object detection on the in-focus plane. Experimental results demonstrate the effectiveness and applicability for video surveillance.
A new method for estimating a six-degrees-of-freedom camera pose for a ground-view image using reference points on an aerial image is presented. Unlike typical PnP problems, altitude information is not available for the reference points in our case. The camera pose is estimated by minimizing a cost function defined as the sum of squared distances between observed 2D positions of reference points on a ground-view image and corresponding lines that are projections of 3D vertical lines passing through 2D reference points on an aerial image. The accuracy of the proposed method is evaluated quantitatively in both simulation and real environments. The availability of the proposed method is demonstrated by generating AR images from aerial and ground-view images downloaded from Google Maps and Flickr.
Sensing the 3D shape of a dynamic scene is not a trivial problem, but it is useful for various applications. Recently, sensing systems have been improved and are now capable of high sampling rates. However, particularly for dynamic scenes, there is a limit to improving the resolution at high sampling rates. In this paper, we present a method for improving the resolution of a 3D shape reconstructed from multiple range images acquired from a moving target. In our approach, the alignment and surface estimation problems are solved in a simultaneous estimation framework. Together with the use of an adaptive multi-level implicit surface for shape representation, this allows us to calculate the alignment by using shape features and surface estimation according to the amount of movement of the point clouds for each range image. By doing so, this approach realized simultaneous estimation more precisely than a scheme involving mere alternating estimation of shape and alignment. We present results of experiments for evaluating the reconstruction accuracy with different point cloud densities and noise levels.
In this paper, we propose a new, effective, and unified scoring method for local feature-based image retrieval. The proposed scoring method is derived by solving the large-scale image retrieval problem as a classification problem with a large number of classes. The resulting proposed score is based on the ratio of the probability density function of an object model to that of a background model, which is efficiently calculated via nearest neighbor density estimation. The proposed method has the following desirable properties: (1) has a sound theoretical basis, (2) is more effective than inverse document frequency-based scoring, (3) is applicable not only to quantized descriptors but also to raw descriptors, and (4) is easy and efficient in terms of calculation and updating. We show the effectiveness of the proposed method empirically by applying it to a standard and improved bag-of-visual words-based framework and a k-nearest neighbor voting framework.
This paper describes the first gait verification system for criminal investigation using footages from surveillance cameras. The system is designed so that the criminal investigators as non-specialists on computer vision-based gait verification can, independently, use it to verify unknown perpetrators as suspects or ex-convicts in criminal investigations. Each step of the gait verification process is proceeded by interactive operation on a graphics-user interface. Eventually, for each pair of compared subjects selected by a user, the system outputs a posterior probability on a verification result, which indicates that compared subjects are the same, with the consideration of various circumstances of the subjects such as the size, frame-rate, observation views, and clothing of subjects. One gait-specialist and ten non-gait-specialists participated in operation tests of the system using five different datasets with various types of scenes, each of which contained two or three verification sets. It was shown that all the non-gait-specialists, as well as the gait-specialist, could obtain reasonable verification results for almost all of the verification sets.
This paper presents a novel method for detecting 3D road boundaries, such as walls, guardrails, and curbs, using on-board stereo cameras. The proposed method uses conformal geometric algebra, which can describe different shapes in a common representation. 3D road boundaries on straight and curved roads are seamlessly detected by use of this representation, and this framework is also applied to curb detection by a subtle modification. Experimental results show that despite its algorithmic simplicity, the proposed method exhibited competitive detection performance compared with conventional model fitting and curb detection methods.