Probabilistic classification and multi-task learning are two important branches of machine learning research. Probabilistic classification is useful when the ‘confidence’ of decision is necessary. On the other hand, the idea of multi-task learning is beneficial if multiple related learning tasks exist. So far, kernelized logistic regression has been a vital probabilistic classifier for the use in multi-task learning scenarios. However, its training tends to be computationally expensive, which prevented its use in large-scale problems. To overcome this limitation, we propose to employ a recently-proposed probabilistic classifier called the least-squares probabilistic classifier in multi-task learning scenarios. Through image classification experiments, we show that our method achieves comparable classification performance to the existing method, with much less training time.
A method for detecting moving objects using a Markov random field (MRF) model is proposed, based on background subtraction. We aim at overcoming two major drawbacks of existing methods: dynamic background changes such as swinging trees and camera shaking tend to yield false positives, and the existence of similar colors in objects and their backgrounds tends to yield false negatives. One characteristic of our method is the background subtraction using the nearest neighbor method with multiple background images to cope with dynamic backgrounds. Another characteristic is the estimation of object movement, which provides robustness for similar colors in objects and background regions. From the viewpoint of the MRF, we define the energy function by considering these characteristics and optimize the function by graph cut. In most cases of our experiments, the proposed method can be implemented in (nearly) real time, and experimental results show favorable detection performance even in difficult cases in which methods of previous studies have failed.
We propose a fast method of face detection that is based on the characteristics of a cascade classifier that is called a cascade step search (CSS). The proposed method has two features. First is a gradual classification that uses only a few layers of the cascade classifier to estimate the face-likelihood distribution. Second is an efficient search that uses the face likelihood distribution. The search is operated at intervals that are optimally changed according to the likelihood. This reduces the number of sub-windows that must be processed. The face-likelihood on the image window exposes the likelihood near the window and the next scaled one. These features can reduce the costs of classifications in the face detection process. Our experiments on face detection show that the proposed method is about five times faster than the traditional searches and maintains a high detection rate.
When a camera moves during its exposure time, the captured image is degraded by the motion. Despite the several decades of researches, image deconvolution to restore a blurred image still remains an issue, particularly in blind deconvolution cases in which the actual shape of the blur is unknown. The cepstral approaches have been used to estimate a linear motion. In this paper, we propose a Point Spread Function (PSF) estimation method from a single blurred image. We extend the classical cepstral approaches that have been used for Uniform Linear motion PSF estimation. Focusing on Uniform Non-Linear motion (UNLM) that goes one direction and potentially weakly curves, we solve the PSF estimation problem as a camera path estimation problem. To solve the ill-posed problem, we derive a constraint on the behavior of the cepstra of UNLM PSFs. In a first step, we estimate several PSF candidates from the cepstrum of a blurred image. Then, we select the best PSF candidate by evaluating the candidates based on the ringing artifacts on the restored images obtained using the candidates. The performance of the proposed method is verified using both synthetic images and real images.
The Internet has become an unprecedented source of visual information about our world, with millions of people uploading photos and videos to media-sharing sites at staggering rates. Virtually all of the world's famous landmarks and cities (and many not-so-famous ones) have been photographed hundreds of thousands or millions of times, and billions of these photos can be found on photo-sharing websites. This richness and variety make such Internet photo collections extremely attractive as a source of data for applications ranging from mapping and visualization to social science. However, a prerequisite to many such applications is recovering structure — often in the form of 3D geometry — from these massive, unorganized collections of imagery. This ever-growing collection of visual data opens up fundamental new questions in computer vision and computer graphics, where traditional techniques designed for small, controlled sets of images cannot be readily applied. This article surveys recent work on applying geometric computer vision to large, unstructured photo collections, as well as applications enabled by these new techniques in scene visualization, location recognition, image editing, and other areas of computer vision and graphics.
We present a new algorithm for optimally computing from point correspondences over two images their 3-D positions using the knowledge that they are constrained to be on a planar surface. We consider two cases: the case in which the plane and camera parameters are known and the case in which they are not. In the former, we show how observed point correspondences are optimally corrected so that they are compatible with the homography between the two images. In the latter, we show how the homography is optimally estimated by iteratively using the triangulation procedure. Although the accuracy improvement over existing methods is very small, our algorithm has a theoretical merit of computing an exact maximum likelihood solution.
We present a general framework of a special type of least squares (LS) estimator, which we call “HyperLS, ” for parameter estimation that frequently arises in computer vision applications. It minimizes the algebraic distance under a special scale normalization, which is derived by a detailed error analysis in such a way that statistical bias is removed up to second order noise terms. We discuss in detail many theoretical issues involved in its derivation. By numerical experiments, we show that HyperLS is far superior to the standard LS and comparable in accuracy to maximum likelihood (ML), which is known to produce highly accurate results but may fail to converge if poorly initialized. We conclude that HyperLS is a perfect candidate for ML initialization.
Low-level vision encompasses a wide variety of problems and solutions. Solutions to low-level problems can be broadly group according to how they propagate local information to global representations. Understanding these categorizations is useful because they offer guidance on how tools like machine learning can be implemented in these systems.
Object recognition can be performed on local or global features. While local features are more robust against occlusions, global features are more powerful to distinguish among many objects. In this paper we propose a novel approach in construction of a shape model from local features aimed at achieving high discriminative power as global features have, while keeping the robustness of local features. We utilize a common reference point expressing the relative position of local features like in a star graph representation. This model is dynamically calculated during recognition which makes it flexible. With our approach we achieve an improved recognition performance of 2% compared to other shape models and even 6% compared to approaches that do not utilize shape information.
We propose a new method to analyze scattering light transport in homogeneous translucent media. The incident light undergoes multiple bounces in translucent media, and produces a complex light field. Our method analyzes the light transport in two steps. First, single and multiple scatterings are separated by projecting high-frequency stripe patterns. Then, the light field for each bounce scattering is recursively estimated based on a forward rendering process. Experimental results show that scattering light fields can be analyzed and visualized for each bounce.
This paper describes a method for periodic temporal super resolution, namely, reconstructing a one period image sequence with high frame-rate from a single quasi-periodic image sequence with low frame-rate. First, the periodic image sequence to be reconstructed is expressed as a manifold in the parametric eigenspace of the phase, namely, period-normalized time. Given an input image sequence, phase registration data in sub-frame order among multiple periods of the image sequence is estimated. The phase registration and manifold reconstruction are alternately executed iteratively within an energy minimization framework that considers data fitness and the smoothness of both the manifold and the phase evolution. The energy minimization problem is solved through three-step coarse-to-fine procedures to avoid local minima. The proposed periodic temporal super resolution is evaluated through the experiments using both simulated and real data in terms of phase noise, the number of input frames, frame-rate, spatial registration noise, and image noise, respectively.
An adaptive background model plays an important role for object detection in a scene which includes illumination changes. An updating process of the background model is utilized to improve the robustness against illumination changes. However, the process sometimes causes a false-negative problem when a moving object stops in an observed scene. A paused object will be gradually trained as the background since the observed pixel value is directly used for the model update. In addition, the original background model hidden by the paused object cannot be updated. If the illumination changes behind the paused object, a false-positive problem will be caused when the object restarts to move. In this paper, we propose 1) a method to inhibit background training to avoid the false-negative problem, and 2) a method to update an original background region occluded by a paused object to avoid the false-positive problem. We have used a probabilistic approach and a predictive approach of the background model to solve these problems. The great contribution of this paper is that we can keep paused objects from being trained by modeling the original background hidden by them. And also, our approach has an ability to adapt to various illumination changes. Our experimental results show that the proposed method can detect stopped objects robustly, and in addition, it is also robust for illumination changes and as efficient as the state-of-the-art method.
Human detection and action recognition form the basis for understanding human behaviors. Human detection is used to detect the positions of humans, and action recognition is able to recognize the action of specific humans. However, numerous approaches have been used to handle action recognition and human detection separately. Therefore, three main issues still exist when independent methods of human detection and action recognition are combined, 1) intrinsic errors in object detection impact the performance of action recognition, 2) features common to action recognition and object detection are missed, 3) the combination also has an impact on processing speed. We propose a single framework for human detection and action recognition to solve these issues. It is based on a hierarchical structure called Boosted Randomized Trees. The nodes are trained such that the upper nodes detect humans from the background, while the lower nodes recognize action. We were able to improve both human detection and action recognition rates over earlier hierarchical structure approaches with the proposed method.
This paper describes a method to determine the direction of a light source and the distribution of diffuse reflectance from two images under different lighting conditions. While most inverse-rendering methods require 3 or more images, we investigate the use of only two images. Using the relationships between albedo and light direction at 6 or more points, we firstly show that it is possible to simultaneously estimate both of these if the shape of the target object is given. Then we extend our method to handle a specular object and shadow effect by applying a robust estimation method. Thorough experimentation shows that our method is feasible and stable not only for well controlled indoor scenes, but also for an outdoor environment illuminated by sunlight.
We propose a novel framework called StochasticSIFT for detecting interest points (IPs) in video sequences. The proposed framework incorporates a stochastic model considering the temporal dynamics of videos into the SIFT detector to improve robustness against fluctuations inherent to video signals. Instead of detecting IPs and then removing unstable or inconsistent IP candidates, we introduce IP stability derived from a stochastic model of inherent fluctuations to detect more stable IPs. The experimental results show that the proposed IP detector outperforms the SIFT detector in terms of repeatability and matching rates.
Object detection is an important task for computer vision applications. Many researchers have proposed a number of methods to detect the objects through background modeling. To adapt to “illumination changes” in the background, local feature-based background models are proposed. They assume that local features are not affected by background changes. However, “motion changes”, such as the movement of trees, affect the local features in the background significantly. Therefore, it is difficult for local feature-based models to handle motion changes in the background. To solve this problem, we propose a new background model in this paper by applying a statistical framework to a local feature-based approach. Our proposed method combines the concepts of statistical and local feature-based approaches into a single framework. In particular, we use illumination invariant local features and describe their distribution by Gaussian Mixture Models (GMMs). The local feature has the ability to tolerate the effects of “illumination changes”, and the GMM can learn the variety of “motion changes”. As a result, this method can handle both background changes. Some experimental results show that the proposed method can detect the foreground objects robustly against both illumination changes and motion changes in the background.
In histopathological diagnosis, a clinical pathologist discriminates between normal tissues and cancerous tissues. However, recently, the shortage of clinical pathologists is posing increasing burdens on meeting the demands for such diagnoses, and this is becoming a serious social problem. Currently, it is necessary to develop new medical technologies to help reduce their burdens. Therefore, as a diagnostic support technology, this paper describes an extended method of HLAC feature extraction for classification of histopathological images into normal and anomaly. The proposed method can automatically classify cancerous images as anomaly by using an extended geometric invariant HLAC features with rotation- and reflection-invariant properties from three-level histopathological images, which are segmented into nucleus, cytoplasm and background. In conducted experiments, we demonstrate a reduction in the rate of not only false-negative errors but also of false-positive errors, where a normal image is falsely classified as an image with an anomaly that is suspected as being cancerous.
We propose a new imaging method called hemispherical confocal imaging to visualize clearly a particular depth in a 3-D scene. The key optical component is a turtleback reflector which is a specially designed polyhedral mirror. To synthesize a hemispherical aperture, we combined the turtleback reflector with a coaxial camera and projector, to create on a hemisphere many virtual cameras and projectors with a uniform density. In such an optical device, high frequency illumination can be focused at a particular depth in the scene to visualize only that depth by employing descattering. The observed views are then factorized into masking, attenuation, reflected light, illuminance, and texture terms to enhance the visualization when obstacles are present. Experiments using a prototype system show that only a particular depth is effectively illuminated, and hazes caused by scattering and attenuation can be recovered even when obstacles are present.
Many deblurring techniques have been proposed to restore blurred images resulting from camera motion. A major problem in the restoration process is that the deblurred images often include wave-like artifacts called ringing. In this paper, we propose a ringing detector that distinguishes the ringing artifacts from natural textures included in images. In designing the ringing detector, we focus on the fact that ringing artifacts are caused by the null frequency of the point-spread function. Ringings are detected by evaluating whether the deblurred image contains sine waves corresponding to the null frequencies across the entire image with uniform phase. By combining the ringing detector with a deblurring process, we can reduce ringing artifacts in the restored images. We demonstrate the effectiveness of the proposed ringing detector in experiments with synthetic and real images.
October 05, 2017 Due to the maintenance‚following linking services will not be available on Oct 18 from 10:00 to 19:00 (JST)(Oct 18‚ from 1:00 to 10:00(UTC)). We apologize for the inconvenience. a)reference linking b)cited-by linking c)linking to J-STAGE with JOI/OpenURL
May 18, 2016 We have released “J-STAGE BETA site”.
May 01, 2015 Please note the "spoofing mail" that pretends to be J-STAGE.