-
Jerry Jun Yokono, Tomaso Poggio
2009 Volume 129 Issue 5 Pages
806-811
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Object recognition system based on local descriptors is increasingly used recently because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor — based on a set of oriented Gaussian derivative filters—is used in our recognition system. In this paper, we explore the multiview 3D object recognition and multiview face identification. Basic idea is to find discriminant features to describe an object across different views. Boosting framework is used to select features out of huge feature pool created by collecting the local features from the positive training examples. We conduct experiments on 3D objects and face images and get excellent recognition rate. Comparison to SVM is also noted in the paper.
View full abstract
-
Masayuki Yokoyama, Tomaso Poggio
2009 Volume 129 Issue 5 Pages
812-817
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
We propose a fast and robust approach to the detection of moving objects. Our method is based on using lines computed by a gradient-based optical flow and an edge detector. While it is known among researchers that gradient-based optical flow and edges are well matched for accurate computation of velocity, not much attention is paid to creating systems for detecting objects using this feature. In our method, extracted edges by using optical flow and the edge detector are restored as lines, and background lines of the previous frame are subtracted. Contours of objects are obtained by using snakes to clustered lines. The experimental results on outdoor-scenes show fast and robust performance of our method. The computation time of our method is 0.089 s/frame on a 900 MHz processor.
View full abstract
-
Yoshitaka Moriguchi, Kazuhiro Hotta, Haruhisa Takahashi
2009 Volume 129 Issue 5 Pages
818-823
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In this paper, an asbestos detection method from microscope images is proposed. The asbestos particles have different colors in two specific angles of the polarizing plate. Therefore, human examiners use the color information to detect asbestos. To detect the asbestos by computer, we develop the detector based on Support Vector Machine (SVM) of local color features. However, when it is applied to each pixel independently, there are many false positives and negatives because it does not use the relation with neighboring pixels. To take into consideration of the relation with neighboring pixels, Conditional Random Field (CRF) with SVM outputs is used. We confirm that the accuracy of asbestos detection is improved by using the relation with neighboring pixels.
View full abstract
-
Masanari Takaki, Hironobu Fujiyoshi
2009 Volume 129 Issue 5 Pages
824-831
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
The paper describes an algorithm for traffic sign recognition using SIFT features. The algorithm works in four steps. First, feature points are extracted out of an input image. Secondly, the amount of the features of the feature points is computed by SIFT Descriptor. Thirdly, the final recognition result is classified by matching feature points and voting feature points. The experimental results indicate that 88.7% of the traffic signs are correctly identified under various environmental conditions.
View full abstract
-
Yuki Suzuyama, Kazuhiro Hotta, Haruhisa Takahashi
2009 Volume 129 Issue 5 Pages
832-837
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
This paper presents a method to estimate the prior probability of object appearance and position from only context information. The context is extracted from a whole image by Gabor filters. The conventional method represented the context by mixture of Gaussian distributions. The prior probabilities of object appearance and position were estimated by generative model. However, we define the probability estimation of object appearance as the binary-classification problem whether an input image contains the specific object or not. The Support Vector Machine is used to classify them, and the distance from the hyperplane is transformed to the probability using a sigmoid function. We also define the estimation problem of object position in an image from only the context as the regression problem. The position of object in an image is estimated by Support Vector Regression. Experimental results show that the proposed method outperforms the conventional method.
View full abstract
-
Masayuki Obata, Takeshi Nishida, Hidekazu Miyagawa, Fujio Ohkawa
2009 Volume 129 Issue 5 Pages
838-845
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In this paper, an appearance based image processing method is proposed for target detection, posture estimation, and tracking of 3D objects by Parametric Eigenspace Method embedded particle filter. In this method, the computational cost of Parametric Eigenspace Method and the tracking error can be greatly reduced by using the outputs of previous time and the preliminary knowledge of dynamics of movement of targets for a particle filter. Namely, since the particles of the particle filter can be generated in the direction where the object will be moved by using the posture estimated by Parametric Eigenspace Method for the prediction model, the accuracy of the state estimation of the object can be improved without increasing the number of particles. Therefore, the proposed method uses output result of Particle Filters and Parametric Eigenspace Method recursively and mutually for fast simultaneous execution of detection, posture estimation, and tracking of the targets. Furthermore, we demonstrate validity of our approach by several experiments.
View full abstract
-
Atsushi Shimada, Rin-ichiro Taniguchi
2009 Volume 129 Issue 5 Pages
846-852
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
We propose a new method to create adaptive background models. Traditionally, each pixel has an adaptive background model which consists of Gaussian mixtures. Each model can approximate small changes and periodic changes of pixel values and it helps us to detect moving objects. However, it cannot adapt to some illumination changes such as gradually varying illumination, precipitously varying illumination and so on. The other model such as using a texture or using prediction of pixel value is effective to handle these changes. Therefore, a hybrid background model which is combined with more than two kind of models. In our approach, we use two different types of the background model. The one is the stochastic background model. The other is the predictive background model based on the exponential smoothing.
View full abstract
-
Toshifumi Honda, Kenji Obara, Minoru Harada, Mitsuji Ikeda
2009 Volume 129 Issue 5 Pages
853-860
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
A method is investigated for high-sensitivity detection of slight deformations, such as defects on semiconductor wafers, using multiple scanning electron microscope (SEM) images. The relationship between the signal-to-noise ratio (S/N) for shape deformation and the solid angle of electron detection is investigated using an SEM simulator to overcome the problem of low S/N in SEM images of slight shape deformations obtained using conventional SEMs. Based on the investigation, we proposed a new defect detection algorithm. Three SEM images are simultaneously acquired from a die; from these, two images are synthesized—one for enhancing deformation and one for enhancing material contrast—by a linear combination. Three more images are then acquired from the neighboring die and two images are synthesized in the same way. The images from the two dies are compared in a two-dimensional vector space spanned by the intensities of the two synthesized images to discriminate defects. Experimental results obtained using an actual SEM demonstrate that the proposed method is effective in detecting slight deformations.
View full abstract
-
Fujio Tsutsumi, Yutaka Tateda
2009 Volume 129 Issue 5 Pages
861-869
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
This paper proposes a new counting method for floating leaves on the surface of a river to be used in ecosystem monitoring. Since conventional counting methods (such as the litter trap method) require considerable manual labor for precise monitoring of material flow in ecosystem, an efficient counting method was needed. Our method automatically counts the number of floating leaves in recorded video images. Floating leaves are detected using color and motion features. The color feature is represented by 3 dimensional histograms of the RGB color space. For the motion feature, speed and acceleration of the targets are used. The counting method proposed in this paper has been applied to a five hours video which recorded 20,000 leaves, and high recall and precision rates of 96% and 94%, respectively, have been achieved. This paper also proposes a user interface for training the counting mechanism based on the Interactive Machine Learning model. The user can easily produce a huge number of sample data to train the detection mechanism by using the user interface in the same way as coloring a picture.
View full abstract
-
Hiroshi Murata, Chikahito Nakajima, Hiroaki Watanabe
2009 Volume 129 Issue 5 Pages
870-875
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In coal gasification plants, coal ash is discharged as molten slag. If the molten slag solidifies and is not discharged, an operator who monitors the plant throughout the day ignites a melting burner and removes the solidified slag. However, the use of this burner decreases the efficiency of the gasifier. Therefore, objective and appropriate automatic decision of ignition timing for the slag melting burner is necessary for economical operation, thereby reducing the operator's work load. In this report, we propose a decision index for ignition timing of the melting burner using monitoring videos. We have evaluated our method using 54 actual monitoring videos, and we have shown the applicability of automatically deciding the ignition timing.
View full abstract
-
Kenji Terada, Toru Fukuhara
2009 Volume 129 Issue 5 Pages
876-884
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
“AWA ODDRI” dance is one of the traditional dance festival of Tokushima Prefecture in Japan. Basic operation of “AWA ODORI” dance needs harmony of hands and legs, and the rhythm of duple time, but it is not possible that the beginners do these actions completely. We have already developed the method of quantifying evaluation of the skill of “AWA ODORI” dance. But the three dimensional movements are not able to be evaluated by image squences. In this paper, the authors proposed a method of quantifying evaluation of the skill of “AWA ODORI” dance by using the three dimensional data. In this method, the image sequence of dance scenes is obtained by a stereo camera with a simple of structure. The dynamics of motion is detected by three dimensional data calculated by the stereo camera. The input data and model data of excellent dancers are compared. In this paper, we describe the algorithm of quantifying evaluation of the skill using the stereo view and show some experimental results obtained by using a simple experimental system to verify effectiveness of the proposed method.
View full abstract
-
Chikahito Nakajima, Yasushi Shinohara, Takefumi Setta
2009 Volume 129 Issue 5 Pages
885-892
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
This paper describes a method of counting pedestrians by a surveillance camera at an entrance gate. The surveillance camera is set up at side of the gate and side views of pedestrians are taken to count them. The proposed method uses only pedestrians' motion to determine their walking directions and it uses the time required for the gate passages to count pedestrians. To evaluate the method, we prepared thirty-hour videos of the Open-Laboratory at October 2005 and October 2007. More than ten thousand pedestrians were taken in the videos. The experimental results show the error ratios of counting pedestrians are less than five percent for the videos. The method is low computational costs and it works at 67 fps by a note PC.
View full abstract
-
Arihito Ihara, Hironobu Fujiyoshi, Masanari Takaki, Hiroaki Kumon, Yuk ...
2009 Volume 129 Issue 5 Pages
893-900
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
A technique for recognizing traffic signs from an image taken with an in-vehicle camera has already been proposed as driver's drive assist. SIFT feature is used for traffic sign recognition, because it is robust to changes in scaling and rotating of the traffic sign. However, it is difficult to process in real-time because the computation cost of the SIFT feature extraction and matching is expensive. This paper presents a method of traffic sign recognition based on keypoint classifier by AdaBoost using PCA-SIFT features in different feature subspaces. Each subspace is constructed from gradients of traffic sign images and general images respectively. A detected keypoint is projected to both subspaces, and then the AdaBoost employs to classy into whether the keypoint is on the traffic sign or not. Experimental results show that the computation cost for keypoint matching can be reduced to about 1/2 compared with the conventional method.
View full abstract
-
Yusuke Fujita, Yoshihiko Hamamoto
2009 Volume 129 Issue 5 Pages
901-908
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In this paper, we propose a system for automatic reading of an analogue meter using a digital image which is flexible and allows easy installation for various existing analogue meters. The system operates segmentation of meter area from a image, correcting image distortion and recognition of the scale on the meter in automatic setup phase. And it operates meter reading in monitoring phase. In our system, the planer projective transformation is automatically applied to a distorted image using a rectangle and a circle, to correct geometric distortion. In the automatic setup phase, the graduation marks of the scale are detected and located from a acquired image. Thereby, the meter reading can be flexibly adapted to different meter scales during the easy process. In the monitoring phase, the needle is detected and located from a new acquired image, and the meter reading is derived by comparing the relative position of the needle within the arrangement of the detected graduation marks. The experimental results show that the proposed system realizes automatic reading for various analogue meters as exactly as human observers under the controlled illuminated condition.
View full abstract
-
Yuusuke Mochiduki, Kimiya Aoki, Hiroyasu Koshimizu
2009 Volume 129 Issue 5 Pages
909-915
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Measurement and reconstruction of 3-dimensional scenes by a visual system is very important subject in the field of Computer Vision. In this article, we propose a method that estimates 3-dimensional structures by using only one input image obtained by a monocular camera. Blurring of image by defocus is a clue to estimating a relative range image, so we use preprocessing of the frequency dimension to extract and analyze the high-frequency component one-dimensional waveform. Furthermore, in order to speed up the process and improve accuracy, we propose a method to set the analysis regions based on edge directions. We confirmed that amount of blur is correlated with range data, from a basic experiment. We performed experiments to verify the effectiveness of our method. Every experiment was performed in approximately 2.5sec, and all the estimated range images were just about good.
View full abstract
-
Sanae Shimizu, Hidekazu Hirayu, Hirotsugu Asai
2009 Volume 129 Issue 5 Pages
916-922
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Human error prevention is one of the most important things as for improvement of product quality and productivity. In order to prevent human errors, it is necessary to detect them. In this paper, we propose a novel method based on image information as for detecting human errors. In proposed method, first, we segment a video sequence of work operation into basic motion elements by using changes of the motion speed and the motion direction distribution extracted from the video sequence. Specifically, we extract the motion scale and the motion direction histogram as the motion feature in a frame. And then we detect human errors by evaluating the number of operation and the order of operation from the sequence of basic motion elements. In experiments, we applied proposed strategy to fastening bolts works in the real process of assembling cars, and verified effectiveness of proposed method.
View full abstract
-
Munetoshi Numada, Hiroyasu Koshimizu
2009 Volume 129 Issue 5 Pages
923-931
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Hough transform (HT) is one of the important methods for detecting or recognizing lines from the edge points in the image. In order to put HT faster, it is promising to decrease the computation costs both in the trigonometric function that forms the basic computational unit and in multiplication. From this view point, Koshimizu et al. proposed a method “Fast Incremental Hough Transform 2 (FIHT2)” that enables one point to be generated on the curve by a single operation of multiplication. However, it has a problem that a significant approximation error occurs. With this shortcoming taken into consideration, this paper proposes a new method “FIHT3” that generates an exact Hough curve using a STTR (Sine-Three-Term-Recurrence). As with FIHT2, FIHT3 method is implemented by a single operation of multiplication for generating one point of Hough curve. Furthermore, it was known experimentally that the FIHT3 method provides the complete accuracy and the fastest computation among any other methods that employ the conventional simultaneous recurrence formula. Moreover, the high-speed algorithm which uses shift operation instead of multiplication was also shown in this paper.
View full abstract
-
Shoichi Shimizu, Hironobu Fujiyoshi, Hiroshi Sakai, Takeo Kanade, Yuji ...
2009 Volume 129 Issue 5 Pages
932-939
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Drivers of vehicles focus their gaze in the direction of movement, driver guesses an optimum route using the white road line and the delineators. However, the range that can be clearly seen in the headlights is limited, it is difficult to guess the optimum route. This paper proposes a method that estimate road contour using delineators. The road contours are estimated from the 3D positions of delineators located on the sides of roads, which are extracted using a circle detection filter. Then, clothoid curve is applied to the delineators and the parameters of clothoid curve are obtained. This classifies the parameter into four kinds of curves using support vector machine. In simulation experiment, we create a virtual road. A classification rate was 86.9 %. Our method was able to classify the road contour by high accuracy.
View full abstract
-
Tatsuya Nakanishi, Kenji Terabayashi, Kazunori Umeda
2009 Volume 129 Issue 5 Pages
940-946
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
An intelligent room which recognizes gestures and supports people is expected in various situations in recent years. This paper proposes a method to recognize mouth motion, i.e., a method of lip reading, to indicate an object like a home appliance in an intelligent room. The method first detects the operator's face. Then mouth region is extracted from the face region using the fact that inside of mouth is dark. Dynamic Programming (DP) matching is applied to a sequence of low-resolution images of the mouth region and the mouth motion of speaking a word is recognized. The proposed method overcomes the disadvantage of image-based methods that they are not robust to the change of distances between an operator and a camera. Additionally, the proposed method can cope with small displacement of mouth position while speaking by considering one-pixel offsets for low-resolution images and using nine shifted images to obtain the smallest distance. The effectiveness of the proposed method is evaluated by experiments to recognize four words that are typical names of home appliances.
View full abstract
-
Takuya Minagawa, Hideo Saito
2009 Volume 129 Issue 5 Pages
947-955
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In this paper, we present an object category recognition method for an information search system which is queried by camera of mobile phone or by servers of internet services. In such a system, processing speed is an important requirement.
To improve processing speed, the hierarchical object category recognition technique proposed by Serre
(23) is modified using Haar-Like features, vector quantization of feature models, and reduction of processing area. In addition, by retaining the information of each feature's position, it compensates the accuracy which is a little reduced in exchange of processing speed. We implemented this method to web server, and proved this system can work in practical processing time. Through the experiment for Caltech-101 image database and natural scene category images, we also confirm the accuracy of our approach.
View full abstract
-
Go Migiyama, Atsuhiko Sugimura, Atsushi Osa, Hidetoshi Miike
2009 Volume 129 Issue 5 Pages
956-962
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Recently, digital cameras are offering technical advantages rapidly. However, the shot image is different from the sight image generated when that scenery is seen with the naked eye. There are blown-out highlights and crushed blacks in the image that photographed the scenery of wide dynamic range. The problems are hardly generated in the sight image. These are contributory cause of difference between the shot image and the sight image. Blown-out highlights and crushed blacks are caused by the difference of dynamic range between the image sensor installed in a digital camera such as CCD and CMOS and the human visual system. Dynamic range of the shot image is narrower than dynamic range of the sight image. In order to solve the problem, we propose an automatic method to decide an effective exposure range in superposition of edges. We integrate multi-step exposure images using the method. In addition, we try to erase pseudo-edges using the process to blend exposure values. Afterwards, we get a pseudo wide dynamic range image automatically.
View full abstract
-
Hirotaka Ohta, Kazuhiko Yamamoto
2009 Volume 129 Issue 5 Pages
963-969
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
When character recognition is made from low-resolution characters of motion image, it is the general idea to restructure high-resolution image first by using sequence of the low-resolution images and then extract features from the constructed high-resolution image. In this paper, we propose a new method in which the direct extraction of features from the low-resolution images is made first, and then reconstructing high accuracy feature from sequence of the feature. We show the advantage of our proposed method over ordinary method on theoretical and recognition experiment.
View full abstract
-
Koh Wataoka, Shunichiro Oe
2009 Volume 129 Issue 5 Pages
970-976
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Stereo method is used to measure the distance from camera to object widely. When a surface of object does not have any textures, for example a white wall, it is difficult to search the corresponding points between the left and right images. In this research, we propose a distance measurement method by using RGB pattern in order to solve the missing corresponding-points problem. The RGB pattern with some characteristic are projected to the objects by projector. In this method the RGB pattern captured by CCD camera is changed to binary RGB pattern, the corresponding-points search is performed from this RGB binary pattern. We use the index based on the XOR for the RGB binary pattern as the index used in the corresponding-points search. Since this method does not use normalized cross correlation, it has the merit which high-speed processing is possible. In this paper we verify the validity of proposed method by using some experiments.
View full abstract
-
Atsutoshi Shimeno, Seiichi Uchida, Ryo Kurazume, Rin-ichiro Taniguchi, ...
2009 Volume 129 Issue 5 Pages
977-984
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
Tracking of a moving robot in surveillance video is an important task for coexistence of human beings with robots. An essential technology to manage coexistence environment of human beings and moving robots is separation and tracking of moving robots. For this task, the moving robot should be separated from other moving objects, i.e., human beings. We assume that the robot provides its additional motion information to the surveillance system to ease the task. The robot can be tracked from the other objects as a moving region being consistent with the additional motion information. For this purpose, we modify a tracking algorithm based on particle filter in order to incorporate the additional motion information. The results of an experiment on real surveillance video sequences have indicated that the proposed framework can separate and track a moving robot under the existence of several walking persons.
View full abstract
-
Norisuke Takao, Zhuo Liu, Shigeo Wada
2009 Volume 129 Issue 5 Pages
985-992
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
In this paper, we propose a robust feature extraction method for texture image retrieval, and verify the effectiveness. In this method, the original image is transformed to log-polar autocorrelation image. The log-polar image is decomposed by discreet wavelet transform to obtain the approximation images. We extract the edge component from the approximation image by sobel filter processing, and we use higher order local autocorrelation as feature vector. Using this method to texture image, we obtain the robust feature for geometrical (shift, rotation, scale) and illumination distortions.
In order to show the effectiveness of this method, we estimated the similarity between images by using the feature vector. According to the result of computer simulation, we verified the higher retrieval rate to geometrical and illumination distorted images than the conventional approaches.
View full abstract
-
Kasemsuk Sepsirisuk, Kazuhiko Hamamoto, Kiyoaki Atsuta, Shozo Kondo
2009 Volume 129 Issue 5 Pages
993-1001
Published: May 01, 2009
Released on J-STAGE: May 01, 2009
JOURNAL
FREE ACCESS
This paper proposes a new correlation-based watermarking method using the wavelet tree and the mathematical morphology. In this method a watermark is a two-dimensional pseudorandom array of {-1, 1} with the same size as a host image to be watermarked. The watermark is embedded into a Resilient Tree Structure (RTS) which is created by applying the mathematical morphology, the dilation operation, to the wavelet tree. The dilation operation improves the reliability of the proposed method by increasing the number of coefficients involved in watermarking. Furthermore an improved perceptual weighting function of the Human Visual System is used for preserving the image quality. In a watermark detection process the linear correlation between the watermark and the coefficients of the RTS of a tested image is computed to judge the presence of the watermark. The experimental results show that the proposed method outperforms current correlation-based watermarking methods.
View full abstract