電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
133 巻, 1 号
選択された号の論文の32件中1~32を表示しています
特集:2012 Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV2012)
巻頭言
特集論文
<システム・計測・制御>
  • Kensuke Tobitani, Kunihito Kato, Kazuhiko Yamamoto
    2013 年 133 巻 1 号 p. 2-7
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In our lives, we look at a specific object mainly in central vision and peripheral vision is an auxiliary. However, peripheral vision has some outstanding features such as sensitiveness to light-dark change and moving target. In recent years, these features are focused on some fields such as sports vision and visual inspection, and many applications to these fields have been researched. In this study, we focused attention on information presentation systems designed on the basis that these should be looked in central vision. In consideration of these peripheral vision characteristics, it is more effective under an emergent situation which needs quick thinking than presenting of an information calling for attention in central vision to present it in peripheral vision. Hence, we verified an appropriate vision angle to present visual signal such as the information calling for attention by an experiment measuring reaction time to blinking LEDs setting around a test subject as simple visual stimuli. This measuring experiment proved that the reaction time to visual stimuli in peripheral vision was smaller than the reaction time to visual stimuli in central vision.
  • Hiroyuki Ukida, Masafumi Miwa
    2013 年 133 巻 1 号 p. 8-17
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    This study proposes an information transmission device constructed by a LED panel and a video camera. The LED panel displays various patterns of AR markers, QR and micro QR codes. Then, from images taken by the video camera, some information and 3D position and pose of the camera from the LED panel are extracted. We are planning to apply this system for a communication between moving objects. In this paper, we propose a method to distinguish AR markers, QR and micro QR codes, and show results of discrimination rates and the altitude estimation accuracy in the experiment.
<知能,ロボティクス>
  • Hisato Fukuda, Satoshi Mori, Katsutoshi Sakata, Yoshinori Kobayashi, Y ...
    2013 年 133 巻 1 号 p. 18-27
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In order to be effective, it is essential for service robots to be able to recognize objects in complex environments. However, it is difficult for them to recognize objects autonomously without any mistakes in a real-world environment. Thus, in response to this challenge we conceived of an object recognition system that would utilize information about target objects acquired from the user through simple interaction. In this paper, we propose an interactive object recognition system using multiple attribute information (color, shape, and material), and introduce a robot using this system. Experimental results confirmed that the robot could indeed recognize objects by utilizing multiple attribute information obtained through interaction with the user.
  • Mohammad Abu Yousuf, Yoshinori Kobayashi, Yoshinori Kuno, Keiichi Yama ...
    2013 年 133 巻 1 号 p. 28-39
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    This paper presents a model for a mobile museum guide robot that can dynamically establish an appropriate spatial relationship with visitors during explanation of an exhibit. We began by observing and videotaping scenes of actual museum galleries where human guides explained exhibits to visitors. Based on the analysis of the video, we developed a mobile robot system able to guide multiple visitors inside the gallery from one exhibit to another. The robot has the capability to establish a type of spatial formation known as the “F-formation” at the beginning of its explanation after arriving near any exhibit, a feature aided by its ability to employ the “pause and restart” strategy at certain moments in its talk to draw the visitors' attention towards itself. The robot is also able to identify and invite any bystanders around itself into its ongoing explanation, thereby reconfiguring the F-formation. The system uses spatial information from a laser range sensor and the heads of visitors are tracked using three USB cameras. A particle filter framework is employed to track the visitors' positions and body orientation, and the orientations of their heads, based on position data and panorama images captured by the laser range sensor and the USB cameras, respectively. The effectiveness of our method was confirmed through experiments.
  • Andrey Vavilin, Kang-Hyun Jo
    2013 年 133 巻 1 号 p. 40-46
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    This paper describes an approach for detection and tracking of multiple moving objects in a cluttered environment via tracking of the local image regions. This method is designed to work with image sequences taken from moving cameras. The first image of the sequence is used to compose a triangular grid which vertices are used as feature points. The grid is optimized in order to increase the number of elements in regions with higher level of details. This grid is then used as an initial for the next frame in order to track local features. Neighborhood of each vertex is used to generate color distribution model which is used as a feature vector for tracking. In the second part of the proposed algorithm grids of two consistent frames are used to estimate motion of the correspondent vertices in order to form a motion field. This field is used to find dominant motions and make assumptions for background-foreground motion. The background motion is then used to estimate camera motion parameters. In order to improve robustness of the algorithm context analysis was used. Input images are analyzed to exclude regions bad for tracking (such as trees, road, sky or clouds) from the further processing and verify if moving parts of the scene are more likely to belong to a background or to one of the object class (vehicles, humans, etc).
<メディア情報,ユーザ・インタフェース>
  • Hiroyoshi Tsuru, Itaru Kitahara, Yuichi Ohta
    2013 年 133 巻 1 号 p. 47-53
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Pose estimation (calibration) of a mobile camera is one of the most important research issues to realize geometrical consistency between the real and the virtual world in mixed-reality. This paper proposes a method to estimate the pose of a mobile camera in a dynamic scene by using an environmental stereo camera. Sequential 3D-maps of the capturing environment are generated in real-time by the stereo images, which include both static objects and dynamic objects such as people. By using the 3D point of dynamic objects as landmarks for camera calibration, it is possible to realize a robust pose estimation method in a dynamic environment. Experimental evaluations were conducted using both simulation CG images and captured images of a real scene to demonstrate the effectiveness of our proposed method.
  • Takenori Hara, Chiho Toyono, Tomoya Tachikawa, Goro Motai, Keisuke Shu ...
    2013 年 133 巻 1 号 p. 54-60
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    An Augmented Reality (AR) guide system that does not spoil a space design is important for a museum. In this paper, we propose an AR guide system that uses an Light Emitting Diode (LED) as a marker. An LED marker does not spoil a space design and yields high recognition accuracy even from a long distance. For a user, an LED marker is intuitive and can provide the content in a highly enjoyable manner. We also discuss fusion of the low-cost LED marker recognition method and the accelerometer-gyroscope sensor. Our system can easily be used with unmodified examples of commercially available mobile devices.
  • Masato Konishi, Yasuhiro Azuma, Noriko Nagata, Young-suk Shin
    2013 年 133 巻 1 号 p. 61-66
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    We designed a method for estimating the subjective age of a person. Using this method, one evaluates one's own age by estimating whether a person shown in a facial image looks older or younger than oneself. Thus far, experiments have shown that Japanese and Americans tend to underestimate their subjective ages. In this study, we conducted estimation experiments involving subjects who were racially Japanese—some of whom were from Japan and others who were raised in the Korean culture—and investigated the differences between the two groups' results. Experiments were performed in which Korean participants viewed Korean and Japanese facial images, and the Japanese participants also viewed Korean facial images. Through these experiments, it was confirmed that the bias values of the subjective ages were negative, indicating that a younger self-identity occurs despite differences in Japanese and Korean societies and cultures.
<音声画像処理・認識>
  • Stephen Karungaru, Kenji Terada, Minoru Fukumi
    2013 年 133 巻 1 号 p. 67-73
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Abandoned objects in sensitive congested public areas like airports or train stations pose a major security threat. Therefore, in this paper, to solve the problem, we propose a novel method for the detection of abandoned luggage, tracking its owner and extracting any other necessary information using multi-threshold pixel based dynamic background model and Earth Mover's Distance (EMD) signature matching. The public area selected for experiments is a train station. The background model is created by learning the color variance in all pixels by allowing multiple thresholds per pixel. After learning, pruning unnecessary thresholds improves the foreground extraction speed. Blob noise and outliers including shadows are deleted by a binarization method based on the Discriminant Analysis (DA) method. A color signature created using the HSV color space that is fast to process, is matched using the EMD metric to track blobs. Once an abandoned object candidate is found, a slower but more accurate SURF algorithm is used to extract feature points for further tracking. Stationary objects after this phase are considered to be abandoned luggage. To prove the effectiveness of the proposed method, experiments are conducted using the i-Lids dataset (2007 IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS 2007)) achieving a frame-based average accuracy of about 93%.
  • I Gede Pasek Suta Wijaya, Keiichi Uchimura, Gou Koutaki
    2013 年 133 巻 1 号 p. 74-83
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    This paper present an alternative approach to PDLDA for incremental data which belong to old/known and new classes called as incremental PDLDA (IPDLDA). The IPDLDA not only can overcome the main problem of the conventional LDA in terms of large computational cost for retraining but also can provide almost the same optimum projection matrix (W) as that original LDA for each incremental data. The proposed method can be realized by redefining new formulation for updating the between class scatter (Sb) using constant global mean assignment and simplifying the equation for updating the within class scatter (Sw). These new updating algorithms make the IPDLDA require much less time complexity for retraining the incremental data. In addition, they also make the IPDLDA have almost the same properties as the original one in terms of the power discriminant and scattering matrix. To know the ability of the IPDLDA on features clustering, we implement it for face recognition with the DCT-based holistic features as the dimensional reduction of raw face image. The experimental results show the proposed method provides robust recognition rate and less processing time than that of GSVD-ILDA and SP-ILDA in several challenges databases when the experiments were done by retraining the system using two scenarios: the incremental data belonging to new and old classes.
  • Baowei Lin, Toru Tamaki, Marcos Slomp, Bisser Raytchev, Kazufumi Kaned ...
    2013 年 133 巻 1 号 p. 84-90
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper we propose a method for detecting 3D keypoints in a 3D point cloud for robust real-time camera tracking. Assuming that there are a number of images corresponding to the 3D point cloud, we define a 3D keypoint as a point that has corresponding 2D keypoints in many images. These 3D keypoints are expected to appear with high probability as 2D keypoints in newly taken query images. For 3D-2D matching, we embed 2D feature descriptors into the 3D keypoints. Experimental results with 3D point clouds of indoor and outdoor scenes show that the extracted 3D keypoints can be used for matching with 2D keypoints in query images.
  • Ryota Mukai, Tomoyuki Araki, Toshio Asano
    2013 年 133 巻 1 号 p. 91-96
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper, tennis games are evaluated quantitatively, and skills of players are analyzed by computer vision technology. Two cameras are used to detect the three-dimensional positions of balls and players. Subtractions of images are used to extract ball images after eliminating ball shadows. The prediction of ball position by using velocity vectors of previous images enables accurate and high-speed ball detection. Examples of game plays are shown quantitatively, and the plays are discussed in detail. Sequential images of 73 return strokes and 15 services by five players are analyzed quantitatively. Six parameters that represent tennis play skill are introduced for the quantitative evaluation. These skill factors are evaluated using the multiple regression method. Among these, it is concluded that ball scattering and ball shot speed are strongly related to skill scores, and the skill factors differ between players.
  • Kazuki Ueno, Jun'ichi Yamaguchi
    2013 年 133 巻 1 号 p. 97-102
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    The authors describe a method for object identification using Hough data verification. In this study, to cope with an image deformation by camera shooting condition change, we pay attention to a distribution pattern of Hough data. A positional relationship of the peaks as Hough data shows tendency to be kept, in case of camera shooting condition change. In the paper, the authors describe the relationship between Hough transform and shooting condition change, and explain the object identification method based on a distinction of the peak distribution pattern by Matched filter. Furthermore, a result of experiment using car images is shown. As a result, high identification rate was obtained in case of view point change. It was confirmed that our method is useful for object identification in various view points.
  • Baowei Lin, Yuji Ueno, Kouhei Sakai, Toru Tamaki, Bisser Raytchev, Kaz ...
    2013 年 133 巻 1 号 p. 103-110
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper we propose a method for 3D scene change detection. We particularly target the case when a 3D scene can be changed by disasters and accidents. Assuming that there is a 3D point cloud of the 3D scene reconstructed before the change by using a set of training images, the camera positions of newly taken query images with small changes can be estimated. To find a region of change in the query images, first, the nearest image of a query image is selected from the training images. Then, the matching between the query image and the nearest images is computed for finding a set of non-matching points as a region of change. These change regions are visualized by projecting 3D points back only to those regions. Experimental results show that our method can detect change areas correctly.
  • Yohei Minekawa, Kenji Nakahira, Ryo Nakagaki, Yuji Takagi
    2013 年 133 巻 1 号 p. 111-116
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    An efficient method for optimizing the parameters used for image processing is described that applies estimation error reduction to design of experiments (DOE). The traditional DOE optimization method is used to estimate the evaluation scores of all parameter sets and to rank them using a small number of actual scores. Because the search for the optimal parameter set is done in the order of the estimated scores for all parameter sets, the ranking accuracy, which strongly depends on the estimation error, is important. We introduce a function for reducing the estimation errors for the higher ranked parameter sets. The proposed parameter optimization method was evaluated by applying it to parameter optimization for industrial image defect area extraction. Evaluation using three datasets showed that the parameter sets selected by the proposed method had close to the highest actual score and that the number of image processings was 1/57 that of a full search procedure.
  • Dao Huu Hung, Hideo Saito
    2013 年 133 巻 1 号 p. 117-127
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper, we present a video-based method of detecting fall incidents of the elderly living alone. We propose using the measures of humans' heights and occupied areas to distinguish three typical states of humans: standing, sitting, and lying. Two relatively orthogonal views are utilized, in turn, simplifying the estimation of occupied areas as the product of widths of the same person, observed in two cameras. However, the feature estimation based on sizes of silhouettes varies across the viewing window due to the camera perspective. To deal with it, we suggest using Local Empirical Templates (LET) that are defined as the sizes of standing people in local image patches. Two important characteristics of LET are: (1) LET in unknown scenes can be easily extracted by an automatic manner, and (2) by its nature, LET hold the perspective information that can be used for feature normalization. The normalization process is not only to cancel the perspective but also to take the features of standing people as the baselines. We realize that heights of standing people are greater than that of sitting and lying people. People in standing states also occupy smaller areas than whom in sitting and lying states. Thus, three humans' states fall into three separable regions of the proposed feature space, composing of normalized heights and normalized occupied areas. Fall incidents can be inferred from time-series analysis of human state transition. We test the performance of our method on 24 video samples in Multi-view Fall Dataset(1) leading to high detection rates and low false alarms, which outperform the state-of-the-art methods(2)(3) tested on the same benchmark dataset.
  • Kenji Iwata, Yutaka Satoh, Ryushi Ozaki, Katsuhiko Sakaue
    2013 年 133 巻 1 号 p. 128-133
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Extracting a robust feature set in a given image sequence is an important fundamental technique that influences the performance of various computer vision systems. A statistic reach feature (SRF) is a stable feature for robust background subtraction. The SRF is defined as two arbitrary points that maintain the sign of the increase and decrease of the brightness in the image sequence. This paper describes the process of accelerating the construction of a background model using some point pairs chosen at random.
  • Gou Koutaki, Keiichi Uchimura
    2013 年 133 巻 1 号 p. 134-141
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    We propose a new template matching method to detect the XY displacement and the rotational angle accurately for the chipped target object in the complex background. The proposed method is based on the Eigen template method. The eigen template method compresses many rotational templates by using eigen decomposition technique. This method can detect the target object same as the accuracy of Normalized Cross Correlation (NCC) and it is less computation time. However, the eigen template method is not robust for the occluded (chipped) target and the complex background. In this paper, we improve the eigen template method to use the shape based image similarity. The simulation experiments show that the proposed method can detect the target object correctly for the complex images.
  • —Towards Clickable Real World—
    Atsushi Shimada, Vincent Charvillat, Hajime Nagahara, Rin-ichiro Tanig ...
    2013 年 133 巻 1 号 p. 142-149
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Clickable Real World is a new framework to realize an intuitive information search with a mobile terminal. To achieve the goal, we tackle two challenging tasks. One is landmark detection from an observing scene. Our approach detects a landmark based on an image prior. The prior is not given manually. Instead, it is generated automatically from the training samples collected from photo sharing website. Another challenging task is image annotation assisted by geolocation. We use the location of a user who uses a mobile terminal, and geolocation where the training sample images were taken. Two probabilistic models are generated to achieve image annotation. One is image-based labeling which utilizes the co-occurrence between image features and label features. The other is label-based localization which uses the consensus about the label given around the geolocation among photographers. We combine two probabilistic approaches to improve the accuracy of image annotation. We demonstrate this approach for 87 scenes in the world.
  • Fitri Utaminingrum, Keiichi Uchimura, Gou Koutaki
    2013 年 133 巻 1 号 p. 150-158
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    This research aimed to reduce noise in the image corrupted by Gaussian noise. Gaussian noise is able change the image pixel data as a whole. We introduced the Hybrid Filter based on fuzzy methods to reduce the Gaussian noise. The hybrid filter combines Fuzzy Aliasing Filter (FAF) and Mean Impulse Fuzzy (MIF). MIF is a filter that processes the degree of linear membership function on the degraded images, which use 3×3 window. Meanwhile, the highest and lowest value of an element in the 3×3 window was replaced with the average value of an element in the 3×3 windows without includes the highest and lowest elements in the calculation. MIF method was more suitable for the variance noise content more than 20%. Conversely, the content of the noise variance less than 20% used FAF. The degree of membership function value on FAF was obtained from the Gaussian membership function. FAF method adopted Aliasing Filter technique and the linear approach, which used the mean value of the regional block. The quality of Hybrid Filter was compared to the Weighting Mean Filter, Adaptive Wiener Filer, Optimum Aliasing Filter and Optimum Weighting Gaussian Aliasing Filter. Our method was optimal to reduce Gaussian noise.
  • Darlis Herumurti, Keiichi Uchimura, Gou Koutaki, Takumi Uemura
    2013 年 133 巻 1 号 p. 159-168
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper, we introduce another approach for road extraction from Digital Surface Model (DSM) Data. DSM Data is based on elevation of the surface, and the benefit of using the DSM data is to avoid the problem that caused by shadow of the building, trees and so on. For road extraction, we use a fundamental technique using segmentation processing. First, we employ the Adaptive Resonance Theory (ART) Model; we use Fuzzy ART and Symmetric Fuzzy ART (S Fuzzy ART) method, the unsupervised learning for analog patterns. However, this method should be followed a labeling process to separate the same cluster but in the different region. Therefore, this method requires a relatively long processing time. The second approach for segmentation uses the region growing method based on a similarity criterion. A threshold should be provided to measure the homogeneous of the region with the adjacent. However, to determine a threshold is not easy. In this paper, we proposed a Mixed ART that combines the Fuzzy ART and S Fuzzy ART method. Furthermore, we compromise the Mixed ART method and the Region Growing method to improve the performance. This method uses the Region Growing for segmentation process and uses the resonance approach for homogeneity measurement. The advantage of using the Region Growing method, we could control the seed point to achieve a satisfactory performance for extracting the road. The experimental result shows that the proposed method increases the performance up to four times faster without sacrificing the quality.
論文
<生体医工学・福祉工学>
<システム・計測・制御>
<音声画像処理・認識>
  • 藤井 淳広, 中本 昌由, 棟安 実治, 大野 修一
    2013 年 133 巻 1 号 p. 185-192
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    In this paper, we consider a data-embedding method in the discrete wavelet transform (DWT) domain of the digital images. A bit-pattern embedding method with Human visual system (HVS) and correlation-based detection have been proposed by Kino and Wada based on the watermarking scheme, which is referred to as the conventional method. In the conventional method, the capacity of embedding data depends on the ‘potential’ correlation between the DWT coefficients of the host image and watermark signal. We propose the new embedding and detection algorithm without affected by the ‘potential’ correlation in order to increase the capacity of embedding data. The data-embedding and detection in the proposed method are performed by considering the absolute value of the correlation between the watermarked DWT coefficients and watermark signal. In the simulation, we show that the capacity of the embedding data in the proposed method is increased as compared with the conventional method under the same PSNR value (image quality).
  • 山本 琢麿, 服部 公央亮, 田口 亮, 保黒 政大, 梅崎 太造
    2013 年 133 巻 1 号 p. 193-199
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Face is one of the most important factor to communicate with people. Therefore, many face recognition systems have been developed. These systems use images captured by a camera to recognize a person. However, facial images are various due to some external factors, such as the position of the body, luminous surroundings, facial expressions, and so on. These variations make recognition accuracy worse, therefore face recognition systems have been required to overcome this problem. Three-dimensional face data can solve this problem, but three-dimensional measuring systems are expensive. We propose a method that estimates three-dimensional face data from two-dimensional image which is captured by a camera. This method uses an artificial neural networks which learned the relations between two-dimensional facial images and three-dimensional facial data. We generate these data measured by a CCD camera and a laser range finder for teaching the artificial neural networks. The experimental results show that the proposed method is effective.
  • Senaka Amarakeerthi, Chamin Morikawa, Tin Lay Nwe, Liyanage C. De Silv ...
    2013 年 133 巻 1 号 p. 200-210
    発行日: 2013/01/01
    公開日: 2013/01/01
    ジャーナル フリー
    Since the earliest studies of human behavior, emotions have attracted attention of researchers in many disciplines, including psychology, neuroscience, and lately computer science. Speech is considered a salient conveyor of emotional cues, and can be used as an important source for emotional studies. Speech is modulated for different emotions by varying frequency- and energy-related acoustic parameters such as pitch, energy, and formants. In this paper, we explore analyzing inter- and intra-subband energy variations to differentiate six emotions. The emotions considered are anger, disgust, fear, happiness, neutral, and sadness. In this research, Two-Layered Cascaded Subband Cepstral Coefficients (TLCS-CC) analysis was introduced to study energy variations within low and high arousal emotions as a novel approach for emotion classification. The new approach was compared with Mel frequency cepstral coefficients (MFCC) and log frequency power coefficients (LFPC). Experiments were conducted on the Berlin Emotional Data Corpus (BECD). With energy-related features, we could achieve average accuracy of 73.9% and 80.1% for speaker-independent and -dependent emotion classification respectively.
<ソフトコンピューティング・学習>
 
部門記事
 
feedback
Top