IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E101.D , Issue 5
Showing 1-30 articles out of 30 articles from the selected issue
Special Section on Machine Vision and its Applications
  • Norimichi UKITA
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1221
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS
    Download PDF (65K)
  • Hiroshi FUKUI, Takayoshi YAMASHITA, Yuji YAMAUCHI, Hironobu FUJIYOSHI, ...
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1222-1231
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Pedestrian attribute information is important function for an advanced driver assistance system (ADAS). Pedestrian attributes such as body pose, face orientation and open umbrella indicate the intended action or state of the pedestrian. Generally, this information is recognized using independent classifiers for each task. Performing all of these separate tasks is too time-consuming at the testing stage. In addition, the processing time increases with increasing number of tasks. To address this problem, multi-task learning or heterogeneous learning is performed to train a single classifier to perform multiple tasks. In particular, heterogeneous learning is able to simultaneously train a classifier to perform regression and recognition tasks, which reduces both training and testing time. However, heterogeneous learning tends to result in a lower accuracy rate for classes with few training samples. In this paper, we propose a method to improve the performance of heterogeneous learning for such classes. We introduce a rarity rate based on the importance and class probability of each task. The appropriate rarity rate is assigned to each training sample. Thus, the samples in a mini-batch for training a deep convolutional neural network are augmented according to this rarity rate to focus on the classes with a few samples. Our heterogeneous learning approach with the rarity rate performs pedestrian attribute recognition better, especially for classes representing few training samples.

    Download PDF (4095K)
  • Atsushi KAWASAKI, Kosuke HARA, Hideo SAITO
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1232-1242
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    We propose a method of line-based Simultaneous Localization and Mapping (SLAM) using non-overlapping multiple cameras for vehicles running in an urban environment. It uses corresponding line segments between images taken by different frames and different cameras. The contribution is a novel line segment matching algorithm by warping processing based on urban structures. This idea significantly improves the accuracy of line segment matching when viewing direction are very different, so that a number of correspondences between front-view and rear-view cameras can be found and the accuracy of SLAM can be improved. Additionally, to enhance the accuracy of SLAM we apply a geometrical constraint of urban area for initial estimation of 3D mapping of line segments and optimization by bundle adjustment. We can further improve the accuracy of SLAM by combining points and lines. The position error is stable within 1.5m for the entire image dataset evaluated in this paper. The estimation accuracy of our method is as high as that of ground truth captured by RTK-GPS. Our high accuracy SLAM algorithm can be apply for generating a road map represented by line segments. According to an evaluation of our generating map, true positive rate around the vehicle exceeding 70% is achieved.

    Download PDF (5704K)
  • Naoki HOSOYA, Atsushi MIYAMOTO, Junichiro NAGANUMA
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1243-1250
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Nuclear power plants require in-vessel inspections for soundness checks and preventive maintenance. One inspection procedure is visual testing (VT), which is based on video images of an underwater camera in a nuclear reactor. However, a lot of noise is superimposed on VT images due to radiation exposure. We propose a technique for improving the quality of those images by image processing that reduces radiation noise and enhances signals. Real-time video processing was achieved by applying the proposed technique with a parallel processing unit. Improving the clarity of VT images will lead to reducing the burden on inspectors.

    Download PDF (1830K)
  • Ziwei DENG, Yilin HOU, Xina CHENG, Takeshi IKENAGA
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1251-1259
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    3D ball tracking is of great significance in ping-pong game analysis, which can be utilized to applications such as TV contents and tactic analysis, with some of them requiring real-time implementation. This paper proposes a CPU-GPU platform based Particle Filter for multi-view ball tracking including 4 proposals. The multi-peak estimation and the ball-like observation model are proposed in the algorithm design. The multi-peak estimation aims at obtaining a precise ball position in case the particles' likelihood distribution has multiple peaks under complex circumstances. The ball-like observation model with 4 different likelihood evaluation, utilizes the ball's unique features to evaluate the particle's similarity with the target. In the GPU implementation, the double-queue structure and the vectorized data combination are proposed. The double-queue structure aims at achieving task parallelism between some data-independent tasks. The vectorized data combination reduces the time cost in memory access by combining 3 different image data to 1 vector data. Experiments are based on ping-pong videos recorded in an official match taken by 4 cameras located in 4 corners of the court. The tracking success rate reaches 99.59% on CPU. With the GPU acceleration, the time consumption is 8.8 ms/frame, which is sped up by a factor of 98 compared with its CPU version.

    Download PDF (1907K)
  • Tingting HU, Takeshi IKENAGA
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1260-1269
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactive applications which call for higher frame rate and lower delay for a better experience. The large amount of processing data and the complex computation in a local feature based matching system, make it difficult to achieve a high process speed and ultra-low delay matching with limited resource. Aiming at a matching system with the process speed of more than 1000 fps and with the delay of less than 1 ms/frame, this paper puts forward a local binary feature based matching system with field-programmable gate array (FPGA). Pixel selection based 4-1-4 parallel matching and intensity directed symmetry are proposed for the implementation of this system. To design a basic framework with the high process speed and ultra-low delay using limited resource, pixel selection based 4-1-4 parallel matching is proposed, which makes it possible to use only one-thread resource consumption to achieve a four-thread processing. Assumes that the orientation of the keypoint will bisect the patch best and will point to the region with high intensity, intensity directed symmetry is proposed to calculate the keypoint orientation in a hardware friendly way, which is an important part for a rotation-robust matching system. Software experiment result shows that the proposed keypoint orientation calculation method achieves almost the same performance with the state-of-art intensity centroid orientation calculation method in a matching system. Hardware experiment result shows that the designed image process core supports to process VGA (640×480) videos at a process speed of 1306 fps and with a delay of 0.8083 ms/frame.

    Download PDF (3048K)
  • Xianxu HOU, Jiasong ZHU, Ke SUN, Linlin SHEN, Guoping QIU
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1270-1277
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce object specific channel (OSC) and systematically identifying it for the human faces has been developed. In this paper, we introduce a multi-scale approach to constructing robust face heatmaps based on OSC features for rapidly filtering out non-face regions thus significantly improving search efficiency for face detection. We show that multi-scale OSC can be used to develop simple and compact face detectors in unconstrained settings with state of the art performance.

    Download PDF (5850K)
  • Taishi OGAWA, Atsushi NAKAZAWA, Toyoaki NISHIDA
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1278-1287
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    We present a human point of gaze estimation system using corneal surface reflection and omnidirectional image taken by spherical panorama cameras, which becomes popular recent years. Our system enables to find where a user is looking at only from an eye image in a 360° surrounding scene image, thus, does not need gaze mapping from partial scene images to a whole scene image that are necessary in conventional eye gaze tracking system. We first generate multiple perspective scene images from an omnidirectional (equirectangular) image and perform registration between the corneal reflection and perspective images using a corneal reflection-scene image registration technique. We then compute the point of gaze using a corneal imaging technique leveraged by a 3D eye model, and project the point to an omnidirectional image. The 3D eye pose is estimate by using the particle-filter-based tracking algorithm. In experiments, we evaluated the accuracy of the 3D eye pose estimation, robustness of registration and accuracy of PoG estimations using two indoor and five outdoor scenes, and found that gaze mapping error was 5.546 [deg] on average.

    Download PDF (12586K)
  • Ichraf LAHOULI, Robby HAELTERMAN, Joris DEGROOTE, Michal SHIMONI, Geer ...
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1288-1295
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Video surveillance from airborne platforms can suffer from many sources of blur, like vibration, low-end optics, uneven lighting conditions, etc. Many different algorithms have been developed in the past that aim to recover the deblurred image but often incur substantial CPU-time, which is not always available on-board. This paper shows how a “strap-on” quasi-Newton method can accelerate the convergence of existing iterative methods with little extra overhead while keeping the performance of the original algorithm, thus paving the way for (near) real-time applications using on-board processing.

    Download PDF (2243K)
  • Masahiro YAMAGUCHI, Trong Phuc TRUONG, Shohei MORI, Vincent NOZICK, Hi ...
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1296-1307
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    In this paper, we propose a method to generate a three-dimensional (3D) thermal map and RGB + thermal (RGB-T) images of a scene from thermal-infrared and RGB images. The scene images are acquired by moving both a RGB camera and an thermal-infrared camera mounted on a stereo rig. Before capturing the scene with those cameras, we estimate their respective intrinsic parameters and their relative pose. Then, we reconstruct the 3D structures of the scene by using Direct Sparse Odometry (DSO) using the RGB images. In order to superimpose thermal information onto each point generated from DSO, we propose a method for estimating the scale of the point cloud corresponding to the extrinsic parameters between both cameras by matching depth images recovered from the RGB camera and the thermal-infrared camera based on mutual information. We also generate RGB-T images using the 3D structure of the scene and Delaunay triangulation. We do not rely on depth cameras and, therefore, our technique is not limited to scenes within the measurement range of the depth cameras. To demonstrate this technique, we generate 3D thermal maps and RGB-T images for both indoor and outdoor scenes.

    Download PDF (4816K)
  • Yoshikatsu NAKAJIMA, Hideo SAITO
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1308-1316
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    We propose a novel object recognition system that is able to (i) work in real-time while reconstructing segmented 3D maps and simultaneously recognize objects in a scene, (ii) manage various kinds of objects, including those with smooth surfaces and those with a large number of categories, utilizing a CNN for feature extraction, and (iii) maintain high accuracy no matter how the camera moves by distributing the viewpoints for each object uniformly and aggregating recognition results from each distributed viewpoint as the same weight. Through experiments, the advantages of our system with respect to current state-of-the-art object recognition approaches are demonstrated on the UW RGB-D Dataset and Scenes and on our own scenes prepared to verify the effectiveness of the Viewpoint-Class-based approach.

    Download PDF (7685K)
  • Gibran BENITEZ-GARCIA, Tomoaki NAKAMURA, Masahide KANEKO
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1317-1324
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    An increasing number of psychological studies have demonstrated that the six basic expressions of emotions are not culturally universal. However, automatic facial expression recognition (FER) systems disregard these findings and assume that facial expressions are universally expressed and recognized across different cultures. Therefore, this paper presents an analysis of Western-Caucasian and East-Asian facial expressions of emotions based on visual representations and cross-cultural FER. The visual analysis builds on the Eigenfaces method, and the cross-cultural FER combines appearance and geometric features by extracting Local Fourier Coefficients (LFC) and Facial Fourier Descriptors (FFD) respectively. Furthermore, two possible solutions for FER under multicultural environments are proposed. These are based on an early race detection, and independent models for culture-specific facial expressions found by the analysis evaluation. HSV color quantization combined with LFC and FFD compose the feature extraction for race detection, whereas culture-independent models of anger, disgust and fear are analyzed for the second solution. All tests were performed using Support Vector Machines (SVM) for classification and evaluated using five standard databases. Experimental results show that both solutions overcome the accuracy of FER systems under multicultural environments. However, the approach which individually considers the culture-specific facial expressions achieved the highest recognition rate.

    Download PDF (3798K)
  • Kazunori AOKI, Wataru OHYAMA, Tetsushi WAKABAYASHI
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1325-1332
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products; that is, there is much within-class variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearance using a limited number of training samples. The proposed method employs maximally stable extremal regions for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearances show that the proposed method achieves promising performance for both logo extraction and recognition.

    Download PDF (1282K)
  • Takumi EGE, Keiji YANAI
    Type: PAPER
    Subject area: Machine Vision and its Applications
    2018 Volume E101.D Issue 5 Pages 1333-1341
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.

    Download PDF (2863K)
Regular Section
  • Hainan ZHANG, Yanjing SUN, Song LI, Wenjuan SHI, Chenglong FENG
    Type: PAPER
    Subject area: Fundamentals of Information Systems
    2018 Volume E101.D Issue 5 Pages 1342-1349
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    The correlation filter-based trackers with an appearance model established by single feature have poor robustness to challenging video environment which includes factors such as occlusion, fast motion and out-of-view. In this paper, a long-term tracking algorithm based on multi-feature adaptive fusion for video target is presented. We design a robust appearance model by fusing powerful features including histogram of gradient, local binary pattern and color-naming at response map level to conquer the interference in the video. In addition, a random fern classifier is trained as re-detector to detect target when tracking failure occurs, so that long-term tracking is implemented. We evaluate our algorithm on large-scale benchmark datasets and the results show that the proposed algorithm have more accurate and more robust performance in complex video environment.

    Download PDF (1363K)
  • Yuma SAKAKIBARA, Shin MORISHIMA, Kohei NAKAMURA, Hiroki MATSUTANI
    Type: PAPER
    Subject area: Computer System
    2018 Volume E101.D Issue 5 Pages 1350-1360
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Engineers and researchers have recently paid attention to Blockchain. Blockchain is a fault-tolerant distributed ledger without administrators. Blockchain is originally derived from cryptocurrency, but it is possible to be applied to other industries. Transferring digital asset is called a transaction. Blockchain holds all transactions, so the total amount of Blockchain data will increase as time proceeds. On the other hand, the number of Internet of Things (IoT) products has been increasing. It is difficult for IoT products to hold all Blockchain data because of their storage capacity. Therefore, they access Blockchain data via servers that have Blockchain data. However, if a lot of IoT products access Blockchain network via servers, server overloads will occur. Thus, it is useful to reduce workloads and improve throughput. In this paper, we propose a caching technique using a Field Programmable Gate Array-based (FPGA) Network Interface Card (NIC) which possesses four 10Gigabit Ethernet (10GbE) interfaces. The proposed system can reduce server overloads, because the FPGA NIC instead of the server responds to requests from IoT products if cache hits. We implemented the proposed hardware cache to achieve high throughput on NetFPGA-10G board. We counted the number of requests that the server or the FPGA NIC processed as an evaluation. As a result, the throughput improved by on average 1.97 times when hitting the cache.

    Download PDF (5421K)
  • Li QUAN, Zhi-liang WANG, Xin LIU
    Type: PAPER
    Subject area: Data Engineering, Web Information Systems
    2018 Volume E101.D Issue 5 Pages 1361-1369
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.

    Download PDF (1365K)
  • Sho MIZUNO, Mitsuhiro HATADA, Tatsuya MORI, Shigeki GOTO
    Type: PAPER
    Subject area: Information Network
    2018 Volume E101.D Issue 5 Pages 1370-1379
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Damage caused by malware has become a serious problem. The recent rise in the spread of evasive malware has made it difficult to detect it at the pre-infection timing. Malware detection at post-infection timing is a promising approach that fulfills this gap. Given this background, this work aims to identify likely malware-infected devices from the measurement of Internet traffic. The advantage of the traffic-measurement-based approach is that it enables us to monitor a large number of endhosts. If we find an endhost as a source of malicious traffic, the endhost is likely a malware-infected device. Since the majority of malware today makes use of the web as a means to communicate with the C&C servers that reside on the external network, we leverage information recorded in the HTTP headers to discriminate between malicious and benign traffic. To make our approach scalable and robust, we develop the automatic template generation scheme that drastically reduces the amount of information to be kept while achieving the high accuracy of classification; since it does not make use of any domain knowledge, the approach should be robust against changes of malware. We apply several classifiers, which include machine learning algorithms, to the extracted templates and classify traffic into two categories: malicious and benign. Our extensive experiments demonstrate that our approach discriminates between malicious and benign traffic with up to 97.1% precision while maintaining the false positive rate below 1.0%.

    Download PDF (1345K)
  • Qian LI, Xiaojuan LI, Bin WU, Yunpeng XIAO
    Type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2018 Volume E101.D Issue 5 Pages 1380-1392
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    In social networks, predicting user behavior under social hotspots can aid in understanding the development trend of a topic. In this paper, we propose a retweeting prediction method for social hotspots based on tensor decomposition, using user information, relationship and behavioral data. The method can be used to predict the behavior of users and analyze the evolvement of topics. Firstly, we propose a tensor-based mechanism for mining user interaction, and then we propose that the tensor be used to solve the problem of inaccuracy that arises when interactively calculating intensity for sparse user interaction data. At the same time, we can analyze the influence of the following relationship on the interaction between users based on characteristics of the tensor in data space conversion and projection. Secondly, time decay function is introduced for the tensor to quantify further the evolution of user behavior in current social hotspots. That function can be fit to the behavior of a user dynamically, and can also solve the problem of interaction between users with time decay. Finally, we invoke time slices and discretization of the topic life cycle and construct a user retweeting prediction model based on logistic regression. In this way, we can both explore the temporal characteristics of user behavior in social hotspots and also solve the problem of uneven interaction behavior between users. Experiments show that the proposed method can improve the accuracy of user behavior prediction effectively and aid in understanding the development trend of a topic.

    Download PDF (1829K)
  • Ping ZENG, Qingping TAN, Xiankai MENG, Haoyu ZHANG, Jianjun XU
    Type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2018 Volume E101.D Issue 5 Pages 1393-1400
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Determining the validity of knowledge triples and filling in the missing entities or relationships in the knowledge graph are the crucial tasks for large-scale knowledge graph completion. So far, the main solutions use machine learning methods to learn the low-dimensional distributed representations of entities and relationships to complete the knowledge graph. Among them, translation models obtain excellent performance. However, the proposed translation models do not adequately consider the indirect relationships among entities, affecting the precision of the representation. Based on the long short-term memory neural network and existing translation models, we propose a multiple-module hybrid neural network model called TransP. By modeling the entity paths and their relationship paths, TransP can effectively excavate the indirect relationships among the entities, and thus, improve the quality of knowledge graph completion tasks. Experimental results show that TransP outperforms state-of-the-art models in the entity prediction task, and achieves the comparable performance with previous models in the relationship prediction task.

    Download PDF (467K)
  • Takahiro TANAKA, Kazuhiro FUJIKAKE, Takashi YONEKAWA, Misako YAMAGISHI ...
    Type: PAPER
    Subject area: Human-computer Interaction
    2018 Volume E101.D Issue 5 Pages 1401-1409
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    In recent years, the number of traffic accidents caused by elderly drivers has increased in Japan. However, cars are an important mode of transportation for the elderly. Therefore, to ensure safe driving, a system that can assist elderly drivers is required. We propose a driver-agent system that provides support to elderly drivers during and after driving and encourages them to improve their driving. This paper describes the prototype system and the analysis conducted of the teaching records of a human instructor, the impression caused by the instructions on a subject during driving, and subjective evaluation of the driver-agent system.

    Download PDF (1934K)
  • Ruisheng RAN, Bin FANG, Xuegang WU
    Type: PAPER
    Subject area: Pattern Recognition
    2018 Volume E101.D Issue 5 Pages 1410-1420
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Neighborhood preserving embedding is a widely used manifold reduced dimensionality technique. But NPE has to encounter two problems. One problem is that it suffers from the small-sample-size (SSS) problem. Another is that the performance of NPE is seriously sensitive to the neighborhood size k. To overcome the two problems, an exponential neighborhood preserving embedding (ENPE) is proposed in this paper. The main idea of ENPE is that the matrix exponential is introduced to NPE, then the SSS problem is avoided and low sensitivity to the neighborhood size k is gotten. The experiments are conducted on ORL, Georgia Tech and AR face database. The results show that, ENPE shows advantageous performance over other unsupervised methods, such as PCA, LPP, ELPP and NPE. Another is that ENPE is much less sensitive to the neighborhood parameter k contrasted with the unsupervised manifold learning methods LPP, ELPP and NPE.

    Download PDF (1082K)
  • Chen QU, Duyan BI
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2018 Volume E101.D Issue 5 Pages 1421-1429
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Focusing on the defects of famous defogging algorithms for fog images based on the atmosphere scattering model, we find that it is necessary to obtain accurate transmission map that can reflect the real depths both in large depth and close range. And it is hard to tackle this with just one prior because of the differences between the large depth and close range in foggy images. Hence, we propose a novel prior that simplifies the solution of transmission map by transferring coefficient, called saturation prior. Then, under the Random Walk model, we constrain the transferring coefficient with the color attenuation prior that can obtain good transmission map in large depth regions. More importantly, we design a regularization weight to balance the influences of saturation prior and color attenuation prior to the transferring coefficient. Experimental results demonstrate that the proposed defogging method outperforms the state-of-art image defogging methods based on single prior in terms of details restoring and color preserving.

    Download PDF (2225K)
  • Soh YOSHIDA, Takahiro OGAWA, Miki HASEYAMA, Mitsuji MUNEYASU
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2018 Volume E101.D Issue 5 Pages 1430-1440
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Video reranking is an effective way for improving the retrieval performance of text-based video search engines. This paper proposes a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking approach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise relevance scores between adjacent nodes with regard to a user's query. However, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos' neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method introduces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regularization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness.

    Download PDF (1436K)
  • Chunyan HOU, Chen CHEN, Jinsong WANG
    Type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2018 Volume E101.D Issue 5 Pages 1441-1444
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    In the era of e-commerce, purchase behavior prediction is one of the most important issues to promote both online companies' sales and the consumers' experience. The previous researches usually use the feature engineering and ensemble machine learning algorithms for the prediction. The performance really depends on designed features and the scalability of algorithms because the large-scale data and a lot of categorical features lead to huge samples and the high-dimensional feature. In this study, we explore an alternative to use tree-based Feature Transformation (FT) and simple machine learning algorithms (e.g. Logistic Regression). Random Forest (RF) and Gradient Boosting decision tree (GB) are used for FT. Then, the simple algorithm, rather than ensemble algorithms, is used to predict purchase behavior based on transformed features. Tree-based FT regards the leaves of trees as transformed features, and can learn high-order interactions among original features. Compared with RF, if GB is used for FT, simple algorithms are enough to achieve better performance.

    Download PDF (169K)
  • Motoko TACHIBANA, Kohei YAMAMOTO, Kurato MAENO
    Type: LETTER
    Subject area: Pattern Recognition
    2018 Volume E101.D Issue 5 Pages 1445-1448
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    Radar is expected in advanced driver-assistance systems for environmentally robust measurements. In this paper, we propose a novel radar signal segmentation method by using a complex-valued fully convolutional network (CvFCN) that comprises complex-valued layers, real-valued layers, and a bidirectional conversion layer between them. We also propose an efficient automatic annotation system for dataset generation. We apply the CvFCN to two-dimensional (2D) complex-valued radar signal maps (r-maps) that comprise angle and distance axes. An r-maps is a 2D complex-valued matrix that is generated from raw radar signals by 2D Fourier transformation. We annotate the r-maps automatically using LiDAR measurements. In our experiment, we semantically segment r-map signals into pedestrian and background regions, achieving accuracy of 99.7% for the background and 96.2% for pedestrians.

    Download PDF (510K)
  • Yinan LIU, Qingbo WU, Liangzhi TANG, Linfeng XU
    Type: LETTER
    Subject area: Pattern Recognition
    2018 Volume E101.D Issue 5 Pages 1449-1452
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.

    Download PDF (880K)
  • Zhong ZHANG, Hong WANG, Shuang LIU, Tariq S. DURRANI
    Type: LETTER
    Subject area: Image Recognition, Computer Vision
    2018 Volume E101.D Issue 5 Pages 1453-1456
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    A rich and robust representation for scene characters plays a significant role in automatically understanding the text in images. In this letter, we focus on the issue of feature representation, and propose a novel encoding method named bilateral convolutional activations encoded with Fisher vectors (BCA-FV) for scene character recognition. Concretely, we first extract convolutional activation descriptors from convolutional maps and then build a bilateral convolutional activation map (BCAM) to capture the relationship between the convolutional activation response and the spatial structure information. Finally, in order to obtain the global feature representation, the BCAM is injected into FV to encode convolutional activation descriptors. Hence, the BCA-FV can effectively integrate the prominent features and spatial structure information for character representation. We verify our method on two widely used databases (ICDAR2003 and Chars74K), and the experimental results demonstrate that our method achieves better results than the state-of-the-art methods. In addition, we further validate the proposed BCA-FV on the “Pan+ChiPhoto” database for Chinese scene character recognition, and the experimental results show the good generalization ability of the proposed BCA-FV.

    Download PDF (221K)
  • Yuki IMAEDA, Takatsugu HIRAYAMA, Yasutomo KAWANISHI, Daisuke DEGUCHI, ...
    Type: LETTER
    Subject area: Image Recognition, Computer Vision
    2018 Volume E101.D Issue 5 Pages 1457-1461
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS

    We propose an estimation method of pedestrian detectability considering the driver's visual adaptation to drastic illumination change, which has not been studied in previous works. We assume that driver's visual characteristics change in proportion to the elapsed time after illumination change. In this paper, as a solution, we construct multiple estimators corresponding to different elapsed periods, and estimate the detectability by switching them according to the elapsed period. To evaluate the proposed method, we construct an experimental setup to present a participant with illumination changes and conduct a preliminary simulated experiment to measure and estimate the pedestrian detectability according to the elapsed period. Results show that the proposed method can actually estimate the detectability accurately after a drastic illumination change.

    Download PDF (1012K)
  • Han-sung SON, JungHyun HAN
    Type: LETTER
    Subject area: Computer Graphics
    2018 Volume E101.D Issue 5 Pages 1462-1465
    Published: May 01, 2018
    Released: May 01, 2018
    JOURNALS FREE ACCESS
    Supplementary material

    This paper proposes to pre-compute approximate normal distribution functions and store them in textures such that real-time applications can process complex specular surfaces simply by sampling the textures. The proposed method is compatible with the GPU pipeline-based algorithms, and rendering is completed at real time. The experimental results show that the features of complex specular surfaces, such as the glinty appearance of leather and metallic flakes, are successfully reproduced.

    Download PDF (1992K)
feedback
Top