IEICE Transactions on Information and Systems

Special Section on Machine Vision and its Applications

FOREWORD

Norimichi UKITA

Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1221
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVF0001

JOURNAL FREE ACCESS

Download PDF (65K)
Training of CNN with Heterogeneous Learning for Multiple Pedestrian Attributes Recognition Using Rarity Rate

Hiroshi FUKUI, Takayoshi YAMASHITA, Yuji YAMAUCHI, Hironobu FUJIYOSHI, ...

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1222-1231
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0001

JOURNAL FREE ACCESS

Show abstractHide abstract

Pedestrian attribute information is important function for an advanced driver assistance system (ADAS). Pedestrian attributes such as body pose, face orientation and open umbrella indicate the intended action or state of the pedestrian. Generally, this information is recognized using independent classifiers for each task. Performing all of these separate tasks is too time-consuming at the testing stage. In addition, the processing time increases with increasing number of tasks. To address this problem, multi-task learning or heterogeneous learning is performed to train a single classifier to perform multiple tasks. In particular, heterogeneous learning is able to simultaneously train a classifier to perform regression and recognition tasks, which reduces both training and testing time. However, heterogeneous learning tends to result in a lower accuracy rate for classes with few training samples. In this paper, we propose a method to improve the performance of heterogeneous learning for such classes. We introduce a rarity rate based on the importance and class probability of each task. The appropriate rarity rate is assigned to each training sample. Thus, the samples in a mini-batch for training a deep convolutional neural network are augmented according to this rarity rate to focus on the classes with a few samples. Our heterogeneous learning approach with the rarity rate performs pedestrian attribute recognition better, especially for classes representing few training samples.

View full abstract

Download PDF (4095K)
Line-Based SLAM Using Non-Overlapping Cameras in an Urban Environment

Atsushi KAWASAKI, Kosuke HARA, Hideo SAITO

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1232-1242
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0006

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method of line-based Simultaneous Localization and Mapping (SLAM) using non-overlapping multiple cameras for vehicles running in an urban environment. It uses corresponding line segments between images taken by different frames and different cameras. The contribution is a novel line segment matching algorithm by warping processing based on urban structures. This idea significantly improves the accuracy of line segment matching when viewing direction are very different, so that a number of correspondences between front-view and rear-view cameras can be found and the accuracy of SLAM can be improved. Additionally, to enhance the accuracy of SLAM we apply a geometrical constraint of urban area for initial estimation of 3D mapping of line segments and optimization by bundle adjustment. We can further improve the accuracy of SLAM by combining points and lines. The position error is stable within 1.5m for the entire image dataset evaluated in this paper. The estimation accuracy of our method is as high as that of ground truth captured by RTK-GPS. Our high accuracy SLAM algorithm can be apply for generating a road map represented by line segments. According to an evaluation of our generating map, true positive rate around the vehicle exceeding 70% is achieved.

View full abstract

Download PDF (5704K)
Real-Time Color Image Improvement System for Visual Testing of Nuclear Reactors

Naoki HOSOYA, Atsushi MIYAMOTO, Junichiro NAGANUMA

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1243-1250
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0007

JOURNAL FREE ACCESS

Show abstractHide abstract

Nuclear power plants require in-vessel inspections for soundness checks and preventive maintenance. One inspection procedure is visual testing (VT), which is based on video images of an underwater camera in a nuclear reactor. However, a lot of noise is superimposed on VT images due to radiation exposure. We propose a technique for improving the quality of those images by image processing that reduces radiation noise and enhances signals. Real-time video processing was achieved by applying the proposed technique with a parallel processing unit. Improving the clarity of VT images will lead to reducing the burden on inspectors.

View full abstract

Download PDF (1830K)
Multi-Peak Estimation for Real-Time 3D Ping-Pong Ball Tracking with Double-Queue Based GPU Acceleration

Ziwei DENG, Yilin HOU, Xina CHENG, Takeshi IKENAGA

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1251-1259
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0010

JOURNAL FREE ACCESS

Show abstractHide abstract

3D ball tracking is of great significance in ping-pong game analysis, which can be utilized to applications such as TV contents and tactic analysis, with some of them requiring real-time implementation. This paper proposes a CPU-GPU platform based Particle Filter for multi-view ball tracking including 4 proposals. The multi-peak estimation and the ball-like observation model are proposed in the algorithm design. The multi-peak estimation aims at obtaining a precise ball position in case the particles' likelihood distribution has multiple peaks under complex circumstances. The ball-like observation model with 4 different likelihood evaluation, utilizes the ball's unique features to evaluate the particle's similarity with the target. In the GPU implementation, the double-queue structure and the vectorized data combination are proposed. The double-queue structure aims at achieving task parallelism between some data-independent tasks. The vectorized data combination reduces the time cost in memory access by combining 3 different image data to 1 vector data. Experiments are based on ping-pong videos recorded in an official match taken by 4 cameras located in 4 corners of the court. The tracking success rate reaches 99.59% on CPU. With the GPU acceleration, the time consumption is 8.8 ms/frame, which is sped up by a factor of 98 compared with its CPU version.

View full abstract

Download PDF (1907K)
Pixel Selection and Intensity Directed Symmetry for High Frame Rate and Ultra-Low Delay Matching System

Tingting HU, Takeshi IKENAGA

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1260-1269
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0012

JOURNAL FREE ACCESS

Show abstractHide abstract

High frame rate and ultra-low delay matching system plays an increasingly important role in human-machine interactive applications which call for higher frame rate and lower delay for a better experience. The large amount of processing data and the complex computation in a local feature based matching system, make it difficult to achieve a high process speed and ultra-low delay matching with limited resource. Aiming at a matching system with the process speed of more than 1000 fps and with the delay of less than 1 ms/frame, this paper puts forward a local binary feature based matching system with field-programmable gate array (FPGA). Pixel selection based 4-1-4 parallel matching and intensity directed symmetry are proposed for the implementation of this system. To design a basic framework with the high process speed and ultra-low delay using limited resource, pixel selection based 4-1-4 parallel matching is proposed, which makes it possible to use only one-thread resource consumption to achieve a four-thread processing. Assumes that the orientation of the keypoint will bisect the patch best and will point to the region with high intensity, intensity directed symmetry is proposed to calculate the keypoint orientation in a hardware friendly way, which is an important part for a rotation-robust matching system. Software experiment result shows that the proposed keypoint orientation calculation method achieves almost the same performance with the state-of-art intensity centroid orientation calculation method in a matching system. Hardware experiment result shows that the designed image process core supports to process VGA (640×480) videos at a process speed of 1306 fps and with a delay of 0.8083 ms/frame.

View full abstract

Download PDF (3048K)
Object Specific Deep Feature for Face Detection

Xianxu HOU, Jiasong ZHU, Ke SUN, Linlin SHEN, Guoping QIU

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1270-1277
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0014

JOURNAL FREE ACCESS

Show abstractHide abstract

Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce object specific channel (OSC) and systematically identifying it for the human faces has been developed. In this paper, we introduce a multi-scale approach to constructing robust face heatmaps based on OSC features for rapidly filtering out non-face regions thus significantly improving search efficiency for face detection. We show that multi-scale OSC can be used to develop simple and compact face detectors in unconstrained settings with state of the art performance.

View full abstract

Download PDF (5850K)
Point of Gaze Estimation Using Corneal Surface Reflection and Omnidirectional Camera Image

Taishi OGAWA, Atsushi NAKAZAWA, Toyoaki NISHIDA

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1278-1287
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0020

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a human point of gaze estimation system using corneal surface reflection and omnidirectional image taken by spherical panorama cameras, which becomes popular recent years. Our system enables to find where a user is looking at only from an eye image in a 360° surrounding scene image, thus, does not need gaze mapping from partial scene images to a whole scene image that are necessary in conventional eye gaze tracking system. We first generate multiple perspective scene images from an omnidirectional (equirectangular) image and perform registration between the corneal reflection and perspective images using a corneal reflection-scene image registration technique. We then compute the point of gaze using a corneal imaging technique leveraged by a 3D eye model, and project the point to an omnidirectional image. The 3D eye pose is estimate by using the particle-filter-based tracking algorithm. In experiments, we evaluated the accuracy of the 3D eye pose estimation, robustness of registration and accuracy of PoG estimations using two indoor and five outdoor scenes, and found that gaze mapping error was 5.546 [deg] on average.

View full abstract

Download PDF (12586K)
Accelerating Existing Non-Blind Image Deblurring Techniques through a Strap-On Limited-Memory Switched Broyden Method

Ichraf LAHOULI, Robby HAELTERMAN, Joris DEGROOTE, Michal SHIMONI, Geer ...

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1288-1295
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0022

JOURNAL FREE ACCESS

Show abstractHide abstract

Video surveillance from airborne platforms can suffer from many sources of blur, like vibration, low-end optics, uneven lighting conditions, etc. Many different algorithms have been developed in the past that aim to recover the deblurred image but often incur substantial CPU-time, which is not always available on-board. This paper shows how a “strap-on” quasi-Newton method can accelerate the convergence of existing iterative methods with little extra overhead while keeping the performance of the original algorithm, thus paving the way for (near) real-time applications using on-board processing.

View full abstract

Download PDF (2243K)
Superimposing Thermal-Infrared Data on 3D Structure Reconstructed by RGB Visual Odometry

Masahiro YAMAGUCHI, Trong Phuc TRUONG, Shohei MORI, Vincent NOZICK, Hi ...

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1296-1307
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0023

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a method to generate a three-dimensional (3D) thermal map and RGB + thermal (RGB-T) images of a scene from thermal-infrared and RGB images. The scene images are acquired by moving both a RGB camera and an thermal-infrared camera mounted on a stereo rig. Before capturing the scene with those cameras, we estimate their respective intrinsic parameters and their relative pose. Then, we reconstruct the 3D structures of the scene by using Direct Sparse Odometry (DSO) using the RGB images. In order to superimpose thermal information onto each point generated from DSO, we propose a method for estimating the scale of the point cloud corresponding to the extrinsic parameters between both cameras by matching depth images recovered from the RGB camera and the thermal-infrared camera based on mutual information. We also generate RGB-T images using the 3D structure of the scene and Delaunay triangulation. We do not rely on depth cameras and, therefore, our technique is not limited to scenes within the measurement range of the depth cameras. To demonstrate this technique, we generate 3D thermal maps and RGB-T images for both indoor and outdoor scenes.

View full abstract

Download PDF (4816K)
Simultaneous Object Segmentation and Recognition by Merging CNN Outputs from Uniformly Distributed Multiple Viewpoints

Yoshikatsu NAKAJIMA, Hideo SAITO

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1308-1316
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0024

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel object recognition system that is able to (i) work in real-time while reconstructing segmented 3D maps and simultaneously recognize objects in a scene, (ii) manage various kinds of objects, including those with smooth surfaces and those with a large number of categories, utilizing a CNN for feature extraction, and (iii) maintain high accuracy no matter how the camera moves by distributing the viewpoints for each object uniformly and aggregating recognition results from each distributed viewpoint as the same weight. Through experiments, the advantages of our system with respect to current state-of-the-art object recognition approaches are demonstrated on the UW RGB-D Dataset and Scenes and on our own scenes prepared to verify the effectiveness of the Viewpoint-Class-based approach.

View full abstract

Download PDF (7685K)
Multicultural Facial Expression Recognition Based on Differences of Western-Caucasian and East-Asian Facial Expressions of Emotions

Gibran BENITEZ-GARCIA, Tomoaki NAKAMURA, Masahide KANEKO

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1317-1324
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0025

JOURNAL FREE ACCESS

Show abstractHide abstract

An increasing number of psychological studies have demonstrated that the six basic expressions of emotions are not culturally universal. However, automatic facial expression recognition (FER) systems disregard these findings and assume that facial expressions are universally expressed and recognized across different cultures. Therefore, this paper presents an analysis of Western-Caucasian and East-Asian facial expressions of emotions based on visual representations and cross-cultural FER. The visual analysis builds on the Eigenfaces method, and the cross-cultural FER combines appearance and geometric features by extracting Local Fourier Coefficients (LFC) and Facial Fourier Descriptors (FFD) respectively. Furthermore, two possible solutions for FER under multicultural environments are proposed. These are based on an early race detection, and independent models for culture-specific facial expressions found by the analysis evaluation. HSV color quantization combined with LFC and FFD compose the feature extraction for race detection, whereas culture-independent models of anger, disgust and fear are analyzed for the second solution. All tests were performed using Support Vector Machines (SVM) for classification and evaluated using five standard databases. Experimental results show that both solutions overcome the accuracy of FER systems under multicultural environments. However, the approach which individually considers the culture-specific facial expressions achieved the highest recognition rate.

View full abstract

Download PDF (3798K)
Extraction and Recognition of Shoe Logos with a Wide Variety of Appearance Using Two-Stage Classifiers

Kazunori AOKI, Wataru OHYAMA, Tetsushi WAKABAYASHI

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1325-1332
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0026

JOURNAL FREE ACCESS

Show abstractHide abstract

A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products; that is, there is much within-class variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearance using a limited number of training samples. The proposed method employs maximally stable extremal regions for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearances show that the proposed method achieves promising performance for both logo extraction and recognition.

View full abstract

Download PDF (1282K)
Image-Based Food Calorie Estimation Using Recipe Information

Takumi EGE, Keiji YANAI

Article type: PAPER
Subject area: Machine Vision and its Applications
2018Volume E101.DIssue 5 Pages 1333-1341
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017MVP0027

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.

View full abstract

Download PDF (2863K)

Regular Section

Long-Term Tracking Based on Multi-Feature Adaptive Fusion for Video Target

Hainan ZHANG, Yanjing SUN, Song LI, Wenjuan SHI, Chenglong FENG

Article type: PAPER
Subject area: Fundamentals of Information Systems
2018Volume E101.DIssue 5 Pages 1342-1349
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7245

JOURNAL FREE ACCESS

Show abstractHide abstract

The correlation filter-based trackers with an appearance model established by single feature have poor robustness to challenging video environment which includes factors such as occlusion, fast motion and out-of-view. In this paper, a long-term tracking algorithm based on multi-feature adaptive fusion for video target is presented. We design a robust appearance model by fusing powerful features including histogram of gradient, local binary pattern and color-naming at response map level to conquer the interference in the video. In addition, a random fern classifier is trained as re-detector to detect target when tracking failure occurs, so that long-term tracking is implemented. We evaluate our algorithm on large-scale benchmark datasets and the results show that the proposed algorithm have more accurate and more robust performance in complex video environment.

View full abstract

Download PDF (1363K)
A Hardware-Based Caching System on FPGA NIC for Blockchain

Yuma SAKAKIBARA, Shin MORISHIMA, Kohei NAKAMURA, Hiroki MATSUTANI

Article type: PAPER
Subject area: Computer System
2018Volume E101.DIssue 5 Pages 1350-1360
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7290

JOURNAL FREE ACCESS

Show abstractHide abstract

Engineers and researchers have recently paid attention to Blockchain. Blockchain is a fault-tolerant distributed ledger without administrators. Blockchain is originally derived from cryptocurrency, but it is possible to be applied to other industries. Transferring digital asset is called a transaction. Blockchain holds all transactions, so the total amount of Blockchain data will increase as time proceeds. On the other hand, the number of Internet of Things (IoT) products has been increasing. It is difficult for IoT products to hold all Blockchain data because of their storage capacity. Therefore, they access Blockchain data via servers that have Blockchain data. However, if a lot of IoT products access Blockchain network via servers, server overloads will occur. Thus, it is useful to reduce workloads and improve throughput. In this paper, we propose a caching technique using a Field Programmable Gate Array-based (FPGA) Network Interface Card (NIC) which possesses four 10Gigabit Ethernet (10GbE) interfaces. The proposed system can reduce server overloads, because the FPGA NIC instead of the server responds to requests from IoT products if cache hits. We implemented the proposed hardware cache to achieve high throughput on NetFPGA-10G board. We counted the number of requests that the server or the FPGA NIC processed as an evaluation. As a result, the throughput improved by on average 1.97 times when hitting the cache.

View full abstract

Download PDF (5421K)
A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition

Li QUAN, Zhi-liang WANG, Xin LIU

Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2018Volume E101.DIssue 5 Pages 1361-1369
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7131

JOURNAL FREE ACCESS

Show abstractHide abstract

Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.

View full abstract

Download PDF (1365K)
Detecting Malware-Infected Devices Using the HTTP Header Patterns

Sho MIZUNO, Mitsuhiro HATADA, Tatsuya MORI, Shigeki GOTO

Article type: PAPER
Subject area: Information Network
2018Volume E101.DIssue 5 Pages 1370-1379
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7294

JOURNAL FREE ACCESS

Show abstractHide abstract

Damage caused by malware has become a serious problem. The recent rise in the spread of evasive malware has made it difficult to detect it at the pre-infection timing. Malware detection at post-infection timing is a promising approach that fulfills this gap. Given this background, this work aims to identify likely malware-infected devices from the measurement of Internet traffic. The advantage of the traffic-measurement-based approach is that it enables us to monitor a large number of endhosts. If we find an endhost as a source of malicious traffic, the endhost is likely a malware-infected device. Since the majority of malware today makes use of the web as a means to communicate with the C&C servers that reside on the external network, we leverage information recorded in the HTTP headers to discriminate between malicious and benign traffic. To make our approach scalable and robust, we develop the automatic template generation scheme that drastically reduces the amount of information to be kept while achieving the high accuracy of classification; since it does not make use of any domain knowledge, the approach should be robust against changes of malware. We apply several classifiers, which include machine learning algorithms, to the extracted templates and classify traffic into two categories: malicious and benign. Our extensive experiments demonstrate that our approach discriminates between malicious and benign traffic with up to 97.1% precision while maintaining the false positive rate below 1.0%.

View full abstract

Download PDF (1345K)
Retweeting Prediction Based on Social Hotspots and Dynamic Tensor Decomposition

Qian LI, Xiaojuan LI, Bin WU, Yunpeng XIAO

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2018Volume E101.DIssue 5 Pages 1380-1392
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7364

JOURNAL FREE ACCESS

Show abstractHide abstract

In social networks, predicting user behavior under social hotspots can aid in understanding the development trend of a topic. In this paper, we propose a retweeting prediction method for social hotspots based on tensor decomposition, using user information, relationship and behavioral data. The method can be used to predict the behavior of users and analyze the evolvement of topics. Firstly, we propose a tensor-based mechanism for mining user interaction, and then we propose that the tensor be used to solve the problem of inaccuracy that arises when interactively calculating intensity for sparse user interaction data. At the same time, we can analyze the influence of the following relationship on the interaction between users based on characteristics of the tensor in data space conversion and projection. Secondly, time decay function is introduced for the tensor to quantify further the evolution of user behavior in current social hotspots. That function can be fit to the behavior of a user dynamically, and can also solve the problem of interaction between users with time decay. Finally, we invoke time slices and discretization of the topic life cycle and construct a user retweeting prediction model based on logistic regression. In this way, we can both explore the temporal characteristics of user behavior in social hotspots and also solve the problem of uneven interaction behavior between users. Experiments show that the proposed method can improve the accuracy of user behavior prediction effectively and aid in understanding the development trend of a topic.

View full abstract

Download PDF (1829K)
Modeling Complex Relationship Paths for Knowledge Graph Completion

Ping ZENG, Qingping TAN, Xiankai MENG, Haoyu ZHANG, Jianjun XU

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2018Volume E101.DIssue 5 Pages 1393-1400
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7398

JOURNAL FREE ACCESS

Show abstractHide abstract

Determining the validity of knowledge triples and filling in the missing entities or relationships in the knowledge graph are the crucial tasks for large-scale knowledge graph completion. So far, the main solutions use machine learning methods to learn the low-dimensional distributed representations of entities and relationships to complete the knowledge graph. Among them, translation models obtain excellent performance. However, the proposed translation models do not adequately consider the indirect relationships among entities, affecting the precision of the representation. Based on the long short-term memory neural network and existing translation models, we propose a multiple-module hybrid neural network model called TransP. By modeling the entity paths and their relationship paths, TransP can effectively excavate the indirect relationships among the entities, and thus, improve the quality of knowledge graph completion tasks. Experimental results show that TransP outperforms state-of-the-art models in the entity prediction task, and achieves the comparable performance with previous models in the relationship prediction task.

View full abstract

Download PDF (467K)
Study on Driver Agent Based on Analysis of Driving Instruction Data — Driver Agent for Encouraging Safe Driving Behavior (1) —

Takahiro TANAKA, Kazuhiro FUJIKAKE, Takashi YONEKAWA, Misako YAMAGISHI ...

Article type: PAPER
Subject area: Human-computer Interaction
2018Volume E101.DIssue 5 Pages 1401-1409
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7203

JOURNAL FREE ACCESS

Show abstractHide abstract

In recent years, the number of traffic accidents caused by elderly drivers has increased in Japan. However, cars are an important mode of transportation for the elderly. Therefore, to ensure safe driving, a system that can assist elderly drivers is required. We propose a driver-agent system that provides support to elderly drivers during and after driving and encourages them to improve their driving. This paper describes the prototype system and the analysis conducted of the teaching records of a human instructor, the impression caused by the instructions on a subject during driving, and subjective evaluation of the driver-agent system.

View full abstract

Download PDF (1934K)
Exponential Neighborhood Preserving Embedding for Face Recognition

Ruisheng RAN, Bin FANG, Xuegang WU

Article type: PAPER
Subject area: Pattern Recognition
2018Volume E101.DIssue 5 Pages 1410-1420
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7259

JOURNAL FREE ACCESS

Show abstractHide abstract

Neighborhood preserving embedding is a widely used manifold reduced dimensionality technique. But NPE has to encounter two problems. One problem is that it suffers from the small-sample-size (SSS) problem. Another is that the performance of NPE is seriously sensitive to the neighborhood size k. To overcome the two problems, an exponential neighborhood preserving embedding (ENPE) is proposed in this paper. The main idea of ENPE is that the matrix exponential is introduced to NPE, then the SSS problem is avoided and low sensitivity to the neighborhood size k is gotten. The experiments are conducted on ORL, Georgia Tech and AR face database. The results show that, ENPE shows advantageous performance over other unsupervised methods, such as PCA, LPP, ELPP and NPE. Another is that ENPE is much less sensitive to the neighborhood parameter k contrasted with the unsupervised manifold learning methods LPP, ELPP and NPE.

View full abstract

Download PDF (1082K)
Novel Defogging Algorithm Based on the Joint Use of Saturation and Color Attenuation Prior

Chen QU, Duyan BI

Article type: PAPER
Subject area: Image Processing and Video Processing
2018Volume E101.DIssue 5 Pages 1421-1429
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7260

JOURNAL FREE ACCESS

Show abstractHide abstract

Focusing on the defects of famous defogging algorithms for fog images based on the atmosphere scattering model, we find that it is necessary to obtain accurate transmission map that can reflect the real depths both in large depth and close range. And it is hard to tackle this with just one prior because of the differences between the large depth and close range in foggy images. Hence, we propose a novel prior that simplifies the solution of transmission map by transferring coefficient, called saturation prior. Then, under the Random Walk model, we constrain the transferring coefficient with the color attenuation prior that can obtain good transmission map in large depth regions. More importantly, we design a regularization weight to balance the influences of saturation prior and color attenuation prior to the transferring coefficient. Experimental results demonstrate that the proposed defogging method outperforms the state-of-art image defogging methods based on single prior in terms of details restoring and color preserving.

View full abstract

Download PDF (2225K)
Graph-Based Video Search Reranking with Local and Global Consistency Analysis

Soh YOSHIDA, Takahiro OGAWA, Miki HASEYAMA, Mitsuji MUNEYASU

Article type: PAPER
Subject area: Image Processing and Video Processing
2018Volume E101.DIssue 5 Pages 1430-1440
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDP7277

JOURNAL FREE ACCESS

Show abstractHide abstract

Video reranking is an effective way for improving the retrieval performance of text-based video search engines. This paper proposes a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking approach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise relevance scores between adjacent nodes with regard to a user's query. However, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos' neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method introduces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regularization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness.

View full abstract

Download PDF (1436K)
Tree-Based Feature Transformation for Purchase Behavior Prediction

Chunyan HOU, Chen CHEN, Jinsong WANG

Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2018Volume E101.DIssue 5 Pages 1441-1444
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDL8210

JOURNAL FREE ACCESS

Show abstractHide abstract

In the era of e-commerce, purchase behavior prediction is one of the most important issues to promote both online companies' sales and the consumers' experience. The previous researches usually use the feature engineering and ensemble machine learning algorithms for the prediction. The performance really depends on designed features and the scalability of algorithms because the large-scale data and a lot of categorical features lead to huge samples and the high-dimensional feature. In this study, we explore an alternative to use tree-based Feature Transformation (FT) and simple machine learning algorithms (e.g. Logistic Regression). Random Forest (RF) and Gradient Boosting decision tree (GB) are used for FT. Then, the simple algorithm, rather than ensemble algorithms, is used to predict purchase behavior based on transformed features. Tree-based FT regards the leaves of trees as transformed features, and can learn high-order interactions among original features. Compared with RF, if GB is used for FT, simple algorithms are enough to achieve better performance.

View full abstract

Download PDF (169K)
Complex-Valued Fully Convolutional Networks for MIMO Radar Signal Segmentation

Motoko TACHIBANA, Kohei YAMAMOTO, Kurato MAENO

Article type: LETTER
Subject area: Pattern Recognition
2018Volume E101.DIssue 5 Pages 1445-1448
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDL8214

JOURNAL FREE ACCESS

Show abstractHide abstract

Radar is expected in advanced driver-assistance systems for environmentally robust measurements. In this paper, we propose a novel radar signal segmentation method by using a complex-valued fully convolutional network (CvFCN) that comprises complex-valued layers, real-valued layers, and a bidirectional conversion layer between them. We also propose an efficient automatic annotation system for dataset generation. We apply the CvFCN to two-dimensional (2D) complex-valued radar signal maps (r-maps) that comprise angle and distance axes. An r-maps is a 2D complex-valued matrix that is generated from raw radar signals by 2D Fourier transformation. We annotate the r-maps automatically using LiDAR measurements. In our experiment, we semantically segment r-map signals into pedestrian and background regions, achieving accuracy of 99.7% for the background and 96.2% for pedestrians.

View full abstract

Download PDF (510K)
Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

Yinan LIU, Qingbo WU, Liangzhi TANG, Linfeng XU

Article type: LETTER
Subject area: Pattern Recognition
2018Volume E101.DIssue 5 Pages 1449-1452
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2018EDL8013

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.

View full abstract

Download PDF (880K)
Bilateral Convolutional Activations Encoded with Fisher Vectors for Scene Character Recognition

Zhong ZHANG, Hong WANG, Shuang LIU, Tariq S. DURRANI

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2018Volume E101.DIssue 5 Pages 1453-1456
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDL8238

JOURNAL FREE ACCESS

Show abstractHide abstract

A rich and robust representation for scene characters plays a significant role in automatically understanding the text in images. In this letter, we focus on the issue of feature representation, and propose a novel encoding method named bilateral convolutional activations encoded with Fisher vectors (BCA-FV) for scene character recognition. Concretely, we first extract convolutional activation descriptors from convolutional maps and then build a bilateral convolutional activation map (BCAM) to capture the relationship between the convolutional activation response and the spatial structure information. Finally, in order to obtain the global feature representation, the BCAM is injected into FV to encode convolutional activation descriptors. Hence, the BCA-FV can effectively integrate the prominent features and spatial structure information for character representation. We verify our method on two widely used databases (ICDAR2003 and Chars74K), and the experimental results demonstrate that our method achieves better results than the state-of-the-art methods. In addition, we further validate the proposed BCA-FV on the “Pan+ChiPhoto” database for Chinese scene character recognition, and the experimental results show the good generalization ability of the proposed BCA-FV.

View full abstract

Download PDF (221K)
Pedestrian Detectability Estimation Considering Visual Adaptation to Drastic Illumination Change

Yuki IMAEDA, Takatsugu HIRAYAMA, Yasutomo KAWANISHI, Daisuke DEGUCHI, ...

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2018Volume E101.DIssue 5 Pages 1457-1461
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDL8215

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose an estimation method of pedestrian detectability considering the driver's visual adaptation to drastic illumination change, which has not been studied in previous works. We assume that driver's visual characteristics change in proportion to the elapsed time after illumination change. In this paper, as a solution, we construct multiple estimators corresponding to different elapsed periods, and estimate the detectability by switching them according to the elapsed period. To evaluate the proposed method, we construct an experimental setup to present a participant with illumination changes and conduct a preliminary simulated experiment to measure and estimate the pedestrian detectability according to the elapsed period. Results show that the proposed method can actually estimate the detectability accurately after a drastic illumination change.

View full abstract

Download PDF (1012K)
Real-Time Approximation of a Normal Distribution Function for Normal-Mapped Surfaces

Han-sung SON, JungHyun HAN

Article type: LETTER
Subject area: Computer Graphics
2018Volume E101.DIssue 5 Pages 1462-1465
Published: May 01, 2018
Released on J-STAGE: May 01, 2018

DOIhttps://doi.org/10.1587/transinf.2017EDL8212

JOURNAL FREE ACCESS
Supplementary material

Show abstractHide abstract

This paper proposes to pre-compute approximate normal distribution functions and store them in textures such that real-time applications can process complex specular surfaces simply by sampling the textures. The proposed method is compatible with the GPU pipeline-based algorithms, and rendering is completed at real time. The experimental results show that the features of complex specular surfaces, such as the glinty appearance of leather and metallic flakes, are successfully reproduced.

View full abstract

Download PDF (1992K)

Register with J-STAGE for free!