IEICE Transactions on Information and Systems

Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications

FOREWORD

Chi-Hua CHEN

2023 年E106.D 巻5 号 p. 579-580
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLF0001

ジャーナルフリー

PDF形式でダウンロード (73K)
A Visual Question Answering Network Merging High- and Low-Level Semantic Information

Huimin LI, Dezhi HAN, Chongqing CHEN, Chin-Chen CHANG, Kuan-Ching LI, ...

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 581-589
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0002

ジャーナルフリー

抄録を表示する抄録を非表示にする

Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

抄録全体を表示

PDF形式でダウンロード (1092K)
The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification

Wujian YE, Run TAN, Yijun LIU, Chin-Chen CHANG

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 590-600
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0006

ジャーナルフリー

抄録を表示する抄録を非表示にする

Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.

抄録全体を表示

PDF形式でダウンロード (4400K)
A Novel Differential Evolution Algorithm Based on Local Fitness Landscape Information for Optimization Problems

Jing LIANG, Ke LI, Kunjie YU, Caitong YUE, Yaxin LI, Hui SONG

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 601-616
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0010

ジャーナルフリー

抄録を表示する抄録を非表示にする

The selection of mutation strategy greatly affects the performance of differential evolution algorithm (DE). For different types of optimization problems, different mutation strategies should be selected. How to choose a suitable mutation strategy for different problems is a challenging task. To deal with this challenge, this paper proposes a novel DE algorithm based on local fitness landscape, called FLIDE. In the proposed method, fitness landscape information is obtained to guide the selection of mutation operators. In this way, different problems can be solved with proper evolutionary mechanisms. Moreover, a population adjustment method is used to balance the search ability and population diversity. On one hand, the diversity of the population in the early stage is enhanced with a relative large population. One the other hand, the computational cost is reduced in the later stage with a relative small population. The evolutionary information is utilized as much as possible to guide the search direction. The proposed method is compared with five popular algorithms on 30 test functions with different characteristics. Experimental results show that the proposed FLIDE is more effective on problems with high dimensions.

抄録全体を表示

PDF形式でダウンロード (2932K)
Effectively Utilizing the Category Labels for Image Captioning

Junlong FENG, Jianping ZHAO

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 617-624
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0013

ジャーナルフリー

抄録を表示する抄録を非表示にする

As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

抄録全体を表示

PDF形式でダウンロード (1161K)
A Novel SSD-Based Detection Algorithm Suitable for Small Object

Xi ZHANG, Yanan ZHANG, Tao GAO, Yong FANG, Ting CHEN

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 625-634
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0037

ジャーナルフリー

抄録を表示する抄録を非表示にする

The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.

抄録全体を表示

PDF形式でダウンロード (2269K)
Deep Reinforcement Learning Based Ontology Meta-Matching Technique

Xingsi XUE, Yirui HUANG, Zeqing ZHANG

原稿種別: PAPER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 635-643
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0050

ジャーナルフリー

抄録を表示する抄録を非表示にする

Ontologies are regarded as the solution to data heterogeneity on the Semantic Web (SW), but they also suffer from the heterogeneity problem, which leads to the ambiguity of data information. Ontology Meta-Matching technique (OMM) is able to solve the ontology heterogeneity problem through aggregating various similarity measures to find the heterogeneous entities. Inspired by the success of Reinforcement Learning (RL) in solving complex optimization problems, this work proposes a RL-based OMM technique to address the ontology heterogeneity problem. First, we propose a novel RL-based OMM framework, and then, a neural network that is called evaluated network is proposed to replace the Q table when we choose the next action of the agent, which is able to reduce memory consumption and computing time. After that, to better guide the training of neural network and improve the accuracy of RL agent, we establish a memory bank to mine depth information during the evaluated network's training procedure, and we use another neural network that is called target network to save the historical parameters. The experiment uses the famous benchmark in ontology matching domain to test our approach's performance, and the comparisons among Deep Reinforcement Learning(DRL), RL and state-of-the-art ontology matching systems show that our approach is able to effectively determine high-quality alignments.

抄録全体を表示

PDF形式でダウンロード (652K)
Intelligent Tool Condition Monitoring Based on Multi-Scale Convolutional Recurrent Neural Network

Xincheng CAO, Bin YAO, Binqiang CHEN, Wangpeng HE, Suqin GUO, Kun CHEN

原稿種別: PAPER
専門分野: Smart Industry
2023 年E106.D 巻5 号 p. 644-652
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0043

ジャーナルフリー

抄録を表示する抄録を非表示にする

Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.

抄録全体を表示

PDF形式でダウンロード (1537K)
Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet

Wen LIU, Yixiao SHAO, Shihong ZHAI, Zhao YANG, Peishuai CHEN

原稿種別: PAPER
専門分野: Smart Industry
2023 年E106.D 巻5 号 p. 653-661
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0045

ジャーナルフリー

抄録を表示する抄録を非表示にする

Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.

抄録全体を表示

PDF形式でダウンロード (1551K)
An Improved Insulator and Spacer Detection Algorithm Based on Dual Network and SSD

Yong LI, Shidi WEI, Xuan LIU, Yinzheng LUO, Yafeng LI, Feng SHUANG

原稿種別: PAPER
専門分野: Smart Industry
2023 年E106.D 巻5 号 p. 662-672
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0062

ジャーナルフリー

抄録を表示する抄録を非表示にする

The traditional manual inspection is gradually replaced by the unmanned aerial vehicles (UAV) automatic inspection. However, due to the limited computational resources carried by the UAV, the existing deep learning-based algorithm needs a large amount of computational resources, which makes it impossible to realize the online detection. Moreover, there is no effective online detection system at present. To realize the high-precision online detection of electrical equipment, this paper proposes an SSD (Single Shot Multibox Detector) detection algorithm based on the improved Dual network for the images of insulators and spacers taken by UAVs. The proposed algorithm uses MnasNet and MobileNetv3 to form the Dual network to extract multi-level features, which overcomes the shortcoming of single convolutional network-based backbone for feature extraction. Then the features extracted from the two networks are fused together to obtain the features with high-level semantic information. Finally, the proposed algorithm is tested on the public dataset of the insulator and spacer. The experimental results show that the proposed algorithm can detect insulators and spacers efficiently. Compared with other methods, the proposed algorithm has the advantages of smaller model size and higher accuracy. The object detection accuracy of the proposed method is up to 95.1%.

抄録全体を表示

PDF形式でダウンロード (2564K)
Image-to-Image Translation for Data Augmentation on Multimodal Medical Images

Yue PENG, Zuqiang MENG, Lina YANG

原稿種別: PAPER
専門分野: Smart Healthcare
2023 年E106.D 巻5 号 p. 686-696
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0008

ジャーナルフリー

抄録を表示する抄録を非表示にする

Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.

抄録全体を表示

PDF形式でダウンロード (1884K)
MolHF: Molecular Heterogeneous Attributes Fusion for Drug-Target Affinity Prediction on Heterogeneity

Runze WANG, Zehua ZHANG, Yueqin ZHANG, Zhongyuan JIANG, Shilin SUN, Gu ...

原稿種別: PAPER
専門分野: Smart Healthcare
2023 年E106.D 巻5 号 p. 697-706
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0023

ジャーナルフリー

抄録を表示する抄録を非表示にする

Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.

抄録全体を表示

PDF形式でダウンロード (2969K)
The Effectiveness of Data Augmentation for Mature White Blood Cell Image Classification in Deep Learning — Selection of an Optimal Technique for Hematological Morphology Recognition —

Hiroyuki NOZAKA, Kosuke KAMATA, Kazufumi YAMAGATA

原稿種別: PAPER
専門分野: Smart Healthcare
2023 年E106.D 巻5 号 p. 707-714
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0066

ジャーナルフリー

抄録を表示する抄録を非表示にする

The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.

抄録全体を表示

PDF形式でダウンロード (2603K)
Fish Detecting Using YOLOv4 and CVAE in Aquaculture Ponds with a Non-Uniform Strong Reflection Background

Meng ZHAO, Junfeng WU, Hong YU, Haiqing LI, Jingwen XU, Siqi CHENG, Li ...

原稿種別: PAPER
専門分野: Smart Agriculture
2023 年E106.D 巻5 号 p. 715-725
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLK0001

ジャーナルフリー

抄録を表示する抄録を非表示にする

Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.

抄録全体を表示

PDF形式でダウンロード (1639K)
Detection Method of Fat Content in Pig B-Ultrasound Based on Deep Learning

Wenxin DONG, Jianxun ZHANG, Shuqiu TAN, Xinyue ZHANG

原稿種別: PAPER
専門分野: Smart Agriculture
2023 年E106.D 巻5 号 p. 726-734
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0022

ジャーナルフリー

抄録を表示する抄録を非表示にする

In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.

抄録全体を表示

PDF形式でダウンロード (1675K)
Compression of Vehicle and Pedestrian Detection Network Based on YOLOv3 Model

Lie GUO, Yibing ZHAO, Jiandong GAO

原稿種別: PAPER
専門分野: Intelligent Transportation Systems
2023 年E106.D 巻5 号 p. 735-745
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0021

ジャーナルフリー

抄録を表示する抄録を非表示にする

The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.

抄録全体を表示

PDF形式でダウンロード (2861K)
Dynamic Evolution Simulation of Bus Bunching Affected by Traffic Operation State

Shaorong HU, Yuqi ZHANG, Yuefei JIN, Ziqi DOU

原稿種別: PAPER
専門分野: Intelligent Transportation Systems
2023 年E106.D 巻5 号 p. 746-755
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0047

ジャーナルフリー

抄録を表示する抄録を非表示にする

Bus bunching often occurs in public transit system, resulting in a series of problems such as poor punctuality, long waiting time and low service quality. In this paper, we explore the influence of the discrete distribution of traffic operation state on the dynamic evolution of bus bunching. Firstly, we use self-organizing map (SOM) to find the threshold of bus bunching and analyze the factors that affect bus bunching based on GPS data of No. 600 bus line in Xi'an. Then, taking the bus headway as the research index, we construct the bus bunching mechanism model. Finally, a simulation platform is built by MATLAB to examine the trend of headway when various influencing factors show different distribution states along the bus line. In terms of influencing factors, inter vehicle speed, queuing time at intersection and loading time at station are shown to have a significant impact on headway between buses. In terms of the impact of the distribution of crowded road sections on headway, long-distance and concentrated crowded road sections will lead to large interval or bus bunching. When the traffic states along the bus line are randomly distributed among crowded, normal and free, the headway may fluctuate in a large range, which may result in bus bunching, or fluctuate in a small range and remain relatively stable. The headway change curve is determined by the distribution length of each traffic state along the bus line. The research results can help to formulate improvement measures according to traffic operation state for equilibrium bus headway and alleviating bus bunching.

抄録全体を表示

PDF形式でダウンロード (2745K)
Semantic Path Planning for Indoor Navigation Tasks Using Multi-View Context and Prior Knowledge

Jianbing WU, Weibo HUANG, Guoliang HUA, Wanruo ZHANG, Risheng KANG, Ho ...

原稿種別: PAPER
専門分野: Positioning and Navigation
2023 年E106.D 巻5 号 p. 756-764
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0033

ジャーナルフリー

抄録を表示する抄録を非表示にする

Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.

抄録全体を表示

PDF形式でダウンロード (3491K)
SPSD: Semantics and Deep Reinforcement Learning Based Motion Planning for Supermarket Robot

Jialun CAI, Weibo HUANG, Yingxuan YOU, Zhan CHEN, Bin REN, Hong LIU

原稿種別: PAPER
専門分野: Positioning and Navigation
2023 年E106.D 巻5 号 p. 765-772
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0057

ジャーナルフリー

抄録を表示する抄録を非表示にする

Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.

抄録全体を表示

PDF形式でダウンロード (3623K)
An Improved BPNN Method Based on Probability Density for Indoor Location

Rong FEI, Yufan GUO, Junhuai LI, Bo HU, Lu YANG

原稿種別: PAPER
専門分野: Positioning and Navigation
2023 年E106.D 巻5 号 p. 773-785
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0073

ジャーナルフリー

抄録を表示する抄録を非表示にする

With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).

抄録全体を表示

PDF形式でダウンロード (1444K)
An Improved Real-Time Object Tracking Algorithm Based on Deep Learning Features

Xianyu WANG, Cong LI, Heyi LI, Rui ZHANG, Zhifeng LIANG, Hai WANG

原稿種別: PAPER
専門分野: Object Recognition and Tracking
2023 年E106.D 巻5 号 p. 786-793
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0039

ジャーナルフリー

抄録を表示する抄録を非表示にする

Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.

抄録全体を表示

PDF形式でダウンロード (2380K)
Learning Pixel Perception for Identity and Illumination Consistency Face Frontalization in the Wild

Yongtang BAO, Pengfei ZHOU, Yue QI, Zhihui WANG, Qing FAN

原稿種別: PAPER
専門分野: Person Image Generation
2023 年E106.D 巻5 号 p. 794-803
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0055

ジャーナルフリー

抄録を表示する抄録を非表示にする

A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.

抄録全体を表示

PDF形式でダウンロード (2223K)
Multi-Scale Correspondence Learning for Person Image Generation

Shi-Long SHEN, Ai-Guo WU, Yong XU

原稿種別: PAPER
専門分野: Person Image Generation
2023 年E106.D 巻5 号 p. 804-812
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLP0058

ジャーナルフリー

抄録を表示する抄録を非表示にする

A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.

抄録全体を表示

PDF形式でダウンロード (1607K)
Enhanced Full Attention Generative Adversarial Networks

KaiXu CHEN, Satoshi YAMANE

原稿種別: LETTER
専門分野: Core Methods
2023 年E106.D 巻5 号 p. 813-817
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0007

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.

抄録全体を表示

PDF形式でダウンロード (2244K)
Bearing Remaining Useful Life Prediction Using 2D Attention Residual Network

Wenrong XIAO, Yong CHEN, Suqin GUO, Kun CHEN

原稿種別: LETTER
専門分野: Smart Industry
2023 年E106.D 巻5 号 p. 818-820
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0006

ジャーナルフリー

抄録を表示する抄録を非表示にする

An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.

抄録全体を表示

PDF形式でダウンロード (439K)
Epileptic Seizure Prediction Using Convolutional Neural Networks and Fusion Features on Scalp EEG Signals

Qixin LAN, Bin YAO, Tao QING

原稿種別: LETTER
専門分野: Smart Healthcare
2023 年E106.D 巻5 号 p. 821-823
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0002

ジャーナルフリー

抄録を表示する抄録を非表示にする

Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.

抄録全体を表示

PDF形式でダウンロード (355K)
OPENnet: Object Position Embedding Network for Locating Anti-Bird Thorn of High-Speed Railway

Zhuo WANG, Junbo LIU, Fan WANG, Jun WU

原稿種別: LETTER
専門分野: Intelligent Transportation Systems
2023 年E106.D 巻5 号 p. 824-828
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0011

ジャーナルフリー

抄録を表示する抄録を非表示にする

Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.

抄録全体を表示

PDF形式でダウンロード (1045K)
Clustering-Based Neural Network for Carbon Dioxide Estimation

Conghui LI, Quanlin ZHONG, Baoyin LI

原稿種別: LETTER
専門分野: Intelligent Transportation Systems
2023 年E106.D 巻5 号 p. 829-832
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0012

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.

抄録全体を表示

PDF形式でダウンロード (87K)
Effectiveness of Feature Extraction System for Multimodal Sensor Information Based on VRAE and Its Application to Object Recognition

Kazuki HAYASHI, Daisuke TANAKA

原稿種別: LETTER
専門分野: Object Recognition and Tracking
2023 年E106.D 巻5 号 p. 833-835
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DLL0008

ジャーナルフリー

抄録を表示する抄録を非表示にする

To achieve object recognition, it is necessary to find the unique features of the objects to be recognized. Results in prior research suggest that methods that use multiple modalities information are effective to find the unique features. In this paper, the overview of the system that can extract the features of the objects to be recognized by integrating visual, tactile, and auditory information as multimodal sensor information with VRAE is shown. Furthermore, a discussion about changing the combination of modalities information is also shown.

抄録全体を表示

PDF形式でダウンロード (1700K)

Special Section on Data Engineering and Information Management

FOREWORD

Akiyoshi MATONO

2023 年E106.D 巻5 号 p. 836-837
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAF0001

ジャーナルフリー

PDF形式でダウンロード (117K)
Effective Language Representations for Danmaku Comment Classification in Nicovideo

Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 838-846
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0010

ジャーナルフリー

抄録を表示する抄録を非表示にする

Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.

抄録全体を表示

PDF形式でダウンロード (1574K)
Maximizing External Action with Information Provision Over Multiple Rounds in Online Social Networks

Masaaki MIYASHITA, Norihiko SHINOMIYA, Daisuke KASAMATSU, Genya ISHIGA ...

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 847-855
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0007

ジャーナルフリー

抄録を表示する抄録を非表示にする

Online social networks have increased their impact on the real world, which motivates information senders to control the propagation process of information to promote particular actions of online users. However, the existing works on information provisioning seem to oversimplify the users' decision-making process that involves information reception, internal actions of social networks, and external actions of social networks. In particular, characterizing the best practices of information provisioning that promotes the users' external actions is a complex task due to the complexity of the propagation process in OSNs, even when the variation of information is limited. Therefore, we propose a new information diffusion model that distinguishes user behaviors inside and outside of OSNs, and formulate an optimization problem to maximize the number of users who take the external actions by providing information over multiple rounds. Also, we define a robust provisioning policy for the problem, which selects a message sequence to maximize the expected number of desired users under the probabilistic uncertainty of OSN settings. Our experiment results infer that there could exist an information provisioning policy that achieves nearly-optimal solutions in different types of OSNs. Furthermore, we empirically demonstrate that the proposed robust policy can be such a universally optimal solution.

抄録全体を表示

PDF形式でダウンロード (609K)
Construction of a Support Tool for Japanese User Reading of Privacy Policies and Assessment of its User Impact

Sachiko KANAMORI, Hirotsune SATO, Naoya TABATA, Ryo NOJIMA

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 856-867
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0002

ジャーナルフリー

抄録を表示する抄録を非表示にする

To protect user privacy and establish self-information control rights, service providers must notify users of their privacy policies and obtain their consent in advance. The frameworks that impose these requirements are mandatory. Although originally designed to protect user privacy, obtaining user consent in advance has become a mere formality. These problems are induced by the gap between service providers' privacy policies, which prioritize the observance of laws and guidelines, and user expectations which are to easily understand how their data will be handled. To reduce this gap, we construct a tool supporting users in reading privacy policies in Japanese. We designed the tool to present users with separate unique expressions containing relevant information to improve the display format of the privacy policy and render it more comprehensive for Japanese users. To accurately extract the unique expressions from privacy policies, we created training data for machine learning for the constructed tool. The constructed tool provides a summary of privacy policies for users to help them understand the policies of interest. Subsequently, we assess the effectiveness of the constructed tool in experiments and follow-up questionnaires. Our findings reveal that the constructed tool enhances the users' subjective understanding of the services they read about and their awareness of the related risks. We expect that the developed tool will help users better understand the privacy policy content and and make educated decisions based on their understanding of how service providers intend to use their personal data.

抄録全体を表示

PDF形式でダウンロード (1262K)
Privacy-Preserving Correlation Coefficient

Tomoaki MIMOTO, Hiroyuki YOKOYAMA, Toru NAKAMURA, Takamasa ISOHARA, Ma ...

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 868-876
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0014

ジャーナルフリー

抄録を表示する抄録を非表示にする

Differential privacy is a confidentiality metric and quantitatively guarantees the confidentiality of individuals. A noise criterion, called sensitivity, must be calculated when constructing a probabilistic disturbance mechanism that satisfies differential privacy. Depending on the statistical process, the sensitivity may be very large or even impossible to compute. As a result, the usefulness of the constructed mechanism may be significantly low; it might even be impossible to directly construct it. In this paper, we first discuss situations in which sensitivity is difficult to calculate, and then propose a differential privacy with additional dummy data as a countermeasure. When the sensitivity in the conventional differential privacy is calculable, a mechanism that satisfies the proposed metric satisfies the conventional differential privacy at the same time, and it is possible to evaluate the relationship between the respective privacy parameters. Next, we derive sensitivity by focusing on correlation coefficients as a case study of a statistical process for which sensitivity is difficult to calculate, and propose a probabilistic disturbing mechanism that satisfies the proposed metric. Finally, we experimentally evaluate the effect of noise on the sensitivity of the proposed and direct methods. Experiments show that privacy-preserving correlation coefficients can be derived with less noise compared to using direct methods.

抄録全体を表示

PDF形式でダウンロード (1304K)
Geo-Graph-Indistinguishability: Location Privacy on Road Networks with Differential Privacy

Shun TAKAGI, Yang CAO, Yasuhito ASANO, Masatoshi YOSHIKAWA

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 877-894
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0011

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.

抄録全体を表示

PDF形式でダウンロード (3378K)
Prioritization of Lane-Specific Traffic Jam Detection for Automotive Navigation Framework Utilizing Suddenness Index and Automatic Threshold Determination

Aki HAYASHI, Yuki YOKOHATA, Takahiro HATA, Kouhei MORI, Masato KAMIYA

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 895-903
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0005

ジャーナルフリー

抄録を表示する抄録を非表示にする

Car navigation systems provide traffic jam information. In this study, we attempt to provide more detailed traffic jam information that considers the lane in which a traffic jam is in. This makes it possible for users to avoid long waits in queued traffic going toward an unintended destination. Lane-specific traffic jam detection utilizes image processing, which incurs long processing time and high cost. To reduce these, we propose a “suddenness index (SI)” to categorize candidate areas as sudden or periodic. Sudden traffic jams are prioritized as they may lead to accidents. This technology aggregates the number of connected cars for each mesh on a map and quantifies the degree of deviation from the ordinary state. In this paper, we evaluate the proposed method using actual global positioning system (GPS) data and found that the proposed index can cover 100% of sudden lane-specific traffic jams while excluding 82.2% of traffic jam candidates. We also demonstrate the effectiveness of time savings by integrating the proposed method into a demonstration framework. In addition, we improved the proposed method's ability to automatically determine the SI threshold to select the appropriate traffic jam candidates to avoid manual parameter settings.

抄録全体を表示

PDF形式でダウンロード (1483K)
MicroState: An Anomaly Localization Method in Heterogeneous Microservice Systems

Jingjing YANG, Yuchun GUO, Yishuai CHEN

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 904-912
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022DAP0003

ジャーナルフリー

抄録を表示する抄録を非表示にする

Microservice architecture has been widely adopted for large-scale applications because of its benefits of scalability, flexibility, and reliability. However, microservice architecture also proposes new challenges in diagnosing root causes of performance degradation. Existing methods rely on labeled data and suffer a high computation burden. This paper proposes MicroState, an unsupervised and lightweight method to pinpoint the root cause with detailed descriptions. We decompose root cause diagnosis into element location and detailed reason identification. To mitigate the impact of element heterogeneity and dynamic invocations, MicroState generates elements' invoked states, quantifies elements' abnormality by warping-based state comparison, and infers the anomalous group. MicroState locates the root cause element with the consideration of anomaly frequency and persistency. To locate the anomalous metric from diverse metrics, MicroState extracts metrics' trend features and evaluates metrics' abnormality based on their trend feature variation, which reduces the reliance on anomaly detectors. Our experimental evaluation based on public data of the Artificial intelligence for IT Operations Challenge (AIOps Challenge 2020) shows that MicroState locates root cause elements with 87% precision and diagnoses anomaly reasons accurately.

抄録全体を表示

PDF形式でダウンロード (836K)

Special Section on the Architectures, Protocols, and Applications for the Future Internet

FOREWORD

ISMAIL ARAI

2023 年E106.D 巻5 号 p. 913
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022NTF0001

ジャーナルフリー

PDF形式でダウンロード (111K)
Wide-Area and Long-Term Agricultural Sensing System Utilizing UAV and Wireless Technologies

Hiroshi YAMAMOTO, Shota NISHIURA, Yoshihiro HIGASHIURA

原稿種別: INVITED PAPER
2023 年E106.D 巻5 号 p. 914-926
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022NTI0001

ジャーナルフリー

抄録を表示する抄録を非表示にする

In order to improve crop production and efficiency of farming operations, an IoT (Internet of Things) system for remote monitoring has been attracting a lot of attention. The existing studies have proposed agricultural sensing systems such that environmental information is collected from many sensor nodes installed in farmland through wireless communications (e.g., Wi-Fi, ZigBee). Especially, Low-Power Wide-Area (LPWA) is a focus as a candidate for wireless communication that enables the support of vast farmland for a long time. However, it is difficult to achieve long distance communication even when using the LPWA because a clear line of sight is difficult to keep due to many obstacles such as crops and agricultural machinery in the farmland. In addition, a sensor node cannot run permanently on batteries because the battery capacity is not infinite. On the other hand, an Unmanned Aerial Vehicle (UAV) that can move freely and stably in the sky has been leveraged for agricultural sensor network systems. By utilizing a UAV as the gateway of the sensor network, the gateway can move to the appropriate location to ensure a clear line of sight from the sensor nodes. In addition, the coverage area of the sensor network can be expanded as the UAV travels over a wide area even when short-range and ultra-low-power wireless communication (e.g., Bluetooth Low Energy (BLE)) is adopted. Furthermore, various wireless technologies (e.g., wireless power transfer, wireless positioning) that have the possibility to improve the coverage area and the lifetime of the sensor network have become available. Therefore, in this study, we propose and develop two kinds of new agricultural sensing systems utilizing a UAV and various wireless technologies. The objective of the proposed system is to provide the solution for achieving the wide-area and long-term sensing for the vast farmland. Depending on which problem is in a priority, the proposed system chooses one of two designs. The first design of the system attempts to achieve the wide-area sensing, and so it is based on the LPWA for wireless communication. In the system, to efficiently collect the environmental information, the UAV autonomously travels to search for the locations to maintain the good communication properties of the LPWA to the sensor nodes dispersed over a wide area of farmland. In addition, the second design attempts to achieve the long-term sensing, so it is based on BLE, a typical short-range and ultra-low-power wireless communication technology. In this design, the UAV autonomously flies to the location of sensor nodes and supplies power to them using a wireless power transfer technology for achieving a battery-less sensor node. Through experimental evaluations using a prototype system, it is confirmed that the combination of the UAV and various wireless technologies has the possibility to achieve a wide-area and long-term sensing system for monitoring vast farmland.

抄録全体を表示

PDF形式でダウンロード (4429K)
Performance Aware Egress Path Discovery for Content Provider with SRv6 Egress Peer Engineering

Yasunobu TOYOTA, Wataru MISHIMA, Koichiro KANAYA, Osamu NAKAMURA

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 927-939
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022NTP0003

ジャーナルフリー

抄録を表示する抄録を非表示にする

QoS of applications is essential for content providers, and it is required to improve the end-to-end communication quality from a content provider to users. Generally, a content provider's data center network is connected to multiple ASes and has multiple egress paths to reach the content user's network. However, on the Internet, the communication quality of network paths outside of the provider's administrative domain is a black box, so multiple egress paths cannot be quantitatively compared. In addition, it is impossible to determine a unique egress path within a network domain because the parameters that affect the QoS of the content are different for each network. We propose a “Performance Aware Egress Path Discovery” method to improve QoS for content providers. The proposed method uses two techniques: Egress Peer Engineering with Segment Routing over IPv6 and Passive End-to-End Measurement. The method is superior in that it allows various metrics depending on the type of content and can be used for measurements without affecting existing systems. To evaluate our method, we deployed the Performance Aware Egress Path Discovery System in an existing content provider network and conducted experiments to provide production services. Our findings from the experiment show that, in this network, 15.9% of users can expect a 30Mbps throughput improvement, and 13.7% of users can expect a 10ms RTT improvement.

抄録全体を表示

PDF形式でダウンロード (1645K)
A Fast Handover Mechanism for Ground-to-Train Free-Space Optical Communication using Station ID Recognition by Dual-Port Camera

Kosuke MORI, Fumio TERAOKA, Shinichiro HARUYAMA

原稿種別: PAPER
2023 年E106.D 巻5 号 p. 940-951
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022NTP0005

ジャーナルフリー

抄録を表示する抄録を非表示にする

There are demands for high-speed and stable ground-to-train optical communication as a network environment for trains. The existing ground-to-train optical communication system developed by the authors uses a camera and a QPD (Quadrant photo diode) to capture beacon light. The problem with the existing system is that it is impossible to identify the ground station. In the system proposed in this paper, a beacon light modulated with the ID of the ground station is transmitted, and the ground station is identified by demodulating the image from the dual-port camera on the opposite side. In this paper, we developed an actual system and conducted experiments using a car on the road. The results showed that only one packet was lost with the ping command every 1 ms near handover. Although the communication device itself has a bandwidth of 100 Mbps, the throughput before and after the handover was about 94 Mbps, and only dropped to about 89.4 Mbps during the handover.

抄録全体を表示

PDF形式でダウンロード (3579K)

Regular Section

Parallelization on a Minimal Substring Search Algorithm for Regular Expressions

Yosuke OBE, Hiroaki YAMAMOTO, Hiroshi FUJIWARA

原稿種別: PAPER
専門分野: Fundamentals of Information Systems
2023 年E106.D 巻5 号 p. 952-958
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7105

ジャーナルフリー

抄録を表示する抄録を非表示にする

Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.

抄録全体を表示

PDF形式でダウンロード (402K)
On Lookaheads in Regular Expressions with Backreferences

Nariyoshi CHIDA, Tachio TERAUCHI

原稿種別: PAPER
専門分野: Fundamentals of Information Systems
2023 年E106.D 巻5 号 p. 959-975
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7098

ジャーナルフリー

抄録を表示する抄録を非表示にする

Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewbl_p and the latter as rewbl_n). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewbl_p or rewbl_n. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewbl_p. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewbl_p.

抄録全体を表示

PDF形式でダウンロード (724K)
Time Series Forecasting Based on Convolution Transformer

Na WANG, Xianglian ZHAO

原稿種別: PAPER
専門分野: Fundamentals of Information Systems
2023 年E106.D 巻5 号 p. 976-985
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7136

ジャーナルフリー

抄録を表示する抄録を非表示にする

For many fields in real life, time series forecasting is essential. Recent studies have shown that Transformer has certain advantages when dealing with such problems, especially when dealing with long sequence time input and long sequence time forecasting problems. In order to improve the efficiency and local stability of Transformer, these studies combine Transformer and CNN with different structures. However, previous time series forecasting network models based on Transformer cannot make full use of CNN, and they have not been used in a better combination of both. In response to this problem in time series forecasting, we propose the time series forecasting algorithm based on convolution Transformer. (1) ES attention mechanism: Combine external attention with traditional self-attention mechanism through the two-branch network, the computational cost of self-attention mechanism is reduced, and the higher forecasting accuracy is obtained. (2) Frequency enhanced block: A Frequency Enhanced Block is added in front of the ESAttention module, which can capture important structures in time series through frequency domain mapping. (3) Causal dilated convolution: The self-attention mechanism module is connected by replacing the traditional standard convolution layer with a causal dilated convolution layer, so that it obtains the receptive field of exponentially growth without increasing the calculation consumption. (4) Multi-layer feature fusion: The outputs of different self-attention mechanism modules are extracted, and the convolutional layers are used to adjust the size of the feature map for the fusion. The more fine-grained feature information is obtained at negligible computational cost. Experiments on real world datasets show that the time series network forecasting model structure proposed in this paper can greatly improve the real-time forecasting performance of the current state-of-the-art Transformer model, and the calculation and memory costs are significantly lower. Compared with previous algorithms, the proposed algorithm has achieved a greater performance improvement in both effectiveness and forecasting accuracy.

抄録全体を表示

PDF形式でダウンロード (1328K)
A Practical Model Driven Approach for Designing Security Aware RESTful Web APIs Using SOFL

Busalire Onesmus EMEKA, Soichiro HIDAKA, Shaoying LIU

原稿種別: PAPER
専門分野: Data Engineering, Web Information Systems
2023 年E106.D 巻5 号 p. 986-1000
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7194

ジャーナルフリー

抄録を表示する抄録を非表示にする

RESTful web APIs have become ubiquitous with most modern web applications embracing the micro-service architecture. A RESTful API provides data over the network using HTTP probably interacting with databases and other services and must preserve its security properties. However, REST is not a protocol but rather a set of guidelines on how to design resources accessed over HTTP endpoints. There are guidelines on how related resources should be structured with hierarchical URIs as well as how the different HTTP verbs should be used to represent well-defined actions on those resources. Whereas security has always been critical in the design of RESTful APIs, there are few or no clear model driven engineering techniques utilizing a secure-by-design approach that interweaves both the functional and security requirements. We therefore propose an approach to specifying APIs functional and security requirements with the practical Structured-Object-oriented Formal Language (SOFL). Our proposed approach provides a generic methodology for designing security aware APIs by utilizing concepts of domain models, domain primitives, Ecore metamodel and SOFL. We also describe a case study to evaluate the effectiveness of our approach and discuss important issues in relation to the practical applicability of our method.

抄録全体を表示

PDF形式でダウンロード (961K)
High-Precision Mobile Robot Localization Using the Integration of RAR and AKF

Chen WANG, Hong TAN

原稿種別: PAPER
専門分野: Information Network
2023 年E106.D 巻5 号 p. 1001-1009
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7156

ジャーナルフリー

抄録を表示する抄録を非表示にする

The high-precision indoor positioning technology has gradually become one of the research hotspots in indoor mobile robots. Relax and Recover (RAR) is an indoor positioning algorithm using distance observations. The algorithm restores the robot's trajectory through curve fitting and does not require time synchronization of observations. The positioning can be successful with few observations. However, the algorithm has the disadvantages of poor resistance to gross errors and cannot be used for real-time positioning. In this paper, while retaining the advantages of the original algorithm, the RAR algorithm is improved with the adaptive Kalman filter (AKF) based on the innovation sequence to improve the anti-gross error performance of the original algorithm. The improved algorithm can be used for real-time navigation and positioning. The experimental validation found that the improved algorithm has a significant improvement in accuracy when compared to the original RAR. When comparing to the extended Kalman filter (EKF), the accuracy is also increased by 12.5%, which can be used for high-precision positioning of indoor mobile robots.

抄録全体を表示

PDF形式でダウンロード (1436K)
Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

Tianbin WANG, Ruiyang HUANG, Nan HU, Huansha WANG, Guanghan CHU

原稿種別: PAPER
専門分野: Artificial Intelligence, Data Mining
2023 年E106.D 巻5 号 p. 1010-1017
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7168

ジャーナルフリー

抄録を表示する抄録を非表示にする

Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

抄録全体を表示

PDF形式でダウンロード (2963K)
Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow

Rebeka SULTANA, Gosuke OHASHI

原稿種別: PAPER
専門分野: Artificial Intelligence, Data Mining
2023 年E106.D 巻5 号 p. 1018-1026
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7146

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.

抄録全体を表示

PDF形式でダウンロード (8644K)
3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation

He LI, Yutaro IWAMOTO, Xianhua HAN, Lanfen LIN, Akira FURUKAWA, Shuzo ...

原稿種別: PAPER
専門分野: Artificial Intelligence, Data Mining
2023 年E106.D 巻5 号 p. 1027-1037
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7193

ジャーナルフリー

抄録を表示する抄録を非表示にする

Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.

抄録全体を表示

PDF形式でダウンロード (6949K)
Subjective Difficulty Estimation of Educational Comics Using Gaze Features

Kenya SAKAMOTO, Shizuka SHIRAI, Noriko TAKEMURA, Jason ORLOSKY, Hiroyu ...

原稿種別: PAPER
専門分野: Educational Technology
2023 年E106.D 巻5 号 p. 1038-1048
発行日: 2023/05/01
公開日: 2023/05/01

DOIhttps://doi.org/10.1587/transinf.2022EDP7100

ジャーナルフリー

抄録を表示する抄録を非表示にする

This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).

抄録全体を表示

PDF形式でダウンロード (3159K)

J-STAGEへの登録はこちら（無料）