Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications
-
Chi-Hua CHEN
2023 Volume E106.D Issue 5 Pages
579-580
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
-
Huimin LI, Dezhi HAN, Chongqing CHEN, Chin-Chen CHANG, Kuan-Ching LI, ...
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
581-589
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.
View full abstract
-
Wujian YE, Run TAN, Yijun LIU, Chin-Chen CHANG
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
590-600
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.
View full abstract
-
Jing LIANG, Ke LI, Kunjie YU, Caitong YUE, Yaxin LI, Hui SONG
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
601-616
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The selection of mutation strategy greatly affects the performance of differential evolution algorithm (DE). For different types of optimization problems, different mutation strategies should be selected. How to choose a suitable mutation strategy for different problems is a challenging task. To deal with this challenge, this paper proposes a novel DE algorithm based on local fitness landscape, called FLIDE. In the proposed method, fitness landscape information is obtained to guide the selection of mutation operators. In this way, different problems can be solved with proper evolutionary mechanisms. Moreover, a population adjustment method is used to balance the search ability and population diversity. On one hand, the diversity of the population in the early stage is enhanced with a relative large population. One the other hand, the computational cost is reduced in the later stage with a relative small population. The evolutionary information is utilized as much as possible to guide the search direction. The proposed method is compared with five popular algorithms on 30 test functions with different characteristics. Experimental results show that the proposed FLIDE is more effective on problems with high dimensions.
View full abstract
-
Junlong FENG, Jianping ZHAO
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
617-624
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.
View full abstract
-
Xi ZHANG, Yanan ZHANG, Tao GAO, Yong FANG, Ting CHEN
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
625-634
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.
View full abstract
-
Xingsi XUE, Yirui HUANG, Zeqing ZHANG
Article type: PAPER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
635-643
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Ontologies are regarded as the solution to data heterogeneity on the Semantic Web (SW), but they also suffer from the heterogeneity problem, which leads to the ambiguity of data information. Ontology Meta-Matching technique (OMM) is able to solve the ontology heterogeneity problem through aggregating various similarity measures to find the heterogeneous entities. Inspired by the success of Reinforcement Learning (RL) in solving complex optimization problems, this work proposes a RL-based OMM technique to address the ontology heterogeneity problem. First, we propose a novel RL-based OMM framework, and then, a neural network that is called evaluated network is proposed to replace the Q table when we choose the next action of the agent, which is able to reduce memory consumption and computing time. After that, to better guide the training of neural network and improve the accuracy of RL agent, we establish a memory bank to mine depth information during the evaluated network's training procedure, and we use another neural network that is called target network to save the historical parameters. The experiment uses the famous benchmark in ontology matching domain to test our approach's performance, and the comparisons among Deep Reinforcement Learning(DRL), RL and state-of-the-art ontology matching systems show that our approach is able to effectively determine high-quality alignments.
View full abstract
-
Xincheng CAO, Bin YAO, Binqiang CHEN, Wangpeng HE, Suqin GUO, Kun CHEN
Article type: PAPER
Subject area: Smart Industry
2023 Volume E106.D Issue 5 Pages
644-652
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.
View full abstract
-
Wen LIU, Yixiao SHAO, Shihong ZHAI, Zhao YANG, Peishuai CHEN
Article type: PAPER
Subject area: Smart Industry
2023 Volume E106.D Issue 5 Pages
653-661
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.
View full abstract
-
Yong LI, Shidi WEI, Xuan LIU, Yinzheng LUO, Yafeng LI, Feng SHUANG
Article type: PAPER
Subject area: Smart Industry
2023 Volume E106.D Issue 5 Pages
662-672
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The traditional manual inspection is gradually replaced by the unmanned aerial vehicles (UAV) automatic inspection. However, due to the limited computational resources carried by the UAV, the existing deep learning-based algorithm needs a large amount of computational resources, which makes it impossible to realize the online detection. Moreover, there is no effective online detection system at present. To realize the high-precision online detection of electrical equipment, this paper proposes an SSD (Single Shot Multibox Detector) detection algorithm based on the improved Dual network for the images of insulators and spacers taken by UAVs. The proposed algorithm uses MnasNet and MobileNetv3 to form the Dual network to extract multi-level features, which overcomes the shortcoming of single convolutional network-based backbone for feature extraction. Then the features extracted from the two networks are fused together to obtain the features with high-level semantic information. Finally, the proposed algorithm is tested on the public dataset of the insulator and spacer. The experimental results show that the proposed algorithm can detect insulators and spacers efficiently. Compared with other methods, the proposed algorithm has the advantages of smaller model size and higher accuracy. The object detection accuracy of the proposed method is up to 95.1%.
View full abstract
-
Yue PENG, Zuqiang MENG, Lina YANG
Article type: PAPER
Subject area: Smart Healthcare
2023 Volume E106.D Issue 5 Pages
686-696
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.
View full abstract
-
Runze WANG, Zehua ZHANG, Yueqin ZHANG, Zhongyuan JIANG, Shilin SUN, Gu ...
Article type: PAPER
Subject area: Smart Healthcare
2023 Volume E106.D Issue 5 Pages
697-706
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Recent studies in protein structure prediction such as AlphaFold have enabled deep learning to achieve great attention on the Drug-Target Affinity (DTA) task. Most works are dedicated to embed single molecular property and homogeneous information, ignoring the diverse heterogeneous information gains that are contained in the molecules and interactions. Motivated by this, we propose an end-to-end deep learning framework to perform Molecular Heterogeneous features Fusion (MolHF) for DTA prediction on heterogeneity. To address the challenges that biochemical attributes locates in different heterogeneous spaces, we design a Molecular Heterogeneous Information Learning module with multi-strategy learning. Especially, Molecular Heterogeneous Attention Fusion module is present to obtain the gains of molecular heterogeneous features. With these, the diversity of molecular structure information for drugs can be extracted. Extensive experiments on two benchmark datasets show that our method outperforms the baselines in all four metrics. Ablation studies validate the effect of attentive fusion and multi-group of drug heterogeneous features. Visual presentations demonstrate the impact of protein embedding level and the model ability of fitting data. In summary, the diverse gains brought by heterogeneous information contribute to drug-target affinity prediction.
View full abstract
-
Hiroyuki NOZAKA, Kosuke KAMATA, Kazufumi YAMAGATA
Article type: PAPER
Subject area: Smart Healthcare
2023 Volume E106.D Issue 5 Pages
707-714
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
View full abstract
-
Meng ZHAO, Junfeng WU, Hong YU, Haiqing LI, Jingwen XU, Siqi CHENG, Li ...
Article type: PAPER
Subject area: Smart Agriculture
2023 Volume E106.D Issue 5 Pages
715-725
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Accurate fish detection is of great significance in aquaculture. However, the non-uniform strong reflection in aquaculture ponds will affect the precision of fish detection. This paper combines YOLOv4 and CVAE to accurately detect fishes in the image with non-uniform strong reflection, in which the reflection in the image is removed at first and then the reflection-removed image is provided for fish detecting. Firstly, the improved YOLOv4 is applied to detect and mask the strong reflective region, to locate and label the reflective region for the subsequent reflection removal. Then, CVAE is combined with the improved YOLOv4 for inferring the priori distribution of the Reflection region and restoring the Reflection region by the distribution so that the reflection can be removed. For further improving the quality of the reflection-removed images, the adversarial learning is appended to CVAE. Finally, YOLOV4 is used to detect fishes in the high quality image. In addition, a new image dataset of pond cultured takifugu rubripes is constructed,, which includes 1000 images with fishes annotated manually, also a synthetic dataset including 2000 images with strong reflection is created and merged with the generated dataset for training and verifying the robustness of the proposed method. Comprehensive experiments are performed to compare the proposed method with the state-of-the-art fish detecting methods without reflection removal on the generated dataset. The results show that the fish detecting precision and recall of the proposed method are improved by 2.7% and 2.4% respectively.
View full abstract
-
Wenxin DONG, Jianxun ZHANG, Shuqiu TAN, Xinyue ZHANG
Article type: PAPER
Subject area: Smart Agriculture
2023 Volume E106.D Issue 5 Pages
726-734
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
View full abstract
-
Lie GUO, Yibing ZHAO, Jiandong GAO
Article type: PAPER
Subject area: Intelligent Transportation Systems
2023 Volume E106.D Issue 5 Pages
735-745
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The commonly used object detection algorithm based on convolutional neural network is difficult to meet the real-time requirement on embedded platform due to its large size of model, large amount of calculation, and long inference time. It is necessary to use model compression to reduce the amount of network calculation and increase the speed of network inference. This paper conducts compression of vehicle and pedestrian detection network by pruning and removing redundant parameters. The vehicle and pedestrian detection network is trained based on YOLOv3 model by using K-means++ to cluster the anchor boxes. The detection accuracy is improved by changing the proportion of categorical losses and regression losses for each category in the loss function because of the unbalanced number of targets in the dataset. A layer and channel pruning algorithm is proposed by combining global channel pruning thresholds and L1 norm, which can reduce the time cost of the network layer transfer process and the amount of computation. Network layer fusion based on TensorRT is performed and inference is performed using half-precision floating-point to improve the speed of inference. Results show that the vehicle and pedestrian detection compression network pruned 84% channels and 15 Shortcut modules can reduce the size by 32% and the amount of calculation by 17%. While the network inference time can be decreased to 21 ms, which is 1.48 times faster than the network pruned 84% channels.
View full abstract
-
Shaorong HU, Yuqi ZHANG, Yuefei JIN, Ziqi DOU
Article type: PAPER
Subject area: Intelligent Transportation Systems
2023 Volume E106.D Issue 5 Pages
746-755
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Bus bunching often occurs in public transit system, resulting in a series of problems such as poor punctuality, long waiting time and low service quality. In this paper, we explore the influence of the discrete distribution of traffic operation state on the dynamic evolution of bus bunching. Firstly, we use self-organizing map (SOM) to find the threshold of bus bunching and analyze the factors that affect bus bunching based on GPS data of No. 600 bus line in Xi'an. Then, taking the bus headway as the research index, we construct the bus bunching mechanism model. Finally, a simulation platform is built by MATLAB to examine the trend of headway when various influencing factors show different distribution states along the bus line. In terms of influencing factors, inter vehicle speed, queuing time at intersection and loading time at station are shown to have a significant impact on headway between buses. In terms of the impact of the distribution of crowded road sections on headway, long-distance and concentrated crowded road sections will lead to large interval or bus bunching. When the traffic states along the bus line are randomly distributed among crowded, normal and free, the headway may fluctuate in a large range, which may result in bus bunching, or fluctuate in a small range and remain relatively stable. The headway change curve is determined by the distribution length of each traffic state along the bus line. The research results can help to formulate improvement measures according to traffic operation state for equilibrium bus headway and alleviating bus bunching.
View full abstract
-
Jianbing WU, Weibo HUANG, Guoliang HUA, Wanruo ZHANG, Risheng KANG, Ho ...
Article type: PAPER
Subject area: Positioning and Navigation
2023 Volume E106.D Issue 5 Pages
756-764
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.
View full abstract
-
Jialun CAI, Weibo HUANG, Yingxuan YOU, Zhan CHEN, Bin REN, Hong LIU
Article type: PAPER
Subject area: Positioning and Navigation
2023 Volume E106.D Issue 5 Pages
765-772
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.
View full abstract
-
Rong FEI, Yufan GUO, Junhuai LI, Bo HU, Lu YANG
Article type: PAPER
Subject area: Positioning and Navigation
2023 Volume E106.D Issue 5 Pages
773-785
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).
View full abstract
-
Xianyu WANG, Cong LI, Heyi LI, Rui ZHANG, Zhifeng LIANG, Hai WANG
Article type: PAPER
Subject area: Object Recognition and Tracking
2023 Volume E106.D Issue 5 Pages
786-793
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.
View full abstract
-
Yongtang BAO, Pengfei ZHOU, Yue QI, Zhihui WANG, Qing FAN
Article type: PAPER
Subject area: Person Image Generation
2023 Volume E106.D Issue 5 Pages
794-803
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
A frontal and realistic face image was synthesized from a single profile face image. It has a wide range of applications in face recognition. Although the frontal face method based on deep learning has made substantial progress in recent years, there is still no guarantee that the generated face has identity consistency and illumination consistency in a significant posture. This paper proposes a novel pixel-based feature regression generative adversarial network (PFR-GAN), which can learn to recover local high-frequency details and preserve identity and illumination frontal face images in an uncontrolled environment. We first propose a Reslu block to obtain richer feature representation and improve the convergence speed of training. We then introduce a feature conversion module to reduce the artifacts caused by face rotation discrepancy, enhance image generation quality, and preserve more high-frequency details of the profile image. We also construct a 30,000 face pose dataset to learn about various uncontrolled field environments. Our dataset includes ages of different races and wild backgrounds, allowing us to handle other datasets and obtain better results. Finally, we introduce a discriminator used for recovering the facial structure of the frontal face images. Quantitative and qualitative experimental results show our PFR-GAN can generate high-quality and high-fidelity frontal face images, and our results are better than the state-of-art results.
View full abstract
-
Shi-Long SHEN, Ai-Guo WU, Yong XU
Article type: PAPER
Subject area: Person Image Generation
2023 Volume E106.D Issue 5 Pages
804-812
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.
View full abstract
-
KaiXu CHEN, Satoshi YAMANE
Article type: LETTER
Subject area: Core Methods
2023 Volume E106.D Issue 5 Pages
813-817
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.
View full abstract
-
Wenrong XIAO, Yong CHEN, Suqin GUO, Kun CHEN
Article type: LETTER
Subject area: Smart Industry
2023 Volume E106.D Issue 5 Pages
818-820
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
View full abstract
-
Qixin LAN, Bin YAO, Tao QING
Article type: LETTER
Subject area: Smart Healthcare
2023 Volume E106.D Issue 5 Pages
821-823
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.
View full abstract
-
Zhuo WANG, Junbo LIU, Fan WANG, Jun WU
Article type: LETTER
Subject area: Intelligent Transportation Systems
2023 Volume E106.D Issue 5 Pages
824-828
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Machine vision-based automatic anti-bird thorn failure inspection, instead of manual identification, remains a great challenge. In this paper, we proposed a novel Object Position Embedding Network (OPENnet), which can improve the precision of anti-bird thorn localization. OPENnet can simultaneously predict the location boxes of the support device and anti-bird thorn by using the proposed double-head network. And then, OPENnet is optimized using the proposed symbiotic loss function (SymLoss), which embeds the object position into the network. The comprehensive experiments are conducted on the real railway video dataset. OPENnet yields competitive performance on anti-bird thorn localization. Specifically, the localization performance gains +3.65 AP, +2.10 AP50, and +1.22 AP75.
View full abstract
-
Conghui LI, Quanlin ZHONG, Baoyin LI
Article type: LETTER
Subject area: Intelligent Transportation Systems
2023 Volume E106.D Issue 5 Pages
829-832
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.
View full abstract
-
Kazuki HAYASHI, Daisuke TANAKA
Article type: LETTER
Subject area: Object Recognition and Tracking
2023 Volume E106.D Issue 5 Pages
833-835
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
To achieve object recognition, it is necessary to find the unique features of the objects to be recognized. Results in prior research suggest that methods that use multiple modalities information are effective to find the unique features. In this paper, the overview of the system that can extract the features of the objects to be recognized by integrating visual, tactile, and auditory information as multimodal sensor information with VRAE is shown. Furthermore, a discussion about changing the combination of modalities information is also shown.
View full abstract
-
Akiyoshi MATONO
2023 Volume E106.D Issue 5 Pages
836-837
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
-
Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
838-846
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
View full abstract
-
Masaaki MIYASHITA, Norihiko SHINOMIYA, Daisuke KASAMATSU, Genya ISHIGA ...
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
847-855
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Online social networks have increased their impact on the real world, which motivates information senders to control the propagation process of information to promote particular actions of online users. However, the existing works on information provisioning seem to oversimplify the users' decision-making process that involves information reception, internal actions of social networks, and external actions of social networks. In particular, characterizing the best practices of information provisioning that promotes the users' external actions is a complex task due to the complexity of the propagation process in OSNs, even when the variation of information is limited. Therefore, we propose a new information diffusion model that distinguishes user behaviors inside and outside of OSNs, and formulate an optimization problem to maximize the number of users who take the external actions by providing information over multiple rounds. Also, we define a robust provisioning policy for the problem, which selects a message sequence to maximize the expected number of desired users under the probabilistic uncertainty of OSN settings. Our experiment results infer that there could exist an information provisioning policy that achieves nearly-optimal solutions in different types of OSNs. Furthermore, we empirically demonstrate that the proposed robust policy can be such a universally optimal solution.
View full abstract
-
Sachiko KANAMORI, Hirotsune SATO, Naoya TABATA, Ryo NOJIMA
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
856-867
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
To protect user privacy and establish self-information control rights, service providers must notify users of their privacy policies and obtain their consent in advance. The frameworks that impose these requirements are mandatory. Although originally designed to protect user privacy, obtaining user consent in advance has become a mere formality. These problems are induced by the gap between service providers' privacy policies, which prioritize the observance of laws and guidelines, and user expectations which are to easily understand how their data will be handled. To reduce this gap, we construct a tool supporting users in reading privacy policies in Japanese. We designed the tool to present users with separate unique expressions containing relevant information to improve the display format of the privacy policy and render it more comprehensive for Japanese users. To accurately extract the unique expressions from privacy policies, we created training data for machine learning for the constructed tool. The constructed tool provides a summary of privacy policies for users to help them understand the policies of interest. Subsequently, we assess the effectiveness of the constructed tool in experiments and follow-up questionnaires. Our findings reveal that the constructed tool enhances the users' subjective understanding of the services they read about and their awareness of the related risks. We expect that the developed tool will help users better understand the privacy policy content and and make educated decisions based on their understanding of how service providers intend to use their personal data.
View full abstract
-
Tomoaki MIMOTO, Hiroyuki YOKOYAMA, Toru NAKAMURA, Takamasa ISOHARA, Ma ...
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
868-876
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Differential privacy is a confidentiality metric and quantitatively guarantees the confidentiality of individuals. A noise criterion, called sensitivity, must be calculated when constructing a probabilistic disturbance mechanism that satisfies differential privacy. Depending on the statistical process, the sensitivity may be very large or even impossible to compute. As a result, the usefulness of the constructed mechanism may be significantly low; it might even be impossible to directly construct it. In this paper, we first discuss situations in which sensitivity is difficult to calculate, and then propose a differential privacy with additional dummy data as a countermeasure. When the sensitivity in the conventional differential privacy is calculable, a mechanism that satisfies the proposed metric satisfies the conventional differential privacy at the same time, and it is possible to evaluate the relationship between the respective privacy parameters. Next, we derive sensitivity by focusing on correlation coefficients as a case study of a statistical process for which sensitivity is difficult to calculate, and propose a probabilistic disturbing mechanism that satisfies the proposed metric. Finally, we experimentally evaluate the effect of noise on the sensitivity of the proposed and direct methods. Experiments show that privacy-preserving correlation coefficients can be derived with less noise compared to using direct methods.
View full abstract
-
Shun TAKAGI, Yang CAO, Yasuhito ASANO, Masatoshi YOSHIKAWA
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
877-894
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.
View full abstract
-
Aki HAYASHI, Yuki YOKOHATA, Takahiro HATA, Kouhei MORI, Masato KAMIYA
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
895-903
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Car navigation systems provide traffic jam information. In this study, we attempt to provide more detailed traffic jam information that considers the lane in which a traffic jam is in. This makes it possible for users to avoid long waits in queued traffic going toward an unintended destination. Lane-specific traffic jam detection utilizes image processing, which incurs long processing time and high cost. To reduce these, we propose a “suddenness index (SI)” to categorize candidate areas as sudden or periodic. Sudden traffic jams are prioritized as they may lead to accidents. This technology aggregates the number of connected cars for each mesh on a map and quantifies the degree of deviation from the ordinary state. In this paper, we evaluate the proposed method using actual global positioning system (GPS) data and found that the proposed index can cover 100% of sudden lane-specific traffic jams while excluding 82.2% of traffic jam candidates. We also demonstrate the effectiveness of time savings by integrating the proposed method into a demonstration framework. In addition, we improved the proposed method's ability to automatically determine the SI threshold to select the appropriate traffic jam candidates to avoid manual parameter settings.
View full abstract
-
Jingjing YANG, Yuchun GUO, Yishuai CHEN
Article type: PAPER
2023 Volume E106.D Issue 5 Pages
904-912
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Microservice architecture has been widely adopted for large-scale applications because of its benefits of scalability, flexibility, and reliability. However, microservice architecture also proposes new challenges in diagnosing root causes of performance degradation. Existing methods rely on labeled data and suffer a high computation burden. This paper proposes MicroState, an unsupervised and lightweight method to pinpoint the root cause with detailed descriptions. We decompose root cause diagnosis into element location and detailed reason identification. To mitigate the impact of element heterogeneity and dynamic invocations, MicroState generates elements' invoked states, quantifies elements' abnormality by warping-based state comparison, and infers the anomalous group. MicroState locates the root cause element with the consideration of anomaly frequency and persistency. To locate the anomalous metric from diverse metrics, MicroState extracts metrics' trend features and evaluates metrics' abnormality based on their trend feature variation, which reduces the reliance on anomaly detectors. Our experimental evaluation based on public data of the Artificial intelligence for IT Operations Challenge (AIOps Challenge 2020) shows that MicroState locates root cause elements with 87% precision and diagnoses anomaly reasons accurately.
View full abstract
Special Section on the Architectures, Protocols, and Applications for the Future Internet
-
Yosuke OBE, Hiroaki YAMAMOTO, Hiroshi FUJIWARA
Article type: PAPER
Subject area: Fundamentals of Information Systems
2023 Volume E106.D Issue 5 Pages
952-958
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Let us consider a regular expression r of length m and a text string T of length n over an alphabet Σ. Then, the RE minimal substring search problem is to find all minimal substrings of T matching r. Yamamoto proposed O(mn) time and O(m) space algorithm using a Thompson automaton. In this paper, we improve Yamamoto's algorithm by introducing parallelism. The proposed algorithm runs in O(mn) time in the worst case and in O(mn/p) time in the best case, where p denotes the number of processors. Besides, we show a parameter related to the parallel time of the proposed algorithm. We evaluate the algorithm experimentally.
View full abstract
-
Nariyoshi CHIDA, Tachio TERAUCHI
Article type: PAPER
Subject area: Fundamentals of Information Systems
2023 Volume E106.D Issue 5 Pages
959-975
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.
View full abstract
-
Na WANG, Xianglian ZHAO
Article type: PAPER
Subject area: Fundamentals of Information Systems
2023 Volume E106.D Issue 5 Pages
976-985
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
For many fields in real life, time series forecasting is essential. Recent studies have shown that Transformer has certain advantages when dealing with such problems, especially when dealing with long sequence time input and long sequence time forecasting problems. In order to improve the efficiency and local stability of Transformer, these studies combine Transformer and CNN with different structures. However, previous time series forecasting network models based on Transformer cannot make full use of CNN, and they have not been used in a better combination of both. In response to this problem in time series forecasting, we propose the time series forecasting algorithm based on convolution Transformer. (1) ES attention mechanism: Combine external attention with traditional self-attention mechanism through the two-branch network, the computational cost of self-attention mechanism is reduced, and the higher forecasting accuracy is obtained. (2) Frequency enhanced block: A Frequency Enhanced Block is added in front of the ESAttention module, which can capture important structures in time series through frequency domain mapping. (3) Causal dilated convolution: The self-attention mechanism module is connected by replacing the traditional standard convolution layer with a causal dilated convolution layer, so that it obtains the receptive field of exponentially growth without increasing the calculation consumption. (4) Multi-layer feature fusion: The outputs of different self-attention mechanism modules are extracted, and the convolutional layers are used to adjust the size of the feature map for the fusion. The more fine-grained feature information is obtained at negligible computational cost. Experiments on real world datasets show that the time series network forecasting model structure proposed in this paper can greatly improve the real-time forecasting performance of the current state-of-the-art Transformer model, and the calculation and memory costs are significantly lower. Compared with previous algorithms, the proposed algorithm has achieved a greater performance improvement in both effectiveness and forecasting accuracy.
View full abstract
-
Busalire Onesmus EMEKA, Soichiro HIDAKA, Shaoying LIU
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2023 Volume E106.D Issue 5 Pages
986-1000
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
RESTful web APIs have become ubiquitous with most modern web applications embracing the micro-service architecture. A RESTful API provides data over the network using HTTP probably interacting with databases and other services and must preserve its security properties. However, REST is not a protocol but rather a set of guidelines on how to design resources accessed over HTTP endpoints. There are guidelines on how related resources should be structured with hierarchical URIs as well as how the different HTTP verbs should be used to represent well-defined actions on those resources. Whereas security has always been critical in the design of RESTful APIs, there are few or no clear model driven engineering techniques utilizing a secure-by-design approach that interweaves both the functional and security requirements. We therefore propose an approach to specifying APIs functional and security requirements with the practical Structured-Object-oriented Formal Language (SOFL). Our proposed approach provides a generic methodology for designing security aware APIs by utilizing concepts of domain models, domain primitives, Ecore metamodel and SOFL. We also describe a case study to evaluate the effectiveness of our approach and discuss important issues in relation to the practical applicability of our method.
View full abstract
-
Chen WANG, Hong TAN
Article type: PAPER
Subject area: Information Network
2023 Volume E106.D Issue 5 Pages
1001-1009
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
The high-precision indoor positioning technology has gradually become one of the research hotspots in indoor mobile robots. Relax and Recover (RAR) is an indoor positioning algorithm using distance observations. The algorithm restores the robot's trajectory through curve fitting and does not require time synchronization of observations. The positioning can be successful with few observations. However, the algorithm has the disadvantages of poor resistance to gross errors and cannot be used for real-time positioning. In this paper, while retaining the advantages of the original algorithm, the RAR algorithm is improved with the adaptive Kalman filter (AKF) based on the innovation sequence to improve the anti-gross error performance of the original algorithm. The improved algorithm can be used for real-time navigation and positioning. The experimental validation found that the improved algorithm has a significant improvement in accuracy when compared to the original RAR. When comparing to the extended Kalman filter (EKF), the accuracy is also increased by 12.5%, which can be used for high-precision positioning of indoor mobile robots.
View full abstract
-
Tianbin WANG, Ruiyang HUANG, Nan HU, Huansha WANG, Guanghan CHU
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2023 Volume E106.D Issue 5 Pages
1010-1017
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.
View full abstract
-
Rebeka SULTANA, Gosuke OHASHI
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2023 Volume E106.D Issue 5 Pages
1018-1026
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.
View full abstract
-
He LI, Yutaro IWAMOTO, Xianhua HAN, Lanfen LIN, Akira FURUKAWA, Shuzo ...
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2023 Volume E106.D Issue 5 Pages
1027-1037
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.
View full abstract
-
Kenya SAKAMOTO, Shizuka SHIRAI, Noriko TAKEMURA, Jason ORLOSKY, Hiroyu ...
Article type: PAPER
Subject area: Educational Technology
2023 Volume E106.D Issue 5 Pages
1038-1048
Published: May 01, 2023
Released on J-STAGE: May 01, 2023
JOURNAL
FREE ACCESS
This study explores significant eye-gaze features that can be used to estimate subjective difficulty while reading educational comics. Educational comics have grown rapidly as a promising way to teach difficult topics using illustrations and texts. However, comics include a variety of information on one page, so automatically detecting learners' states such as subjective difficulty is difficult with approaches such as system log-based detection, which is common in the Learning Analytics field. In order to solve this problem, this study focused on 28 eye-gaze features, including the proposal of three new features called “Variance in Gaze Convergence,” “Movement between Panels,” and “Movement between Tiles” to estimate two degrees of subjective difficulty. We then ran an experiment in a simulated environment using Virtual Reality (VR) to accurately collect gaze information. We extracted features in two unit levels, page- and panel-units, and evaluated the accuracy with each pattern in user-dependent and user-independent settings, respectively. Our proposed features achieved an average F1 classification-score of 0.721 and 0.742 in user-dependent and user-independent models at panel unit levels, respectively, trained by a Support Vector Machine (SVM).
View full abstract