-
Hongzhi XU, Binlian ZHANG
Article type: PAPER
Subject area: Fundamentals of Information Systems
2024Volume E107.DIssue 10 Pages
1285-1296
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Reliability is an important figure of merit of the system and it must be satisfied in safety-critical applications. This paper considers parallel applications on heterogeneous embedded systems and proposes a two-phase algorithm framework to minimize energy consumption for satisfying applications' reliability requirement. The first phase is for initial assignment and the second phase is for either satisfying the reliability requirement or improving energy efficiency. Specifically, when the application's reliability requirement cannot be achieved via the initial assignment, an algorithm for enhancing the reliability of tasks is designed to satisfy the application's reliability requirement. Considering that the reliability of initial assignment may exceed the application's reliability requirement, an algorithm for reducing the execution frequency of tasks is designed to improve energy efficiency. The proposed algorithms are compared with existing algorithms by using real parallel applications. Experimental results demonstrate that the proposed algorithms consume less energy while satisfying the application's reliability requirements.
View full abstract
-
Haruhiko KAIYA, Shinpei OGATA, Shinpei HAYASHI
Article type: PAPER
Subject area: Software Engineering
2024Volume E107.DIssue 10 Pages
1297-1311
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Before introducing systems to an activity in a business or in daily life, the effects of these systems should first be carefully examined by analysts. Thus, methods for examining such effects are required at the early stage of requirements analysis. In this study, we propose and evaluate an analysis method using a modeling notation for this purpose, called goal dependency modeling and analysis (GDMA). In an activity, an actor, such as a person or a system, expects a goal to be achieved. The actor or another actor will achieve this goal. We focus herein on such a goal and the two different roles played by the actors. In GDMA, the dependencies in the roles of the two actors about a goal are mainly represented. GDMA enables analysts to observe the change of actors, their expectations, and abilities by using metrics. Each metric is defined on the basis of the GDMA meta-model. Therefore, GDMA enables them to decide whether the change is good or bad both quantitatively and qualitatively for the people. We evaluate GDMA by describing models of the actual system introduction written in the literatures and explain the effects caused by this introduction. In addition, CASE tools are crucial in efficiently and accurately performing GDMA. Hence, we develop its tools by extending an existing UML modeling tool.
View full abstract
-
Rina TAGAMI, Hiroki KOBAYASHI, Shuichi AKIZUKI, Manabu HASHIMOTO
Article type: PAPER
Subject area: Pattern Recognition
2024Volume E107.DIssue 10 Pages
1312-1321
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Due to the revitalization of the semiconductor industry and efforts to reduce labor and unmanned operations in the retail and food manufacturing industries, objects to be recognized at production sites are increasingly diversified in color and design. Depending on the target objects, it may be more reliable to process only color information, while intensity information may be better, or a combination of color and intensity information may be better. However, there are not many conventional method for optimizing the color and intensity information to be used, and deep learning is too costly for production sites. In this paper, we optimize the combination of the color and intensity information of a small number of pixels used for matching in the framework of template matching, on the basis of the mutual relationship between the target object and surrounding objects. We propose a fast and reliable matching method using these few pixels. Pixels with a low pixel pattern frequency are selected from color and grayscale images of the target object, and pixels that are highly discriminative from surrounding objects are carefully selected from these pixels. The use of color and intensity information makes the method highly versatile for object design. The use of a small number of pixels that are not shared by the target and surrounding objects provides high robustness to the surrounding objects and enables fast matching. Experiments using real images have confirmed that when 14 pixels are used for matching, the processing time is 6.3msec and the recognition success rate is 99.7%. The proposed method also showed better positional accuracy than the comparison method, and the optimized pixels had a higher recognition success rate than the non-optimized pixels.
View full abstract
-
Yuka KO, Katsuhito SUDOH, Sakriani SAKTI, Satoshi NAKAMURA
Article type: PAPER
Subject area: Speech and Hearing
2024Volume E107.DIssue 10 Pages
1322-1331
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
View full abstract
-
Wenxia BAO, An LIN, Hua HUANG, Xianjun YANG, Hemu CHEN
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2024Volume E107.DIssue 10 Pages
1332-1341
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Recent years have seen remarkable progress in human pose estimation. However, manual annotation of keypoints remains tedious and imprecise. To alleviate this problem, this paper proposes a novel method called Multi-Scale Contrastive Learning (MSCL). This method uses a siamese network structure with upper and lower branches that capture diffirent views of the same image. Each branch uses a backbone network to extract image representations, employing multi-scale feature vectors to capture information. These feature vectors are then passed through an enhanced feature pyramid for fusion, producing more robust feature representations. The feature vectors are then further encoded by mapping and prediction heads to predict the feature vector of another view. Using negative cosine similarity between vectors as a loss function, the backbone network is pre-trained on a large-scale unlabeled dataset, enhancing its capacity to extract visual representations. Finally, transfer learning is performed on a small amount of labelled data for the pose estimation task. Experiments on COCO datasets show significant improvements in Average Precision (AP) of 1.8%, 0.9%, and 1.2% with 1%, 5%, and 10% labelled data on COCO. In addition, the Percentage of Correct Keypoints (PCK) improves by 0.5% on MPII&AIC, outperforming mainstream contrastive learning methods.
View full abstract
-
Jiakai LI, Jianyong DUAN, Hao WANG, Li HE, Qing ZHANG
Article type: PAPER
Subject area: Natural Language Processing
2024Volume E107.DIssue 10 Pages
1342-1352
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Chinese spelling correction is a foundational task in natural language processing that aims to detect and correct spelling errors in text. Most spelling corrections in Chinese used multimodal information to model the relationship between incorrect and correct characters. However, feature information mismatch occured during fusion result from the different sources of features, causing the importance relationships between different modalities to be ignored, which in turn restricted the model from learning in an efficient manner. To this end, this paper proposes a multimodal language model-based Chinese spelling corrector, named as MISpeller. The method, based on ChineseBERT as the basic model, allows the comprehensive capture and fusion of character semantic information, phonetic information and graphic information in a single model without the need to construct additional neural networks, and realises the phenomenon of unequal fusion of multi-feature information. In addition, in order to solve the overcorrection issues, the replication mechanism is further introduced, and the replication factor is used as the dynamic weight to efficiently fuse the multimodal information. The model is able to control the proportion of original characters and predicted characters according to different input texts, and it can learn more specifically where errors occur. Experiments conducted on the SIGHAN benchmark show that the proposed model achieves the state-of-the-art performance of the F1 score at the correction level by an average of 4.36%, which validates the effectiveness of the model.
View full abstract
-
Yuxin HUANG, Yuanlin YANG, Enchang ZHU, Yin LIANG, Yantuan XIAN
Article type: PAPER
Subject area: Natural Language Processing
2024Volume E107.DIssue 10 Pages
1353-1361
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Chinese-Vietnamese cross-lingual event retrieval aims to retrieve the Vietnamese sentence describing the same event as a given Chinese query sentence from a set of Vietnamese sentences. Existing mainstream cross-lingual event retrieval methods rely on extracting textual representations from query texts and calculating their similarity with textual representations in other language candidate sets. However, these methods ignore the difference in event elements present during Chinese-Vietnamese cross-language retrieval. Consequently, sentences with similar meanings but different event elements may be incorrectly considered to describe the same event. To address this problem, we propose a cross-lingual retrieval method that integrates event elements. We introduce event elements as an additional supervisory signal, where we calculate the semantic similarity of event elements in two sentences using an attention mechanism to determine the attention score of the event elements. This allows us to establish a one-to-one correspondence between event elements in the text. Additionally, we leverage the multilingual pre-trained language model fine-tuned based on contrastive learning to obtain cross-language sentence representation to calculate the semantic similarity of the sentence texts. By combining these two approaches, we obtain the final text similarity score. Experimental results demonstrate that our proposed method achieves higher retrieval accuracy than the baseline model.
View full abstract
-
Weizhi WANG, Lei XIA, Zhuo ZHANG, Xiankai MENG
Article type: LETTER
Subject area: Software Engineering
2024Volume E107.DIssue 10 Pages
1362-1366
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Smart contracts, as a form of digital protocol, are computer programs designed for the automatic execution, control, and recording of contractual terms. They permit transactions to be conducted without the need for an intermediary. However, the economic property of smart contracts makes their vulnerabilities susceptible to hacking attacks, leading to significant losses. In this paper, we introduce a smart contract timestamp vulnerability detection technique HomoDec based on code homogeneity. The core idea of this technique involves comparing the homogeneity between the code of the test smart contract and the existing smart contract vulnerability codes in the database to determine whether the tested code has a timestamp vulnerability. Specifically, HomoDec first explores how to vectorize smart contracts reasonably and efficiently, representing smart contract code as a high-dimensional vector containing features of code vulnerabilities. Subsequently, it investigates methods to determine the homogeneity between the test codes and the ones in vulnerability code base, enabling the detection of potential timestamp vulnerabilities in smart contract code.
View full abstract
-
Na XING, Lu LI, Ye ZHANG, Shiyi YANG
Article type: LETTER
Subject area: Information Network
2024Volume E107.DIssue 10 Pages
1367-1371
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Unmanned aerial vehicle (UAV)-assisted systems have attracted a lot of attention due to its high probability of line-of-sight (LoS) connections and flexible deployment. In this paper, we aim to minimize the upload time required for the UAV to collect information from the sensor nodes in disaster scenario, while optimizing the deployment position of UAV. In order to get the deployment solution quickly, a data-driven approach is proposed in which an optimization strategy acts as the expert. Considering that images could capture the spatial configurations well, we use a convolutional neural network (CNN) to learn how to place the UAV. In the end, the simulation results demonstrate the effectiveness and generalization of the proposed method. After training, our CNN can generate UAV configuration faster than the general optimization-based algorithm.
View full abstract
-
Liu ZHANG, Zilong WANG, Jinyu LU
Article type: LETTER
Subject area: Information Network
2024Volume E107.DIssue 10 Pages
1372-1375
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Based on the framework of a multi-stage key recovery attack for a large block cipher, 2 and 3-round differential-neural distinguishers were trained for AES using partial ciphertext bits. The study introduces the differential characteristics employed for the 2-round ciphertext pairs and explores the reasons behind the near 100% accuracy of the 2-round differential neural distinguisher. Utilizing the trained 2-round distinguisher, the 3-round subkey of AES is successfully recovered through a multi-stage key guessing. Additionally, a complexity analysis of the attack is provided, validating the effectiveness of the proposed method.
View full abstract
-
Zhe WANG, Zhe-Ming LU, Hao LUO, Yang-Ming ZHENG
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2024Volume E107.DIssue 10 Pages
1376-1379
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
To accurately extract tabular data, we propose a novel cell-based tabular data extraction model (TDEM). The key of TDEM is to utilize grayscale projection of row separation lines, coupled with table masks and column masks generated by the VGG-19 neural network, to segment each individual cell from the input image of the table. In this way, the text content of the table is extracted from a specific single cell, which greatly improves the accuracy of table recognition.
View full abstract
-
Zheqing ZHANG, Hao ZHOU, Chuan LI, Weiwei JIANG
Article type: LETTER
Subject area: Image Processing and Video Processing
2024Volume E107.DIssue 10 Pages
1380-1384
Published: October 01, 2024
Released on J-STAGE: October 01, 2024
JOURNAL
FREE ACCESS
Single-image dehazing is a challenging task in computer vision research. Aiming at the limitations of traditional convolutional neural network representation capabilities and the high computational overhead of the self-attention mechanism in recent years, we proposed image attention and designed a single image dehazing network based on the image attention: IAD-Net. The proposed image attention is a plug-and-play module with the ability of global modeling. IAD-Net is a parallel network structure that combines the global modeling ability of image attention and the local modeling ability of convolution, so that the network can learn global and local features. The proposed network model has excellent feature learning ability and feature expression ability, has low computational overhead, and also improves the detail information of hazy images. Experiments verify the effectiveness of the image attention module and the competitiveness of IAD-Net with state-of-the-art methods.
View full abstract