IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E105.D, Issue 4
Displaying 1-11 of 11 articles from this issue
Regular Section
  • Jing ZHU, Song HUANG, Yaqing SHI, Kaishun WU, Yanqiu WANG
    Article type: PAPER
    Subject area: Software Engineering
    2022 Volume E105.D Issue 4 Pages 736-754
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Nowadays there is no way to automatically obtain the function points when using function point analyze (FPA) method, especially for the requirement documents written in Chinese language. Considering the characteristics of Chinese grammar in words segmentation, it is necessary to divide words accurately Chinese words, so that the subsequent entity recognition and disambiguation can be carried out in a smaller range, which lays a solid foundation for the efficient automatic extraction of the function points. Therefore, this paper proposed a method of K-Means clustering based on TF-IDF, and conducts experiments with 24 software requirement documents written in Chinese language. The results show that the best clustering effect is achieved when the extracted information is retained by 55% to 75% and the number of clusters takes the middle value of the total number of clusters. Not only for Chinese, this method and conclusion of this paper, but provides an important reference for automatic extraction of function points from software requirements documents written in other Oriental languages, and also fills the gaps of data preprocessing in the early stage of automatic calculation function points.

    Download PDF (13516K)
  • Yuma MASUBUCHI, Masaki HASHIMOTO, Akira OTSUKA
    Article type: PAPER
    Subject area: Dependable Computing
    2022 Volume E105.D Issue 4 Pages 755-765
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Binary code similarity comparison methods are mainly used to find bugs in software, to detect software plagiarism, and to reduce the workload during malware analysis. In this paper, we propose a method to compare the binary code similarity of each function by using a combination of Control Flow Graphs (CFGs) and disassembled instruction sequences contained in each function, and to detect a function with high similarity to a specified function. One of the challenges in performing similarity comparisons is that different compile-time optimizations and different architectures produce different binary code. The main units for comparing code are instructions, basic blocks and functions. The challenge of functions is that they have a graph structure in which basic blocks are combined, making it relatively difficult to derive similarity. However, analysis tools such as IDA, display the disassembled instruction sequence in function units. Detecting similarity on a function basis has the advantage of facilitating simplified understanding by analysts. To solve the aforementioned challenges, we use machine learning methods in the field of natural language processing. In this field, there is a Transformer model, as of 2017, that updates each record for various language processing tasks, and as of 2021, Transformer is the basis for BERT, which updates each record for language processing tasks. There is also a method called node2vec, which uses machine learning techniques to capture the features of each node from the graph structure. In this paper, we propose SIBYL, a combination of Transformer and node2vec. In SIBYL, a method called Triplet-Loss is used during learning so that similar items are brought closer and dissimilar items are moved away. To evaluate SIBYL, we created a new dataset using open-source software widely used in the real world, and conducted training and evaluation experiments using the dataset. In the evaluation experiments, we evaluated the similarity of binary codes across different architectures using evaluation indices such as Rank1 and MRR. The experimental results showed that SIBYL outperforms existing research. We believe that this is due to the fact that machine learning has been able to capture the features of the graph structure and the order of instructions on a function-by-function basis. The results of these experiments are presented in detail, followed by a discussion and conclusion.

    Download PDF (682K)
  • Jing WANG, Yiyu LUO, Weiming YI, Xiang XIE
    Article type: PAPER
    Subject area: Speech and Hearing
    2022 Volume E105.D Issue 4 Pages 766-777
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Speech separation is the task of extracting target speech while suppressing background interference components. In applications like video telephones, visual information about the target speaker is available, which can be leveraged for multi-speaker speech separation. Most previous multi-speaker separation methods are mainly based on convolutional or recurrent neural networks. Recently, Transformer-based Seq2Seq models have achieved state-of-the-art performance in various tasks, such as neural machine translation (NMT), automatic speech recognition (ASR), etc. Transformer has showed an advantage in modeling audio-visual temporal context by multi-head attention blocks through explicitly assigning attention weights. Besides, Transformer doesn't have any recurrent sub-networks, thus supporting parallelization of sequence computation. In this paper, we propose a novel speaker-independent audio-visual speech separation method based on Transformer, which can be flexibly applied to unknown number and identity of speakers. The model receives both audio-visual streams, including noisy spectrogram and speaker lip embeddings, and predicts a complex time-frequency mask for the corresponding target speaker. The model is made up by three main components: audio encoder, visual encoder and Transformer-based mask generator. Two different structures of encoders are investigated and compared, including ResNet-based and Transformer-based. The performance of the proposed method is evaluated in terms of source separation and speech quality metrics. The experimental results on the benchmark GRID dataset show the effectiveness of the method on speaker-independent separation task in multi-talker environments. The model generalizes well to unseen identities of speakers and noise types. Though only trained on 2-speaker mixtures, the model achieves reasonable performance when tested on 2-speaker and 3-speaker mixtures. Besides, the model still shows an advantage compared with previous audio-visual speech separation works.

    Download PDF (3445K)
  • Kazuhiko MURASAKI, Shingo ANDO, Jun SHIMAMURA
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2022 Volume E105.D Issue 4 Pages 778-784
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    In this paper, we propose a semi-supervised triplet loss function that realizes semi-supervised representation learning in a novel manner. We extend conventional triplet loss, which uses labeled data to achieve representation learning, so that it can deal with unlabeled data. We estimate, in advance, the degree to which each label applies to each unlabeled data point, and optimize the loss function with unlabeled features according to the resulting ratios. Since the proposed loss function has the effect of adjusting the distribution of all unlabeled data, it complements methods based on consistency regularization, which has been extensively studied in recent years. Combined with a consistency regularization-based method, our method achieves more accurate semi-supervised learning. Experiments show that the proposed loss function achieves a higher accuracy than the conventional fine-tuning method.

    Download PDF (478K)
  • Xiang SHEN, Dezhi HAN, Chin-Chen CHANG, Liang ZONG
    Article type: PAPER
    Subject area: Natural Language Processing
    2022 Volume E105.D Issue 4 Pages 785-796
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Visual Question Answering (VQA) is multi-task research that requires simultaneous processing of vision and text. Recent research on the VQA models employ a co-attention mechanism to build a model between the context and the image. However, the features of questions and the modeling of the image region force irrelevant information to be calculated in the model, thus affecting the performance. This paper proposes a novel dual self-guided attention with sparse question networks (DSSQN) to address this issue. The aim is to avoid having irrelevant information calculated into the model when modeling the internal dependencies on both the question and image. Simultaneously, it overcomes the coarse interaction between sparse question features and image features. First, the sparse question self-attention (SQSA) unit in the encoder calculates the feature with the highest weight. From the self-attention learning of question words, the question features of larger weights are reserved. Secondly, sparse question features are utilized to guide the focus on image features to obtain fine-grained image features, and to also prevent irrelevant information from being calculated into the model. A dual self-guided attention (DSGA) unit is designed to improve modal interaction between questions and images. Third, the sparse question self-attention of the parameter δ is optimized to select these question-related object regions. Our experiments with VQA 2.0 benchmark datasets demonstrate that DSSQN outperforms the state-of-the-art methods. For example, the accuracy of our proposed model on the test-dev and test-std is 71.03% and 71.37%, respectively. In addition, we show through visualization results that our model can pay more attention to important features than other advanced models. At the same time, we also hope that it can promote the development of VQA in the field of artificial intelligence (AI).

    Download PDF (1497K)
  • Guoyi MIAO, Yufeng CHEN, Mingtong LIU, Jinan XU, Yujie ZHANG, Wenhe FE ...
    Article type: PAPER
    Subject area: Natural Language Processing
    2022 Volume E105.D Issue 4 Pages 797-806
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Translation of long and complex sentence has always been a challenge for machine translation. In recent years, neural machine translation (NMT) has achieved substantial progress in modeling the semantic connection between words in a sentence, but it is still insufficient in capturing discourse structure information between clauses within complex sentences, which often leads to poor discourse coherence when translating long and complex sentences. On the other hand, the hypotactic structure, a main component of the discourse structure, plays an important role in the coherence of discourse translation, but it is not specifically studied. To tackle this problem, we propose a novel Chinese-English NMT approach that incorporates the hypotactic structure knowledge of complex sentences. Specifically, we first annotate and build a hypotactic structure aligned parallel corpus to provide explicit hypotactic structure knowledge of complex sentences for NMT. Then we propose three hypotactic structure-aware NMT models with three different fusion strategies, including source-side fusion, target-side fusion, and both-side fusion, to integrate the annotated structure knowledge into NMT. Experimental results on WMT17, WMT18 and WMT19 Chinese-English translation tasks demonstrate that the proposed method can significantly improve the translation performance and enhance the discourse coherence of machine translation.

    Download PDF (2722K)
  • Ying ZHANG, Fandong MENG, Jinchao ZHANG, Yufeng CHEN, Jinan XU, Jie ZH ...
    Article type: PAPER
    Subject area: Natural Language Processing
    2022 Volume E105.D Issue 4 Pages 807-819
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Machine reading comprehension with multi-hop reasoning always suffers from reasoning path breaking due to the lack of world knowledge, which always results in wrong answer detection. In this paper, we analyze what knowledge the previous work lacks, e.g., dependency relations and commonsense. Based on our analysis, we propose a Multi-dimensional Knowledge enhanced Graph Network, named MKGN, which exploits specific knowledge to repair the knowledge gap in reasoning process. Specifically, our approach incorporates not only entities and dependency relations through various graph neural networks, but also commonsense knowledge by a bidirectional attention mechanism, which aims to enhance representations of both question and contexts. Besides, to make the most of multi-dimensional knowledge, we investigate two kinds of fusion architectures, i.e., in the sequential and parallel manner. Experimental results on HotpotQA dataset demonstrate the effectiveness of our approach and verify that using multi-dimensional knowledge, especially dependency relations and commonsense, can indeed improve the reasoning process and contribute to correct answer detection.

    Download PDF (1638K)
  • Saifeng HOU, Yuxiang HU, Le TIAN, Zhiguang DANG
    Article type: LETTER
    Subject area: Information Network
    2022 Volume E105.D Issue 4 Pages 820-823
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    This work proposes NFD.P4, a cache implementation scheme in Named Data Networking (NDN), to solve the problem of insufficient cache space of prgrammable switch and realize the practical application of NDN. We transplant the cache function of NDN.P4 to the NDN Forwarding Daemon (NFD) cache server, which replace the memory space of programmable switch.

    Download PDF (2091K)
  • Jinho CHOI, Taehwa LEE, Kwanwoo KIM, Minjae SEO, Jian CUI, Seungwon SH ...
    Article type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2022 Volume E105.D Issue 4 Pages 824-827
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Bitcoin is currently a hot issue worldwide, and it is expected to become a new legal tender that replaces the current currency started with El Salvador. Due to the nature of cryptocurrency, however, difficulties in tracking led to the arising of misuses and abuses. Consequently, the pain of innocent victims by exploiting these bitcoins abuse is also increasing. We propose a way to detect new signatures by applying two-fold NLP-based clustering techniques to text data of Bitcoin abuse reports received from actual victims. By clustering the reports of text data, we were able to cluster the message templates as the same campaigns. The new approach using the abuse massage template representing clustering as a signature for identifying abusers is much efficacious.

    Download PDF (1905K)
  • Yuzhuo LIU, Hangting CHEN, Qingwei ZHAO, Pengyuan ZHANG
    Article type: LETTER
    Subject area: Speech and Hearing
    2022 Volume E105.D Issue 4 Pages 828-831
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    Weakly labelled semi-supervised audio tagging (AT) and sound event detection (SED) have become significant in real-world applications. A popular method is teacher-student learning, making student models learn from pseudo-labels generated by teacher models from unlabelled data. To generate high-quality pseudo-labels, we propose a master-teacher-student framework trained with a dual-lead policy. Our experiments illustrate that our model outperforms the state-of-the-art model on both tasks.

    Download PDF (1096K)
  • Yaying SHEN, Qun LI, Ding XU, Ziyi ZHANG, Rui YANG
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2022 Volume E105.D Issue 4 Pages 832-835
    Published: April 01, 2022
    Released on J-STAGE: April 01, 2022
    JOURNAL FREE ACCESS

    A triple loss based framework for generalized zero-shot learning is presented in this letter. The approach learns a shared latent space for image features and attributes by using aligned variational autoencoders and variants of triplet loss. Then we train a classifier in the latent space. The experimental results demonstrate that the proposed framework achieves great improvement.

    Download PDF (773K)
feedback
Top