IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
E105.D 巻, 12 号
選択された号の論文の18件中1~18を表示しています
Special Section on Forefront Computing
  • Toshihiro YAMAUCHI
    2022 年 E105.D 巻 12 号 p. 1998
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー
  • Takuto KANAMORI, Takashi ODAN, Kazuki HIROHATA, Kenji KISE
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 1999-2007
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Deep Neural Network (DNN) is widely used for computer vision tasks, such as image classification, object detection, and segmentation. DNN accelerator on FPGA and especially Convolutional Neural Network (CNN) is a hot topic. More research and education should be conducted to boost this field. A starting point is required to make it easy for new entrants to join this field. We believe that FPGA-based Autonomous Driving (AD) motor cars are suitable for this because DNN accelerators can be used for image processing with low latency. In this paper, we propose an FPGA-based simple and open-source mini motor car system named RVCar with a RISC-V soft processor and a CNN accelerator. RVCar is suitable for the new entrants who want to learn the implementation of a CNN accelerator and the surrounding system. The motor car consists of Xilinx Nexys A7 board and simple parts. All modules except the CNN accelerator are implemented in Verilog HDL and SystemVerilog. The CNN accelerator is converted from a PyTorch model by our tool. The accelerator is written in C++, synthesizable by Vitis HLS, and an easy-to-customize baseline for the new entrants. FreeRTOS is used to implement AD algorithms and executed on the RISC-V soft processor. It helps the users to develop the AD algorithms efficiently. We conduct a case study of the simple AD task we define. Although the task is simple, it is difficult to achieve without image recognition. We confirm that RVCar can recognize objects and make correct decisions based on the results.

  • Tomoki SHIMIZU, Kohei ITO, Kensuke IIZUKA, Kazuei HIRONAKA, Hideharu A ...
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2008-2018
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.

  • Satoru JIMBO, Daiki OKONOGI, Kota ANDO, Thiem Van CHU, Jaehoon YU, Mas ...
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2019-2031
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    For formulating Quadratic Knapsack Problems (QKPs) into the form of Quadratic Unconstrained Binary Optimization (QUBO), it is necessary to introduce an integer variable, which converts and incorporates the knapsack capacity constraint into the overall energy function. In QUBO, this integer variable is encoded with auxiliary binary variables, and the encoding method used for it affects the behavior of Simulated Annealing (SA) significantly. For improving the efficiency of SA for QKP instances, this paper first visualized and analyzed their annealing processes encoded by conventional binary and unary encoding methods. Based on this analysis, we proposed a novel hybrid encoding (HE), getting the best of both worlds. The proposed HE obtained feasible solutions in the evaluation, outperforming the others in small- and medium-scale models.

  • Kazuhito MATSUDA, Kouji KURIHARA, Kentaro KAWAKAMI, Masafumi YAMAZAKI, ...
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2032-2039
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Statical causal discovery is an approach to infer the causal relationship between observed variables whose causalities are not revealed. LiNGAM (Linear Non-Gaussian Acyclic Model), an algorithm for causal discovery, can calculate the causal relationship uniquely if the independent components of variables are assumed to be non-Gaussian. However, use-cases of LiNGAM are limited because of its O(d3x) computational complexity, where dx is the number of variables. This paper shows two approaches to accelerate LiNGAM causal discovery: SIMD utilization for LiNGAM's mathematical matrixes operations and MPI parallelization. We evaluate the implementation with the supercomputer Fugaku. Using 96 nodes of Fugaku, our improved version can achieve 17,531 times faster than the original OSS implementation (completed in 17.7 hours).

  • Yoshiharu YAMAGISHI, Tatsuya KANEKO, Megumi AKAI-KASAYA, Tetsuya ASAI
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2040-2047
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Edge computing, which has been gaining attention in recent years, has many advantages, such as reducing the load on the cloud, not being affected by the communication environment, and providing excellent security. Therefore, many researchers have attempted to implement neural networks, which are representative of machine learning in edge computing. Neural networks can be divided into inference and learning parts; however, there has been little research on implementing the learning component in edge computing in contrast to the inference part. This is because learning requires more memory and computation than inference, easily exceeding the limit of resources available for edge computing. To overcome this problem, this research focuses on the optimizer, which is the heart of learning. In this paper, we introduce our new optimizer, hardware-oriented logarithmic momentum estimation (Holmes), which incorporates new perspectives not found in existing optimizers in terms of characteristics and strengths of hardware. The performance of Holmes was evaluated by comparing it with other optimizers with respect to learning progress and convergence speed. Important aspects of hardware implementation, such as memory and operation requirements are also discussed. The results show that Holmes is a good match for edge computing with relatively low resource requirements and fast learning convergence. Holmes will help create an era in which advanced machine learning can be realized on edge computing.

  • Takeshi SENOO, Akira JINGUJI, Ryosuke KURAMOCHI, Hiroki NAKAHARA
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2048-2056
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.

  • Naoya NIWA, Hideharu AMANO, Michihiro KOIBUCHI
    原稿種別: PAPER
    2022 年 E105.D 巻 12 号 p. 2057-2065
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    This study presents a selective data-compression interconnection network to boost its performance. Data compression virtually increases the effective network bandwidth. One drawback of data compression is a long latency to perform (de-)compression operation at a compute node. In terms of the communication latency, we explore the trade-off between the compression latency overhead and the reduced injection latency by shortening the packet length by compression algorithms. As a result, we present to selectively apply a compression technique to a packet. We perform a compression operation to long packets and it is also taken when network congestion is detected at a source compute node. Through a cycle-accurate network simulation, the selective compression method using the above compression algorithms improves by up to 39% the network throughput with a moderate increase in the communication latency of short packets.

Regular Section
  • Yaoyu ZHANG, Jiarui ZHANG, Han ZHANG
    原稿種別: PAPER
    専門分野: Software System
    2022 年 E105.D 巻 12 号 p. 2066-2074
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    With the development of blockchain technology, the automatic generation of smart contract has become a hot research topic. The existing smart contract automatic generation technology still has improvement spaces in complex process, third-party specialized tools required, specific the compatibility of code and running environment. In this paper, we propose an automatic smart contract generation method, which is domain-oriented and configuration-based. It is designed and implemented with the application scenarios of government service. The process of configuration, public state database definition, code generation and formal verification are included. In the Hyperledger Fabric environment, the applicability of the generated smart contract code is verified. Furthermore, its quality and security are formally verified with the help of third-party testing tools. The experimental results show that the quality and security of the generated smart contract code meet the expect standards. The automatic smart contract generation will “elegantly” be applied on the work of anti-disclosure, privacy protection, and prophecy processing in government service. To effectively enable develop “programmable government”.

  • Naoki AOYAMA, Hiroshi YAMADA
    原稿種別: PAPER
    専門分野: Software System
    2022 年 E105.D 巻 12 号 p. 2075-2084
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    The issue of copying values or references has historically been studied for managing memory objects, especially in distributed systems. In this paper, we explore a new topic on copying values v.s. references, for memory page compaction on virtualized systems. Memory page compaction moves target physical pages to a contiguous memory region at the operating system kernel level to create huge pages. Memory virtualization provides an opportunity to perform memory page compaction by copying the references of the physical pages. That is, instead of copying pages' values, we can move guest physical pages by changing the mappings of guest-physical to machine-physical pages. The goal of this paper is a quantitative comparison between value- and reference-based memory page compaction. To do so, we developed a software mechanism that achieves memory page compaction by appropriately updating the references of guest-physical pages. We prototyped the mechanism on Linux 4.19.29 and the experimental results show that the prototype's page compaction is up to 78% faster and achieves up to 17% higher performance on the memory-intensive real-world applications as compared to the default value-copy compaction scheme.

  • Katsutoshi HIRAYAMA, Tenda OKIMOTO
    原稿種別: PAPER
    専門分野: Information Network
    2022 年 E105.D 巻 12 号 p. 2085-2091
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    To the best of our knowledge, there have been very few work on computational algorithms for the core or its variants in MC-nets games. One exception is the work by [Hirayama, et.al., 2014], where a constraint generation algorithm has been proposed to compute a payoff vector belonging to the least core. In this paper, we generalize this algorithm into the one for finding a payoff vector belonging to ϵ-core with pre-specified bound guarantee. The underlying idea behind this algorithm is basically the same as the previous one, but one key contribution is to give a clearer view on the pricing problem leading to the development of our new general algorithm. We showed that this new algorithm was correct and never be trapped in an infinite loop. Furthermore, we empirically demonstrated that this algorithm really presented a trade-off between solution quality and computational costs on some benchmark instances.

  • Iuon-Chang LIN, Chin-Chen CHANG, Hsiao-Chi CHIANG
    原稿種別: PAPER
    専門分野: Information Network
    2022 年 E105.D 巻 12 号 p. 2092-2103
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    The prosperous Internet communication technologies have led to e-commerce in mobile computing and made Web of Things become popular. Electronic payment is the most important part of e-commerce, so many electronic payment schemes have been proposed. However, most of proposed schemes cannot give change. Based on proxy blind signatures, an e-cash payment system is proposed in this paper to solve this problem. This system can not only provide change divisibility through Web of Things, but also provide anonymity, verifiability, unforgeability and double-spending owner track.

  • Yuli ZHA, Pengshuai CUI, Yuxiang HU, Julong LAN, Yu WANG
    原稿種別: PAPER
    専門分野: Information Network
    2022 年 E105.D 巻 12 号 p. 2104-2111
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Named Data Networking (NDN) uses name to indicate content mechanism to divide content, and uses content names for routing and addressing. However, the traditional network devices that support the TCP/IP protocol stack and location-centric communication mechanisms cannot support functions such as in-network storage and multicast distribution of NDN effectively. The performance of NDN routers designed for specific functional platforms is limited, and it is difficult to deploy on a large scale, so the NDN network can only be implemented by software. With the development of data plane languages such as Programmable Protocol-Independent Packet Processors (P4), the practical deployment of NDN becomes achievable. To ensure efficient data distribution in the network, this paper proposes a protocol-independent multicast method according to each binary bit. The P4 language is used to define a bit vector in the data packet intrinsic metadata field, which is used to mark the requested port. When the requested content is returned, the routing node will check which port has requested the content according to the bit vector recorded in the register, and multicast the Data packet. The experimental results show that bitwise multicast technology can eliminate the number of flow tables distributed compared with the dynamic multicast group technology, and reduce the content response delay by 57% compared to unicast transmission technology.

  • Han MA, Qiaoling ZHANG, Roubing TANG, Lu ZHANG, Yubo JIA
    原稿種別: PAPER
    専門分野: Speech and Hearing
    2022 年 E105.D 巻 12 号 p. 2112-2118
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Recently, robust speech recognition for real-world applications has attracted much attention. This paper proposes a robust speech recognition method based on the teacher-student learning framework for domain adaptation. In particular, the student network will be trained based on a novel optimization criterion defined by the encoder outputs of both teacher and student networks rather than the final output posterior probabilities, which aims to make the noisy audio map to the same embedding space as clean audio, so that the student network is adaptive in the noise domain. Comparative experiments demonstrate that the proposed method obtained good robustness against noise.

  • Kazuki OMI, Jun KIMATA, Toru TAMAKI
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2022 年 E105.D 巻 12 号 p. 2119-2126
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    In this paper, we propose a multi-domain learning model for action recognition. The proposed method inserts domain-specific adapters between layers of domain-independent layers of a backbone network. Unlike a multi-head network that switches classification heads only, our model switches not only the heads, but also the adapters for facilitating to learn feature representations universal to multiple domains. Unlike prior works, the proposed method is model-agnostic and doesn't assume model structures unlike prior works. Experimental results on three popular action recognition datasets (HMDB51, UCF101, and Kinetics-400) demonstrate that the proposed method is more effective than a multi-head architecture and more efficient than separately training models for each domain.

  • Teng LIANG, Ao ZHAN, Chengyu WU, Zhengqiang WANG
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2022 年 E105.D 巻 12 号 p. 2127-2130
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    In this letter, a path dynamics assessment asynchronous advantage actor-critic scheduling algorithm (PDAA3C) is proposed to solve the MPTCP scheduling problem by using deep reinforcement learning Actor-Critic framework. The algorithm picks out the optimal transmitting path faster by multi-core asynchronous updating and also guarantee the network fairness. Compared with the existing algorithms, the proposed algorithm achieves 8.6% throughput gain over RLDS algorithm, and approaches the theoretic upper bound in the NS3 simulation.

  • Haney KANG, Seungwon SHIN
    原稿種別: LETTER
    専門分野: Information Network
    2022 年 E105.D 巻 12 号 p. 2131-2134
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Recently, Linux Container has been the de-facto standard for a cloud system, enabling cloud providers to create a virtual environment in a much more scaled manner. However, configuring container networks remains immature and requires automatic verification for efficient cloud management. We propose Verikube, which utilizes a novel graph structure representing policies to reduce memory consumption and accelerate verification. Moreover, unlike existing works, Verikube is compatible with the complex semantics of Cilium Policy which a cloud adopts from its advantage of performance. Our evaluation results show that Verikube performs at least seven times better for memory efficiency, at least 1.5 times faster for data structure management, and 20K times better for verification.

  • Kai YAN, Tiejun ZHAO, Muyun YANG
    原稿種別: LETTER
    専門分野: Computer Graphics
    2022 年 E105.D 巻 12 号 p. 2135-2138
    発行日: 2022/12/01
    公開日: 2022/12/01
    ジャーナル フリー

    Graph layout is a critical component in graph visualization. This paper proposes GRAPHULY, a graph u-nets-based neural network, for end-to-end graph layout generation. GRAPHULY learns the multi-level graph layout process and can generate graph layouts without iterative calculation. We also propose to use Laplacian positional encoding and a multi-level loss fusion strategy to improve the layout learning. We evaluate the model with a random dataset and a graph drawing dataset and showcase the effectiveness and efficiency of GRAPHULY in graph visualization.

feedback
Top