International Journal of Networking and Computing
Online ISSN : 2185-2847
Print ISSN : 2185-2839
ISSN-L : 2185-2839
11 巻, 2 号
選択された号の論文の23件中1~23を表示しています
Special Issue on the Eighth International Symposium on Computing and Networking
  • Koji Nakano
    2021 年 11 巻 2 号 p. 120-
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    The Eighth International Symposium on Networking and Computing (CANDAR 2020) was held virtually from November 25th to 27th, 2020. The organizers of the CANDAR 2020 invited authors to submit the extended version of the presented papers. As a result, 38 articles have been submitted to this special issue. This issue includes the extended version of 22 papers that have been accepted. This issue owes a great deal to a number of people who devoted their time and expertise to handle the submitted papers. In particular, I would like to thank the guest editors for the excellent review process: Professor Hideharu Amano, Professor Satoshi Fujita, Professor Akihiro Fujiwara, Professor Ikki Fujiwara, Professor Shinji Inoue, Professor Yasuaki Ito, Professor Eitaro Kohno, Professor Michihiro Koibuchi, Professor Susumu Matsumae, Professor Toru Nakanishi, Professor Yasuyuki Nogami, Professor Kouzou Ohara, Professor Shinya Takamaeda-Yamazaki, and Professor Takashi Yokota. Words of gratitude are also due to the anonymous reviewers who carefully read the papers and provided detailed comments and suggestions to improve the quality of the submitted papers. This special issue would not have been without their efforts.
  • Takeru Terai, Masami Yoshida, Alberto Gallegos Ramonet, Taku Noguchi
    2021 年 11 巻 2 号 p. 121-139
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Blackhole (BH) attacks are among the most significant threats in mobile ad-hoc networks. A BH is a security attack in which a malicious node absorbs data packets and sends fake routing information to neighboring nodes. BH attacks have been widely studied. However, existing defense methods wrongfully assume that BH attacks cannot overcome the most common defense approaches. A new wave of BH attacks is known as smart BH attacks. In this study, we used a highly aggressive type of BH attack that can predict sequence numbers to overcome traditional detection methods that set a threshold to sequence numbers. To protect the network from this type of BH attack, we propose a collaborative defense method that uses local information collected from neighboring nodes. We evaluated the performance of our defense method against a smart BH attack and a collaborative attack that uses the collaboration of another malicious node. Our results show that the proposed method successfully detects and contains these threats to some degree. Consequently, the smart BH attack success rate decreases.
  • Yaodong Wang, Yamin Li
    2021 年 11 巻 2 号 p. 140-153
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    This paper investigates the fault tolerance of Mirrored k-Ary n-Tree (MiKANT) networks with link faulty. The MiKANT network is a variant of the traditional k-ary n-tree (Fat-tree) and Clos networks. It doubles the number of compute nodes of the fat-tree by adding a few switches and links and has a shorter average distance to reduce the packet latency. As the scale of MiKANT becomes larger, the probability of link faulty becomes higher. In order to improve the successful routing ratio of MiKANT, we give four link fault tolerant routing algorithms for MiKANT and evaluate their performance through simulations. In addition, the performance of the combined algorithms is also evaluated.
  • Naoki Fujieda, Sogo Takashima
    2021 年 11 巻 2 号 p. 154-171
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    For a true random number generator (TRNG) on an FPGA, the use of a pair of clocking elements has an advantage of minimal usage of its logic elements. This paper presents a novel high-speed TRNG for recent Xilinx FPGAs using their clocking elements called mixed-mode clock managers (MMCMs). By following the proposed parameter selection methods, both better randomness and higher throughput of generated bitstrings can be achieved. According to our evaluation on an Artix-7 FPGA with the most promising sets of parameters, 38.2% (42 out of 110) of the sets passed AIS-31 Procedure B, which means that an appropriate parameter set can be found by ten or less trials with more than 99% probability. The average throughput of them was 2.44 Mbit/s, which was comparable to recent FPGA-based TRNGs. An initial prototype of dynamic reconfiguration of the parameters is also presented in this paper.
  • Kaijie Wei, Koki Honda, Hideharu Amano
    2021 年 11 巻 2 号 p. 172-197
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Artificial Intelligence(AI) has achieved unprecedented success in various fields that include image, speech, or even video recognition. Most systems are implemented on power-hungry devices like CPU, GPU, or even TPU to process data due to the models' high computation and storage complexity. CPU platforms do weak in computation capacity, while energy budgets and expense of GPU and TPU are often not affordable to edge computing in the industrial business. Recently, the FPGA-based Neural Network (NN) accelerator has been a trendy topic in the research field. It is regarded as a promising solution to suppress GPU in both speed and energy efficiency with its specifically designed architecture. Our work performs on a low-end FPGA board, a more desirable platform in meeting the restrictions of energy efficiency and computational resource on an autonomous driving car. We propose a methodology that integrates a NN model into the board using HLS description in this paper. The whole design consists of algorithm-level downscaling and hardware optimization. The former emphasizes the model downscale through model pruning and binarization, which balance the model size and accuracy. The latter applies various HLS design techniques on each NN component, like loop unrolling, inter- /intra- level pipelining, and so on, to speed-up the application running on the target board. In the case study of tiny YOLO (You Only Look Once) v3, the model running on PYNQ-Z1 presents up to 22x acceleration comparing with the PYNQ's ARM CPU. Energy efficiency also achieves 3x better than Xeon E5-2667. To verify the flexibility of our methodology, we extend our work to the BinaryConnect and DoReFaNet. It is worth mentioning that the BinaryConnect even achieves around 100x acceleration comparing with it purely running on the PYNQ-Z1 ARM core.
  • Ben Wang, Guan-Shen Fang, Sayaka Kamei
    2021 年 11 巻 2 号 p. 198-214
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    In today's online services, users' feedback such as numerical rating, textual review, time of purchase, and so on for each item is often encouraged to provide. Managers of online services utilize the feedback to improve the quality of their services, or user experience. For example, many recommender systems predict the items that the users may like and purchase in the future using users' historical ratings. With the increase of user data in the systems, more detailed and interpretable information about item features and user sentiments can be extracted from textual reviews that are relative to ratings. In this paper, we propose a novel topic and sentiment matrix factorization model, which leverages both topic and sentiment drawn from the reviews simultaneously. First, we conduct topic analysis and sentiment analysis of reviews using Latent Dirichlet Allocation (LDA) and lexicon construction technique, respectively. Second, we combine the user consistency, which is calculated from his/her reviews and ratings, and helpful votes from other users of reviews to obtain a reliability measure to weight the ratings. Third, we integrate these three parts into the matrix factorization framework for the prediction of ratings. Our experimental comparison using Amazon datasets indicates that the proposed method significantly improves performance compared to traditional matrix factorization up to 14.12%.
  • Shungo Kumazawa, Kazushi Kawamura, Thiem Van Chu, Masato Motomura, Jae ...
    2021 年 11 巻 2 号 p. 215-230
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Training machine learning models on edge devices is always a conflict with power consumption and computing cost. This paper introduces a hardware-oriented training method called ExtraFerns for a unique subset of decision tree ensembles, which significantly decreases memory access and optimizes each tree in parallel. ExtraFerns benefits from the advantages of both extraTrees and randomFerns. As extraTrees does, it generates nodes by randomly selecting attributes and generating thresholds. Then, as randomFerns does, it builds ferns, which are decision trees that share identical nodes at each depth. In contrast to other ensemble methods using greedy optimization, ExtraFerns attempts global optimization of each fern. Experimental results show that ExtraFerns requires only 4.3% and 4.1% memory access for training models with 3.0% and 1.2% accuracy drops compared with randomForest and extraTrees, respectively. This paper also proposes applying lightweight random projection to ExtraFerns as a preprocessing step, which achieved a further accuracy improvement of up to 2.0% for image datasets.
  • Hiromasa Miura, Syota Kanzawa, Rikuya Matsumura, Yuta Kodera, Takuya K ...
    2021 年 11 巻 2 号 p. 231-250
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    In this paper, the authors focus on and propose an approach to attack a kind of pairing-friendly curves, the Barreto-Naehring (BN) curve, to accelerate the evaluation of the security level concerning the elliptic curve discrete logarithm problem (ECDLP). More precisely, this paper targets the BN curve, which is known to be a pairing-friendly curve, and Pollard's rho method based on the random-walk is adopted to attack the curve. Though Pollard's rho method with skew Frobenius mapping is known to solve the ECDLP efficiently, this approach sometimes induces the unsolvable cycle, called the fruitless cycle, and such trials must restart with a different starting point. However, any effective method to eliminate such fruitless cycles has not been proposed. Therefore, the authors focus and give the sophisticated analysis to propose an effective approach to eliminate such cycles to optimize Pollard's rho method furthermore. In addition, we confirm the effectiveness of the method by applying it to a BN curve with 12, 17, and 33-bit parameters.
  • Hendy Briantoro, Nobuo Funabiki, Md. Mahbubur Rahman, Kwenga Ismael Mu ...
    2021 年 11 巻 2 号 p. 251-266
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Currently, IEEE 802.11n wireless local-area network (WLAN) is popular for the Internet access due to mobility, flexibility, and scalability. Multiple access-points (APs) are often allocated in WLAN to cover the wide area, which may cause interferences and reduce the performances. Previously, we have studied the transmission power optimization method for two concurrently communicating APs to reduce interferences. It selects either the maximum or minimum power for each AP such that signal-to-noise ratio (SNR) is highest. However, it was found that the channel assignment is also important when multiple APs are closely allocated in dense WLAN. In this paper, we propose a joint optimization method of channel assignment and transmission power for concurrently communicating multiple APs in WLAN. First, the same channel is assigned to the nearby APs where the CSMA/CA protocol works well, and the most distant channels are to the other APs. Second, the transmission power is optimized by selecting the highest measured SNR. To reduce the SNR measurement load, 1) the maximum power is assigned to every AP, 2) the initial RSS from the associated host is measured, 3) the minimum power is assigned to one AP in descending order of the initial RSS, and the SNR is measured, and 4) the power combination for the highest SNR is selected. For evaluations, we conduct extensive experiments under various network topologies using up to four Raspberry Pi APs. The results show that the proposal always selects the best channel and transmission power for each AP that offers the highest throughput performance.
  • Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi
    2021 年 11 巻 2 号 p. 267-282
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    General matrix-matrix multiplication (GEMM) is a commonly used BLAS level-3 routine in big data analysis and scientific computations. To further enhance the capability for GEMM computation on GPUs, manufacturers have introduced dedicated hardware for tensor and matrix operations into modern GPU architectures, which is called the Tensor Core unit. Mixed-precision GEMM based on the Tensor Core units has been introduced into many BLAS libraries and deep learning frameworks. However, these implementations are usually designed for large square matrices while these implementations tend to have a low performance for irregular-shaped matrices, especially for tall-and-skinny matrices. This paper discusses on optimizing the GEMM computation suited for tall-and-skinny matrices on GPUs with three optimization methods: task mapping, memory access, and efficient use of Tensor core units by filling multiple fragments. First, the task mapping pattern of GEMM is optimized to make the implementation avoid launching too many thread blocks even when the sizes of input matrices are large. Second, the memory access pattern is optimized for half-precision tall-and-skinny matrices stored in the row-major layout. Third, Tensor Core units are effectively used even for extremely skinny matrices by filling multiple fragments into a Tensor Core operation. To examine the effectiveness of the proposed optimization methods, the experiments are conducted in two cases of GEMM that take tall-and-skinny matrices as input. With the proposed optimization methods, the evaluation results show that the optimized GEMM algorithms can make 1.07x to 3.19x and 1.04x to 3.70x speedups compared with the latest cuBLAS library on NVIDIA V100 and NVIDIA A100, respectively. By reducing the usage of the Tensor Core operations and utilizing the optimized memory access pattern, the optimized GEMM algorithms can save the energy consumptions of V100 and A100 by 34% to 74% and 62% to 82%, respectively.
  • Jinshan Luo, Atsushi Ito
    2021 年 11 巻 2 号 p. 283-298
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Recently, ad hoc networks have been widely used because of the progress of the Internet of Things (IoT). A long-range wide-area network (LoRaWAN) is one of a number of low-cost wide-area networking technologies and has been drawing attention because of its outstanding performance in long-range low-power communication. LoRa is an implementation of LoRaWAN that is now used in many applications, including grazing management for livestock monitoring. LoRa uses carrier-sense multiple access with collision avoidance (CSMA/CA) to improve resilience against interference. However, in our personal experience implementing livestock-monitoring networks using LoRa, we have encountered performance degradation issues due to collisions among channels. In this paper, we first explain the problems that we encountered. Then, we explain the experiments undertaken herein to investigate these problems. Finally, we propose a solution and evaluate its effectiveness based on a simulation, which simulated real-world conditions. In the preliminary experiment, we used two transmitters to measure interference at different channel distances, bandwidths (BWs), and spreading factors (SFs) and found that a closer channel, smaller BW, and/or larger SF led to a higher carrier sense rate and greater interference distance. Thus, we proposed a cellular communication network for channel allocation to reduce adjacent-channel interference and a duration division mode to solve the insufficient channel issue. The calculations demonstrated that the proposed solution can monitor approximately 3,000 cows in a pasture with an area of 4x4 km^2.
  • Shingo Hasegawa, Masashi Hisai, Hiroki Shizuya
    2021 年 11 巻 2 号 p. 299-318
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Ananth and Sahai proposed the projective arithmetic functional encryption (PAFE) and showed that PAFE derives a single-key selective secure functional encryption with the help of the randomizing polynomials scheme (RP), namely PAFE with RP achieves the indistinguishability obfuscation (iO). Their PAFE considers a secret-key type functional encryption only and a public-key counterpart is not known. We propose the public-key version: pkPAFE, and show that pkPAFE with RP derives a public-key functional encryption which is single-key selective secure. This means that our pkPAFE achieves iO as well as the original PAFE by Ananth and Sahai.
  • Masayuki Fukumitsu, Shingo Hasegawa
    2021 年 11 巻 2 号 p. 319-337
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    From the birth of the blockchain technology, multisignatures attract much attention as a tool for handling blockchain transactions. Concerning the application to the blockchain, multisignatures with public-key aggregation, which can compress public keys of signers to a single public key, is preferable to the standard multisignature because the public keys and the signature used in a transaction are stored to verify the transaction later. Several multisignature schemes with public key aggregation are proposed, however, there are no known schemes having a tight security reduction. We propose a first multisignature with public-key aggregation whose security is proven to be tightly secure under the DDH assumption in the random oracle model. Our multisignature is based on the DDH-based multisignature by Le, Yang, and Ghorbani, however, our security proof is different from theirs. The idea of our security proof originates from another DDH-based multisignature by Le, Bonnecaze, and Gabillon whose security proof is tightly one. By tailoring their security proof to a setting which admits the public-key aggregation, we can prove the tight security of our multisignature.
  • Junnosuke Suzuki, Tomohiro Kaneko, Kota Ando, Kazutoshi Hirose, Kazush ...
    2021 年 11 巻 2 号 p. 338-353
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Computational scalability allows neural networks on embedded systems to provide desirable inference performance while satisfying severe power consumption and computational resource constraints. This paper presents a simple yet scalable inference method called ProgressiveNN, consisting of bitwise binary (BWB) quantization, accumulative bit-serial (ABS) inference, and batch normalization (BN) retraining. ProgressiveNN does not require any network structure modification and obtains the network parameters from a single training. BWB quantization decomposes and transforms each parameter into a bitwise format for ABS inference, which then utilizes the parameters in the most-significant-bit-first order, enabling progressive inference. The evaluation result shows that the proposed method provides computational scalability from 12.5% to 100% for ResNet18 on CIFAR-10/100 with a single set of network parameters. It also shows that BN retraining suppresses accuracy degradation of training performed with low computational cost and restores inference accuracy to 65% at 1-bit width inference. This paper also presents a method to dynamically adjust the bit-precision of the ProgressiveNN to achieve a better trade-off between computational resource use and accuracy for practical applications using sequential data with proximity resemblance. The evaluation result indicates that the accuracy increases by 1.3% with an average bit-length of 2 compared with only the 2-bit BWB network.
  • Taichi Aoki, Atsuhiro Goto
    2021 年 11 巻 2 号 p. 354-382
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Content regarding various illegal activities, such as weapon and drug trafficking, is shared on the dark web. Most of the illegal content is distributed on anonymous networks that cannot be directly accessed from the World Wide Web. A number of studies have been conducted on the network structure of the World Wide Web since its advent. Similar to the World Wide Web, the dark web is connected by hypertext transfer protocol (http); this makes it possible to use the methods developed for the web in the dark web. Many studies have investigated the dark web and its network structure. However, few studies have focused on the visualization of the dark web network structure, and there have been no studies investigating the temporal changes in the network structure. In this study, to understand the hypertext markup language (html) network structure of the dark web, we created and visualized a graph of the html hyperlink relations of the Tor network, which is popular on the dark web. We then compared the insights gained from graph centrality metrics with those gained from visualizations. The analyzed dataset comprised 25,270,157 pages of html text files crawled from the Tor network by breadth-first search from June 1, 2018, to January 30, 2021. Subsequently, we acquired half-yearly snapshots from the collected data and investigated the change in the dark web network over time using a time-series graph. Then, we derived the centrality metrics from the created graph data and confirmed the differences between the centrality metrics and visualizations. The results obtained in this study provided new insights into the dark web. First, we found that the dark web fluctuated significantly; the structure of the dark web network was more strongly interconnected. Second, most of the nodes that had increased in the past two years may have disappeared rapidly after May 2020. Third, analysis of each snapshot revealed that the proportion of highly volatile domains increased from 40% to 75% during the observation period. Fourth, after calculating the network centrality metrics from each snapshot and comparing the transition of hub nodes in chronological order, we observed that the importance of link-collection sites as the main information retrieval method used in the dark web decreased. Finally, we estimated the size of the dark web based on our observed dark web measurements using the mark-recapture method. To the best of our knowledge, this is the first study to use the mark-recapture method to estimate the size of the dark web network.
  • Yuki Nanjo, Masaaki Shirase, Takuya Kusaka, Yasuyuki Nogami
    2021 年 11 巻 2 号 p. 383-411
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Pairings are widely used for innovative protocols such as ID-based encryption and group signature authentication. According to the recent works, the Barreto-Lynn-Scott (BLS) family of pairing-friendly elliptic curves is suggested for the pairings at the various security levels. One of the important facts is that the BLS family has fixed polynomial parameters of a field characteristic and group order in terms of an integer x_0. For practical pairing-based protocols, we have to carefully find x_0 which leads to efficient pairings, however, this search of x_0 is typically complicated. Thus, it is desired some convenient ways of finding x_0 which have advantageous for the pairings. For this reason, Costello et al. proposed simple restrictions for finding x_0 that generates the specific BLS subfamilies of curves with embedding degree k = 24 having one of the best field and curve constructions for the pairings. Since there are demands of such restrictions for the other cases of the embedding degrees, the authors extend their work and provide these for the cases of k = 2^m 3 and 3^n with arbitrary integers m, n>0 in this paper. The results will help to find new parameters which lead to one of the best performing pairings with the BLS family of curves with various k. The results also allow us to respond to change in the security levels of the pairings flexibly according to the progress in the security analyses in the future.
  • Misato Ogawa, Shigeaki Tanimoto, Takashi Hatashima, Atsushi Kanai
    2021 年 11 巻 2 号 p. 412-425
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Information security measures have become increasingly important not only for companies but also for individuals in recent years. For this reason, many information security measures have been taken. However, if information security is overly complicated, ICT users may get overwhelmed. Although research on information security fatigue is being conducted, currently it is not enough. In particular, research from a psychological point of view is lacking. Examples of psychological measures include cognitive strategies that are classified as behavioral models. There are various cognitive strategies, but research into particular countermeasures against information security fatigue has not been sufficiently explored. In this work, we propose new measures based on psychological viewpoints. Specifically, these are information security fatigue countermeasures that introduce a cognitive strategy. We conducted a questionnaire survey on cognitive strategies and information security fatigue, analyzed the results, and classified them into 24 levels, consisting of six levels of the information security fatigue scale multiplied by four levels of cognitive strategies. Then, for each of these 24 levels, a new information security fatigue measure was proposed and evaluated on the basis of the free responses of the questionnaire survey. The results indicated that appropriate information security fatigue countermeasures according to human behavior patterns based on cognitive strategies and the information security fatigue scale are possible.
  • Yuta Suzuki, Toshiki Hatano, Toi Tsuneda, Daiki Kuyoshi, Satoshi Yaman ...
    2021 年 11 巻 2 号 p. 426-437
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    In recent years, deep neural network technology has been developing rapidly, especially in the field of image recognition. However, since deep neural networks learn images based on pixel values, they can only learn the features of the image and not the meta-information that the image has. In this paper, we focused on the differences between image features and meta-information. For example, 0 and 9 are relatively similar in terms of image characteristics, but there is significant difference in terms of the numbers they actually mean. In contrast, 3 and 4 are relatively dissimilar in terms of image features, but the difference is small in terms of the values they actually mean. In order to solve problems like this example, this paper proposes a method for learning based not only on the features of the image, but also on the numerical information that the image has. Experiments were conducted on the MNIST and Kannada-MNIST datasets using three different models: DNN, CNN, and RNN. As a result, the numerical error is smaller in the proposed model than in the baseline.
  • Manu Manuel, Arne Kreddig, Simon Conrady, Nguyen Anh Vu Doan, Walter S ...
    2021 年 11 巻 2 号 p. 438-462
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    In recent years, technological advancements in computer hardware systems have been lagging behind the demand for increased computational power, especially in application domains such as signal and image processing. Approximate computing is a design paradigm for efficient system design to overcome this bottleneck by exploiting the resilience of such applications to inaccuracy in their computations and trading off quality for hardware resource savings. Over the years, many approximation techniques have been proposed on various abstraction layers and demonstrated their effectiveness in different applications. Combining multiple methods in a larger system can further increase the resulting benefits. However, this often leads to a non-trivial optimization task of finding the best parameterization across all employed methods. The interaction and influence of error propagation between individual components demand a global optimization of parameters that simultaneously considers all the parameters for each of the approximation techniques used. In this work, we propose a methodology for exploring such highly complex design spaces using a multi-objective genetic algorithm in an FPGA-based system. Simple models are used for the estimation of resource demands in terms of power together with the anticipated quality degradation. The optimization is carried out to determine the trade-off between these objectives. We demonstrate the effectiveness of our approach on a typical color processing pipeline by tailoring the encoding and genetic operations to the needs of this application. To focus the optimization into a relevant region of interest, we propose ROI-NSGA, a novel variant of nondominated solution selection, and compare its optimization efficiency with the traditional NSGA-II approach for the examined case study. Our results show that the models are able to guide the optimization, and that the genetic operations and selections are capable to find Pareto-optimal solutions, among which the desired quality-resource trade-off can be chosen. Besides, the ROI-NSGA based optimization outperforms the results obtained for the case study using the NSGA-II approach within the region of interest.
  • Masahito Kumagai, Kazuhiko Komatsu, Fumiyo Takano, Takuya Araki, Masay ...
    2021 年 11 巻 2 号 p. 463-491
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Recently, a clustering method using a combinatorial optimization problem, called combinatorial clustering, has been drawing attention due to the rapid spreads of quantum annealing and simulated annealing. Combinatorial clustering is performed by minimizing an objective function under a condition to satisfy a one-hot constraint. The objective function and the constraint function are generally formulated to a unified objective function of a QUBO (Quadratic Unconstrained Binary Optimization) problem using the method of the Lagrange multiplier. The coefficients of the QUBO function can be represented by a square matrix, which is called the QUBO matrix. Although the Lagrange multiplier needs to be large enough to avoid violating the constraint, it is usually hard to be set appropriately due to the limitation of the bit precision. For example, the latest quantum annealer can handle values represented by only six or fewer bits. Even conventional computing systems cannot control the larger value of the Lagrange multiplier as the number of data points increases. Besides, the execution time for combinatorial clustering increases exponentially as the problem size increases. This is because the time for the QUBO matrix generation is long and a dominant factor of the total execution time when the problem size is large. To solve these problems, this paper proposes combinatorial clustering that overcomes the limitation of the method of the Lagrange multiplier. The proposed method uses a QUBO solver that can externally define the one-hot constraint independent from the objective function, and the externally-defined constraint is satisfied by the operations with multiple bit-flips. As the QUBO function contains only the objective function, the method of the Lagrange multiplier is not necessary. The proposed method can optimize the objective function sufficiently even when the problem size is large. Since the constraint function is not included in the QUBO function, the proposed method also reduces the time for the QUBO matrix generation. The experimental results obtained using the artificial and real data show that the proposed method can improve the quality of annealing-based clustering results in all data sets. The experimental results also clarify that the quality of the proposed method is almost equal to or better than that of quasi-optimal clustering methods such as K-means. The evaluation using the multiple traveling salesman problem shows that the proposed method can obtain shorter tour lengths than the conventional annealing-based clustering in all 13 cases and K-means++ in 6 out of 12 cases with a significant difference. Furthermore, the proposed method can accelerate the execution time for combinatorial clustering because there is no need to calculate the coefficients of the constraint function.
  • Haruki ISHIZAKI, Ryohei SAKA, Eitaro KOHNO, Yoshiaki KAKUDA
    2021 年 11 巻 2 号 p. 492-515
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Currently, research on Bluetooth-based mobile ad hoc networks (hereinafter referred to as Bluetooth MANETs) is being conducted. As an application of Bluetooth MANETs, a grass-root disaster information propagation system has been studied for the purpose of information distribution following disasters. Bluetooth standards include Classic Bluetooth (hereinafter referred to as Classic), which can transfer relatively large data packets, and Bluetooth Low Energy (hereinafter referred to as BLE), which can work with relatively low power consumption. So far, a rapid connection establishment method that combines Classic and BLE in a complementary manner (hereinafter referred to as the existing method) and the flooding-based data packet transfer method with delay- and disruption- tolerance have been proposed. In the existing method, a Bluetooth MANET is constructed by connecting multiple piconets, which consist of one master and multiple slaves. In the existing method, loops are easily formed in a Bluetooth MANET due to the establishment of connections between slave terminals in a piconet in an environment with high terminal density. Therefore, when the flooding-based data packet transfer method is used in the existing method, the amount of data packets increases and the processing load on the terminals increases. In this paper, we proposed a method to reduce the number of loops by controlling the connection establishment between slave terminals in a piconet for Bluetooth MANETs (hereinafter referred to as our proposed method). In addition, we have evaluated the effect of our proposed method on data packet transfer through simulation experiments. As a result, we confirmed that our proposed method can reduce the number of loops and the number of data packet transmissions while maintaining the time to complete the data packet dissemination at the same level as the existing method.
  • Masaki Furukawa, Tomoya Itsubo, Hiroki Matsutani
    2021 年 11 巻 2 号 p. 516-532
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    In distributed deep neural network using remote GPU nodes, communication occurs iteratively between remote nodes for gradient aggregation. This communication latency limits the benefit of distributed training with faster GPUs. In this paper, we therefore propose to offload the gradient aggregation to a DPDK (Data Plane Development Kit) based network switch between a host machine and remote GPUs. In this approach, the aggregation process is completed in the network using extra computation resources in the network switch and efficiently overlapped without increasing workload on remote nodes. The proposed DPDK-based switch supports reliable communication protocols for exchanging gradients data and can handle a part of MPI over TCP-based communication. We evaluate the proposed switch when GPUs and the host communicate with a standard IP communication over 40GbE, a PCI Express (PCIe) over 40Gbit Ethernet (40GbE) product and MPI communication over 10GbE, respectively. The evaluation results using a standard IP communication show that the aggregation is accelerated by 2.2-2.5x compared to the aggregation executed by a host machine. The results using the PCIe over 40GbE product show that the proposed switch outperforms the aggregation done by the host machine by 1.16x. The evaluations using MPI communication using Jetson Xaviers cluster show that the proposed switch provides up to 5.5-5.8x faster reduction operations than the conventional method.
  • Akifumi NOMASAKI, Eitaro KOHNO, Reo MORISHIGE, Yoshiaki KAKUDA
    2021 年 11 巻 2 号 p. 533-555
    発行日: 2021年
    公開日: 2021/07/08
    ジャーナル オープンアクセス
    Bluetooth MANETs are Bluetooth-based mobile ad hoc networks (MANETs) which consist of mobile terminals such as Bluetooth-enabled smartphones. In Bluetooth MANETs, a method to quickly establish a connection for Classic Bluetooth (Classic), which has a wide communication bandwidth, and by using Bluetooth Low Energy (BLE) for terminal discovery (Existing method) has been proposed. Bluetooth MANETs are effective as a temporary communication method in time of disaster because terminals can communicate without using communication infrastructure such as base stations. The existing method improves the speed of connection establishment and re-establishment of Classic between terminals, which is expected to occur more frequently compared to Wi-Fi and other technologies in the communication range of Blue- tooth. On the other hand, there is a problem of establishing connections with a large number of terminals in high density areas, which is likely to occur when the network size increases. However, the existing method does not consider the problem of establishing connections with a large number of terminals in a high terminal density area. In this paper, we provided the following contributions. (1) By conducting real terminal-based experiments, we observed the problems that occur when Bluetooth MANETs are applied to a large network size, i.e., when the network size of the existing method is increased, and discussed the causes. In addition, we proposed a method to mitigate the problem and verified the effect of the method using actual terminals. In this paper, we first prepared 50 Android terminals to check the problems that occur when the network size increases, and carefully measured the behavior and problems of the existing method. As a result, in an environment where there are many terminals that applied the existing method, while a terminal executes the process of a Classic connection establishment, re- quests from other terminals to establish a Classic connection (hereinafter referred to as “control packets”) are concentrated. Terminals must execute processes of connection establishment with multiple terminals simultaneously (hereinafter referred to as “concentration of control packets”). At that time, due to the specification of the profile used in Bluetooth MANETs, the success rate of established connections drops rapidly when multiple connection establishment processes are performed simultaneously. As a result, we found that it is difficult to establish connections in an environment where there are many terminals around, making it difficult to disseminate data packets. (2) To solve this problem, we propose a method to establish a connection more reliably by controlling the transmission of control packets (hereinafter referred to as “our proposed method”). First, we investigated the least number of terminals that can reproduce the situation where connection establishment becomes difficult in order to evaluate our proposed method. As a result, we confirmed that when the existing method is conducted with 6 terminals, the same problem occurs as when 50 terminals are used. Therefore, in this paper, we first implement our proposed method on 6 Android terminals and evaluate the performance. (3) In order to evaluate the applicability of our proposed method in an environment with many terminals, we increased the number of terminals from 6 and conducted experiments on actual terminals. As a result, we confirmed that our proposed method can be applied even when there are up to 14 terminals in the communication range.
feedback
Top