The Ninth International Symposium on Networking and Computing (CANDAR 2021) was held virtually from November 24th to 26th, 2021. The organizers of the CANDAR 2021 invited authors to submit the extended version of the presented papers. As a result, 18 articles have been submitted to this special issue. This issue includes the extended version of 14 papers that have been accepted.
This issue owes a great deal to a number of people who devoted their time and expertise to handle the submitted papers. In particular, I would like to thank the guest editors for the excellent review process: Professor Akihiro Fujiwara, Professor Ikki Fujiwara, Professor Shuichi Ichikawa, Professor Yasuaki Ito, Professor Noriaki Kamiyama, Professor Eitaro Kohno, Professor Michihiro Koibuchi Professor Susumu Matsumae, Professor Toru Nakanishi, Professor Ryo Nakamura, Profes- sor Nobuhiko Nakano, Professor Yasuyuki Nogami, Professor Hiroyuki Sato, and Professor Shinya Takamaeda-Yamazaki.
Words of gratitude are also due to the anonymous reviewers who carefully read the papers and provided detailed comments and suggestions to improve the quality of the submitted papers. This special issue would not have been without their efforts.
Membrane computing, which is also known as P system, is a computational model inspired by the activity of living cells. Several efficient P systems, which work in a polynomial number of steps, have been proposed for solving computationally hard problems. However, most of the proposed algorithms use an exponential number of membranes, and reduction of the number of membranes must be considered in order to make the P system a more realistic model.
In the present paper, we propose an asynchronous P system with a Davis-Putnam-Logemann-Loveland (DPLL) algorithm, which is a set of rules for solving a satisfiability problem (SAT) efficiently, in an attempt to reduce the number of membranes. The proposed P system solves SAT with n variables and m clauses in O(mn2) parallel steps or O(mn22n) sequential steps.
We evaluate the number of membranes used in the proposed P system by comparison with the number of membranes used in known P systems. The experimental result demonstrates the validity and efficiency of the proposed P system.
Currently, software implementation is the mainstream approach for anti-malware measures. However, software-based anti-malware measures are difficult to implement in Internet of Things devices with limited hardware resources. To solve this problem, a malware detection mechanism that can be realized with only hardware has been proposed. The hardware mechanism consists of three elements: an access-hit counter, dividers, and a classifier. The classifier is generated by a random forest and uses processor information as feature values. To reduce the hardware scale, a Hit Rate Table (HRTable) is introduced in place of the dividers. We propose methods of reducing the scale of hardware resources and synchronizing the CPU and the malware detection mechanism. This paper implements the proposed mechanism in hardware, simulates it while considering the delay caused by input/output to the HRTable, and evaluates the hardware scale of the proposed mechanism combined with RISC-V on a field-programmable gate array (FPGA).
Convolutional neural network (CNN) is a state-of-the-art technique in machine learning and has achieved high accuracy in many computer vision applications. The number of the parameters of the CNN models is fast increasing for improving accuracy; therefore, it requires more computation time and memory space for both training and inference. As a result, reducing the model size and improving the inference speed have become critical issues for CNN. This paper focuses on filter pruning and special optimization for NVIDIA sparse tensor core. Filter pruning is a model compression technique that evaluates the importance of filters in the CNN model and removes the less critical filters. NVIDIA sparse tensor core is special hardware for CNN computation from NVIDIA Ampere GPU architecture, which can speed up a matrix multiplication if the matrix has a structure that manifests as a 2:4 pattern.
This paper proposed hybrid pruning to prune the CNN models. The hybrid pruning combines filter pruning and 2:4 pruning. We apply filter pruning to remove the redundant filters to reduce the model size. Next, we use 2:4 pruning to prune the model according to a 2:4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning scenario, we also proposed two hybrid metrics to decide the filter’s importance during filter pruning. The hybrid ranking metrics preserve the essential filters for both pruning steps and achieve higher accuracy than traditional filter prunings by considering both metrics. We test our hybrid pruning algorithm on MNIST, SVHN, CIFAR-10 datasets using AlexNet. Our experiments concluded that our hybrid metrics achieve better accuracy than the classic L1-norm metric and the output L1-norm metric. When we prune away 40 percent of filters in the model, our methods have 2.8% to 3.3%, 2.9% to 3.5%, 2.5% to 2.7% higher accuracy than the classic L1-norm metric and the output L1-norm metric on these three datasets. We also evaluate the inference speed of the model from our hybrid pruning. We compare the hybrid pruning model with the models that result from either filter pruning or 2:4 pruning. We find that a hybrid pruning model runs up to 1.3x faster than the traditional filter pruning model with similar accuracy.
Recently, the demand for computation-intensive applications such as multimedia and AI applications has increased. Data-parallel execution units are typically used for calculations in these applications to increase computational throughput. However, the data required for computation may need to be accessed at discontinuous memory addresses, which can reduce computation efficiency.
Generally, normal memory access instructions access data blocks of continuously allocated memory addresses, containing both valid and invalid data for computation. These memory access patterns result in low computation density in the data parallel execution units, wasting computation resources. Therefore, simply increasing the number of data-parallel execution units leads to an increase in wasted computational resources, which will become a significant issue in embedded systems where multiple resource limitations exist. It is essential to improve computational efficiency to perform practical computation in such systems.
This paper introduces a Data Rearrange Unit (DRU), which gathers and rearranges valid computation data between main memory and execution units. The DRU improves the performance of multimedia and AI application by significantly reducing the access rate from/to main memory and increasing computation efficiency. It is applicable to most hardware architectures, and its effectiveness can be further enhanced by the execution unit interface that directly connects the DRU to the execution unit. We demonstrate the effectiveness of our DRU by implementation on the RMTP SoC, improving convolution throughput on a data-parallel execution unit by a maximum of 94 times while only increasing the total cell area by about 12.7%.
Pairings on elliptic curves consisting of the Miller loop and final exponentiation are used for innovative protocols such as ID-based encryption and group signature authentication. As the recent progress of attacks for the discrete logarithm problem in finite fields in which pairings are defined, the importance of the use of curves with prime embedding degrees k has been increased. In this paper, the authors provide formulas to construct algorithms for computing the final exponentiation for cyclotomic families of curves with any prime k. Since the formulas give rise to one of the same exponents given by a lattice-based method for the small cases of k, it is expected that the proposed algorithms are efficient enough for the cases of any prime k. At least for the curves with k = 13 and 19 for the pairing at the 128-bit security level, the proposed algorithms can achieve current state-of-the-art computations.
In recent years, the technology of machine learning has been developing rapidly. Among them, neural network technology has had a great impact on various fields such as image recog- nition and natural language processing. Among them, CNN or Convolutional Neural Network have been effective in the field of image recognition. However, most of these CNNs learn only the features of the image, and do not learn the meta-information of the image. In this study, we proposed a CNN and its ensemble method that can learn meta-information by using out- category penalties. Experiments were conducted on the CIFAR-10 and CIFAR-100 datasets, and the results show that the proposed method has high accuracy and small out-category error in both single and ensemble models.
The demand for inspections has increased due to the aging of concrete structures and tile wall surfaces. The hammering test is a simple inspection method, but the inspector needs much experience distinguishing the hammering sounds. Therefore, we developed a device that automatically classifies the hammering sounds using deep learning. However, the hardware with GPU for deep learning is the one-board microcomputer, which has no display, a battery, or an input device, so the inspector cannot change the settings or check the status during the hammering test. Therefore, we used a smartphone instead of a one-board microcomputer. We also used a cloud GPU since a smartphone does not have GPU. The results showed that communication time was the bottleneck. So we considered using a 5G network and compared the classification time, training time, and battery life of the smartphone. As a result, although the training time remained the same, we found that the classification speed was 1.46 times faster than the conventional method, and the smartphone’s battery life was sufficient.
A true random number generator (TRNG) is suitable for generating secure keys and nonces. TRNGs implemented in IoT devices must be small in scale, have low power consumption, and be feasible. The random number sequence generated by TRNG needs to have high entropy immediately after startup and a stable state. This paper implements three types of ring oscillator type TRNGs, TERO-based, COSO-based, and STR-based TRNG, on Zynq-7010. When these TRNGs are implemented as a single entropy source, it is challenging to implement them because evaluating the layout and wiring for each FPGA is necessary. This paper evaluates a TRNG configuration, which exclusively ORs the outputs of multiple entropy sources. We show that this configuration can reduce the implementing difficulty and realize high entropy. For the random number sequence evaluation, we use the statistical test of NIST SP800-90B, SP800-22, and BSI AIS 20/31. In addition, the random number sequence immediately after the startup is also statistically evaluated. As a result, our evaluated TRNGs generate high entropy random numbers. They are easy to implement on FPGA when we implement TRNGs with eight single entropy sources for TERO-based TRNG, 48 for COSO-based TRNG, and two for STR-based TRNG, respectively.
Field programmable gate arrays (FPGAs) are now used in a wide range of application fields including aerospace, medical, and industrial infrastructure systems, where not only soft errors but also common cause faults must be treated in systems design. Although the heterogeneous redundant design is preferable in such application fields, it tends to be a large burden on system designers. Even with high-level synthesis (HLS) technologies, which have enabled productive design processes without register transfer level (RTL) descriptions, an efficient design approach for redundant design is not always clear. In this paper, we present and evaluates two heterogeneous redundant circuit design approaches for FPGAs: a resource-level approach and strategy-level approach. The resource-level approach focuses on diversity in FPGA technology mapping. For example, two different implementations for a multiplier, one with look-up tables (LUTs) and the other with digital signal processing (DSP) blocks, can form heterogeneous redundancy. The strategy-level approach makes the use of the optimization options offered by an FPGA design tool, called strategies, as a source of diversity. For example, two different implementations can be derived from the same hardware description with area oriented optimization and performance oriented optimization. With both approaches, heterogeneous implementation variants can be generated from the same hardware description code. For evaluation, we implemented homogeneous and heterogeneous redundant designs for proportional-integral-derivative (PID) control with those approaches and evaluated their error detection capability and reliability with overclock simulation. The resource-level approach showed that heterogeneous redundant designs by the proposed method have a high error detection rate in both RTL and HLS implementations in an application-level circuit. Although the detection rate of the strategy-level approach was not as high as that of the resource-level one, it was shown to have a certain diversification effect.
Mobile ad-hoc networks (MANETs) have limited network resources such as node battery capacity and wireless frequency bands. Therefore, to ensure efficient broadcast communication in such networks, it is critical to reduce unnecessary packet transfers and maximize the use of network resources and power. Network coding (NC) is a technique for reducing the number of transmitted packets over a network and maximizing the use of devices and network resources. The packet-reduction benefits obtained using NC depend considerably on network topology. In this study, we propose an NC method particularly designed for MANETs. In our approach, the nodes switch the type of transmission between an uncoded transmission (simple broadcast) and a coded transmission (NC) based on the estimated neighboring node’s proximity. The effectiveness of the proposed method was evaluated using computer simulations. Our evaluations demonstrated that the proposed method can reduce the number of packets in the network without decreasing the packet delivery rate.
This paper proposes a post-filtering system for improving the quality of depth maps for 3D projection on FPGA. We propose to implement the Weighted Least Square (WLS) filter on Field-programmable Gate Array (FPGA), which can predict the disparities, which cannot be measured, by using the values of the neighboring pixels. In our design, we optimized the architecture of WLS filter at the algorithm level. For hardware acceleration, we use the High-Level Synthesis (HLS) description to accelerate the algorithm. To break through the bottleneck brought by the limited memory resources on FPGA, we used UltraScale Architecture Memory Resources (URAM) on board and reduced memory consumption of Block RAMs (BRAM) from 140% to 80%. Through our approach, we can improve the quality of the depth map on the FPGA named M-KUBOS. And, the WLS filter can smooth the depth map with 130 MHz operational frequency on M-KUBOS at the speed of 10 frames per second (FPS), which achieves a 76.43% performance acceleration compared to the software execution on the ARM Cortex A9 at 1.2 GHz clock.
We have developed an ultra-fast sequence similarity search tool named PZLAST on a MIMD processor PEZY-SC2. In this paper, we show the merit of MIMD features in reducing the load imbalance among the threads. They are also effective in the implementation of the alignment for long sequences. Additionally, we point out two problems related to the implementation on an ultra-parallel computation accelerator as follows: (1) Deciding the optimal amount of inputs prior to the run is extremely difficult and usually even impossible, and (2) Keeping up the parallelism efficiently throughout the whole computation is not always possible. A feedback strategy and an accumulation strategy are proposed to overcome these problems and their results are shown to be valuable in reducing the accidental memory overflow in runtime and speeding up the processing time.
Network intrusion detection system(NIDS) is the most used tool to detect malicious network activities. The NIDS has achieved in the recent years promising results for detecting known and novel attacks, with the adoption of deep learning. However, these NIDSs still have shortcomings. Most of the datasets used for NIDS are highly imbalanced, where the number of samples that belong to normal traffic is much larger than the attack traffic. The problem of imbalanced class skews the results. It limits the deep learning classifier’s performance for minority classes by misleading the classifier to be biased in favor of the majority class. To improve the detection rate for minority classes while ensuring efficiency, this study proposes a hybrid approach to handle the imbalance problem. This hybrid approach is a combination of oversampling with Synthetic Minority Over-Sampling (SMOTE) and Tomek link, an under-sampling method to reduce noise. Additionally, this study uses two deep learning models such as Long Short-Term Memory Network (LSTM) and Convolutional Neural Network (CNN) to provide a better intrusion detection system. The advantage of our proposed model is tested in NSL-KDD, CICIDS2017 datasets. In addition, we evaluate the method in the most recent intrusion detection dataset, CICIDS2018 dataset. We use 10-fold cross validation in this work to train the learning models and an independent test set for evaluation. The experimental results show that in the multi-class classification with NSLKDD dataset, the proposed model reached an overall accuracy and Fscore of 99% and 99.0.2% respectively on LSTM, an overall accuracy and Fscore of 99.70% and 99.27% respectively for CNN. And with CICIDS2017 an overall ac- curacy and Fscore of 99.65% and 98 % respectively on LSTM, an overall accuracy and Fscore of 99.85% and 98.98% respectively for CNN. In CICIDS2018 the proposed method achieved an overall detection rate and Fscore of 95% and 94% respectively.
We propose a group signature scheme with a function of designated traceability; each opener has attributes, and a signer of a group signature can be traced by only the openers whose attributes satisfy the boolean formula designated by the signer. We describe syntax and security definitions of the scheme. Then we give a generic construction of the scheme by employing a ciphertext-policy attribute-based encryption scheme.