The Fifth International Symposium on Networking and Computing (CANDAR 2017) was held in Aomori Japan, from November 19th to 22th, 2017. The organizers of the CANDAR 2017 invited authors to submit the extended version of the presented papers. As a result, 27 articles have been submitted to this special issue. This issue includes the extended version of 16 papers that have been accepted.
This issue owes a great deal to a number of people who devoted their time and expertise to handle the submitted papers. In particular, I would like to thank the guest editors for the excellent review process: Professor Shuichi Ichikawa, Professor Katsunobu Imai, Professor Yasuaki Ito, Professor Yoshiaki Kakuda, Professor Eitaro Kohno, Professor Susumu Matsumae, Professor Toru Nakanishi, Professor Yasuyuki Nogami, Professor Fukuhito Ooshita, Professor Hiroyuki Sato, and Professor Shinya Takamaeda-Yamazaki.
Words of gratitude are also due to the anonymous reviewers who carefully read the papers and provided detailed comments and suggestions to improve the quality of the submitted papers. This special issue would not have been without their efforts.
Membrane computing, which is a computational model based on cell activity, has considerable attention as one of new paradigms of computations. In the general membrane computing, computationally hard problems have been solved in a polynomial number of steps using an exponential number of membranes. However, reduction of the number of membranes must be considered to make P system more realistic model.
In the paper, we propose an asynchronous P system with branch and bound, which is a well known optimization technique, to reduce the number of membranes. The proposed P system solves the satisfiability problem (SAT) with n variables and m clauses, and works in O(m2^n)$ sequential steps or O(mn) parallel steps.
In addition, the number of membranes used in the proposed P system is evaluated using a computational simulation. The experimental result shows validity and efficiency of the proposed P system.
Methods for improving learning accuracy by utilizing a plurality of data sets with different reliabilities have been studied extensively. Unreliable data sets often include data with incorrect labels, and the accuracy of learning from such data sets is thus affected. Here, we focused on a learning problem, Taming, which deals with two kinds of data sets with different reliabilities. We propose a label estimation method for use in data sets that include data with incorrect labels. The proposed method is an extension of BaggTaming, which has been proposed as a solution to Taming. We conducted experiments to verify the effectiveness of the proposed method by using a benchmark data set in which the labels were intentionally changed to make them incorrect. We confirmed that learning accuracy could be improved by using the proposed method and data sets with modified labels.
This paper introduces an optimal representation for a right-to-left parallel elliptic curve scalar point multiplication. The right-to-left approach is easier to parallelize than the conventional left-to-right approach. However, unlike the left-to-right approach, there is still no work considering number representations for the right-to-left parallel calculation. By simplifying the implementation by Robert, we devise a mathematical model to capture the computation time of the calculation. Then, for any arbitrary amount of doubling time and addition time, we propose algorithms to generate representations which minimize the time in that model. As a result, we can show a negative result that a conventional representation like NAF is almost optimal. The parallel computation time obtained from any representation cannot be better than NAF by more than 1%. In addition to that, we devise a time model and propose an algorithm to generate optimal representations for multi-scalar point multiplication (under a condition). Similar to the result of scalar point multiplication, NAF is almost optimal also for multi-scalar point multiplication as the difference of parallel computation time obtained from optimal representation and NAF is less than 1% in all experimental settings.
In this paper, we propose a system for modeling fire dynamics with emphasis on realistic behavior as well as the extensive behavioral control system. We propose a wide range of parametric and procedural controls. Simulation of flame spread, and motion is governed using differential equations which takes into account wind forces, buoyancy forces, diffusion and velocity of the burning surface. We propose to enhance the realistic behavior by using stochastic simulation of flickering and buoyant diffusion. The proposed wind fields serve as added fire motion control by a user. Flame behavior covers moving sources, flickering, fire separation and fire merging.
In photo-realistic image synthesis the incoming light from the environment is particularly important. In this work we focus on capturing the incoming light in the form of the radiance map using a common mobile device. This involves the reconstruction of the spherical panorama sky-dome in high dynamic range (HDR) quality and save it to a usable data format. Our setup requires a common smartphone devices with an additional fish-eye lens attached to the camera. In this paper we discuss the calibration of our setup and the implementation of selected tone mapping operator (TMO) allowing satisfactory display of HDR images on the mobile device screen. Built-in cameras on mobile devices do not generally capture the HDR image. In this work, we describe an algorithm for capturing an HDR image on the Android platform. Using an optimization method we are able to acquire the camera response curve needed in the reconstruction of the HDR image from multiple snapshots. Subsequent HDR panoramas represent radiance sphere-maps of the incoming light from an environment. These radiance sphere-maps are useful in realistic image synthesis to illuminate the objects in the 3D scenes.
Recently, in large-scale Internet of Things (IoT) systems based on cloud computing, problems such as increase in network load, delay in response, and invasion of privacy have become concerning. To solve these problems, edge computing has been introduced to the IoT systems. However, if the cloud function is excessively migrated to the edge, the collected data cannot be shared between IoT systems, thus reducing the system's usefulness. In this paper, we propose a multi-agent based flexible IoT edge computing architecture to balance global optimization by a cloud and local optimization by edges and to optimize the role of the cloud server and the edge servers dynamically. Further, as an application example, we introduce an energy management system based on the proposed edge computing system architecture to demonstrate the effectiveness of our proposal.
A thread synchronization mechanism dubbed Spatial Locks for parallel geometric algorithms is presented. We show that Spatial Locks ensure thread synchronization on geometric algorithms that perform concurrent operations on geometric surfaces in two-dimensional or three-dimensional space. The proposed technique respects the fact that these operations follow a certain order of processing, i.e. priorities. Parallelizing these kinds of geometric algorithms using Spatial Locks requires only a simple parameter initialization, rather than modifying the algorithms themselves together with their internal data structures. A parallel algorithm for mesh simplification is chosen to show the Spatial Locks usefulness when parallelizing geometric algorithms with ease on multi-core machines. Experimental results illustrate the advantage of using this synchronization mechanism where significant computational improvement can be achieved compared to alternative approaches.
Recently, Wireless Local Area Networks (WLANs) have increased popularity around the world, where users can easily access to the Internet through associations with access points (APs) using mobile devices like smart phones, tablets, and laptops. In a WLAN, it is common that the number of users is always changing by time and users are not evenly distributed in the field. To optimize the number of active APs and the host associations in the network depending on traffic demands from users, previously, we proposed the AP configuration algorithm for the elastic WLAN system. Unfortunately, this algorithm can find the solution for the fixed user state in the network, although users often repeat joining and leaving the network. In this paper, we propose the extension of the AP configuration algorithm to deal with this dynamic nature. Here, as practical situations, this extension considers that at most one host may join or leave at the same time, and any communicating AP cannot be suspended and any communicating host cannot change its associated AP. Through numerical experiments using the WIMNET simulator in two network instances, the effectiveness of the proposal is demonstrated. Furthermore, it is implemented in the elastic WLAN system testbed using Raspberry Pi for the AP. The performance of this implementation is verified through experiments in four scenarios.
The ever-growing demand for compute resources has reached a wide range of application domains, and with that has created a larger audience for compute-intensive tasks. In this paper, we present the CloudCL framework, which empowers users to run compute-intensive tasks without having to face the total cost of ownership of operating an extensive high-performance compute infrastructure. CloudCL enables developers to tap the ubiquitous availability of cloud-based heterogeneous resources using a single-paradigm compute framework, without having to consider dynamic resource management and inter-node communication. In an extensive performance evaluation, we demonstrate the feasibility of the framework, yielding close-to-linear scale-out capabilities for certain workloads.
Performance optimization in the petascale era and beyond in the exascale era has and will require modifications of legacy codes to take advantage of new architectures with large core counts and SIMD units. The Numerical Weather Prediction (NWP) physics codes considered here are optimized using thread-local structures of arrays (SOA). High-level and low-level optimization strategies are applied to the WRF Single-Moment 6-Class Microphysics Scheme (WSM6) and Global Forecast System (GFS) physics codes used in the NEPTUNE forecast code. By building on previous work optimizing WSM6 on the Intel Knights Landing (KNL), it is shown how to further optimize WMS6 and GFS physics, and GFS radiation on Intel KNL, Haswell, and potentially on future micro-architectures with many cores and SIMD vector units. The optimization techniques used herein employ thread-local structures of arrays (SOA), an OpenMP directive, OMP SIMD, and minor code transformations to enable better utilization of SIMD units, increase parallelism, improve locality, and reduce memory traffic. The optimized versions of WSM6, GFS physics, GFS radiation run 70, 27, and 23 faster (respectively) on KNL and 26, 18 and 30 faster (respectively) on Haswell than their respective original serial versions. Although this work targets WRF physics schemes, the findings are transferable to other performance optimization contexts and provide insight into the optimization of codes with complex physical models for present and near-future architectures with many core and vector units.
In this paper, we propose a new method to estimate a traveling direction for a pedestrian with a smart phone by using the Inertial Measurement Unit (IMU) and principal component analysis (PCA). Our estimator does not restrict a holding status of the smart phone. It is possible to estimate an attitude of the smart phone based on the IMU filter which makes use of values from sensors embedded in itself. We estimate a traveling direction of the pedestrian based on the PCA by using the output of the filter. However, this method essentially includes an indeterminacy problem which allows a direction reversal because the component of the PCA is non-directed graph. In this paper, we investigate a modified traveling direction estimation method in order to suppress a performance degradation caused by this problem. From experimental results, we show that the proposed estimator outperforms the conventional one.
The security of Internet of Things (IoT) devices is one of the most important problems to be addressed by the cryptographers and security engineers. The processing ability of IoT devices is limited, therefore light-weight and secure cryptographic tools are necessary for security of them. This paper shows the implementation of 256-bit Elliptic Curve Cryptography (ECC) on an 8-bit microcontroller. The proposed implementation applies towering technique for extension field of degree 32 with a certain 8-bit prime characteristic instead of the 256-bit prime characteristic. It enables to execute 256-bit ECC operations without complicated multiple-precision arithmetic on small computers like 8-bit microcontrollers. This approach efficiently realizes the scalability of the ECC encryption strength. In addition, the authors use a twisted Montgomery curve with a Montgomery ladder technique which enables fast calculations without inversions referring to Curve25519. It is considered resistant to the Side Channel Attack (SCA) since it applies the Montgomery ladder technique for scalar multiplication (SCM). This ECC implementation on Arduino UNO, an 8-bit microcontroller board, can be utilized for a key agreement protocol among IoT devices.
A total loss of thrust poses a major hazard to passengers and aircrafts. In such situations, the pilot is forced to perform an emergency landing by fast and intuitive decisions. During the manoeuvre, the potential energy of the aircrafts altitude is converted into the kinetic energy to move a certain distance over ground. This may enable the aircraft to reach a suitable landing field at a proper altitude. In this paper, we introduce an emergency landing assistant which calculates the flight path from a start to a target position with uniform computational complexity. The main objective is to support the pilot and accelerate the decision-making process. Our path computations take a constant wind into account by moving the target runway contrary to the wind direction. The results have shown that even with the restriction of co-rotating circles a high percentage of valid flight paths can be found. Furthermore, for most of the considered start configurations more than one feasible flight path can be discovered.
Because of the widespread adoption of mobile devices, many applications now provide support for wireless LAN (WLAN). This makes it an important issue to provide good WLAN quality of service (QoS). For this purpose, Dhurandher et al. improved the distributed coordination function (DCF). Although the highest-priority throughput increased using this method, throughput for other priorities decreased significantly. The authors proposed a minimum contention window control method based on the collision history for two (high and low) priorities, to improve Dhurandher's method. That method can keep the same average throughput for high-priority traffic as Dhurandher's method yields, but decreases the average total throughput of not only the low-priority traffic but also the high-priority traffic when congestion occurs. To overcome this problem, the method introduced in this paper improves the contention window control method by using a priority control based on the number of freezes, which is defined as a count of the times that a mobile node lets its backoff timer be frozen by the other nodes. The number of freezes is introduced as a more suitable indication of congestion, and to control the contention window for the low priority. Finally, network simulations executed on two scenarios using continuous flows or on-off flows for the low-priority and demonstrated that the proposed method retains the average total throughput of both priority frames as well as reducing the number of freezes of both priorities and the packet delay and jitter for high-priority frames, compared with the controls in Dhurandher's method and the previous method. However, the fairness-index for low-priority flows showed a little unfair in the proposed method.
This paper presents a practical implementation of a 2D flood simulation model using hybrid distributed-parallel technologies including MPI, OpenMP, CUDA, and evaluations of its performance under various configurations that utilize these technologies. The main objective of this research work was to improve the computational performance of the flood simulation in a hybrid architecture. Modern desktops and small cluster systems owned by domain researchers are able to perform these simulations efficiently due to multicore and GPU computing devices, but lack the expertise needed to fully utilize software libraries designed to take advantage of the latest hardware. By leveraging knowledge of our experimentation environment, we were able to incorporate MPI and multiple GPU devices to improve performance over a single-process OpenMP version up to 18x, depending on the size of the input data. We discuss some observations that have significant effects on overall performance, including process-to-device mapping, communication strategies and data partitioning, and present some experimental results. The limitations of this work are discussed, and we propose some ideas to relieve or overcome such limitations in future work.
Recent state-of-the-art deep reinforcement learning algorithms, such as A3C and UNREAL, are designed to train on a single device with only CPU's. Using GPU acceleration for these algorithms results in low GPU utilization, which means the full performance of the GPU is not reached. Motivated by the architecture changes made by the GA3C algorithm, which gave A3C better GPU acceleration, together with the high learning efficiency of the UNREAL algorithm, this paper extends GA3C with the auxiliary tasks from UNREAL to create a deep reinforcement learning algorithm, GUNREAL, with higher learning efficiency and also benefiting from GPU acceleration. We show that our GUNREAL system achieves 3.8 to 9.5 times faster training speed compared to UNREAL and 73% more training efficiency compared to GA3C.