The 23rd Workshop on Advances in Parallel and Distributed Computational Models (APDCM), which was held in conjunction with the International Parallel and Distributed Processing Symposium (IPDPS) on May 17 - May 21, 2021, aims to provide a timely forum for the exchange and dissemination of new ideas, techniques and research in the field of the parallel and distributed computational models.
The APDCM workshop has a history of attracting participation from reputed researchers worldwide. The program committee has encouraged the authors of accepted papers to submit full-versions of their manuscripts to the International Journal of Networking and Computing (IJNC) after the workshop. After a thorough reviewing process, with extensive discussions, seven articles on various topics have been selected for publication on the IJNC special issue on APDCM.
On behalf of the APDCM workshop, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal of Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.
In this paper, we consider the gathering problem of seven autonomous mobile robots on triangular grids. The gathering problem requires that, starting from any connected initial configuration where a subgraph induced by all robot nodes (nodes where a robot exists) constitutes one connected graph, robots reach a configuration such that the maximum distance between two robots is minimized. For the case of seven robots, gathering is achieved when one robot has six adjacent robot nodes (they form a shape like a hexagon). In this paper, we aim to clarify the relationship between the capability of robots and the solvability of the gathering problem on a triangular grid. In particular, we focus on visibility range of robots. To discuss the solvability of the problem in terms of the visibility range, we consider strong assumptions except for visibility range. Concretely, we assume that robots are fully synchronous and they agree on the direction and orientation of the $x$-axis, and chirality on the triangular grid. In this setting, we first consider the weakest assumption about visibility range, i.e., robots with visibility range 1. In this case, we show that there exists no collision-free algorithm to solve the gathering problem. Next, we extend the visibility range to 2. In this case, we show that our algorithm can solve the problem from any connected initial configuration. Thus, the proposed algorithm is optimal in terms of visibility range.
This paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). We analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. We then compare the implementation of these algorithms over a task-based runtime system, PaRSEC and show the advantages and limitations of each approach in a real implementation.
With the advent of exascale computing, issues such as application irregularity and permanent hardware failure are growing in importance. Irregularity is often addressed by task-based parallel programming implemented with work stealing. At the task level, resilience can be provided by two principal approaches, namely checkpointing and supervision. For both, particular algorithms have been worked out recently. They perform local recovery and continue the program execution on a reduced set of resources. The checkpointing algorithms regularly save task descriptors explicitly, while the supervision algorithms exploit their natural duplication during work stealing and may be coupled with steal tracking to minimize the number of task re-executions. Thus far, the two groups of algorithms have been targeted at different task models: checkpointing algorithms at dynamic independent tasks, and supervision algorithms at nested fork-join programs. This paper transfers the most advanced supervision algorithm to the dynamic independent tasks model, thus enabling a comparison between checkpointing and supervision. Our comparison includes experiments, running time predictions, and simulations of job set executions. Results consistently show typical resilience overheads below 1% for both approaches. The overheads are lower for supervision in practically relevant cases, but checkpointing takes over for order millions of processes.
We investigate the terminating grid exploration for autonomous myopic luminous robots. Myopic robots mean that they can observe nodes only within a certain fixed distance, and luminous robots mean that they have light devices that can emit colors. First, we prove that, in the semi-synchronous and asynchronous models, three myopic robots are necessary to achieve the terminating grid exploration if the visible distance is one. Next, we give fourteen algorithms for the terminating grid exploration in various assumptions of synchrony (fully-synchronous, semi-synchronous, and asynchronous models), visible distance, the number of colors, and a chirality. Six of them are optimal in terms of the number of robots.
In self-organizing distributed systems in which there is no centralized controller, cooperation of processes and fault-tolerance are crucial. The former can be formalized by process synchronization, which is one of the fundamental problems in concurrent, parallel and distributed computing. The latter can be formalized by self-stabilization. A self-stabilizing distributed algorithm is a class of fault-tolerant distributed algorithms that tolerates a finite number of any kind of transient faults. It can be considered as a self-organizing system because it does not need a globally synchronized initialization nor reset, and the system automatically converges to some legitimate configuration.
In this paper, we propose a self-stabilizing distributed algorithm for a token ring with the graceful handover on bidirectional ring networks with the message-passing communication model. The motivation of this work is to design a protocol, by a formal approach, which is useful for the self-organizing multi-node security camera system that guarantees continuous observation. More specifically, a system consists of several nodes each of which is equipped with a video camera, some of the nodes are active in monitoring, and others are inactive to save energy. The problem is to design an algorithm with the graceful handover of active nodes. That is, at least one node is active at any time, in other words, there is no time instant at which no node is active. This problem is formalized as the mutual inclusion problem, which is a process synchronization problem such that at least one process is in the critical section. To this end, we propose an algorithm for circulating two tokens on bidirectional ring networks under the state-reading model by extending Dijkstra’s self-stabilizing token ring. We also propose the concept of the model gap tolerance property for the graceful handover. The proposed algorithm is self-stabilizing, and it guarantees the graceful handover in message-passing distributed systems.
This article describes techniques to model the performance of heterogeneous iterative programs, which can execute on multiple device types (CPUs and GPUs). We have classified iterative programs into two categories - static and dynamic, based on their workload distributions. Methods are described to model their performance on multi-device machines using linear regression from statistics. Experiments were designed to capture the behavioral response of different types of iterative programs to variations in regression variables. Experiments were divided into two sets - training and validation. Training sets were used to train and compare performance models, and validation sets were used to determine the accuracy of predictions. Performance models were developed for the execution time and the energy consumption of programs.
The interest in Deep Neural Networks has dramatically increased, especially e. g. in Computer Vision or Neural Language Processing tasks. Due to the heavy influence of the Neural Networks architecture on its predictive accuracy, Neural Architecture Search has gained much attention in recent years. Neural Architecture Search typically comes along with a high computational demand and thus, requires scalability as well as high availability to ensure no data loss or waste of computational power. Hence, we developed a scalable and highly available multi-objective Neural Architecture Search and adopted it to the modern thinking of developing applications by subdividing an already existing, monolithic approach – based on a Genetic Algorithm – into microservices. Moreover, we adjusted the initial population creation by mutating each individual 1,000 times, extended the approach by inception layers, implemented it as island model and achieved on MNIST, Fashion-MNIST and CIFAR-10 dataset 99.75%, 94.35% and 89.90% test accuracy, respectively. Furthermore, we analyzed nine different configurations of the Genetic Algorithm – with only one subpopulation – to identify well performing settings. Besides, our model is strongly focused on high availability empowered by the deployment in our bare-metal Kubernetes cluster. Our results show that the introduced Neural Architecture Search can easily handle and recover – without the necessity of human interaction – from the exceptional loss of Kubernetes pods within seconds and no loss of results or the algorithms state.
The local function of a cellular automaton with binary states can be expressed by a formula in propositional logic. If a local function is that of any reversible cellular automaton, its inverse function can also be expressed as a propositional logic formula, and using it as a local function, we can define the cellular automaton. The multiplication of these formulae in propositional logic yields the local function of the composition of two cellular automata.
In this paper, we consider logical formulae on a commutative monoid as the local functions of n-dimensional cellular automata. We discuss the commutativity of the multiplication of formulae and show some conditions for formulae to satisfy the commutativity of the composition of n-dimensional cellular automata.
This paper implements an execution unit that generates grid square codes from latitude and longitude on a RISC-V processor and evaluates its performance. In recent years, a statistical analysis which uses grid square codes has been focused. Although grid square codes are obtained from latitude and longitude based on several equations, this calculation requires a long computing time because it needs a lot of floating-point instructions.
In this paper, an execution unit which generates grid square codes from latitude and longitude is designed, and the instruction which generates grid square codes by using the unit is implemented on a RISC-V processor. The proposed execution unit can generate the corresponding grid square code from latitude and longitude in one cycle.
As a benchmark, a program that counts the number of times the randomly generated latitude and longitude match the specified grid square code is used. Experimental results show the in-order RISC-V processor with the proposed unit which implemented on an FPGA achieves a 36.6% reduction of execution time compared to the original processor. In addition, the performance evaluation of the out-of-order RISC-V processor with the proposed unit by using the gem5 simulator shows a 23.9% reduction of execution cycles compared to the original processor.
This paper also designs an execution unit which converts grid square codes to latitude and longitude. We call this conversion ”Grid-to-Degree conversion”. Experimental results show the in-order RISC-V processor with the unit which implemented on an FPGA achieves a 3.65% reduction of execution time compared to the original processor.
Unmanned aerial vehicles (UAVs) have demonstrated substantial abilities at disaster sites, as they can be easily moved and are not affected by terrain conditions. The purpose of this study is to search for survivors in the UAV localization system. For this purpose, we explore a localization method using the Received Signal Strength Indicator (RSSI) when aerial photography cannot be used to find victims buried in rubble. We propose a method for estimating the position of disaster survivors using RSSI in an environment surrounded by obstacles. We conducted an experiment in an urban area to evaluate our proposed algorithm. Our proposed algorithm improves the accuracy of position estimation in disaster areas in crowded, obstacle-ridden situations.