International Journal of Networking and Computing

Special Issue on Workshop on Advances in Parallel and Distributed Computational Models 2014

Special Issue on Workshop on Advances in Parallel and Distributed Computational Models 2014

Akihiro Fujiwara, Susumu Matsumae

2015 Volume 5 Issue 1 Pages 1
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_1

JOURNAL FREE ACCESS

Show abstractHide abstract

The 16th Workshop on Advances in Parallel and Distributed Computational Models (APDCM) – held in conjunction with the International Parallel and Distributed Processing Symposium (IPDPS) on May 19-23, 2014, in Phoenix, USA, - aims to provide a timely forum for the exchange and dissemination of new ideas, techniques and research in the field of the parallel and distributed computational models.The APDCM workshop has a history of attracting participation from reputed researchers world- wide. The program committee has encouraged the authors of accepted papers to submit full-versions of their manuscripts to the International Journal of Networking and Computing (IJNC) after the workshop. After a thorough reviewing process, with extensive discussions, eight articles on various topics have been selected for publication on the IJNC special issue on APDCM.On behalf of the APDCM workshop, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal of Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.

View full abstract

Download PDF (40K)
Composing resilience techniques: ABFT, periodic and incremental checkpointing

George Bosilca, Aurelien Bouteiller, Thomas Herault, Yves Robert, Jack ...

2015 Volume 5 Issue 1 Pages 2-25
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_2

JOURNAL FREE ACCESS

Show abstractHide abstract

Algorithm Based Fault Tolerant (ABFT) approaches promise unparalleled scalability and performance in failure-prone environments. Thanks to recent advances in the understanding of the involved mechanisms, a growing number of important algorithms (including all widely used factorizations) have been proven ABFT-capable. In the context of larger applications, these algorithms provide a temporal section of the execution, where the data is protected by its own intrinsic properties, and can therefore be algorithmically recomputed without the need of checkpoints. However, while typical scientific applications spend a significant fraction of their execution time in library calls that can be ABFT-protected, they interleave sections that are difficult or even impossible to protect with ABFT. As a consequence, the only practical fault-tolerance approach for these applications is checkpoint/restart. In this paper we propose a model to investigate the efficiency of a composite protocol, that alternates between ABFT and checkpoint/restart for the effective protection of an iterative application composed of ABFT-aware and ABFT-unaware sections. We also consider an incremental checkpointing composite approach in which the algorithmic knowledge is leveraged by a novel optimal dynamic program- ming to compute checkpoint dates. We validate these models using a simulator. The model and simulator show that the composite approach drastically increases the performance delivered by an execution platform, especially at scale, by providing the means to increase the interval between checkpoints while simultaneously decreasing the volume of each checkpoint.

View full abstract

Download PDF (838K)
A Novel Computational Model for GPUs with Applications to Efficient Algorithms

Atsushi Koike, Kunihiko Sadakane

2015 Volume 5 Issue 1 Pages 26-60
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_26

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel computational model for GPUs. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU-based algorithms. Our model, called AGPU, abstracts the essence of current GPU architectures such as global and shared memory, memory coalescing and bank conflicts. Using our model, we can evaluate asymptotic behavior of GPU algorithms more efficiently than the known models and we can develop algorithms that run fast on real GPU devices.As a showcase, we analyze the asymptotic behavior of basic existing algorithms including reduction, prefix scan, and comparison sorting. We further develop new algorithms by detecting and resolving performance bottlenecks of the existing algorithms. Our reduction algorithm has the optimal time and I/O complexities and works with non-commutative operators. Our comparison sorting algorithm has the optimal I/O complexity. Additionally, we show our algorithms run faster than the existing algorithms not only in theory but also in practice.

View full abstract

Download PDF (799K)
Handling Non-determinism with Description Logics using a Fork/Join Approach

Jocelyne Faddoul, Wendy MacCaull

2015 Volume 5 Issue 1 Pages 61-85
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_61

JOURNAL FREE ACCESS

Show abstractHide abstract

The increasing use of Ontologies, formulated using expressive Description Logics, for time sensitive applications necessitates the development of fast (near realtime) reasoning tools. Multicore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. The rise of such powerful execution environments calls for new parallel and distributed Description Logics (DLs) reasoning algorithms. Many sophisticated optimizations have been explored and have considerably enhanced DL reasoning with light ontologies. Non-determinism remains a main source of complexity for implemented systems handling ontologies relying on more expressive logics.In this work, we explore handling non-determinism with DL languages enabling qualified cardinality restrictions. We implement a fork/join parallel framework into our tableau-based algebraic reasoner, which handles qualified cardinality restrictions and nominals using in-equation solving. The preliminary results are encouraging and show that using a parallel framework with algebraic reasoning is worth investigating and more promising than parallelizing standard tableau-based reasoning.

View full abstract

Download PDF (557K)
Linear Performance-Breakdown Model: A Framework for GPU kernel programs performance analysis

Chapa Martell Mario Alberto, Hiroyuki Sato

2015 Volume 5 Issue 1 Pages 86-104
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_86

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we describe our performance-breakdown model for GPU programs. GPUs are a popular choice as accelerator hardware due to their high performance, high availability and relatively low price. However, writing programs that are highly efficient represents a difficult and time consuming task for programmers because of the complexities of GPU architecture and the inherent difficulty of parallel programming. That is the reason why we propose the Linear Performance-Breakdown Model Framework as a tool to assist in the optimization process. We show that the model closely matches the behavior of the GPU by comparing the execution time obtained from experiments in two different types of GPU, an Accelerated Processing Unit (APU) and a GTX660, a discrete board. We also show performance-breakdown results obtained from applying the modeling strategy and how they indicate the time spent during the computation in each of the three Mayor Performance Factors that we define as processing time, global memory transfer time and shared memory transfer time.

View full abstract

Download PDF (2887K)
Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications

Chunyan Wang, Shoichi Hirasawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2015 Volume 5 Issue 1 Pages 180-199
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_180

JOURNAL FREE ACCESS

Show abstractHide abstract

A code smell is a code pattern that might indicate a code or design problem, which makes the application code hard to evolve and maintain. Automatic detection of code smells has been studied to help users find which parts of their application codes should be refactored. However, code smells have not been defined in a formal manner. Moreover, existing detection tools are designed mainly for object-oriented applications, but rarely provided for high performance computing (HPC) applications. HPC applications are usually optimized for a particular platform to achieve a high performance, and hence have special code smells called platform-specific code smells (PSCSs). The purpose of this work is to develop a code smell alert system to help users find PSCSs of HPC applications to improve the performance portability across different platforms. This paper presents a PSCS alert system that is based on an abstract syntax tree (AST) and XML. Code patterns of PSCSs are defined in a formal way using the AST information represented in XML. XML Path Language (XPath) is used to describe those patterns. A database is built to store the transformation recipes written in XSLT files for eliminating detected PSCSs. The recall and precision evaluation results obtained by using real applications show that the proposed system can detect potential PSCSs accurately. The evaluation on performance portability of real applications demonstrates that eliminating PSCSs leads to significant performance changes and therefore the code portions with detected PSCSs have to be refactored to improve the performance portability across multiple platforms.

View full abstract

Download PDF (600K)
Self-Stabilizing Algorithms for Maximal 2-packing and General k-packing (k ≥ 2) with Safe Convergence in an Arbitrary Graph

Yihua Ding, James Wang, Pradip K Srimani

2015 Volume 5 Issue 1 Pages 105-121
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_105

JOURNAL FREE ACCESS

Show abstractHide abstract

In a graph or a network G=(V,E), a set S⊆V is a 2-packing if ∀i∈V : |N[i]∩S|≤1, where N[i] denotes the closed neighborhood of node i. A 2-packing is maximal if no proper superset of S is a 2-packing. This paper presents a safely converging self-stabilizing algorithm for maximal 2-packing problem. Under a synchronous daemon, it quickly converges to a 2-packing (a safe state, not necessarily the legitimate state) in three synchronous steps, and then terminates in a maximal one (the legitimate state) in O(n) steps without breaking safety during the convergence interval, where n is the number of nodes. Space requirement at each node is O(log n) bits. This is a significant improvement over the most recent self-stabilizing algorithm for maximal 2-packing that uses O(n²) synchronous steps with same space complexity and that does not have safe convergence property. We then generalize the technique to design a self-stabilizing algorithm for maximal k-packing, k ≥ 2, with safe convergence that stabilizes in O(kn²) steps under synchronous daemon; the algorithm has space complexity of O(knlogn) bits at each node; existing algorithms for k-packing stabilize in exponential time under a central daemon with O(log n) space complexity.

View full abstract

Download PDF (393K)
Near-Optimal Location Tracking Using Sensor Networks

Gokarna Sharma, Hari Krishnan, Costas Busch, Steven R. Brandt

2015 Volume 5 Issue 1 Pages 122-158
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_122

JOURNAL FREE ACCESS

Show abstractHide abstract

We consider the problem of tracking mobile objects using a sensor network. We present a distributed tracking algorithm, called Mobile Object Tracking using Sensors (MOT), that scales well with the number of sensors and also with the number of mobile objects. MOT maintains a hierarchical structure of detection lists of objects that can efficiently track mobile objects and resolve object queries at any time. MOT guarantees that the cost to update (or maintain) its data structures will be at most O(min{log n, log D}) times the optimal update cost and query cost will be within O(1) of the optimal query cost in the constant-doubling graph model, where n and D, respectively, are the number of nodes and the diameter of the network. MOT achieves polylogarithmic approximations for both costs in the general graph model. MOT balances the object and bookkeeping information load at each node in the expense of only O(logn) increase in the update and query costs. The experimentation evaluation in both one by one and concurrent execution situations shows that MOT performs well in practical scenarios. To the best of our knowledge, MOT is the first algorithm for this problem in a distributed setting that is traffic-oblivious, i.e. agnostic to a priori knowledge of objects movement patterns, mobility and query rate, etc., and is load balanced. All previous solutions for this problem assumed traffic-consciousness in constructing the tracking data structure.

View full abstract

Download PDF (1292K)
Development of an Algorithm for Extracting Parallelism and Pipeline Structure from Stream-based Processing flow with Spanning Tree

Shinichi Yamagiwa, Guyue Wang, Koichi Wada

2015 Volume 5 Issue 1 Pages 159-179
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_159

JOURNAL FREE ACCESS

Show abstractHide abstract

It is a fashion to use the manycore accelerators to promote the computing power in a computing plat- form. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. Programming methods on such accelerators includes development of control programs which accelerators executes to schedule the invocation of the accelerator’s kernel program. The kernel program needs to be written based on the stream computing paradigm. Connecting I/Os of the kernel programs, we can develop a large application. When we consider the processing flow as a directed graph, we can implement a GUI-based programming tool for the accelerators. It visualizes a pipeline-based processing flow. However, it is very hard to find a starting point of a complex processing flow. Moreover, although the processing pipeline include the potential parallelism, it is hard for the programmer to exploit it intuitively. This paper proposes an algorithm applying the spanning tree that mechanically exploits the parallelism and determines an execution order. To verify the algorithm, this paper performs evaluation with realistic applications. The algorithm exploits effectively the parallelism and construct the optimal pipeline processing flow.

View full abstract

Download PDF (1107K)

Regular Papers

Extensions of Access-Point Aggregation Algorithm for Large-scale Wireless Local Area Networks

Md. Ezharul Islam, Nobuo Funabiki, Toru Nakanishi

2015 Volume 5 Issue 1 Pages 200-222
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_200

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently, many organizations such as universities and companies have deployed wireless local area networks (WLANs) to cover the whole site for ubiquitous network services. In these WLANs, wireless access-points (APs) are often managed independently by different groups such as laboratories or departments. Then, a host may detect signals from multiple APs, which can degrade the communication performance due to radio interferences among them and increase operational costs. Previously, we proposed the AP aggregation algorithm to solve this problem by minimizing the number of active APs through aggregating them using the virtual AP technology. However, our extensive simulations in various instances found that 1) the minimization of active APs sometimes excessively degrades the network performance, and 2) the sequential optimization of host associations does not always reach optimal where slow links are still used. In this paper, we propose two extensions of the AP aggregation algorithm to solve these problems by 1) ensuring the minimum average throughput for any host by adding active APs and 2) further optimizing host associations by changing multiple hosts simultaneously in the host association finalization phase. We verify the effectiveness through simulations in four network instances using the WIMNET simulator.

View full abstract

Download PDF (497K)
An Adaptive Routing Algorithm of 2-D Torus Network Based on Turn Model: The Communication Performance

Yasuyuki Miura, Kentaro Shimozono, Naohisa Fukase, Shigeyoshi Watanabe ...

2015 Volume 5 Issue 1 Pages 223-238
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_223

JOURNAL FREE ACCESS

Show abstractHide abstract

A 2-D torus network is one of the most popular networks for parallel processing. Many algorithms have been proposed based on the turn model, but most of them cannot be applied to a torus network without modification. In this paper, we propose North-South First (NSF) routing that is applicable to a 2-D torus and combines the north-first method (NF) and the south-first method (SF). NF and SF are algorithms yielded by the turn model. A software simulation comparing NSF routing with other forms of deterministic and adaptive routing showed that NSF routing improves throughput in three types of communication patterns, but yields no improvement for one other communication pattern.

View full abstract

Download PDF (667K)
A Co-Processor Design for an Energy Efficient Reconfigurable Accelerator CMA

Mai Izawa, Nobuaki Ozaki, Yusuke Koizumi, Rie Uno, Hideharu Amano

2015 Volume 5 Issue 1 Pages 239-251
Published: January 10, 2015
Released on J-STAGE: January 22, 2015

DOIhttps://doi.org/10.15803/ijnc.5.1_239

JOURNAL FREE ACCESS

Show abstractHide abstract

Cool Mega Array (CMA) is an energy efficient reconfigurable accelerator consisting of a large PE array with combinatorial circuits and a small microcontroller. In order to enhance the energy efficiency of the total system, a co-processor design of CMA called CMA-Geyser is proposed. By partly replacing the programmable microcontroller by the host processor Geyser with a dedicated hardware controller, the setting up for the CMA and data transfer can be efficiently done.The design using 65nm CMOS process is compared with an off-loading style multicore system Cube-1. By eliminating the data memory required in Cube-1, CMA-Geyser reduced 21.3% of semiconductor area. Also, it achieved about 2.7 times performance of Cube-1 by the efficient data communication between host and the accelerator.

View full abstract

Download PDF (1100K)

Register with J-STAGE for free!