International Journal of Networking and Computing

Special Issue on Invited Talks of the Second International Conference in Networking and Computing

Special Issue on Invited Talks of the Second International Conference on Networking and Computing

Koji Nakano

2012Volume 2Issue 2 Pages 146
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_146

JOURNAL FREE ACCESS

Show abstractHide abstract

The Second International Conference on Networking and Computing (ICNC’12) has been held in Osaka, Japan, from November 30-December 2. It is my pleasure to publish the special issue on invited talks of ICNC’12.
The ICNC’12 organizing committee asked speakers of keynote and tutorials of ICNC’12 to submit papers based on their talks. Submitted papers have been reviewed by experts in the same filed, and authors have revised papers according to their comments. Finally, we have accepted following four papers:
・ Probabilistic Self-Stabilization and Biased Random Walks on Dynamic Graphs by Masafumi Yamasita
・ Visualized Development and Testing for Embedded Cluster Computing by Ian Vince McLaughlin, Timo Rolf Bretschneider, and Chen Zheming
・ Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: Open CL by Shinichi Yamagiwa
・ All about RICC: RIKEN Integrated Cluster of Clusters by Nakata Maho
On behalf of the ICNC’12 organizing committee, I would like to thank authors for giving excellent talks at ICNC’12 conference site and for submitting their outstanding manuscripts to this special issue. Likewise, I would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. This special issue would not be possible without their support.

View full abstract

Download PDF (18K)
Probabilistic Self-Stabilization and Biased Random Walks on Dynamic Graphs

Masafumi Yamashita

2012Volume 2Issue 2 Pages 147-159
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_147

JOURNAL FREE ACCESS

Show abstractHide abstract

A distributed system is said to be probabilistic self-stabilizing, if it eventually converges to a legitimate computation with probability 1, starting from any global configuration. Like a self-stabilizing system, a probabilistic self-stabilizing system tolerates any number of transient failures and recovers a legitimate computation, but only probabilistically, unlike a self-stabilizing system, which recovers it deterministically even in the worst case. However, a self-stabilizing algorithm is in general difficult to design and even impossible for some problems. A probabilistic self-stabilizing, on the other hand, is easier to design. To see this, we discuss how a probabilistic self-stabilizing system is constructible from a given weak stabilizing system, which can recover a legitimate computation only in the best case.
An execution of a probabilistic self-stabilizing system can be modeled by a random walk on a graph, and its performance can be evaluated in terms of some quantities, e.g., the hitting and the cover times, of the corresponding random walk. The hitting and the cover times of random walks have been studied extensively, but most of them consider standard (i.e., unbiased) random walks on static graphs. We discuss how to design biased random walks whose hitting and cover times are faster than standard random walks, to improve the performance of probabilistic self-stabilizing system. We also discuss random walks on dynamic graphs to analyze a probabilistic self-stabilizing system such that its communication network topology frequently changes.

View full abstract

Download PDF (131K)
Virtualized Development and Testing for Embedded Cluster Computing

Ian Vince McLoughlin, Timo Rolf Bretschneider, Chen Zheming

2012Volume 2Issue 2 Pages 160-187
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_160

JOURNAL FREE ACCESS

Show abstractHide abstract

Embedded cluster processing solutions can be difficult to design if quantity and type of processors, interconnectivity technology and code partitioning are not specified in advance, and yet it may not be possible to determine these without qualitative and quantitative testing which can not be performed until hardware is available. Similar issues are faced in conventional development projects for generic embedded systems, or for generic distributed systems, but the issues are exacerbated when both attributes are combined. This paper considers issues relating to custom embedded cluster processing system development in general, before focussing on a specific example of a space-borne cluster computer using ARM processors running embedded Linux. This early example, which will be shown performing distributed synthetic aperture radar image processing, is representative of a growing number of systems in the pervasive and ambient computing fields. It illustrates many difficulties in co-developing hardware and software for specific tasks. We present one potential solution - the use of distributed virtualization to simulate the final design. An ARM cluster simulator is presented, constructed using QEMU and virtual networking, and shown used for the development and validation of distributed embedded processing tasks on a cluster-type architecture. As clusters of embedded processors, pervasive embedded networks, and the “embedded cloud” become more popular in the near future, such virtualized systems can prove useful for development and testing.

View full abstract

Download PDF (2550K)
Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL

Shinichi Yamagiwa

2012Volume 2Issue 2 Pages 188-205
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_188

JOURNAL FREE ACCESS

Show abstractHide abstract

Multicore/manycore architecture accelerates demand for a new programming environment to utilize the massive processors integrated in an LSI. GPU (Graphics Processing Unit) is one of the typical hardware environments. The programming environments on GPU are traditionally vendor-/hardware-specific, where complicate the management of uniform programs that access computing resources of the massively parallel platform. The recently released OpenCL is expected to become a standard for providing a uniform programming environment for the heterogeneous processors from different vendors. This tutorial paper introduces the overview of the OpenCL that motivates the programmers who are going to program the massively parallel hardware or who migrates the programming method from another vendor specific programming interface to the OpenCL. This paper explains the characteristics of the OpenCL interface with describing in detail the basic structures used in the program. Moreover, this paper discusses performance aspects to evaluate advanced programming techniques that improve the performance of the OpenCL applications.

View full abstract

Download PDF (285K)
All about RIKEN Integrated Cluster of Clusters (RICC)

Maho Nakata

2012Volume 2Issue 2 Pages 206-215
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_206

JOURNAL FREE ACCESS

Show abstractHide abstract

This is an introduction to the RIKEN’s supercomputer RIKEN Integrated Cluster of Clusters (RICC), that has been in operation since August 2009. The basic concept of the RICC is to “provide an environment with high power computational resources to facilitate research and development for RIKEN’s researchers”. Based on this concept, we have been operating the RICC system as a (i) data analysis environment for experimental researchers, (ii) development environment targeting the next-generation supercomputer; i.e., the “K” computer, and (iii) GPU (graphics processing unit) computers for exploring challenges in developing a future computer environment. The total performance of RICC is 97.94 TFlops, ranking it as the 125th on the Top500 list in Nov. 2011. We prepared four job class accounts, based on the researchers’ proposals prior to evaluation by our Review Committee. We also provided backup services to RIKEN’s researchers, such as conducting RICC training classes, software installation services, and speed up and visualization support. To encourage affirmative participation and proactive initiation, all the services were free of charge; however, access to RICC was limited to researchers and collaborators of RIKEN. As a result, RICC has been able to maintain a high activity ratio (> 90%) since the beginning of its operation.

View full abstract

Download PDF (1884K)

Special Issue on Selected Papers from the Second International Conference on Networking and Computing

Special Issue on Selected Papers from the Second International Conference on Networking and Computing

Yasuaki Ito, Sayaka Kamei

2012Volume 2Issue 2 Pages 216
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_216

JOURNAL FREE ACCESS

Show abstractHide abstract

The second International Conference on Networking and Computing (ICNC) on November 30-December 2, 2011, in Osaka, Japan, - aims to provide a timely forum for exchange and discussion of the latest research findings in all aspects of networking and computing including parallel and distributed systems, architectures, and applications.
Also, five workshops, 3rd Workshop on Ultra Performance and Dependable Acceleration Systems (UPDAS), 3rd International Workshop on Parallel and Distributed Algorithms and Applications (PDAA), 2nd International Workshop on Advances in Networking and Computing (WANC), International Workshop on Challenges on Massively Parallel Processors (CMPP), and International Workshop on Networking, Computing, Systems, and Software (NCSS) were held in conjunction with ICNC.
The program committee has encouraged the authors of selected papers including the workshops to submit full-versions of their manuscripts to the International Journal on Networking and Computing (IJNC) after the conference. After a thorough reviewing process, with extensive discussions, four articles on various topics have been selected for publication on the IJNC special issue on ICNC.
On behalf of the ICNC, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal on Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.

View full abstract

Download PDF (17K)
Arithmetic Operations and Factorization using Asynchronous P Systems

Takayuki Murakawa, Akihiro Fujiwara

2012Volume 2Issue 2 Pages 217-233
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_217

JOURNAL FREE ACCESS

Show abstractHide abstract

In the present paper, we consider the asynchronous parallelism in membrane computing, and propose asynchronous P systems that perform two basic arithmetic operations and factorization. Since there is no restrictive assumption for application of rules, sequential and maximal parallel executions are allowed on the asynchronous P system.
We first propose a P system that computes addition of two binary numbers of m bits. The P system works in O(m) sequential and parallel steps using O(m) types of objects. We next propose a P system for multiplication of the two binary numbers of m bits, and show that the P system works in O(m log m) parallel steps or O(m³) sequential steps using O(m²) types of objects. Finally, we propose a P system for factorization of a positive integer of m bits using the above P system as a sub-system. The P system computes the factorization in O(m log m) parallel steps or O(4^m × m² log m) sequential steps using O(m²) types of objects.

View full abstract

Download PDF (218K)
Hierarchical Dual-Net: A Flexible Interconnection Network and its Routing Algorithm

Yamin Li, Shietung Peng, Wanming Chu

2012Volume 2Issue 2 Pages 234-250
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_234

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a flexible interconnection network, called hierarchical dual-net (HDN), with low node degree and short diameter for constructing a large scale of supercomputer. The HDN is constructed based on a symmetric product graph (base network). A k-level hierarchical dual-net, HDN(B, k, S), contains (2N₀)^2k/(2 ×Π^k_i=1 s_i) nodes, where S = {G′₁,G′₂, . . . ,G′_k}, G′_i is a super-node and s_i = |G′_i| is the number of nodes in the super-node at the level i for 1 ≤ i ≤ k, and N₀ is the number of nodes in the base network B. The node degree of HDN(B, k, S) is d₀ + k, where d₀ is the node degree of the base network. The HDN structure is better than existing networks such as hypercube and 2D/3D torus with respect to the degree and diameter. Another benefit of the HDN is that we can select suitable super-nodes to control the growing speed of the number of nodes for constructing a supercomputer of the desired scale. We investigate the topological properties of the HDN and compare them to that of other networks and give efficient routing and broadcasting algorithms for the hierarchical dual-net.

View full abstract

Download PDF (245K)
High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

Naoki Nishikawa, Keisuke Iwai, Takakazu Kurokawa

2012Volume 2Issue 2 Pages 251-268
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_251

JOURNAL FREE ACCESS

Show abstractHide abstract

As the data protection with encryption becomes important day by day, the encryption processing using General Purpose computation on a Graphic Processing Unit (GPGPU) has been noticed as one of the methods to realize high-speed data protection technology. GPUs have evolved in recent years into powerful parallel computing devices, with a high cost-performance ratio. However, many factors affect GPU performance. In earlier work to gain higher AES performance using GPGPU in various ways, we obtained the following two technical viewpoints: (1) 16 Bytes/Thread is the best granularity (2) Extended key and substitution table stored in shared memory and plaintext stored in register are the best memory allocation style.
However, AES is not the only cipher algorithm widely used in the real world. For this reason, this study was undertaken to test the hypothesis that these two findings are applicable to implementation of other symmetric block ciphers on two generation of GPU. In this study, we targeted five 128-bit symmetric block ciphers, AES, Camellia, CIPHERUNICORN-A, Hierocrypt-3, and SC2000, from an e-government recommended ciphers list by the CRYPTography Research and Evaluation Committees (CRYPTREC) in Japan. We evaluated the performance of these five symmetric block ciphers on the machine including a 4-core CPU and each GPU using three method: (A) throughput without data transfer, (B) throughput with data transfer and overlapping encryption processing on GPU, (C) throughput with data transfer and non-overlapping encryption processing on GPU. Results demonstrate that the throughput of implementation of SC2000 in method (A) on Tesla C2050 achieved extremely high 73.4 Gbps. Additionally, the throughput obtained using methods (B) and (C) deteriorated to 33.4 Gbps and 18.3 Gbps, respectively. Method (B) showed effective throughput with an approximately 4.7 times higher speed compared to that obtained when using 8 threads on a 4-core CPU.

View full abstract

Download PDF (615K)
A Rewriting Approach to Replace Asynchronous ROMs with Synchronous Ones for the Circuits with Cycles

Md. Nazrul Islam Mondal, Koji Nakano, Yasuaki Ito

2012Volume 2Issue 2 Pages 269-290
Published: 2012
Released on J-STAGE: March 23, 2017

DOIhttps://doi.org/10.15803/ijnc.2.2_269

JOURNAL FREE ACCESS

Show abstractHide abstract

Field Programmable Gate Arrays (FPGAs) are a dominant implementation medium for digital circuits which are used to embed a circuit designed by users instantly. FPGAs can be used for implementing parallel and hardware algorithms. Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most of FPGAs support synchronous read operations, but do not support asynchronous read operations. The main contribution of this paper is to provide one of the potent approaches to resolve this problem. We assume that a circuit using asynchronous ROMs is given. In our previous work, we have presented a circuit rewriting algorithm to convert a circuit with asynchronous ROMs into an equivalent circuit with synchronous ones. The resulting circuit with synchronous ROMs can be embedded into FPGAs. However, this circuit rewriting algorithm can handle circuits represented by a directed acyclic graph and does not work for those with cycles. In this paper, we succeeded in relaxing the cycle-free condition of circuits. More specifically, we present an algorithm that automatically converts a circuit with cycles using asynchronous ROMs into an equivalent circuit using synchronous ROMs.

View full abstract

Download PDF (165K)

Register with J-STAGE for free!