International Journal of Networking and Computing
Online ISSN : 2185-2847
Print ISSN : 2185-2839
ISSN-L : 2185-2839
Volume 2, Issue 2
Displaying 1-10 of 10 articles from this issue
Special Issue on Invited Talks of the Second International Conference in Networking and Computing
  • Koji Nakano
    2012 Volume 2 Issue 2 Pages 146
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    The Second International Conference on Networking and Computing (ICNC’12) has been held in Osaka, Japan, from November 30-December 2. It is my pleasure to publish the special issue on invited talks of ICNC’12.

    The ICNC’12 organizing committee asked speakers of keynote and tutorials of ICNC’12 to submit papers based on their talks. Submitted papers have been reviewed by experts in the same filed, and authors have revised papers according to their comments. Finally, we have accepted following four papers:

    ・ Probabilistic Self-Stabilization and Biased Random Walks on Dynamic Graphs by Masafumi Yamasita

    ・ Visualized Development and Testing for Embedded Cluster Computing by Ian Vince McLaughlin, Timo Rolf Bretschneider, and Chen Zheming

    ・ Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: Open CL by Shinichi Yamagiwa

    ・ All about RICC: RIKEN Integrated Cluster of Clusters by Nakata Maho

    On behalf of the ICNC’12 organizing committee, I would like to thank authors for giving excellent talks at ICNC’12 conference site and for submitting their outstanding manuscripts to this special issue. Likewise, I would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. This special issue would not be possible without their support.

    Download PDF (18K)
  • Masafumi Yamashita
    2012 Volume 2 Issue 2 Pages 147-159
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    A distributed system is said to be probabilistic self-stabilizing, if it eventually converges to a legitimate computation with probability 1, starting from any global configuration. Like a self-stabilizing system, a probabilistic self-stabilizing system tolerates any number of transient failures and recovers a legitimate computation, but only probabilistically, unlike a self-stabilizing system, which recovers it deterministically even in the worst case. However, a self-stabilizing algorithm is in general difficult to design and even impossible for some problems. A probabilistic self-stabilizing, on the other hand, is easier to design. To see this, we discuss how a probabilistic self-stabilizing system is constructible from a given weak stabilizing system, which can recover a legitimate computation only in the best case.

    An execution of a probabilistic self-stabilizing system can be modeled by a random walk on a graph, and its performance can be evaluated in terms of some quantities, e.g., the hitting and the cover times, of the corresponding random walk. The hitting and the cover times of random walks have been studied extensively, but most of them consider standard (i.e., unbiased) random walks on static graphs. We discuss how to design biased random walks whose hitting and cover times are faster than standard random walks, to improve the performance of probabilistic self-stabilizing system. We also discuss random walks on dynamic graphs to analyze a probabilistic self-stabilizing system such that its communication network topology frequently changes.

    Download PDF (131K)
  • Ian Vince McLoughlin, Timo Rolf Bretschneider, Chen Zheming
    2012 Volume 2 Issue 2 Pages 160-187
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    Embedded cluster processing solutions can be difficult to design if quantity and type of processors, interconnectivity technology and code partitioning are not specified in advance, and yet it may not be possible to determine these without qualitative and quantitative testing which can not be performed until hardware is available. Similar issues are faced in conventional development projects for generic embedded systems, or for generic distributed systems, but the issues are exacerbated when both attributes are combined. This paper considers issues relating to custom embedded cluster processing system development in general, before focussing on a specific example of a space-borne cluster computer using ARM processors running embedded Linux. This early example, which will be shown performing distributed synthetic aperture radar image processing, is representative of a growing number of systems in the pervasive and ambient computing fields. It illustrates many difficulties in co-developing hardware and software for specific tasks. We present one potential solution - the use of distributed virtualization to simulate the final design. An ARM cluster simulator is presented, constructed using QEMU and virtual networking, and shown used for the development and validation of distributed embedded processing tasks on a cluster-type architecture. As clusters of embedded processors, pervasive embedded networks, and the “embedded cloud” become more popular in the near future, such virtualized systems can prove useful for development and testing.

    Download PDF (2550K)
  • Shinichi Yamagiwa
    2012 Volume 2 Issue 2 Pages 188-205
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    Multicore/manycore architecture accelerates demand for a new programming environment to utilize the massive processors integrated in an LSI. GPU (Graphics Processing Unit) is one of the typical hardware environments. The programming environments on GPU are traditionally vendor-/hardware-specific, where complicate the management of uniform programs that access computing resources of the massively parallel platform. The recently released OpenCL is expected to become a standard for providing a uniform programming environment for the heterogeneous processors from different vendors. This tutorial paper introduces the overview of the OpenCL that motivates the programmers who are going to program the massively parallel hardware or who migrates the programming method from another vendor specific programming interface to the OpenCL. This paper explains the characteristics of the OpenCL interface with describing in detail the basic structures used in the program. Moreover, this paper discusses performance aspects to evaluate advanced programming techniques that improve the performance of the OpenCL applications.

    Download PDF (285K)
  • Maho Nakata
    2012 Volume 2 Issue 2 Pages 206-215
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    This is an introduction to the RIKEN’s supercomputer RIKEN Integrated Cluster of Clusters (RICC), that has been in operation since August 2009. The basic concept of the RICC is to “provide an environment with high power computational resources to facilitate research and development for RIKEN’s researchers”. Based on this concept, we have been operating the RICC system as a (i) data analysis environment for experimental researchers, (ii) development environment targeting the next-generation supercomputer; i.e., the “K” computer, and (iii) GPU (graphics processing unit) computers for exploring challenges in developing a future computer environment. The total performance of RICC is 97.94 TFlops, ranking it as the 125th on the Top500 list in Nov. 2011. We prepared four job class accounts, based on the researchers’ proposals prior to evaluation by our Review Committee. We also provided backup services to RIKEN’s researchers, such as conducting RICC training classes, software installation services, and speed up and visualization support. To encourage affirmative participation and proactive initiation, all the services were free of charge; however, access to RICC was limited to researchers and collaborators of RIKEN. As a result, RICC has been able to maintain a high activity ratio (> 90%) since the beginning of its operation.

    Download PDF (1884K)
Special Issue on Selected Papers from the Second International Conference on Networking and Computing
  • Yasuaki Ito, Sayaka Kamei
    2012 Volume 2 Issue 2 Pages 216
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    The second International Conference on Networking and Computing (ICNC) on November 30-December 2, 2011, in Osaka, Japan, - aims to provide a timely forum for exchange and discussion of the latest research findings in all aspects of networking and computing including parallel and distributed systems, architectures, and applications.

    Also, five workshops, 3rd Workshop on Ultra Performance and Dependable Acceleration Systems (UPDAS), 3rd International Workshop on Parallel and Distributed Algorithms and Applications (PDAA), 2nd International Workshop on Advances in Networking and Computing (WANC), International Workshop on Challenges on Massively Parallel Processors (CMPP), and International Workshop on Networking, Computing, Systems, and Software (NCSS) were held in conjunction with ICNC.

    The program committee has encouraged the authors of selected papers including the workshops to submit full-versions of their manuscripts to the International Journal on Networking and Computing (IJNC) after the conference. After a thorough reviewing process, with extensive discussions, four articles on various topics have been selected for publication on the IJNC special issue on ICNC.

    On behalf of the ICNC, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal on Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.

    Download PDF (17K)
  • Takayuki Murakawa, Akihiro Fujiwara
    2012 Volume 2 Issue 2 Pages 217-233
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    In the present paper, we consider the asynchronous parallelism in membrane computing, and propose asynchronous P systems that perform two basic arithmetic operations and factorization. Since there is no restrictive assumption for application of rules, sequential and maximal parallel executions are allowed on the asynchronous P system.

    We first propose a P system that computes addition of two binary numbers of m bits. The P system works in O(m) sequential and parallel steps using O(m) types of objects. We next propose a P system for multiplication of the two binary numbers of m bits, and show that the P system works in O(m log m) parallel steps or O(m3) sequential steps using O(m2) types of objects. Finally, we propose a P system for factorization of a positive integer of m bits using the above P system as a sub-system. The P system computes the factorization in O(m log m) parallel steps or O(4m × m2 log m) sequential steps using O(m2) types of objects.

    Download PDF (218K)
  • Yamin Li, Shietung Peng, Wanming Chu
    2012 Volume 2 Issue 2 Pages 234-250
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    In this paper, we propose a flexible interconnection network, called hierarchical dual-net (HDN), with low node degree and short diameter for constructing a large scale of supercomputer. The HDN is constructed based on a symmetric product graph (base network). A k-level hierarchical dual-net, HDN(B, k, S), contains (2N0)2k/(2 ×Πki=1 si) nodes, where S = {G′1,G′2, . . . ,G′k}, G′i is a super-node and si = |G′i| is the number of nodes in the super-node at the level i for 1 ≤ i ≤ k, and N0 is the number of nodes in the base network B. The node degree of HDN(B, k, S) is d0 + k, where d0 is the node degree of the base network. The HDN structure is better than existing networks such as hypercube and 2D/3D torus with respect to the degree and diameter. Another benefit of the HDN is that we can select suitable super-nodes to control the growing speed of the number of nodes for constructing a supercomputer of the desired scale. We investigate the topological properties of the HDN and compare them to that of other networks and give efficient routing and broadcasting algorithms for the hierarchical dual-net.

    Download PDF (245K)
  • Naoki Nishikawa, Keisuke Iwai, Takakazu Kurokawa
    2012 Volume 2 Issue 2 Pages 251-268
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    As the data protection with encryption becomes important day by day, the encryption processing using General Purpose computation on a Graphic Processing Unit (GPGPU) has been noticed as one of the methods to realize high-speed data protection technology. GPUs have evolved in recent years into powerful parallel computing devices, with a high cost-performance ratio. However, many factors affect GPU performance. In earlier work to gain higher AES performance using GPGPU in various ways, we obtained the following two technical viewpoints: (1) 16 Bytes/Thread is the best granularity (2) Extended key and substitution table stored in shared memory and plaintext stored in register are the best memory allocation style.

    However, AES is not the only cipher algorithm widely used in the real world. For this reason, this study was undertaken to test the hypothesis that these two findings are applicable to implementation of other symmetric block ciphers on two generation of GPU. In this study, we targeted five 128-bit symmetric block ciphers, AES, Camellia, CIPHERUNICORN-A, Hierocrypt-3, and SC2000, from an e-government recommended ciphers list by the CRYPTography Research and Evaluation Committees (CRYPTREC) in Japan. We evaluated the performance of these five symmetric block ciphers on the machine including a 4-core CPU and each GPU using three method: (A) throughput without data transfer, (B) throughput with data transfer and overlapping encryption processing on GPU, (C) throughput with data transfer and non-overlapping encryption processing on GPU. Results demonstrate that the throughput of implementation of SC2000 in method (A) on Tesla C2050 achieved extremely high 73.4 Gbps. Additionally, the throughput obtained using methods (B) and (C) deteriorated to 33.4 Gbps and 18.3 Gbps, respectively. Method (B) showed effective throughput with an approximately 4.7 times higher speed compared to that obtained when using 8 threads on a 4-core CPU.

    Download PDF (615K)
  • Md. Nazrul Islam Mondal, Koji Nakano, Yasuaki Ito
    2012 Volume 2 Issue 2 Pages 269-290
    Published: 2012
    Released on J-STAGE: March 23, 2017
    JOURNAL FREE ACCESS

    Field Programmable Gate Arrays (FPGAs) are a dominant implementation medium for digital circuits which are used to embed a circuit designed by users instantly. FPGAs can be used for implementing parallel and hardware algorithms. Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most of FPGAs support synchronous read operations, but do not support asynchronous read operations. The main contribution of this paper is to provide one of the potent approaches to resolve this problem. We assume that a circuit using asynchronous ROMs is given. In our previous work, we have presented a circuit rewriting algorithm to convert a circuit with asynchronous ROMs into an equivalent circuit with synchronous ones. The resulting circuit with synchronous ROMs can be embedded into FPGAs. However, this circuit rewriting algorithm can handle circuits represented by a directed acyclic graph and does not work for those with cycles. In this paper, we succeeded in relaxing the cycle-free condition of circuits. More specifically, we present an algorithm that automatically converts a circuit with cycles using asynchronous ROMs into an equivalent circuit using synchronous ROMs.

    Download PDF (165K)
feedback
Top