The third International Conference on Networking and Computing (ICNC) on December 5-7, 2012, in Okinawa, Japan, - aims to provide a timely forum for exchange and discussion of the latest research findings in all aspects of networking and computing including parallel and distributed systems, architectures, and applications.
Also, four workshops, 4th International Workshop on Parallel and Distributed Algorithms and Applications (PDAA), 3rd International Workshop on Advances in Networking and Computing (WANC), 2nd International Workshop on Challenges on Massively Parallel Processors (CMPP), and 2nd International Workshop on Networking, Computing, Systems, and Software (NCSS) were held in conjunction with ICNC.
The program committee has encouraged the authors of selected papers including the workshops to submit full-versions of their manuscripts to the International Journal on Networking and Computing (IJNC) after the conference. After a thorough reviewing process, with extensive discussions, six articles on various topics have been selected for publication on the IJNC special issue on ICNC.
On behalf of the ICNC, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal on Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.
Consider the problem of routing from a single source node to multiple target nodes with the additional condition that these disjoint paths be the shortest. This problem is harder than the standard one-to-many routing in that such paths do not always exist. Various sufficient and necessary conditions have been found to determine when such paths exist for some interconnection networks. And when these conditions do hold, the problem of finding such paths can be reduced to the problem of finding a disjoint ordering of sets. In addition to the applications in finding disjoint shortest paths in interconnection networks, the problem of finding disjoint ordering of sets is an interesting combinatorial problem in its own right. We study the problem of finding a disjoint ordering of sets A1, A2, ..., Am where Ai ⊆ A = ｛a1, a2,…, an} and m ≤ n. We present an O(n3) algorithm for doing so, under certain conditions, thus improving the previously known O(n4) algorithm, and consequently, improving the corresponding one-to-many routing algorithms for finding disjoint and shortest paths.
Increased energy consumption in processors caused by performance enhancement has recently become a critical problem. Many current processors employ dynamic voltage and frequency scaling (DVFS) which dynamically lowers the supply voltage and clock frequency in order to reduce energy consumption. However, it is difficult to deliver fine-grain energy optimization by using DVFS. Since a voltage regulator takes a long time for scaling the voltage and charging/discharging a power line has a large energy overhead, the useful interval of DVFS is limited to coarse-grain. To optimize energy consumption at fine-grain interval, we have proposed a variable stages pipeline (VSP) processor. VSP reduces energy consumption by dynamically varying the pipeline depth to suitable pipeline depth according to behavior of a running program. VSP can obtain finer-grained energy reduction than DVFS because pipeline scaling only requires a small overhead. In this paper, we fabricated a VSP processor chip using 180 nm technology and evaluated energy consumption of the chip. We present that the fabricated VSP chip dynamically varies the pipeline depth while a program is running and reduces the energy consumption at shorter interval than DVFS. We also analyze how to optimize the energy consumption according to system demand. Our analysis result shows that the VSP can adjust the energy consumption in the same manner for diverse program phases.
We propose a constant-time algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edge-weighted digraph G of average degree d whose weights are real values in [0, 1] is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in G with an absolute error of at most εn with query complexity O(d/ε3), where n is the number of vertices. We also show a lower bound of Ω(d/ε2). Additionally, our algorithm can be modified to run with query complexity O(1/ε4) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with d = Ω(n) edges. In contrast, we show that it requires Ω(n) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.
Systems-on-Chip (SoC) architectures have been shifting from single-core to multi-core solutions, and they are at present evolving towards many-core ones. Network-on-Chip (NoC) is considered as a promising interconnection scheme for many-core SoCs since it offers better scalability than traditional bus-based interconnection. In this work, we have developed a fast simulator of NoC architectures using QEMU and SystemC. QEMU is an open-source CPU emulator which is widely used in many simulation platforms such as Android Emulator. In the proposed simulator, each CPU core is emulated by a QEMU, and the network part including NoC routers is modeled with SystemC. The SystemC simulator and QEMUs are connected by TCP sockets on a host computer. Our simulator is fast because QEMUs run in parallel on a multi-core host computer or even multiple host computers. Also, our simulator is highly retargetable because QEMU provides a variety of CPU models and we use QEMU as is. In our experiments, our simulator successfully simulates a 108-core NoC in a practical time. We have also confirmed the scalability and retargetability of our NoC simulator.
With rapid growth of the GPS enabled mobile devices, location-based online social network services become very popular, and allow their users to share life experiences with location information. In this paper, we considered a method for recommending places to a user based on spatial databases of location-based online social network services. We used a user-based collaborative filtering method to make a set of recommend places. In the proposed method, we calculate similarity of users’ check-in activities based on not only their positions but also their semantics such as “shopping”, “eating”, “drinking”, and so forth. We empirically evaluated our method in a real database and found that the proposed method outperforms the naive singular value decomposition collaborative filtering in its recommendation accuracy.
The existing k-dominant skyline solutions are restricted to centralized query processors, limiting scalability, and imposing a single point of failure. To overcome those problems in this paper, we propose the computation and maintenance algorithms for spatial k-dominant skyline query processing in large-scale distributed environment. Where the underlying dataset is partitioned into geographically distant computing core (personal computer) that are connected to the coordinator (server). Our proposed techniques preserve the spatial k-dominant computation object itself into a serialized form. This preservation is done in client's core after completing a computational job successfully. When the issue of maintenance comes in action, preserve data object retrieves and use for computation. This procedure eliminates the necessity of intermediate re-send and re-computation of k-dominant skyline for the maintenance issue. Thus, we quantify the gain of data transferring consecutively into different cores to maximize the overall gain as well as the query or balancing the load on different cores fairly. Extensive performance study shows that proposed algorithms are efficient and robust to different data distributions.
The availability and utility of large numbers of Graphical Processing Units (GPUs) have enabled parallel computations using extensive multi-threading. Sequential access to global memory and contention at the size-limited shared memory have been main impediments to fully exploiting potential performance in architectures having a massive number of GPUs. After performing extensive study of data structures and complexity analysis of various data access methodologies, we propose novel memory storage and retrieval techniques that enable parallel graph computations to overcome the above issues. More specifically, given a graph G = (V,E) and an integer k <= ｜V｜, we provide both storage techniques and algorithms to count the number of: a) connected subgraphs of size k; b) k cliques; and c) k independent sets, all of which can be exponential in number. Our storage techniques are based on creating a breadth-first search tree and storing it along with non-tree edges in a novel way. Our experiments solve the above mentioned problems by using both naïve and advanced data structures on the CPU and GPU. Speedup is achieved by solving the problems on the GPU even using a brute-force approach as compared to the implementations on the CPU. Utilizing the knowledge of BFS-tree properties, the performance gain on the GPU increases and ultimately outperforms the CPU by a factor of at least 5 for graphs that completely fit in the shared memory and by a factor of 10 for larger graphs stored using the global memory. The counting problems mentioned above have many uses, including the analysis of social networks.
In aerospace industry, computational fluid dynamics (CFD) is used as a common design tool. Fast Aerodynamics Routines (FaSTAR) is one of the most recent CFD software package, convenient for users with various solvers and automatic generation of grid data. The problem of FaSTAR is hard to be executed in parallel machines because of its irregular and unpredictable data structure. Exploiting reconfigurable hardware with their advantages to make up for the inadequacy of the existing high performance computers had gradually become the solutions and trends. However, a single FPGA is not enough for the FaSTAR package because the whole module is very large. Instead of using a large number of chips, partially reconfigurable hardware available in recent FPGAs is explored for this application. Advection term computation module in FaSTAR is chosen as a target subroutine. We proposed a reconfigurable flux calculation scheme using partial reconfiguration technique to save hardware resources to fit in a single FPGA. We developed flux computational module and three flux calculation schemes are implemented as reconfigurable modules. This implementation has advantages of up to 42% resource saving and enhancing the configuration speed by 6.28 times. Performance evaluation also shows that 2.65 times acceleration is achieved compared to Intel Core 2 Duo at 2.4GHz.