-
Lei Ma, Cyrille Artho, Hiroyuki Sato
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
1-13
Published: 2014
Released on J-STAGE: January 08, 2014
JOURNAL
FREE ACCESS
With today's importance of distributed applications, their verification and analysis are still challenging. They involve large combinational states, interactive network communications between peers, and concurrency. Although there are some dynamic analysis tools for analyzing the runtime behavior of a single-process application, they do not provide methods to analyze distributed applications as a whole, where multiple processes run simultaneously. Centralization is a general solution which transforms multi-process applications into a single-process one that can be directly analyzed by existing tools. In this paper, we improve the accuracy of centralization. Moreover, we extend it as a general framework for analyzing distributed applications with multiple versions. First, we formalize the version conflict problem and present a simple solution, and further propose an optimized solution to resolving class version conflicts during centralization. Our techniques enable sharing common code whenever possible while keeping the version space of each component application separate. Centralization issues like startup semantics and static field transformation are improved and discussed. We implement and apply our centralization tool to some network benchmarks. Experiments, where existing tools are used on the centralized application, prove the usefulness of our automatic centralization tool, showing that centralization enables these tools to analyze distributed applications with multiple versions.
View full abstract
-
Kazuyuki Hara, Kentaro Katahira
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
14-19
Published: 2014
Released on J-STAGE: January 08, 2014
JOURNAL
FREE ACCESS
In on-line gradient descent learning, the local property of the derivative term of the output function can slowly converge. Improving the derivative term, such as by using the natural gradient, has been proposed for speeding up the convergence. Beside this sophisticated method, we propose an algorithm that replaces the derivative term with a constant and show that this greatly increases convergence speed when the learning step size is less than 2.7, which is near the optimal learning step size. The proposed algorithm is inspired by linear perceptron learning and can avoid locality of the derivative term. We derived the closed deterministic differential equations by using a statistical mechanics method and show the validity of theoretical results by comparing them with computer simulation solutions. In real problems, the optimum learning step size is not given in advance. Therefore, the learning step size must be small. The proposed method is useful in this case.
View full abstract
-
Satoru Tokuda, Kenji Nagata, Masato Okada
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
20-26
Published: 2014
Released on J-STAGE: January 08, 2014
JOURNAL
FREE ACCESS
The radial basis function (RBF) network is a regression model that uses the sum of radial basis functions such as Gaussian functions. It has recently been widely applied to spectral deconvolution such as X-ray photoelectron spectroscopy data analysis, which enables us to estimate the electronic state of matter from the spectral peak positions. For models with a hierarchy such as the RBF network, Bayesian learning provides better generalization performance than the maximum likelihood estimation. In Bayesian learning, the learning coefficient is well-known as the coefficients of the leading terms for the asymptotic expansion of generalization error and stochastic complexity. However, these coefficients have not been clarified in most models. We propose here a novel method for calculating the learning coefficient by using the exchange Monte Carlo method. In addition, we calculated the learning coefficient in the RBF networks and verified the efficiency of the proposed method by comparing theoretical and experimental values.
View full abstract
-
Ryota Miyata, Toru Aonishi, Jun Tsuzurugi, Koji Kurata
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
27-32
Published: 2014
Released on J-STAGE: January 08, 2014
JOURNAL
FREE ACCESS
Many associative memory models with synaptic decay such as the forgetting model and the zero-order decay model have been proposed and studied so far. The previous studies showed the relation between the storage capacity
C and the synaptic decay coefficient α in each synaptic decay model. However, with the exceptions of a few studies, they did not compare the network retrieval performance between different synaptic decay models. We formulate the associative memory model with the β-th-order synaptic decay as an extension of the zero-order decay model. The parameter β denotes the synaptic decay order or the degree of the synaptic decay term, which enables us to compare the retrieval performance between different synaptic decay models. Using numerical simulations, we investigate the relation between the synaptic decay coefficient α and the storage capacity
C of the network by varying the synaptic decay order β. The results show that the properties of the synaptic decay model are constant for a large decay order β. Moreover, we search the minimum β to avoid overloading and the optimal β to maximize the network retrieval performance. The minimum integer value of β to avoid overloading is
-1. The optimal integer value of β to maximize the network retrieval performance is 1, i.e., the degree of the forgetting model, and the suboptimal integer β is 0, i.e., that of the zero-order synaptic decay model.
View full abstract
-
Yu Liu, Kento Emoto, Kiminori Matsuzaki, Zhenjiang Hu
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
33-42
Published: 2014
Released on J-STAGE: January 23, 2014
JOURNAL
FREE ACCESS
MapReduce programming model attracts a lot of enthusiasm among both industry and academia, largely because it simplifies the implementations of many data parallel applications. In spite of the simplicity of the programming model, there are many applications that are hard to be implemented by MapReduce, due to their innate characters of computational dependency. In this paper we propose a new approach of using the programming pattern
accumulate over MapReduce, to handle a large class of problems that cannot be simply divided into independent sub-computations. Using this
accumulate pattern, many problems that have computational dependency can be easily expressed, and then the programs will be transformed to MapReduce programs executed on large clusters. Users without much knowledge of MapReduce can also easily write programs in a sequential manner but finally obtain efficient and scalable MapReduce programs. We describe the programming interface of our
accumulate framework and explain how to transform a user-specified
accumulate computation to an efficient MapReduce program. Our experiments and evaluations illustrate the usefulness and efficiency of the framework.
View full abstract
-
Masaki Kasuya, Kenji Kono
Article type: Regular Papers
Subject area: Security
2014 Volume 7 Pages
43-51
Published: 2014
Released on J-STAGE: March 28, 2014
JOURNAL
FREE ACCESS
Fake antivirus (AV) software, a kind of malware, pretends to be a legitimate AV product and frightens computer users by showing fake security alerts, as if their computers were infected with malware. In addition, fake AV urges users to purchase a “commercial” version of the fake AV. In this paper, we search for an
indicator that captures behavioral differences in legitimate AV and fake AV. The key insight behind our approach is that legitimate AV behaves
differently in clean and infected environments, whereas fake AV behaves
similarly in both environments, because it does not analyze malware in the infected environments. We have investigated three potential indicators, file access pattern, CPU usage, and memory usage, and found that memory usage is an effective indicator to distinguish legitimate AV from fake AV. In an experiment, this indicator identifies all fake AV samples (39 out of 39) as fake and all legitimate AV products (8 out of 8) as legitimate. It is impractical for fake AV to evade this indicator because to do so would require it to detect malware infections, just as legitimate AV does.
View full abstract
-
Shimpei Yotsukura, Toshiaki Omori, Kenji Nagata, Masato Okada
Article type: Regular Paper
Subject area: Regular Papers
2014 Volume 7 Pages
52-58
Published: 2014
Released on J-STAGE: March 31, 2014
JOURNAL
FREE ACCESS
The spike-triggered average (STA) and phase response curve characterize the response properties of single neurons. A recent theoretical study proposed a method to estimate the phase response curve by means of linear regression with Fourier basis functions. In this study, we propose a method to estimate the STA by means of sparse linear regression with Fourier and polynomial basis functions. In the proposed method, we use sparse estimation with L1 regularization to extract substantial basis functions for the STA. We show using simulated data that the proposed method achieves more accurate estimation of the STA than the simple trial average used in conventional method.
View full abstract
-
Yutaka Matsuno
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
59-68
Published: 2014
Released on J-STAGE: June 12, 2014
JOURNAL
FREE ACCESS
Assurance cases are documented body of evidence that provide valid and convincing argument that the system is adequately dependable in a given application and an environment. Assurance cases are widely required as a regulation for safety-critical systems in EU. There have been several graphical notations for assurance cases. GSN (Goal Structuring Notation) and CAE (Claim, Argument, Evidence) are such two notations. However, these notations have not been defined in a formal way. This paper presents a formal definition of GSN and its pattern extensions. We take the framework of functional programming language as the basis of our study. The implementation has been done on an Eclipse based GSN editor. We report case studies on previous works about GSN and show the applicability of the design and implementation. This is a step toward developing an assurance case language.
View full abstract
-
Kosetsu Ikeda, Nobutaka Suzuki
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
69-81
Published: 2014
Released on J-STAGE: July 02, 2014
JOURNAL
FREE ACCESS
Suppose that we have a DTD and XML documents valid against the DTD, and consider writing an XPath query to the documents. Unfortunately, a user often does not understand the entire structure of the documents exactly, especially in the case where the documents are very large and/or complex, or the DTD has been updated but the user misses it. In such cases, the user tends to write an invalid XPath query. However, it is difficult for the user to correct the query by hand due to his/her lack of exact knowledge about the entire structure of the documents. In this paper, we propose an algorithm that finds, for an XPath query q, a DTD
D, and a positive integer
K, top-
K XPath queries most syntactically close to
q among the XPath queries conforming to
D, so that a user select an appropriate query among the
K queries. We also present some experimental studies.
View full abstract
-
Satoshi Sugiyama, Yasuhiko Minamide
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
82-92
Published: 2014
Released on J-STAGE: July 16, 2014
JOURNAL
FREE ACCESS
Most implementations of regular expression matching in programming languages are based on backtracking. With this implementation strategy, matching may not be achieved in linear time with respect to the length of the input. In the worst case, it may take exponential time. In this paper, we propose a method of checking whether or not regular expression matching runs in linear time. We construct a top-down tree transducer with regular lookahead that translates the input string into a tree corresponding to the execution steps of matching based on backtracking. The regular expression matching then runs in linear time if the tree transducer is of linear size increase. To check this property of the tree transducer, we apply a result of Engelfriet and Maneth. We implemented the method in OCaml and conducted experiments that checked the time linearity of regular expressions appearing in several popular PHP programs. Our implementation showed that 47 of 393 regular expressions were not linear.
View full abstract
-
Shingo Okuno, Tasuku Hiraishi, Hiroshi Nakashima, Masahiro Yasugi, Jun ...
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
93-110
Published: 2014
Released on J-STAGE: July 16, 2014
JOURNAL
FREE ACCESS
This paper proposes a parallel algorithm to extract all connected subgraphs, each of which shares a common itemset whose size is not less than a given threshold, from a given graph in which each vertex is associated to an itemset. We also propose implementations of this algorithm using the task-parallel language Tascell. This kind of graph mining can be applied to analysis of social or biological networks. We have already proposed an efficient sequential search algorithm called COPINE for this problem. COPINE reduces the search space of a dynamically growing tree structure by pruning its branches corresponding to the following subgraphs; already visited, having itemsets smaller than the threshold, and having already-visited supergraphs with identical itemsets. For the third pruning, we use a table associating already-visited subgraphs and their itemsets. To avoid excess pruning in a parallel search where a unique set of subtrees (tasks) is assigned to each worker, we should put a certain restriction on a worker when it is referring to a table entry registered by another worker. We designed a parallel algorithm as an extension of COPINE by introducing this restriction. A problem of the implementation is how workers efficiently share the table entries so that a worker can safely use as many entries registered by other workers as possible. We implemented two sharing methods: (1) a victim worker makes a copy of its own table and passes it to a thief worker when the victim spawns a task by dividing its task and assigns it to the thief, and (2) a single table controlled by locks is shared among workers. We evaluated these implementations using a real protein network. As a result, the single table implementation achieved a speedup of approximately a factor four with 16 workers.
View full abstract
-
Takayuki Kawamura, Kiminori Matsuzaki
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
111-121
Published: 2014
Released on J-STAGE: July 16, 2014
JOURNAL
FREE ACCESS
Tree data such as XML trees have recently been getting larger and larger. Parallel and distributed processing is a promising way of dealing with big data, but we need to divide the data in the first step. Since computation over trees often requires relationships between parents and children and/or among siblings, we should pay attention to such relationships. There is a technique called the “
m-bridge” for dividing trees. We can easily compute
m-bridges for trees of any shape. However, division with the
m-bridge technique is sometimes unsatisfactory for shallow XML trees. We propose a method of tree division for XML trees in this study, in which we apply the
m-bridge technique to a one-to-one corresponding binary tree. We implement the tree division algorithm using the Simple API for XML (SAX) Parser. An important feature of our algorithm is that we transform and divide XML trees in the order that the SAX parser reads the trees. We carried out experiments and discuss the properties of the tree division algorithm we propose. In addition, we discuss how we can use the divided trees with query examples.
View full abstract
-
Takashi Nakada, Kazuya Okamoto, Toshiya Komoda, Shinobu Miwa, Yohei Sa ...
Article type: Regular Papers
Subject area: Embedded System
2014 Volume 7 Pages
122-131
Published: 2014
Released on J-STAGE: August 21, 2014
JOURNAL
FREE ACCESS
Shifting to multi-core designs is so pervasive a trend to overcome the power wall and it is a necessary move for embedded systems in our rapidly evolving information society. Meanwhile, the need to increase the battery life and reduce maintenance costs for such embedded systems is very critical. Therefore, a wide variety of power reduction techniques have been proposed and realized, including Clock Gating, DVFS and Power Gating. To maximize the effectiveness of these techniques, task scheduling is a key but for multi-core systems it is very complicated due to the huge exploration space. This problem is a major obstacle for further power reduction. To cope with it, we propose a design method for embedded systems to minimize their energy consumption under performance constraints. This method is based on the clarification of properties of the above mentioned low power techniques and their interactions. In more details, we firstly establish energy models for these low power techniques and our target systems. We then explore for the best configuration by constructing an optimization problem especially for applications which have a longer deadline than the execution interval. Finally, we propose an approximate solution using dynamic programming with a lower computation complexity and compare it to a brute force explicit solution. We confirm with our evaluations that the proposed method successfully found a better configuration which reduces the total energy consumption by 32% if compared to the manually optimized configuration, which utilizes only one core.
View full abstract
-
Hua Vy Le Thanh, Xin Li
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
132-138
Published: 2014
Released on J-STAGE: September 01, 2014
JOURNAL
FREE ACCESS
Pushdown systems (PDSs) are well-understood as abstract models of recursive sequential programs, and weighted pushdown systems (WPDSs) are a general framework for solving certain meet-over-all-path problems in program analysis. Conditional WPDSs (CWPDSs) further extend WPDSs to enhance the expressiveness of WPDSs, in which each transition is guarded by a regular language over the stack that specifies conditions under which a transition rule can be applied. CWPDSs or its instance are shown to have wide applications in analysis of objected-oriented programs, access rights analysis, etc. Model checking CWPDSs was shown to be reduced to model checking WPDSs, and an offline algorithm was given that translates CWPSs to WPDSs by synchronizing the underlying PDS and finite state automata accepting regular conditions. The translation, however, can cause an exponential blow-up of the system. This paper presents an on-the-fly model checking algorithm for CWPDSs that synchronizes the computing machineries on-demand while computing post-images of regular configurations. We developed an on-the-fly model checker for CWPDSs and apply it to models generated from the reachability analysis of the HTML5 parser specification. Our preliminary experiments show that, the on-the-fly algorithm drastically outperforms the offline algorithm regarding both practical space and time efficiency.
View full abstract
-
Katsuya Kawanami, Noriyuki Fujimoto
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
139-147
Published: 2014
Released on J-STAGE: November 28, 2014
JOURNAL
FREE ACCESS
The longest common subsequence (LCS) for two given strings has various applications, such as for the comparison of deoxyribonucleic acid (DNA). In this paper, we propose a graphics processing unit (GPU) algorithm to accelerate Hirschberg's LCS algorithm improved with Crochemore et al.'s bit-parallel algorithm. Crochemore et al.'s algorithm includes bitwise logical operators, which can be computed easily in parallel because they have bitwise parallelism. However, Crochemore et al.'s algorithm also includes an operator with less parallelism, i.e., an arithmetic sum. In this paper, we focus on how to implement these operators efficiently in parallel and experimentally show the following results. First, the proposed GPU algorithm with a 2.67GHz Intel Core i7 920 CPU and GeForce GTX 580 GPU performs a maximum of 12.81 times faster than the bit-parallel CPU algorithm using a single-core 2.67GHz Intel Xeon X5550 CPU. Subsequently, the proposed GPU algorithm executes a maximum of 4.56 times faster than the bit-parallel CPU algorithm using a four-core 2.67GHz Intel Xeon X5550 CPU. Furthermore, the proposed algorithm with GeForce 8800 GTX performs 10.9 to 18.1 times faster than Kloetzli et al.'s existing GPU algorithm with the same GPU.
View full abstract
-
Akimasa Morihata, Kiminori Matsuzaki
Article type: Regular Papers
Subject area: Regular Paper
2014 Volume 7 Pages
148-156
Published: 2014
Released on J-STAGE: December 10, 2014
JOURNAL
FREE ACCESS
Parallel tree contraction is a well established method of parallel tree processing. There are efficient and useful algorithms for binary trees, including the Shunt contraction algorithm and one based on the
m-bridge decomposition method. However, for trees of unbounded degree, there are few practical tree contraction algorithms. The standard approach is “binarization, ” namely to translate the input tree to a full binary tree beforehand. To prevent the overhead introduced by binarization, we previously proposed the Rake-Shunt contraction algorithm (ICCS 2011), which is a generalization of the Shunt contraction algorithm to trees of unbounded degree. This paper further extends this result. The major contribution is to show that the Rake-Shunt contraction algorithm is a tree contraction algorithm that uses fewer types of primitive contraction operations if we assume the input tree has been binarized. This observation clarifies the connection between the Rake-Shunt contraction algorithm and those based on binarization. In particular, it enables us to translate a parallel program developed based on the Rake-Shunt contraction algorithm to one based on the
m-bridge decomposition method. Thus, we can choose whether to use binarization according to the situation.
View full abstract
-
Kenichi Kourai, Hisato Utsunomiya
Article type: Regular Papers
Subject area: Virtualization
2014 Volume 7 Pages
157-167
Published: 2014
Released on J-STAGE: December 18, 2014
JOURNAL
FREE ACCESS
Since Infrastructure-as-a-Service (IaaS) clouds contain many vulnerable virtual machines (VMs), intrusion detection systems (IDSes) should be run for all the VMs.
IDS offloading is promising for this purpose because it allows IaaS providers to run IDSes outside of VMs without any cooperation of the users. However, offloaded IDSes cannot continue to monitor their target VM when the VM is migrated to another host. In this paper, we propose
VMCoupler for enabling co-migration of offloaded IDSes and their target VM. Our approach is running offloaded IDSes in a special VM called a
guard VM, which can monitor the internals of a target VM using
VM introspection. VMCoupler can migrate a guard VM together with its target VM and restore the state of VM introspection at the destination. The migration processes of these two VMs are synchronized so that a target VM does not run without being monitored. We have confirmed that the overhead of monitoring and co-migration was small.
View full abstract
-
Akimasa Yoshida, Yuki Ochi, Nagatsugu Yamanouchi
Article type: Regular Papers
Subject area: Compiler
2014 Volume 7 Pages
168-178
Published: 2014
Released on J-STAGE: December 18, 2014
JOURNAL
FREE ACCESS
Multicore processors are widely used for various types of computers. In order to achieve high-performance on such multicore systems, it is necessary to extract coarse grain task parallelism from a target program in addition to loop parallelism. Regarding the development of parallel programs, Java or a Java-extension language represents an attractive choice recently, thanks to its performance improvement as well as its platform independence. Therefore, this paper proposes a parallel Java code generation scheme that realizes coarse grain task parallel processing with layer-unified execution control. In this parallel processing, coarse grain tasks of all layers are collectively managed through a dynamic scheduler. In addition, we have developed a prototype parallelizing compiler for Java programs with directives. In performance evaluations, the compiler-generated parallel Java code was confirmed to attain high performance. Concretely, we obtained 7.82 times faster speed-up for the Jacobi program, 7.38 times faster speed-up for the Turb3d program, 6.54 times faster speed-up for the Crypt program, and 6.15 times faster speed-up for the MolDyn program on eight cores of Xeon E5-2660.
View full abstract