With the miniaturization and high performance of current and future LSIs, demand for portable devices has much more increased. Especially the problems of battery runtime and device overheating have occurred. In addition, with the downsize of the LSI design process, the ratio of an interconnection delay to a gate delay has continued to increase. High-level synthesis to estimate the interconnection delays and reduce energy consumption is essential. In this paper, we propose a high-level synthesis algorithm based on HDR architectures (huddle-based distributed register architectures) utilizing multi-stage clock gating. By increasing the number of clock gating stages in each huddle, we increase the number of the control steps at which we can apply the clock gating to registers. We can determine the configuration of the clock gating with optimized energy consumption. The experimental results demonstrate that our proposed algorithm reduced energy consumption by up to 27.7% compared with conventional algorithms.
The longest common subsequence (LCS) for two given strings has various applications, such as for the comparison of deoxyribonucleic acid (DNA). In this paper, we propose a graphics processing unit (GPU) algorithm to accelerate Hirschberg's LCS algorithm improved with Crochemore et al.'s bit-parallel algorithm. Crochemore et al.'s algorithm includes bitwise logical operators, which can be computed easily in parallel because they have bitwise parallelism. However, Crochemore et al.'s algorithm also includes an operator with less parallelism, i.e., an arithmetic sum. In this paper, we focus on how to implement these operators efficiently in parallel and experimentally show the following results. First, the proposed GPU algorithm with a 2.67GHz Intel Core i7 920 CPU and GeForce GTX 580 GPU performs a maximum of 12.81 times faster than the bit-parallel CPU algorithm using a single-core 2.67GHz Intel Xeon X5550 CPU. Subsequently, the proposed GPU algorithm executes a maximum of 4.56 times faster than the bit-parallel CPU algorithm using a four-core 2.67GHz Intel Xeon X5550 CPU. Furthermore, the proposed algorithm with GeForce 8800 GTX performs 10.9 to 18.1 times faster than Kloetzli et al.'s existing GPU algorithm with the same GPU.
Parallel tree contraction is a well established method of parallel tree processing. There are efficient and useful algorithms for binary trees, including the Shunt contraction algorithm and one based on the m-bridge decomposition method. However, for trees of unbounded degree, there are few practical tree contraction algorithms. The standard approach is “binarization, ” namely to translate the input tree to a full binary tree beforehand. To prevent the overhead introduced by binarization, we previously proposed the Rake-Shunt contraction algorithm (ICCS 2011), which is a generalization of the Shunt contraction algorithm to trees of unbounded degree. This paper further extends this result. The major contribution is to show that the Rake-Shunt contraction algorithm is a tree contraction algorithm that uses fewer types of primitive contraction operations if we assume the input tree has been binarized. This observation clarifies the connection between the Rake-Shunt contraction algorithm and those based on binarization. In particular, it enables us to translate a parallel program developed based on the Rake-Shunt contraction algorithm to one based on the m-bridge decomposition method. Thus, we can choose whether to use binarization according to the situation.
Since Infrastructure-as-a-Service (IaaS) clouds contain many vulnerable virtual machines (VMs), intrusion detection systems (IDSes) should be run for all the VMs. IDS offloading is promising for this purpose because it allows IaaS providers to run IDSes outside of VMs without any cooperation of the users. However, offloaded IDSes cannot continue to monitor their target VM when the VM is migrated to another host. In this paper, we propose VMCoupler for enabling co-migration of offloaded IDSes and their target VM. Our approach is running offloaded IDSes in a special VM called a guard VM, which can monitor the internals of a target VM using VM introspection. VMCoupler can migrate a guard VM together with its target VM and restore the state of VM introspection at the destination. The migration processes of these two VMs are synchronized so that a target VM does not run without being monitored. We have confirmed that the overhead of monitoring and co-migration was small.
Multicore processors are widely used for various types of computers. In order to achieve high-performance on such multicore systems, it is necessary to extract coarse grain task parallelism from a target program in addition to loop parallelism. Regarding the development of parallel programs, Java or a Java-extension language represents an attractive choice recently, thanks to its performance improvement as well as its platform independence. Therefore, this paper proposes a parallel Java code generation scheme that realizes coarse grain task parallel processing with layer-unified execution control. In this parallel processing, coarse grain tasks of all layers are collectively managed through a dynamic scheduler. In addition, we have developed a prototype parallelizing compiler for Java programs with directives. In performance evaluations, the compiler-generated parallel Java code was confirmed to attain high performance. Concretely, we obtained 7.82 times faster speed-up for the Jacobi program, 7.38 times faster speed-up for the Turb3d program, 6.54 times faster speed-up for the Crypt program, and 6.15 times faster speed-up for the MolDyn program on eight cores of Xeon E5-2660.
This paper focuses on initializing 3-D reconstruction from scratch without any prior scene information. Traditionally, this has been done from two-view matching, which is prone to the degeneracy called “imaginary focal lengths.” We overcome this difficulty by using three images, but we do not require three-view matching; all we need is three fundamental matrices separately computed from pair-wise image matching. We exploit the redundancy of the three fundamental matrices to optimize the camera parameters and the 3-D structure. The main theme of this paper is to give an analytical procedure for computing the positions and orientations of the three cameras and their internal parameters from three fundamental matrices. The emphasis is on resolving the ambiguity of the solution resulting from the sign indeterminacy of the fundamental matrices. We do numerical simulation to show that imaginary focal lengths are less likely for our three view methods, resulting in higher accuracy than the conventional two-view method. We also test the degeneracy tolerance capability of our method by using endoscopic intestine tract images, for which the camera configuration is almost always nearly degenerate. We demonstrate that our method allows us to obtain more detailed intestine structures than two-view reconstruction and observe how our three-view reconstruction is refined by bundle adjustment. Our method is expected to broaden medical applications of endoscopic images.
This paper describes part of an ongoing comprehensive research project that is aimed at generating a MathML format from images of mathematical expressions that have been extracted from scanned PDF documents. A MathML representation of a scanned PDF document reduces the document's storage size and encodes the mathematical notation and meaning. The MathML representation then becomes suitable for vocalization and accessible through the use of assistive technologies. In order to achieve an accurate layout analysis of a scanned PDF document, all textual and non-textual components must be recognised, identified and tagged. These components may be text or mathematical expressions and graphics in the form of images, figures, tables and/or diagrams. Mathematical expressions are one of the most significant components within scanned scientific and engineering PDF documents and need to be machine readable for use with assistive technologies. This research is a work in progress and includes multiple different modules: detecting and extracting mathematical expressions, recursive primitive component extraction, non-alphanumerical symbols recognition, structural semantic analysis and merging primitive components to generate the MathML of the scanned PDF document. An optional module converts MathML to audio format using a Text to Speech engine (TTS) to make the document accessible for vision-impaired users.
The technique of “renormalization” for geometric estimation attracted much attention when it appeared in early 1990s for having higher accuracy than any other then known methods. The key fact is that it directly specifies equations to solve, rather than minimizing some cost function. This paper expounds this “non-minimization approach” in detail and exploits this principle to modify renormalization so that it outperforms the standard reprojection error minimization. Doing a precise error analysis in the most general situation, we derive a formula that maximizes the accuracy of the solution; we call it hyper-renormalization. Applying it to ellipse fitting, fundamental matrix computation, and homography computation, we confirm its accuracy and efficiency for sufficiently small noise. Our emphasis is on the general principle, rather than on individual methods for particular problems.
This paper proposes a novel noise-aware character alignment method for automatically extracting transliteration fragments in phrase pairs that are extracted from parallel corpora. The proposed method extends a many-to-many Bayesian character alignment method by distinguishing transliteration (signal) parts from non-transliteration (noise) parts. The model can be trained efficiently by a state-based blocked Gibbs sampling algorithm with signal and noise states. The proposed method bootstraps statistical machine transliteration using the extracted transliteration fragments to train transliteration models. In experiments using Japanese-English patent data, the proposed method was able to extract transliteration fragments with much less noise than an IBM-model-based baseline, and achieved better transliteration performance than sample-wise extraction in transliteration bootstrapping.
Many knowledge acquisition tasks are tightly dependent on fundamental analysis technologies, such as part of speech (POS) tagging and parsing. Dependency parsing, in particular, has been widely employed for the acquisition of knowledge related to predicate-argument structures. For such tasks, the dependency parsing performance can determine quality of acquired knowledge, regardless of target languages. Therefore, reducing dependency parsing errors and selecting high quality dependencies is of primary importance. In this study, we present a language-independent approach for automatically selecting high quality dependencies from automatic parses. By considering several aspects that affect the accuracy of dependency parsing, we created a set of features for supervised classification of reliable dependencies. Experimental results on seven languages show that our approach can effectively select high quality dependencies from dependency parses.
In this paper, we propose a two-step algorithm to perform a community detection in scale-free networks. One of the main characteristics of scale-free networks is that node degree distribution follows a power law. However, during our own experiments, we encountered another sub-type of scale-free networks which we call “mixed scale-free networks”. Some communities have hub nodes and node degree follows power law distribution, while some communities do not have hub nodes and node degree follows normal distribution. For mixed scale-free networks, methods that do not specifically design for scale-free will have difficulties because of the scale-free properties. At the same time, scale-free based methods will have difficulties because some communities have node degree follows normal distribution. In this research, we propose a community detection algorithm that can work on networks that contain both types of communities at the same time. Our method can handle this case correctly because our algorithm performs both scale-free and non scale-free approaches iteratively. To evaluate our method, we use NMI - Normalized Mutual Information - to measure our results on both synthetic and real-world datasets comparing with both scale-free and non scale-free community detection methods. The results show that, our method outperforms baseline methods on mixed scale-free networks and scale-free networks while performs equally on networks with normal degree distribution.
MOOC is a crucial platform for improving education; students are able to obtain various educational presentation contents through the Web. Recently, Prezi introduced a zoomable canvas as a substitute to the traditional presentations that allows users to zoom in and out of the presentation media. Teachers then attempt to provide presentations in a nonlinear fashion for enhancing the user interaction through these presentations; however, creation of nonlinear presentations would be time-consuming, besides posing design challenges. Therefore, we have developed a novel support system for grasping overviews of presentation slides, it generates a meaningfully structured presentation, called iPoster; this enables users to automatically navigate through the slide-based educational contents. The system places elements such as text and graphics of presentation slides in a structural layout by semantically analyzing the slide structure. The structural layout can reveal the hierarchy of elements based on topic structure by moving from the overview to a detail using automatic transitions, such as zooms and pans. Through this, the iPoster can support students to interactively browse online presentation slides for grasping an overview; it would substantially help the students navigate the presentation slides effectively for their learning purposes. In this paper, we discuss our interactive poster (iPoster) generation method and we have also included an evaluation of our method's effectiveness.
Barcode reading mobile applications to identify products from pictures acquired by mobile devices are widely used by customers from all over the world to perform online price comparisons or to access reviews written by other customers. Most of the currently available 1D barcode reading applications focus on effectively decoding barcodes and treat the underlying detection task as a side problem that needs to be solved using general purpose object detection methods. However, the majority of mobile devices do not meet the minimum working requirements of those complex general purpose object detection algorithms and most of the efficient specifically designed 1D barcode detection algorithms require user interaction to work properly. In this work, we present a novel method for 1D barcode detection in camera captured images, based on a supervised machine learning algorithm that identifies the characteristic visual patterns of 1D barcodes' parallel bars in the two-dimensional Hough Transform space of the processed images. The method we propose is angle invariant, requires no user interaction and can be effectively executed on a mobile device; it achieves excellent results for two standard 1D barcode datasets: WWU Muenster Barcode Database and ArTe-Lab 1D Medium Barcode Dataset. Moreover, we prove that it is possible to enhance the performance of a state-of-the-art 1D barcode reading library by coupling it with our detection method.
The inference of genetic networks is a problem to obtain mathematical models that can explain observed time-series of gene expression levels. A number of models have been proposed to describe genetic networks. The S-system model is one of the most studied models among them. Due to its advantageous features, numerous inference algorithms based on the S-system model have been proposed. The number of the parameters in the S-system model is however larger than those of the other well-studied models. Therefore, when trying to infer S-system models of genetic networks, we need to provide a larger amount of gene expression data to the inference method. In order to reduce the amount of gene expression data required for an inference of genetic networks, this study simplifies the S-system model by fixing some of its parameters to 0. In this study, we call this simplified S-system model a reduced S-system model. We then propose a new inference method that estimates the parameters of the reduced S-system model by minimizing two-dimensional functions. Finally, we check the effectiveness of the proposed method through numerical experiments on artificial and actual genetic network inference problems.
Sequencing the whole genome of various species has many applications, not only in understanding biological systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.