Although ray tracing is the best approach to high-quality image synthesis, much time is required to generate images due to its huge amount of computation. In particular, ray-primitive intersection tests still dominate the execution time required for ray tracing, and faster ray-primitive intersection algorithms are strongly required to interactively generate higher-quality images with more advanced effects. This paper presents a new fast algorithm for the intersection tests that makes a good use of ray and object coherence in ray tracing. The proposed algorithm utilizes the features whereby the rays in a bundle share the same origin and have massive coherence. By reducing the redundant calculations in the innermost intersection tests for the bundles by precomputation and early termination, the proposed algorithm accelerates the intersection tests. Experimental results show that the proposed algorithm achieves 1.43 times faster intersection tests compared with Möller's algorithm by exploiting the features of the bundles of rays.
Integrating new resource management policies into operating systems (OSes) is an ongoing process. Despite innovative policy proposals being developed, it is unrealistic to widely deploy a new one because it is a difficult, costly and often an impractical endeavor to modify an existing operating system to integrate a new policy. To address this problem, we explore the possibility of using virtual machine technology to incorporate a new policy into an existing OS without the need to make any changes to it. This paper describes FoxyTechnique, which virtualizes physical devices differently from real ones and tricks a guest OS into producing behavior similar to a desired policy. FoxyTechnique has three advantages. First, it allows us to implement a new policy without the need to make any changes to OS kernels. Second, Foxy-based policies are expected to be portable across different operating systems because they are isolated from guest OSes by stable virtual hardware interfaces. Finally, Foxy-based policies sometimes outperform guest OS policies because they can measure performance indicators more accurately than guest OSes. To demonstrate the usefulness of FoxyTechnique, we conducted two case studies, FoxyVegas and FoxyIdle, on the Xen virtual machine monitor. FoxyVegas and FoxyIdle tricked the original Linux and successfully mimicked TCP Vegas and Idletime scheduling, respectively.
Many service providers distribute various kinds of content over the Internet. They often use replica servers to provide a stable service. To position them appropriately, service providers must predict the demands for their services and provide a computing capacity sufficient for servicing the demands. Unfortunately, predicting demands is difficult because the demand for a service usually fluctuates. Our research group is developing ExaPeer, an infrastructure that apportions computing capacity to services running on hundreds or thousands of trusted machines all over the Internet. ExaPeer allocates computing capacity to the services running on top of ExaPeer. In this paper, we describe ExaPeer's approach to dynamically select candidate spots for replica servers. This approach, called EPSS (ExaPeer Server Selection), detects fluctuations in the demand and dynamically selects candidate spots on which replica servers should be placed to best meet the demand. Even if the demand on a service increases dramatically within a short term, EPSS quickly responds to the situation and rapidly selects many candidates for replica servers. To deal with short-term fluctuations, EPSS is designed to be lightweight; each machine determines whether to be a candidate spot independently of others. Experimental results demonstrate that the candidate spots selected by EPSS work better than those selected by heuristics even if the scale of the demand changes rapidly. For 90% of all client accesses, the round-trip-times (RTTs) with EPSS were 23% less than those with randomly selected machines even if the number of machines is 6.7 times larger than EPSS.
This paper refines the alternate stacking technique used in Greibach-Friedman's proof of the language inclusion problem L(A) ⊆ L(B), where A is a pushdown automaton (PDA) and B is a superdeterministic pushdown automaton (SPDA). In particular, we propose a product construction of a simulating PDA M, whereas the one given by the original proof encoded everything as a stack symbol. This construction avoids the need for the “liveness” condition in the alternate stacking technique, and the correctness proof becomes simpler.
We propose a novel extension of Brzozowski's derivative of regular expressions, called product derivatives. It takes two regular expressions and the meaning of its result is stated as follows: a product derivative of R with respect to S is a regular expression obtained by the consumption of all the sequences in the first regular expression by the second, i.e., the result is the product of Brzozowski's derivatives of R with respect to sequences in S. We develop an algorithm for our derivatives in coinductive manner. The termination of the algorithm is shown based on the proof method by Brandt and Henglein. We expect the derivative proposed in this talk is an important step to the development of the derivatives of a regular hedge expression taking a regular hedge expression as input, which is useful for transformation of XML documents with function terms typed with regular hedge expressions. The situation, for example, occurs in the transformation of XML documents with the links to the other Web services, such as Active XML documents.
This paper brings out a robust pixel-wise object tracking algorithm which is based on the K-means clustering algorithm. In order to achieve the robust object tracking under complex condition (such as wired objects, cluttered background), a new reliability-based K-means clustering algorithm is applied to remove the noise background pixel (which is neigher similar to the target nor the background samples) from the target object. According to the triangular relationship among an unknown pixle and its two nearest cluster centers (target and background), the normal pixel (target or background one) will be assigned with high reliability value and correctly classified, while noise pixels will be given low reliability value and ignored. A radial sampling method is also brought out for improving both the processing speed and the robustness of this algorithm. According to the proposed algorithm, we have set up a real video-rate object tracking system. Through the extensive experiments, the effectiveness and advantages of this reliability-based K-means tracking algorithm are confirmed.
Recent progress in robotics has a great potential, and we are considering to develop a dancing humanoid robot for entertainment robots. In this paper, we propose three fundamental methods of a dancing robot aimed at the sound feedback system in which a robot listens to music and automatically synthesizes dance motion based on the musical features. The first study analyzes the relationship between motion and musical rhythm and extracts important features for imitating human dance motion. The second study models modification of upper body motion based on the speed of played music so that a humanoid robot dances to a faster musical speed. The third study automatically synthesizes dance performance that reflects both rhythm and mood of input music.
This paper proposes an instruction pre-execution scheme for a high performance processor, that reduces latency and early scheduling of loads. Our scheme exploits the difference between the amount of instruction-level parallelism available with an unlimited number of physical registers and that available with an actual number of physical registers. We introduce the two-step physical register deallocation scheme, which deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a shortage of physical registers. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions, that enables prefetching of load data and early calculation of memory effective addresses. Our evaluation results show that our scheme improves the performance significantly, and achieves a 1.26 times speedup over a processor without a prefetcher. If combined with a stride prefetcher, it achieves a 1.18 times speedup over a processor with a stride prefetcher.
Recently, the phenomenon which quantifies the agricultural and water management practices from remote sensing (RS) imagery, has been adapted to help the policy makers and farm/water managers to make better operational decisions. SWAP-GA is a combined model of the SWAP (Soil Water Atmosphere and Plant) crop model and the Remote Sensing (RS) data assimilation technique, which is optimized by Genetic Algorithm (GA). However, to run the SWAP-GA model on a single PC requires a massive amount of processing time. Based on the above observation, distributed or parallel computing can be a preeminent and convincing solution. At present, Multi-cluster Grids have emerged as the most popular type of distributed computing system. However, the performances of different parallelization methodologies on the Grid have not been discussed thoroughly. This paper presents the implementation of the SWAP-GA application and discusses its impact on the performance of parallelization methods.