In deep submicron (DSM) VLSI technologies, it is becoming increasingly harder for a copper based electrical interconnect fabric to satisfy the multiple design requirements of delay, power, bandwidth, and delay uncertainty. This is because electrical interconnects are becoming increasingly susceptible to parasitic resistance and capacitance with shrinking process technology and rising clock frequencies, which poses serious challenges for interconnect delay, power dissipation and reliability. On-chip communication architectures such as buses and networks-on-chip (NoC) that are used to enable inter-component communication in multi-processor systems-on-chip (MPSoC) designs rely on these electrical interconnects at the physical level, and are consequently faced with the entire gamut of challenges and drawbacks that plague copper-based electrical interconnects. To overcome the limitations of traditional copper-based electrical interconnects, several research efforts have begun looking at novel interconnect alternatives, such as on-chip optical interconnects, wireless interconnects and carbon nanotube-based interconnects. This paper presents an overview and current state of research for these three promising interconnect technologies. We also discuss the existing challenges for each of these technologies that remain to be resolved before they can be adopted as replacements for copper-based electrical interconnects in the future.
This paper proposes an efficient method to analyze the worst case interruption delay (WCID) of a workload running on modern microprocessors using a cycle accurate simulator (CAS). Our method is highly accurate because it simulates all possible cases inserting an interruption just before the retirement of every instruction executed in a workload. It is also (reasonably) efficient because it takes O(N log N) time for a workload with N executed instructions, instead of O(N2) of a straightforward iterative simulation of interrupted executions. The key idea for the efficiency is that a pair of executions with different interruption points has a set of durations in which they behave exactly coherent and thus one of simulations for the durations may be omitted. We implemented this method modifying the SimpleScalar tool set to prove it finds out WCID of workloads with five million executed instructions in reasonable time, less than 30 minutes, which would be 200-300 days by the straightforward method. Furthermore, our CAS-based analyzer may have a post process to calculate the WCID for multiple F interrupts with O(FN√N log N) time complexity.
Generally, there are periodic interrupt services such as periodic clock tick interrupts in the real-time embedded systems even though the system is in the idle state. To minimize the power consumption of idle state, power management therefore should consider the effect of periodic interrupt services. In this paper, we deal with this issue in two different cases. In case the periodic interrupt cannot be disabled, we formulate the power consumption of idle state, and propose static and dynamic approaches for the optimal frequency selection to save idle power. On the other hand, in case the periodic interrupt can be disabled, we propose the configurable clock tick to disable the interrupt service until the next task is released so that the processor can stay in the low power mode for longer time. The proposed approaches are implemented in a real-time OS; and its efficiency has been validated by theoretical calculations and actually measurements on an embedded processor.
This paper presents a study of GALS (Globally-Asynchronous Locally-Synchronous) architecture multi-core processor design with asynchronous interconnects. While GALS is expected to reduce more power dissipation, it has not been the mainstream of LSI design yet, since there have been no mature design tools for asynchronous circuit design. For GALS design, we constructed a design flow based on general synchronous design tools, by specification of design constraints and configurations. Applying the design flow to an experimental multi-core processor GALS design including an asynchronous interconnect based on QDI (Quasi Delay Insensitive) model, we successfully obtained a netlist and layout, and proved that the flow works correctly, by netlist simulation with delay information back-annotated from the layout. Experimental results show the area, power and throughput of the asynchronous interconnect to indicate the impact by introducing GALS architecture instead of globally synchronous design.
Since many embedded applications involve intensive mathematic operations, floating-point arithmetic units (FPU) have paramount importance in embedded systems. However, previous implementations of FPU either require much manual work or only support special functions (e.g. reciprocal, square root, logarithm, etc.). In this paper, we present an automatic method to synthesize general FPU by aligned partition. Based on the novel partition algorithm and the concept of grouping floating-point numbers into zones, our method supports general functions of wide, irreducible domain. The synthesized FPU achieves smaller area, higher frequency, and greater accuracy. Experimental results show that our method achieves 1) on average 90% smaller and 50% faster indexer than the conventional automatic method; 2) on the hyperbolic functions, 20k times smaller error rate and 50% use of LUTs and flip-flops than the conventional manual design.
In this paper, we propose a high-level synthesis method targeting distributed/shared-register architectures. Our method repeats (1) scheduling/FU binding, (2) register allocation, (3) register binding, and (4) module placement. By feeding back floorplan information from (4) to (1), our method obtains a distributed/shared-register architecture where its scheduling/binding as well as floorplaning are simultaneously optimized. Experimental results show that the area is decreased by 13.2% while maintaining the performance of the circuit equal with that using distributed-register architectures.
This paper presents a stuck-at fault test data compression using the scan flip flops with delay fault testability namely the Chiba scan flip-flops. The feature of the proposed method is two-stage test data compression. First, test data is compressed utilizing the structure of the Chiba scan flip flops (the first stage compression). Second, the compressed test data is further compressed by conventional test data compression utilizing X bits (the second stage compression). Evaluation shows that when Huffman test data compression is used in the second stage compression, the volume of test data for the proposed test data compression in ATE is reduced 35.8% in maximum, 25.7% on average of the one of the test data compressed by the conventional method. The difference of the area overhead of the proposed method from the conventional method is 9.5 percent point.
As a method to evaluate delay test quality of test patterns, SDQM (Statistical Delay Quality Model) has been proposed for transition faults. In order to derive better test quality by SDQM, the following two things are important: for each transition fault, (1) to find out the accurate length of the longest sensitizable paths along which the fault is activated and propagated, and (2) to generate a test pattern that detects the fault through as long paths as possible. In this paper, we propose a method to calculate the length of the potentially sensitizable longest path for detection of a transition fault. In addition, we develop a procedure to extract path information that helps high quality transition ATPG. Experimental results show that the proposed method improves SDQL (Statistical Delay Quality Level) by not only accurate calculation of the longest sensitizable paths but also detection of faults through longer paths.
In this paper, we propose an approximation method for the statistical MAX operation such that it results in a normal distribution good for the worst-case delay analysis. The important operation in SSTA is SUM and MAX of distributions. In general, the delay variation is modeled as normal distribution. The result of SUM operation of two normal distributions is also normal distribution. On the other hand, the result of MAX operation is not normal distribution. Thus approximation to normal distribution is commonly used. We also explain that the proposed MAX operation at each gate also contributes to the accurate estimation in the worst-case delay analysis of the whole circuit. Experimental results show that the proposed method leads to a good approximation for a normal distribution resulted from MAX operation of normal distributions with and without correlation, and the approximation improves the accuracy of the worst-case delay analysis. In a circuit example, the errors of worst-case delay computed by the previous method are about 20%, and the errors computed by the proposed method are under 5%.
This paper presents a software/hardware covalidation environment for embedded systems. Our covalidation environment consists of a simulation model of RTOS which fully supports services of ITRON, multiple hardware simulators, FPGA and a covalidation backplane. All of the simulators are executed concurrently with communication. The RTOS model can be executed on the host computer natively, therefore the software can be simulated much faster than on an instruction set simulator. FPGA can execute the hardware much faster than HDL simulators. With the RTOS model and FPGA, both application software and hardware can be validated in a short time. In the experiment, with using our covalidation environment, we perform covalidation of an MPEG4 decoder system and show the effectiveness of the covalidation environment.
This paper refines the alternate stacking technique used in Greibach-Friedman's proof of the language inclusion problem L(A) ⊆ L(B), where A is a pushdown automaton (PDA) and B is a superdeterministic pushdown automaton (SPDA). In particular, we propose a product construction of a simulating PDA M, whereas the one given by the original proof encoded everything as a stack symbol. This construction avoids the need for the “liveness” condition in the alternate stacking technique, and the correctness proof becomes simpler.
This paper proposes an instruction pre-execution scheme for a high performance processor, that reduces latency and early scheduling of loads. Our scheme exploits the difference between the amount of instruction-level parallelism available with an unlimited number of physical registers and that available with an actual number of physical registers. We introduce the two-step physical register deallocation scheme, which deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a shortage of physical registers. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions, that enables prefetching of load data and early calculation of memory effective addresses. Our evaluation results show that our scheme improves the performance significantly, and achieves a 1.26 times speedup over a processor without a prefetcher. If combined with a stride prefetcher, it achieves a 1.18 times speedup over a processor with a stride prefetcher.
Group signature schemes with membership revocation have been intensively researched. In this paper, we propose revocable group signature schemes with less computational costs for signing/verification than an existing scheme without the update of the signer's secret keys. The key idea is the use of a small prime number embedded in the signer's secret key. By using simple prime integer relations between the secret prime and the public product of primes for valid or invalid members, the revocation is efficiently ensured. To show the practicality, we implemented the schemes and measured the signing/verification times in a common PC (Core2 DUO 2.13GHz). The times are all less than 0.5 seconds even for relatively large groups (10, 000 members), and thus our schemes are sufficiently practical.
In 2006, Bleichenbacher presented a new forgery attack against the signature scheme RSASSA-PKCS1-v1_5. The attack allows an adversary to forge a signature on almost arbitrary messages, if an implementation is not proper. Since the example was only limited to the case when the public exponent is 3 and the bit-length of the public composite is 3, 072, the potential threat is not known. This paper analyzes Bleichenbacher's forgery attack and shows applicable composite sizes for given exponents. Moreover, we extend Bleichenbacher's attack and show that when 1, 024-bit composite and the public exponent 3 are used, the extended attack succeeds the forgery with the probability 2-16.6.
Many group key management schemes that reduce the total communication cost and/or the computational cost imposed on client devices have been proposed. However, optimizations of the key-management structure have not been studied. This paper proposes ways to optimize the key-management structure in a hybrid group key management scheme. The proposed method is able to minimize both the total communication cost and the computational cost imposed on client devices. First, we propose a probabilistic client join/leave model in order to evaluate the communication and computational costs of group key management schemes. This model idealizes client actions generally and considers the existence of the peaks of the joining/leaving frequency. Thus, we can analyze not only the average case scenario but also the worst case scenario using this model. Then, we formalize the total computation cost and the computational cost imposed on client devices in group key management schemes under the model. We present both an average case analysis and a worst case analysis. Finally, we show the parameters that minimize the total communication cost and the computational cost imposed on clients under the model. Our results should be useful in designing a secure group communication system for large and dynamic groups.
Elliptic curve cryptosystems can be constructed over a smaller definition field than the ElGamal cryptosystems or the RSA cryptosystems. This is why elliptic curve cryptosystems have begun to attract notice. This paper explores an efficient fixed-point-scalar-multiplication algorithm for the case of a definition field Fp(p>3) and adjusts coordinates to the algorithm proposed by Lim and Lee. Our adjusted algorithm can give better performance than the previous algorithm in some sizes of the pre-computed tables.
We developed haptic enabled widgets for immersive environment: the HapWi concept. It enables task mode switching, and intuitive interaction with a virtual environment. In this paper we present a comparative study of our interaction method and a keyboard based interaction method with locomotion tasks. With our virtual widgets, the user can perform in a immersive environment context as well as with a keyboard (in the same immersive environment), even for a task that is really designed for keyboard users. A study of haptic feedback will demonstrate the need for the user to understand and stabilize his actions in the virtual world and will emphasizes the need to adapt haptic rendering to individual specificity.
This paper brings out a robust pixel-wise object tracking algorithm which is based on the K-means clustering algorithm. In order to achieve the robust object tracking under complex condition (such as wired objects, cluttered background), a new reliability-based K-means clustering algorithm is applied to remove the noise background pixel (which is neigher similar to the target nor the background samples) from the target object. According to the triangular relationship among an unknown pixle and its two nearest cluster centers (target and background), the normal pixel (target or background one) will be assigned with high reliability value and correctly classified, while noise pixels will be given low reliability value and ignored. A radial sampling method is also brought out for improving both the processing speed and the robustness of this algorithm. According to the proposed algorithm, we have set up a real video-rate object tracking system. Through the extensive experiments, the effectiveness and advantages of this reliability-based K-means tracking algorithm are confirmed.
Recent progress in robotics has a great potential, and we are considering to develop a dancing humanoid robot for entertainment robots. In this paper, we propose three fundamental methods of a dancing robot aimed at the sound feedback system in which a robot listens to music and automatically synthesizes dance motion based on the musical features. The first study analyzes the relationship between motion and musical rhythm and extracts important features for imitating human dance motion. The second study models modification of upper body motion based on the speed of played music so that a humanoid robot dances to a faster musical speed. The third study automatically synthesizes dance performance that reflects both rhythm and mood of input music.
Problem solving is understood as a process through which states of problem solving are transferred from the initial state to the goal state by applying adequate operators. Within this framework, knowledge and strategies are given as operators for the search. One of the most important points of researchers' interest in the domain of problem solving is to explain the performance of problem solving behavior based on the knowledge and strategies that the problem solver has. We call the interplay between problem solvers' knowledge/strategies and their behavior the causal relation between mental operations and behavior. It is crucially important, we believe, for novice learners in this domain to understand the causal relation between mental operations and behavior. Based on this insight, we have constructed a learning system in which learners can control mental operations of a computational agent that solves a task, such as knowledge, heuristics, and cognitive capacity, and can observe its behavior. We also introduce this system to a university class, and discuss which findings were discovered by the participants.
We introduce software that recognizes, extracts, and displays expressions concerning atomic and molecular data from academic papers in the electronic form. This software includes a toolbar application that can be installed in Internet Explorer (IE). This toolbar can be used by scientific readers and researchers to highlight, color-code, and collect important expressions more easily. Those expressions include atomic and molecular symbols (e.g., Xe+ and H2O) and electron configurations(e.g., 4d95s25p) from the atomic and molecular data of a large number of academic papers. We confirmed by experiments that the software could find important expressions with high precision (0.8-1.0). This software is also useful for compiling databases of atomic and molecular data, which is important for plasma simulations, because the simulations critically depend on atomic and molecular data, including the energy levels and collisional and radiative rate coefficients.
Distributional similarity is a widely adopted concept to compute lexical semantic relatedness of words. Whereas the calculation is based on the distributional hypothesis and utilizes contextual clues of words, little attention has been paid to what kind of contextual information is effective for the purpose. As one of the ways to extend contextual information, we pay attention to the use of indirect dependency, where two or more words are related via several contiguous dependency relations. We have investigated the effect of indirect dependency using automatic synonym acquisition task, and shown that the performance can be improved by using indirect dependency in addition to normal direct dependency. We have also verified its effectiveness under various experimental settings including weight functions, similarity measures, and context representations, and shown that context representations which incorporate richer syntactic information are more effective.
This paper discusses a segmentation approach of Mongolian for Cyrillic text for machine translation. Using this method, the processing of one-to-one word permutation between the variations of Mongolian and other languages, especially Altaic family languages like Japanese, becomes easier. Furthermore, it can be used for two-way conversion between texts of Mongolian used in different regions and counties, such as Mongolia and China. Our system has been implemented based on DP (dynamic programming) matching supported by knowledge-based sequence matching, referred to as a multilingual dictionary and linguistic rule bank (LRB), and a data-driven approach of the target language corpus (TLC). For convenience, NM (New Mongolian) is treated as the source language, and TM (Traditional Mongolian) and Todo as the target language in this test. Our application was tested using manually transcribed texts with sizes of 5,000 sentences paralleled from NM to TM and Todo. We found that our method could achieve 91.9% of the transformation accuracy for “NM” to “TM” and 94.3% for “NM” to “Todo”.
Distributional similarity is a widely adopted concept to capture the semantic relatedness of words based on their context in various NLP tasks. While accurate similarity calculation requires a huge number of context types and co-occurrences, the contribution to the similarity calcualtion depends on individual context types, and some of them even act as noise. To select well-performing context and alleviate the high computational cost, we propose and investigate the effectiveness of three context selection schemes: category-based, type-based, and co-occurrence based selection. Categorybased selection is a conventional, simplest selection method which limits the context types based on the syntactic category. Finer-grained, type-based selection assigns importance scores to each context type, which we make possible by proposing a novel formalization of distibutional similarity as a classification problem, and applying feature selection techniques. The finest-grained, co-occurrence based selection assigns importance scores to each co-occurrence of words and context types. We evaluate the effectiveness and the trade-off between co-occurrence data size and synonym acquisition performance. Our experiments show that, on the whole, the finest-grained, co-occurrence based selection achieves better performane, although some of the simple category-based selection show comparable performance/cost trade-off.
Given independent multiple access logs, we develop a mathematical model to identify the number of malicious hosts in the current Internet. In our model, the number of malicious hosts is formalized as a function taking two inputs, namely the duration of observation and the number of sensors. Assuming that malicious hosts with statically assigned global addresses perform random port scans to independent sensors uniformly distributed over the address space, our model gives the asymptotic number of malicious source addresses in two ways. Firstly, it gives the cumulative number of unique source addresses in terms of the duration of observation. Secondly, it estimates the cumulative number of unique source addresses in terms of the number of sensors. To evaluate the proposed method, we apply the mathematical model to actual data packets observed by ISDAS distributed sensors over a one-year duration from September 2004, and check the accuracy of identification of the number of malicious hosts.
Packet filtering in firewalls is one of the useful techniques for network security. This technique examines network packets and determines whether to accept or deny them based on an ordered set of filters. If conflicts exist in filters of a firewall, for example, one filter is never executed because of the prevention of a preceding filter, the behavior of the firewall might be different from the administrator's intention. For this reason, it is necessary to detect conflicts in a set of filters. Previous researches that focused on detecting conflicts in filters paid considerable attention to conflicts caused by one filter affecting another, but they did not consider conflicts caused by a combination of multiple filters. We developed a method of detecting conflicts caused by a combination of filters affecting another individual filter based on their spatial relationships. We also developed two methods of finding all requisite filter combinations from a given combination of filters that intrinsically cause errors to another filter based on top-down and bottom-up algorithms. We implemented prototype systems to determine how effective the methods we developed were. The experimental results revealed that the detecting conflicts method and the method of finding all requisite filter combinations based on the bottom-up algorithm can be used for practical firewall policies.
To secure a network, ideally, all software in the computers should be updated. However, especially in a server farm, we have to cope with unresolved vulnerabilities due to software dependencies. Therefore, it is necessary to understand the vulnerabilities inside the network. Existing methods require IP reachability and dedicated software to be installed in the managed computers. In addition, existing approaches cannot detect vulnerabilities of underlying libraries and uniformly control the communication between computers based only on the vulnerability score. We propose a lightweight vulnerability management system (LWVMS) based on a self-enumeration approach. This LWVMS allows administrators to configure their own network security policy flexibly. It complies with existing standards, such as IEEE802.1X and EAP-TLS, and can operate in existing corporate networks. Since LWVMS does not require IP reachability between the managed server and management servers, it can reduce the risk of invasion and infection in the quarantine phase. In addition, LWVMS can control the connectivity based on both the vulnerabilities of respective components and the network security policy. Since this system can be implemented by a slight modification of open-source software, the developers can implement this system to fit their network more easily.
Computer worms randomly perform port scans to find vulnerable hosts to intrude over the Internet. Malicious software varies its port-scan strategy, e.g., some hosts intensively perform scans on a particular target and some hosts scan uniformly over IP address blocks. In this paper, we propose a new automated worm classification scheme from distributed observations. Our proposed scheme can detect some statistics of behavior with a simple decision tree consisting of some nodes to classify source addresses with optimal threshold values. The choice of thresholds is automated to minimize the entropy gain of the classification. Once a tree has been constructed, the classification can be done very quickly and accurately. In this paper, we analyze a set of source addresses observed by the distributed 30 sensors in ISDAS for a year in order to clarify a primary statistics of worms. Based on the statistical characteristics, we present the proposed classification and show the performance of the proposed scheme.
This paper proposes the Orthogonal Frequency Division Multiple Access (OFDMA)transmission scheme withan adaptive slot and bit allocation method which can provide the broadband wireless LAN (WLAN) systems. From thefact that the radio links between di erent users and a base station experience the independent multi-path fading, this paperproposes a priority-based adaptive slot and bit allocation method in conjunction with an allocation adjustment method. The proposed method can achieve the higher e cient usage of frequency resource than those for the conventional xed slot as-signment with adaptive bit allocation method and the xed bit assignment with adaptive slot allocation method. This paperpresents various computer simulation results to evaluate the proposed method and demonstrates that the proposed method can achieve the higher frequency resourceutilization with less computation complexity in the severe frequency selectivemulti-path fading channels.