IPSJ Transactions on System and LSI Design Methodology
Online ISSN : 1882-6687
ISSN-L : 1882-6687
Volume 1
Displaying 1-12 of 12 articles from this issue
  • Hidetoshi Onodera
    Article type: Editorial
    Subject area: Editorial
    2008Volume 1 Pages 1
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    Download PDF (28K)
  • Sudeep Pasricha, Nikil Dutt
    Article type: Invited Papers
    Subject area: Invited Paper
    2008Volume 1 Pages 2-17
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    In deep submicron (DSM) VLSI technologies, it is becoming increasingly harder for a copper based electrical interconnect fabric to satisfy the multiple design requirements of delay, power, bandwidth, and delay uncertainty. This is because electrical interconnects are becoming increasingly susceptible to parasitic resistance and capacitance with shrinking process technology and rising clock frequencies, which poses serious challenges for interconnect delay, power dissipation and reliability. On-chip communication architectures such as buses and networks-on-chip (NoC) that are used to enable inter-component communication in multi-processor systems-on-chip (MPSoC) designs rely on these electrical interconnects at the physical level, and are consequently faced with the entire gamut of challenges and drawbacks that plague copper-based electrical interconnects. To overcome the limitations of traditional copper-based electrical interconnects, several research efforts have begun looking at novel interconnect alternatives, such as on-chip optical interconnects, wireless interconnects and carbon nanotube-based interconnects. This paper presents an overview and current state of research for these three promising interconnect technologies. We also discuss the existing challenges for each of these technologies that remain to be resolved before they can be adopted as replacements for copper-based electrical interconnects in the future.
    Download PDF (942K)
  • Sachin S. Sapatnekar
    Article type: Invited Papers
    Subject area: Invited Paper
    2008Volume 1 Pages 18-32
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    With each technology generation, the effects of on-chip variations are seen to more profoundly affect digital circuit behavior. These variations may arise from fluctuations attributed to the manufacturing process (e.g., drifts in channel length, oxide thickness, threshold voltage, or doping concentration), which affect the circuit yield, as well as variations in the environmental operating conditions (e.g., supply voltage or temperature) after the circuit is manufactured, which impact the performance of the design. These effects can cause unacceptable alterations in circuit performance parameters such as timing and power, and variation-tolerant design is imperative for next-generation designs. This paper overviews research in this area, describing methods for the analysis and optimization of statistical effects.
    Download PDF (817K)
  • Hiroshi Nakashima, Masahiro Konishi, Takashi Nakada
    Article type: System-Level Performance Analysis
    Subject area: Regular Paper
    2008Volume 1 Pages 33-47
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    This paper proposes an efficient method to analyze the worst case interruption delay (WCID) of a workload running on modern microprocessors using a cycle accurate simulator (CAS). Our method is highly accurate because it simulates all possible cases inserting an interruption just before the retirement of every instruction executed in a workload. It is also (reasonably) efficient because it takes O(N log N) time for a workload with N executed instructions, instead of O(N2) of a straightforward iterative simulation of interrupted executions. The key idea for the efficiency is that a pair of executions with different interruption points has a set of durations in which they behave exactly coherent and thus one of simulations for the durations may be omitted. We implemented this method modifying the SimpleScalar tool set to prove it finds out WCID of workloads with five million executed instructions in reasonable time, less than 30 minutes, which would be 200-300 days by the straightforward method. Furthermore, our CAS-based analyzer may have a post process to calculate the WCID for multiple F interrupts with O(FNN log N) time complexity.
    Download PDF (660K)
  • Gang Zeng, Hiroyuki Tomiyama, Hiroaki Takada
    Article type: System-Level Low-Power Design
    Subject area: Regular Paper
    2008Volume 1 Pages 48-57
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    Generally, there are periodic interrupt services such as periodic clock tick interrupts in the real-time embedded systems even though the system is in the idle state. To minimize the power consumption of idle state, power management therefore should consider the effect of periodic interrupt services. In this paper, we deal with this issue in two different cases. In case the periodic interrupt cannot be disabled, we formulate the power consumption of idle state, and propose static and dynamic approaches for the optimal frequency selection to save idle power. On the other hand, in case the periodic interrupt can be disabled, we propose the configurable clock tick to disable the interrupt service until the next task is released so that the processor can stay in the low power mode for longer time. The proposed approaches are implemented in a real-time OS; and its efficiency has been validated by theoretical calculations and actually measurements on an embedded processor.
    Download PDF (976K)
  • Katsunori Tanaka, Yuichi Nakamura, Atsushi Atarashi
    Article type: System-Level Asynchronous Design
    Subject area: Regular Paper
    2008Volume 1 Pages 58-66
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    This paper presents a study of GALS (Globally-Asynchronous Locally-Synchronous) architecture multi-core processor design with asynchronous interconnects. While GALS is expected to reduce more power dissipation, it has not been the mainstream of LSI design yet, since there have been no mature design tools for asynchronous circuit design. For GALS design, we constructed a design flow based on general synchronous design tools, by specification of design constraints and configurations. Applying the design flow to an experimental multi-core processor GALS design including an asynchronous interconnect based on QDI (Quasi Delay Insensitive) model, we successfully obtained a netlist and layout, and proved that the flow works correctly, by netlist simulation with delay information back-annotated from the layout. Experimental results show the area, power and throughput of the asynchronous interconnect to indicate the impact by introducing GALS architecture instead of globally synchronous design.
    Download PDF (442K)
  • Liangwei Ge, Song Chen, Yuichi Nakamura, Takeshi Yoshimura
    Article type: Arithmetic Synthesis
    Subject area: Regular Paper
    2008Volume 1 Pages 67-77
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    Since many embedded applications involve intensive mathematic operations, floating-point arithmetic units (FPU) have paramount importance in embedded systems. However, previous implementations of FPU either require much manual work or only support special functions (e.g. reciprocal, square root, logarithm, etc.). In this paper, we present an automatic method to synthesize general FPU by aligned partition. Based on the novel partition algorithm and the concept of grouping floating-point numbers into zones, our method supports general functions of wide, irreducible domain. The synthesized FPU achieves smaller area, higher frequency, and greater accuracy. Experimental results show that our method achieves 1) on average 90% smaller and 50% faster indexer than the conventional automatic method; 2) on the hyperbolic functions, 20k times smaller error rate and 50% use of LUTs and flip-flops than the conventional manual design.
    Download PDF (1861K)
  • Akira Ohchi, Shunitsu Kohara, Nozomu Togawa, Masao Yanagisawa, Tatsuo ...
    Article type: Behavioral Synthesis
    Subject area: Regular Paper
    2008Volume 1 Pages 78-90
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    In this paper, we propose a high-level synthesis method targeting distributed/shared-register architectures. Our method repeats (1) scheduling/FU binding, (2) register allocation, (3) register binding, and (4) module placement. By feeding back floorplan information from (4) to (1), our method obtains a distributed/shared-register architecture where its scheduling/binding as well as floorplaning are simultaneously optimized. Experimental results show that the area is decreased by 13.2% while maintaining the performance of the circuit equal with that using distributed-register architectures.
    Download PDF (526K)
  • Kentaroh Katoh, Kazuteru Namba, Hideo Ito
    Article type: Delay Testing
    Subject area: Regular Paper
    2008Volume 1 Pages 91-103
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    This paper presents a stuck-at fault test data compression using the scan flip flops with delay fault testability namely the Chiba scan flip-flops. The feature of the proposed method is two-stage test data compression. First, test data is compressed utilizing the structure of the Chiba scan flip flops (the first stage compression). Second, the compressed test data is further compressed by conventional test data compression utilizing X bits (the second stage compression). Evaluation shows that when Huffman test data compression is used in the second stage compression, the volume of test data for the proposed test data compression in ATE is reduced 35.8% in maximum, 25.7% on average of the one of the test data compressed by the conventional method. The difference of the area overhead of the proposed method from the conventional method is 9.5 percent point.
    Download PDF (921K)
  • Seiji Kajihara, Shohei Morishima, Masahiro Yamamoto, Xiaoqing Wen, Mas ...
    Article type: Delay Testing
    Subject area: Regular Paper
    2008Volume 1 Pages 104-115
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    As a method to evaluate delay test quality of test patterns, SDQM (Statistical Delay Quality Model) has been proposed for transition faults. In order to derive better test quality by SDQM, the following two things are important: for each transition fault, (1) to find out the accurate length of the longest sensitizable paths along which the fault is activated and propagated, and (2) to generate a test pattern that detects the fault through as long paths as possible. In this paper, we propose a method to calculate the length of the potentially sensitizable longest path for detection of a transition fault. In addition, we develop a procedure to extract path information that helps high quality transition ATPG. Experimental results show that the proposed method improves SDQL (Statistical Delay Quality Level) by not only accurate calculation of the longest sensitizable paths but also detection of faults through longer paths.
    Download PDF (373K)
  • Haruhiko Terada, Takayuki Fukuoka, Akira Tsuchiya, Hidetoshi Onodera
    Article type: Statistical Static Timing Analysis
    Subject area: Regular Paper
    2008Volume 1 Pages 116-125
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    In this paper, we propose an approximation method for the statistical MAX operation such that it results in a normal distribution good for the worst-case delay analysis. The important operation in SSTA is SUM and MAX of distributions. In general, the delay variation is modeled as normal distribution. The result of SUM operation of two normal distributions is also normal distribution. On the other hand, the result of MAX operation is not normal distribution. Thus approximation to normal distribution is commonly used. We also explain that the proposed MAX operation at each gate also contributes to the accurate estimation in the worst-case delay analysis of the whole circuit. Experimental results show that the proposed method leads to a good approximation for a normal distribution resulted from MAX operation of normal distributions with and without correlation, and the approximation improves the accuracy of the worst-case delay analysis. In a circuit example, the errors of worst-case delay computed by the previous method are about 20%, and the errors computed by the proposed method are under 5%.
    Download PDF (417K)
  • Seiya Shibata, Shinya Honda, Yuko Hara, Hiroyuki Tomiyama, Hiroaki Tak ...
    Article type: System-Level Verification
    Subject area: Short Paper
    2008Volume 1 Pages 126-130
    Published: 2008
    Released on J-STAGE: August 27, 2008
    JOURNAL FREE ACCESS
    This paper presents a software/hardware covalidation environment for embedded systems. Our covalidation environment consists of a simulation model of RTOS which fully supports services of ITRON, multiple hardware simulators, FPGA and a covalidation backplane. All of the simulators are executed concurrently with communication. The RTOS model can be executed on the host computer natively, therefore the software can be simulated much faster than on an instruction set simulator. FPGA can execute the hardware much faster than HDL simulators. With the RTOS model and FPGA, both application software and hardware can be validated in a short time. In the experiment, with using our covalidation environment, we perform covalidation of an MPEG4 decoder system and show the effectiveness of the covalidation environment.
    Download PDF (637K)
feedback
Top