IPSJ Transactions on System and LSI Design Methodology
Online ISSN : 1882-6687
ISSN-L : 1882-6687
Volume 2
Displaying 1-23 of 23 articles from this issue
  • Hidetoshi Onodera
    Article type: Editorial
    Subject area: Editorial
    2009 Volume 2 Pages 1
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    Download PDF (27K)
  • Masahiro Fujita
    Article type: System-Level Formal Verification
    Subject area: Invited Paper
    2009 Volume 2 Pages 2-17
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    Three formal verification approaches targeting C language based hardware designs, which are the central verification technologies for C-based hardware design flows, are presented. First approach is to statically analyze C design descriptions to see if there is any inconsistency/inadequate usages, such as array overbounds accesses, uses of values of variables before initialization, deadlocks, and others. It is based on local analysis of the descriptions and hence applicable to large design descriptions. The key issue for this approach is how to reason about various dependencies among statements as precisely as possible with as short time as possible. Second approach is to model check C design descriptions. Since simple model checking does not work well for large descriptions, automatic abstractions or reductions of descriptions and their refinements are integrated with model checking methods such that reasonably large designs can be processed. By concentrating on particular types of properties, there can be large reductions of design sizes, and as a result, real life designs could be model checked. The last approach is to check equivalence between two C design descriptions. It is based on symbolic simulations of design descriptions. Since there can be large numbers of execution paths in large design descriptions, various techniques to reduce the numbers of execution paths to be examined are incorporated. All of the presented methods use dependence analysis on data, control, and others as their basic analysis techniques. System dependence graph for programming languages are extended to deal with C based hardware designs that have structural hierarchy as well. With those techniques, reasonably large design descriptions can be checked.
    Download PDF (1101K)
  • Makoto Takamiya, Takayasu Sakurai
    Article type: Low-Power Circuit Design
    Subject area: Invited Paper
    2009 Volume 2 Pages 18-29
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    In order to cope with the increasing leakage power and the increasing device variability in VLSI's, the required control size of both the space-domain and the time-domain is decreasing. This paper shows the several recent fine-grain voltage engineerings for the low power VLSI circuit design. The space-domain fine-grain voltage engineering includes the fine-grain power supply voltage with 3D-structured on-chip buck converters with the maximum power efficiency up to 71.3% in 0.35-µm CMOS and the fine-grain body bias control to reduce power supply voltage in 90-nm CMOS. The time-domain fine-grain voltage engineering includes accelerators for the power supply voltage hopping with a 5-ns transition time in 0.18-µm CMOS, the power supply noise canceller with the 32% power supply noise reduction in 90-nm CMOS, and backgate bias accelerators for fast wake-up with 1.5-V change of backgate voltage in 35ns in 90-nm CMOS.
    Download PDF (2479K)
  • Liangwei Ge, Song Chen, Takeshi Yoshimura
    Article type: Behavioral Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 30-42
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    Scheduling, an important step in high-level synthesis, is essentially a searching process in the solution space. Due to the vastness of the solution space and the complexity of the imposed constraints, it is usually difficult to explore the solution space efficiently. In this paper, we present a random walk based perturbation method to explore the schedule space. The method works by limiting the search within a specifically defined sub-solution space (SSS), where schedules in the SSS can be found in polynomial time. Then, the SSS is repeatedly perturbed by using an N-dimension random walk so that better schedules can be searched in the new SSS. To improve the search efficiency, a guided perturbation strategy is presented that leads the random walk toward promising directions. Experiments on well-known benchmarks show that by controlling the number of perturbations, our method conveniently makes tradeoff between schedule quality and runtime. In reasonable runtime, the proposed method finds schedules of better quality than existing methods.
    Download PDF (2742K)
  • Sho Kodama, Yusuke Matsunaga
    Article type: Behavioral Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 43-52
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    In behavioral synthesis for resource shared architecture, multiplexers are inserted between registers and functional units as a result of binding if necessary. Multiplexer optimization in binding is important for performance, area and power of a synthesized circuit. In this paper, we propose a binding algorithm to reduce total amount of multiplexer ports. Unlike most of the previous works in which binding is performed by a constructive algorithm, our approach is based on an iterative improvement algorithm. Starting point of our approach is initial functional unit binding and initial register binding. Both functional unit binding and register binding are modified by local improvements based on taboo search iteratively. The binding in each iteration is feasible, hence actual value of total amount of multiplexer ports can be optimized. The smart neighborhood which considers an effect of sharing of connection is used in the proposed method for effective reduction of total amount of multiplexer ports. Additionally, the massive modification of binding is performed by regular intervals to achieve a further reduction of total amount of multiplexer ports and further robustness for an initial binding. Experimental results show that our approach can reduce total amount of multiplexer ports by 30% on an average compared to a traditional binding algorithm with computation time of several seconds to a few minutes. Also, results of robustness evaluation show that our approach barely depends on initial binding.
    Download PDF (465K)
  • Kazuhito Ito, Hidekazu Seto
    Article type: Low-Power Behavioral Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 53-63
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    Power dissipation by data communications on LSI depends on not only the binding and floorplan of functional units and registers but how data communications are executed. Data communications depend on the binding, and the binding depends on the schedule of operations. Therefore, it is important to obtain the best schedule which leads to the best binding and floorplan to minimize the power dissipated by data communication. In this paper a schedule exploration method is presented to search the best one which achieves the minimized energy dissipation of data communications.
    Download PDF (431K)
  • Naohiro Hamada, Yuki Shiga, Takao Konishi, Hiroshi Saito, Tomohiro Yon ...
    Article type: Asynchronous Behavioral Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 64-79
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    This paper proposes a behavioral synthesis system for asynchronous circuits with bundled-data implementation. The proposed system is based on a behavioral synthesis method for synchronous circuits and extended on operation scheduling and control synthesis for bundled-data implementation. The proposed system synthesizes an RTL model and a simulation model from a behavioral description specified by a restricted C language, a resource library, and a set of design constraints. This paper shows the effectiveness of the proposed system in terms of area and latency through comparisons among bundled-data implementations synthesized by the proposed system, synchronous counterparts, and bundled-data implementations synthesized by using a behavioral synthesis method for synchronous circuits directly.
    Download PDF (378K)
  • Sho Takeuchi, Kiyoharu Hamaguchi, Toshinobu Kashiwabara
    Article type: Formal Logic Verification
    Subject area: Regular Paper
    2009 Volume 2 Pages 80-92
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    To perform functional formal verification, model checking for assertions has attracted attentions. In SystemVerilog, assertions are allowed to include “local variables”, which are used to store and refer to data values locally within assertions. For the purpose of model checking, a finite automaton called “checker” is generated. In the previous approach for checker generation by Long and Seawright, the checker introduces new state variables corresponding to a local variable. The number of the introduced state variables for each local variable, is linear to the size of a given assertion. In this paper, we show an algorithm for checker generation in order to reduce the number of the introduced state variables. In particular, our algorithm requires only one such variable for each local variable. We also show experimental results on bounded model checking for our algorithm compared with the previous work by Long and Seawright.
    Download PDF (479K)
  • Ryosuke Inagaki, Norio Sadachika, Dondee Navarro, Mitiko Miura-Mattaus ...
    Article type: Device Modeling
    Subject area: Regular Paper
    2009 Volume 2 Pages 93-102
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    A GIDL (Gate Induced Drain Leakage) current model for advanced MOSFETs is proposed and implemented into HiSIM2, complete surface potential based MOSFET model. The model considers two tunneling mechanisms, the band-to-band tunneling and the trap assisted tunneling. Totally 7 model parameters are introduced. Simulation results of NFETs and PFETs reproduce measurements for any device size without binning of model parameters. The influence of the GIDL current is investigated with circuits, which are sensitive to the change of the stored charge due to the GIDL current.
    Download PDF (996K)
  • Masayuki Hiromoto, Hiroyuki Ochi, Yukihiro Nakamura
    Article type: Asynchronous Arithmetic Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 103-113
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    Synchronous design methodology is widely used for today's digital circuits. However, it is difficult to reuse a highly-optimized synchronous module for a specific clock frequency to other systems with different global clocks, because logic depth between FFs should be tailored for the clock frequency. In this paper, we focus on asynchronous design, in which each module works at its best performance, and apply it to an IEEE-754-standard single-precision floating-point divider. Our divider is ready to be built into a system with arbitrary clock frequency and achieves its peak performance and area- and power-efficiency. This paper also reports an implementation result and performance evaluation of the proposed divider on a Xilinx Virtex-4 FPGA. The evaluation results show that our divider achieves smaller area and lower power consumption than the synchronous dividers with comparable throughput.
    Download PDF (652K)
  • Xianghui Wei, Takeshi Ikenaga, Satoshi Goto
    Article type: Architectural Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 114-121
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    A low-bandwidth Integer Motion Estimation (IME) module is proposed for MPEG-2 to H.264 transcoding. Based on bandwidth reduction method proposed in Ref. 1), a ping-pang memory control scheme combined with Partial Sum of Absolute Differences (SAD) Variable Block Size Motion Estimation (VBSME) architecture are realized. Experiment results show bandwidth of the proposed architecture is 70.6% of H.264 regular IME (Level C+ scheme, 2 Macro Block (MB) stitched vertically), while the on-chip memory size is 11.7% of that.
    Download PDF (834K)
  • Wen Ji, Xing Li, Takeshi Ikenaga, Satoshi Goto
    Article type: Architectural Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 122-130
    Published: 2009
    Released on J-STAGE: February 17, 2009
    JOURNAL FREE ACCESS
    In this paper, we propose a partially-parallel irregular LDPC decoder for IEEE 802.11n standard targeting high throughput applications. The proposed decoder has several merits: (i) The decoder is designed based on a novel delta-value based message passing algorithm which facilitates the decoding throughput by redundant computation removal. (ii) Techniques such as binary sorting, parallel column operation, high performance pipelining are used to further speed up the message-passing procedure. The synthesis result in TSMC 0.18 CMOS technology demonstrates that for (648, 324) irregular LDPC code, our decoder can achieve 8 times increasement in throughput, reaching 418Mbps at the frequency of 200MHz.
    Download PDF (729K)
  • Sudipta Kundu, Sorin Lerner, Rajesh Gupta
    Article type: Behavioral Formal Verification
    Subject area: Invited Paper
    2009 Volume 2 Pages 131-144
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    The growth in size and heterogeneity of System-on-Chip (SOC) design makes their design process from initial specification to IC implementation complex. System-level design methods seek to combat this complexity by shifting increasing design burden to high-level languages such as SystemC and SystemVerilog. Such languages not only make a design easier to describe using high-level abstractions, but also provide a path for systematic implementation through refinement and elaboration of such descriptions. In principle, this can enable a greater exploration of design alternatives and thus better design optimization than possible using lower level design methods. To achieve these goals, however, verification capabilities that seek to validate designs at higher levels as well their equivalences with lower level implementations are crucially needed. To the extent possible given the large space of design alternatives, such validation must be formal to ensure the design and important properties are provably correct against various implementation choices. In this paper, we present a survey of high-level verification techniques that are used for both verification and validation of high-level designs, that is, designs modeled using high-level programming languages. These techniques include those based on model checking, theorem proving and approaches that integrate a combination of the above methods. The high-level verification approaches address verification of properties as well as equivalence checking with refined implementations. We also focus on techniques that use information from the synthesis process for improved validation. Finally, we conclude with a discussion and future research directions in this area.
    Download PDF (383K)
  • Yao-Wen Chang, Zhe-Wei Jiang, Tung-Chieh Chen
    Article type: Placement
    Subject area: Invited Paper
    2009 Volume 2 Pages 145-166
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    The placement problem is to place objects into a fixed die such that no objects overlap with each other and some cost metric (e.g., wirelength) is optimized. Placement is a major step in physical design that has been studied for several decades. Although it is a classical problem, many modern design challenges have reshaped this problem. As a result, the placement problem has attracted much attention recently, and many new algorithms have been developed to handle the emerging design challenges. Modern placement algorithms can be classified into three major categories: simulated annealing, min-cut, and analytical algorithms. According to the recent literature, analytical algorithms typically achieve the best placement quality for large-scale circuit designs. In this paper, therefore, we shall give a systematic and comprehensive survey on the essential issues in analytical placement. This survey starts by dissecting the basic structure of analytical placement. Then, various techniques applied as components of popular analytical placers are studied, and two leading placers are exemplified to show the composition of these techniques into a complete placer. Finally, we point out some research directions for future analytical placement.
    Download PDF (734K)
  • Gang Zeng, Hiroyuki Tomiyama, Hiroaki Takada
    Article type: System-Level Low-Power Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 167-179
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    A dynamic energy performance scaling (DEPS) framework is proposed for energy savings in hard real-time embedded systems. In this generalized framework, two existing technologies, i.e., dynamic hardware resource configuration (DHRC) and dynamic voltage frequency scaling (DVFS) are combined for energy performance tradeoff. The problem of selecting the optimal hardware configuration and voltage/frequency parameters is formulated to achieve maximal energy savings and meet the deadline constraint simultaneously. Through case studies, the effectiveness of DEPS has been validated.
    Download PDF (597K)
  • Hideki Takase, Hiroyuki Tomiyama, Hiroaki Takada
    Article type: System-Level Low-Power Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 180-188
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    Energy consumption has become one of the major concerns in modern embedded systems. Recently memory subsystems have become consumed large amount of total energy in the embedded processors. This paper proposes partitioning and allocation approaches of scratch-pad memory in non-preemptive fixed-priority multi-task systems. We propose three approaches (i.e., spatial, temporal, and hybrid approaches) which enable energy efficient usage of the scratch-pad region. These approaches can reduce energy consumption of instruction memory. Each approach is formulated as an integer programming problem that simultaneously determines (1) partitioning of the scratch-pad memory space for the tasks, and (2) allocation of functions to the scratch-pad memory space for each task. Our formulations pay attention to the task periods for the purpose of energy minimization. The experimental results show up to 47% of energy reduction in the instruction memory subsystems can be achieved by the proposed approaches.
    Download PDF (681K)
  • Seiichiro Yamaguchi, Yuriko Ishitobi, Tohru Ishihara, Hiroto Yasuura
    Article type: Architectural Low-Power Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 189-199
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    A small L0-cache located between an MPU core and an L1-cache is widely used in embedded processors for reducing the energy consumption of memory subsystems. Since the L0-cache is small, if there is a hit, the energy consumption will be reduced. On the other hand, if there is a miss, at least one extra cycle is needed to access the L1-cache. This degrades the processor performance. Single-cycle-accessible Two-level Cache (STC) architecture proposed in this paper can resolve the problem in the conventional L0-cache based approach. Both a small L0 and a large L1 caches in our STC architecture can be accessed from an MPU core within a single cycle. A compilation technique for effectively utilizing the STC architecture is also presented in this paper. Experiments using several benchmark programs demonstrate that our approach reduces the energy consumption of memory subsystems by 64% in the best case and by 45% on an average without any performance degradation compared to the conventional L0-cache based approach.
    Download PDF (1027K)
  • Taiga Takata, Yusuke Matsunaga
    Article type: Logic Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 200-211
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    This paper presents Cut Resubstitution; a heuristic algorithm for post-processing of technology mapping for LUT-based FPGAs to minimize area under depth constraint. The concept of Cut Resubstitution is iterating local transformation of an LUT network with considering actual area reduction without using Boolean matching. Cut Resubstitution iterates the following process. At first, Cut Resubstitution substitutes several LUTs in current network in such a way that another LUT is to be redundant. Then Cut Resubstitution eliminates the redundant LUT from network. Experimental results show that a simple depth-minimum mapper followed by Cut Resubstitution generates network whose area is 7%, 7%, 10% smaller than that generated by DAOmap for maximum number of inputs of LUT 4, 5, 6 on average. Our method is similar or slightly faster than DAOmap.
    Download PDF (321K)
  • Taeko Matsunaga, Shinji Kimura, Yusuke Matsunaga
    Article type: Low-Power Arithmetic Synthesis
    Subject area: Regular Paper
    2009 Volume 2 Pages 212-221
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    This paper addresses parallel prefix adder synthesis which aims at minimizing the total switching activity under bitwise timing constraints. This problem is treated as synthesis of prefix graphs which represent global structures of parallel prefix adders at technology-independent level. An approach for timing-driven area minimization of prefix graphs has been already proposed which first finds the exact minimum solution on a specific subset of prefix graphs by dynamic programming, then restructures the result for further reduction by removing restrictions on the subset. In this paper, a switching cost of each node of a prefix graph is defined, and an approach to minimize the total switching cost is presented where our area minimization algorithm is extended to be able to calculate the switching cost using Ordered Binary-Decision Diagrams(OBDDs). Furthermore, a heuristic is integrated which estimates the effect of the restructuring phase in the dynamic programming phase, to improve the robustness of our algorithm under severe timing constraints. Through a series of experiments, the proposed approach is shown to be effective especially when timing constraints are not tight and/or there are comparably a large number of nodes with very low switching costs.
    Download PDF (481K)
  • Qing Dong, Shigetoshi Nakatake
    Article type: Placement
    Subject area: Regular Paper
    2009 Volume 2 Pages 222-238
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    This paper introduces a new concept of regularity-oriented floorplanning and block placement — structured placement, it takes the regularity as a criterion of placement so as to improve the performance. We provide the methods to extract regular structures from a placement representation in linear time, and manage to evaluate these structures by quantifying the regularity as an objective function. We also construct a particular simulated annealing framework, which optimizes placement topology and physical dimension separately and alternately so that it attains a solution balancing the trade-off between regularity and area efficiency. Furthermore, we introduce the symmetry-oriented structured placement to produce symmetrical placement. Experiments show that the resultant placements achieve regularity without increased chip area and wire length, compared to those by existing methods.
    Download PDF (868K)
  • Tadayuki Matsumura, Tohru Ishihara, Hiroto Yasuura
    Article type: Circuit-Level Low-Power Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 239-249
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    On-chip memories generally use higher supply (VDD) and higher threshold (Vth) voltages than those of logic parts to improve the static noise margin and to suppress the static energy consumption. However, the higher VDD increases the dynamic energy consumption. This paper proposes a hybrid memory architecture which consists of the following two regions; (1) a dynamic energy conscious region which uses low VDD and Vth and (2) a static energy conscious region which uses high VDD and Vth. The proposed architecture is applied to a scratchpad memory. This paper also proposes an optimization problem for finding the optimal code allocation and the memory configuration simultaneously, which minimizes the total energy consumption of the memory under constraints of a static noise margin (SNM), a write margin (WM) and a memory access delay. The memory configuration is defined by a memory division ratio, a β ratio and a VDD. Experimental results demonstrate that the total energy consumption of our original 90nm SRAM can be reduced by 62.9% at the best case with a 4.56% area overhead without degradations of SNM, WM and access delay.
    Download PDF (1068K)
  • Yoshinobu Higami, Kewal K. Saluja, Hiroshi Takahashi, Sin-ya Kobayashi ...
    Article type: Testing
    Subject area: Regular Paper
    2009 Volume 2 Pages 250-262
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    Conventional stuck-at fault model is no longer sufficient to deal with the problems of nanometer geometries in modern Large Scale Integrated Circuits (LSIs). Test and diagnosis for transistor defects are required. In this paper we propose a fault diagnosis method for transistor shorts in combinational and full-scan circuits that are described at gale level design. Since it is difficult to describe the precise behavior of faulty transistors, we define two types of transistor short models by focusing on the output values of the corresponding faulty gate. Some of the salient features of the proposed diagnosis method are 1) it uses only gate-level simulation and does not use transistor-level simulation like SPICE, 2) it uses conventional stuck-at fault simulator yet it is able to handle transistor shorts, thus suitable for large circuits, and 3) it is efficient and accurate. We apply our method to ISCAS benchmark circuits to demonstrate the effectiveness of our method.
    Download PDF (305K)
  • Yiqing Huang, Qin Liu, Takeshi Ikenaga
    Article type: Architectural Design
    Subject area: Regular Paper
    2009 Volume 2 Pages 263-273
    Published: 2009
    Released on J-STAGE: August 14, 2009
    JOURNAL FREE ACCESS
    A macroblock (MB) feature based adaptive propagate partial SAD architecture is proposed in this paper. Firstly, by using edge detection operator, the homogeneous MB is detected before motion estimation and three hardware friendly subsampling patterns are adaptively selected for MB with different homogeneity. The proposed architecture uses four different processing elements to realize adaptive subsampling scheme. Secondly, in order to achieve data reuse and power reduction in memory part, the reference pixels in search window are reorganization into two memory groups, which output pixel data interactively for adaptive subsampling. Moreover, a compressor tree based circuit level optimization is included in our design to reduce hardware cost. Synthesized with TSMC 0.18um technology, averagely 10k gates hardware can be reduced for the whole IME engine based on our optimization. With 481k gates at 110.5MHz, an 720-p, 30-fps HDTV integer motion estimation engine is designed. Compared with previous work, our design can achieve 39.8% reduction in power consumption with only 3.44% increase in hardware.
    Download PDF (944K)
feedback
Top