-
Hidetoshi Onodera
Article type: Editorial
Subject area: Editorial
2009 Volume 2 Pages
1
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
-
Masahiro Fujita
Article type: System-Level Formal Verification
Subject area: Invited Paper
2009 Volume 2 Pages
2-17
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
Three formal verification approaches targeting C language based hardware designs, which are the central verification technologies for C-based hardware design flows, are presented. First approach is to statically analyze C design descriptions to see if there is any inconsistency/inadequate usages, such as array overbounds accesses, uses of values of variables before initialization, deadlocks, and others. It is based on local analysis of the descriptions and hence applicable to large design descriptions. The key issue for this approach is how to reason about various dependencies among statements as precisely as possible with as short time as possible. Second approach is to model check C design descriptions. Since simple model checking does not work well for large descriptions, automatic abstractions or reductions of descriptions and their refinements are integrated with model checking methods such that reasonably large designs can be processed. By concentrating on particular types of properties, there can be large reductions of design sizes, and as a result, real life designs could be model checked. The last approach is to check equivalence between two C design descriptions. It is based on symbolic simulations of design descriptions. Since there can be large numbers of execution paths in large design descriptions, various techniques to reduce the numbers of execution paths to be examined are incorporated. All of the presented methods use dependence analysis on data, control, and others as their basic analysis techniques. System dependence graph for programming languages are extended to deal with C based hardware designs that have structural hierarchy as well. With those techniques, reasonably large design descriptions can be checked.
View full abstract
-
Makoto Takamiya, Takayasu Sakurai
Article type: Low-Power Circuit Design
Subject area: Invited Paper
2009 Volume 2 Pages
18-29
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
In order to cope with the increasing leakage power and the increasing device variability in VLSI's, the required control size of both the space-domain and the time-domain is decreasing. This paper shows the several recent fine-grain voltage engineerings for the low power VLSI circuit design. The space-domain fine-grain voltage engineering includes the fine-grain power supply voltage with 3D-structured on-chip buck converters with the maximum power efficiency up to 71.3% in 0.35-µm CMOS and the fine-grain body bias control to reduce power supply voltage in 90-nm CMOS. The time-domain fine-grain voltage engineering includes accelerators for the power supply voltage hopping with a 5-ns transition time in 0.18-µm CMOS, the power supply noise canceller with the 32% power supply noise reduction in 90-nm CMOS, and backgate bias accelerators for fast wake-up with 1.5-V change of backgate voltage in 35ns in 90-nm CMOS.
View full abstract
-
Liangwei Ge, Song Chen, Takeshi Yoshimura
Article type: Behavioral Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
30-42
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
Scheduling, an important step in high-level synthesis, is essentially a searching process in the solution space. Due to the vastness of the solution space and the complexity of the imposed constraints, it is usually difficult to explore the solution space efficiently. In this paper, we present a random walk based perturbation method to explore the schedule space. The method works by limiting the search within a specifically defined sub-solution space (SSS), where schedules in the SSS can be found in polynomial time. Then, the SSS is repeatedly perturbed by using an N-dimension random walk so that better schedules can be searched in the new SSS. To improve the search efficiency, a guided perturbation strategy is presented that leads the random walk toward promising directions. Experiments on well-known benchmarks show that by controlling the number of perturbations, our method conveniently makes tradeoff between schedule quality and runtime. In reasonable runtime, the proposed method finds schedules of better quality than existing methods.
View full abstract
-
Sho Kodama, Yusuke Matsunaga
Article type: Behavioral Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
43-52
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
In behavioral synthesis for resource shared architecture, multiplexers are inserted between registers and functional units as a result of binding if necessary. Multiplexer optimization in binding is important for performance, area and power of a synthesized circuit. In this paper, we propose a binding algorithm to reduce total amount of multiplexer ports. Unlike most of the previous works in which binding is performed by a constructive algorithm, our approach is based on an iterative improvement algorithm. Starting point of our approach is initial functional unit binding and initial register binding. Both functional unit binding and register binding are modified by local improvements based on taboo search iteratively. The binding in each iteration is feasible, hence actual value of total amount of multiplexer ports can be optimized. The smart neighborhood which considers an effect of sharing of connection is used in the proposed method for effective reduction of total amount of multiplexer ports. Additionally, the massive modification of binding is performed by regular intervals to achieve a further reduction of total amount of multiplexer ports and further robustness for an initial binding. Experimental results show that our approach can reduce total amount of multiplexer ports by 30% on an average compared to a traditional binding algorithm with computation time of several seconds to a few minutes. Also, results of robustness evaluation show that our approach barely depends on initial binding.
View full abstract
-
Kazuhito Ito, Hidekazu Seto
Article type: Low-Power Behavioral Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
53-63
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
Power dissipation by data communications on LSI depends on not only the binding and floorplan of functional units and registers but how data communications are executed. Data communications depend on the binding, and the binding depends on the schedule of operations. Therefore, it is important to obtain the best schedule which leads to the best binding and floorplan to minimize the power dissipated by data communication. In this paper a schedule exploration method is presented to search the best one which achieves the minimized energy dissipation of data communications.
View full abstract
-
Naohiro Hamada, Yuki Shiga, Takao Konishi, Hiroshi Saito, Tomohiro Yon ...
Article type: Asynchronous Behavioral Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
64-79
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
This paper proposes a behavioral synthesis system for asynchronous circuits with bundled-data implementation. The proposed system is based on a behavioral synthesis method for synchronous circuits and extended on operation scheduling and control synthesis for bundled-data implementation. The proposed system synthesizes an RTL model and a simulation model from a behavioral description specified by a restricted C language, a resource library, and a set of design constraints. This paper shows the effectiveness of the proposed system in terms of area and latency through comparisons among bundled-data implementations synthesized by the proposed system, synchronous counterparts, and bundled-data implementations synthesized by using a behavioral synthesis method for synchronous circuits directly.
View full abstract
-
Sho Takeuchi, Kiyoharu Hamaguchi, Toshinobu Kashiwabara
Article type: Formal Logic Verification
Subject area: Regular Paper
2009 Volume 2 Pages
80-92
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
To perform functional formal verification, model checking for assertions has attracted attentions. In SystemVerilog, assertions are allowed to include “local variables”, which are used to store and refer to data values locally within assertions. For the purpose of model checking, a finite automaton called “checker” is generated. In the previous approach for checker generation by Long and Seawright, the checker introduces new state variables corresponding to a local variable. The number of the introduced state variables for each local variable, is linear to the size of a given assertion. In this paper, we show an algorithm for checker generation in order to reduce the number of the introduced state variables. In particular, our algorithm requires only one such variable for each local variable. We also show experimental results on bounded model checking for our algorithm compared with the previous work by Long and Seawright.
View full abstract
-
Ryosuke Inagaki, Norio Sadachika, Dondee Navarro, Mitiko Miura-Mattaus ...
Article type: Device Modeling
Subject area: Regular Paper
2009 Volume 2 Pages
93-102
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
A GIDL (Gate Induced Drain Leakage) current model for advanced MOSFETs is proposed and implemented into HiSIM2, complete surface potential based MOSFET model. The model considers two tunneling mechanisms, the band-to-band tunneling and the trap assisted tunneling. Totally 7 model parameters are introduced. Simulation results of NFETs and PFETs reproduce measurements for any device size without binning of model parameters. The influence of the GIDL current is investigated with circuits, which are sensitive to the change of the stored charge due to the GIDL current.
View full abstract
-
Masayuki Hiromoto, Hiroyuki Ochi, Yukihiro Nakamura
Article type: Asynchronous Arithmetic Design
Subject area: Regular Paper
2009 Volume 2 Pages
103-113
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
Synchronous design methodology is widely used for today's digital circuits. However, it is difficult to reuse a highly-optimized synchronous module for a specific clock frequency to other systems with different global clocks, because logic depth between FFs should be tailored for the clock frequency. In this paper, we focus on
asynchronous design, in which each module works at its best performance, and apply it to an IEEE-754-standard single-precision floating-point divider. Our divider is ready to be built into a system with arbitrary clock frequency and achieves its peak performance and area- and power-efficiency. This paper also reports an implementation result and performance evaluation of the proposed divider on a Xilinx Virtex-4 FPGA. The evaluation results show that our divider achieves smaller area and lower power consumption than the synchronous dividers with comparable throughput.
View full abstract
-
Xianghui Wei, Takeshi Ikenaga, Satoshi Goto
Article type: Architectural Design
Subject area: Regular Paper
2009 Volume 2 Pages
114-121
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
A low-bandwidth Integer Motion Estimation (IME) module is proposed for MPEG-2 to H.264 transcoding. Based on bandwidth reduction method proposed in Ref. 1), a ping-pang memory control scheme combined with Partial Sum of Absolute Differences (SAD) Variable Block Size Motion Estimation (VBSME) architecture are realized. Experiment results show bandwidth of the proposed architecture is 70.6% of H.264 regular IME (Level C+ scheme, 2 Macro Block (MB) stitched vertically), while the on-chip memory size is 11.7% of that.
View full abstract
-
Wen Ji, Xing Li, Takeshi Ikenaga, Satoshi Goto
Article type: Architectural Design
Subject area: Regular Paper
2009 Volume 2 Pages
122-130
Published: 2009
Released on J-STAGE: February 17, 2009
JOURNAL
FREE ACCESS
In this paper, we propose a partially-parallel irregular LDPC decoder for IEEE 802.11n standard targeting high throughput applications. The proposed decoder has several merits:
(i) The decoder is designed based on a novel delta-value based message passing algorithm which facilitates the decoding throughput by redundant computation removal.
(ii) Techniques such as binary sorting, parallel column operation, high performance pipelining are used to further speed up the message-passing procedure. The synthesis result in TSMC 0.18 CMOS technology demonstrates that for (648, 324) irregular LDPC code, our decoder can achieve 8 times increasement in throughput, reaching 418Mbps at the frequency of 200MHz.
View full abstract
-
Sudipta Kundu, Sorin Lerner, Rajesh Gupta
Article type: Behavioral Formal Verification
Subject area: Invited Paper
2009 Volume 2 Pages
131-144
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
The growth in size and heterogeneity of System-on-Chip (SOC) design makes their design process from initial specification to IC implementation complex. System-level design methods seek to combat this complexity by shifting increasing design burden to high-level languages such as SystemC and SystemVerilog. Such languages not only make a design easier to describe using high-level abstractions, but also provide a path for systematic implementation through refinement and elaboration of such descriptions. In principle, this can enable a greater exploration of design alternatives and thus better design optimization than possible using lower level design methods. To achieve these goals, however, verification capabilities that seek to validate designs at higher levels as well their equivalences with lower level implementations are crucially needed. To the extent possible given the large space of design alternatives, such validation must be
formal to ensure the design and important properties are provably correct against various implementation choices. In this paper, we present a survey of high-level verification techniques that are used for both verification and validation of high-level designs, that is, designs modeled using high-level programming languages. These techniques include those based on model checking, theorem proving and approaches that integrate a combination of the above methods. The high-level verification approaches address verification of properties as well as equivalence checking with refined implementations. We also focus on techniques that use information from the synthesis process for improved validation. Finally, we conclude with a discussion and future research directions in this area.
View full abstract
-
Yao-Wen Chang, Zhe-Wei Jiang, Tung-Chieh Chen
Article type: Placement
Subject area: Invited Paper
2009 Volume 2 Pages
145-166
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
The placement problem is to place objects into a fixed die such that no objects overlap with each other and some cost metric (e.g., wirelength) is optimized. Placement is a major step in physical design that has been studied for several decades. Although it is a classical problem, many modern design challenges have reshaped this problem. As a result, the placement problem has attracted much attention recently, and many new algorithms have been developed to handle the emerging design challenges. Modern placement algorithms can be classified into three major categories: simulated annealing, min-cut, and analytical algorithms. According to the recent literature, analytical algorithms typically achieve the best placement quality for large-scale circuit designs. In this paper, therefore, we shall give a systematic and comprehensive survey on the essential issues in analytical placement. This survey starts by dissecting the basic structure of analytical placement. Then, various techniques applied as components of popular analytical placers are studied, and two leading placers are exemplified to show the composition of these techniques into a complete placer. Finally, we point out some research directions for future analytical placement.
View full abstract
-
Gang Zeng, Hiroyuki Tomiyama, Hiroaki Takada
Article type: System-Level Low-Power Design
Subject area: Regular Paper
2009 Volume 2 Pages
167-179
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
A dynamic energy performance scaling (DEPS) framework is proposed for energy savings in hard real-time embedded systems. In this generalized framework, two existing technologies, i.e., dynamic hardware resource configuration (DHRC) and dynamic voltage frequency scaling (DVFS) are combined for energy performance tradeoff. The problem of selecting the optimal hardware configuration and voltage/frequency parameters is formulated to achieve maximal energy savings and meet the deadline constraint simultaneously. Through case studies, the effectiveness of DEPS has been validated.
View full abstract
-
Hideki Takase, Hiroyuki Tomiyama, Hiroaki Takada
Article type: System-Level Low-Power Design
Subject area: Regular Paper
2009 Volume 2 Pages
180-188
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
Energy consumption has become one of the major concerns in modern embedded systems. Recently memory subsystems have become consumed large amount of total energy in the embedded processors. This paper proposes partitioning and allocation approaches of scratch-pad memory in non-preemptive fixed-priority multi-task systems. We propose three approaches (i.e., spatial, temporal, and hybrid approaches) which enable energy efficient usage of the scratch-pad region. These approaches can reduce energy consumption of instruction memory. Each approach is formulated as an integer programming problem that simultaneously determines (1) partitioning of the scratch-pad memory space for the tasks, and (2) allocation of functions to the scratch-pad memory space for each task. Our formulations pay attention to the task periods for the purpose of energy minimization. The experimental results show up to 47% of energy reduction in the instruction memory subsystems can be achieved by the proposed approaches.
View full abstract
-
Seiichiro Yamaguchi, Yuriko Ishitobi, Tohru Ishihara, Hiroto Yasuura
Article type: Architectural Low-Power Design
Subject area: Regular Paper
2009 Volume 2 Pages
189-199
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
A small L0-cache located between an MPU core and an L1-cache is widely used in embedded processors for reducing the energy consumption of memory subsystems. Since the L0-cache is small, if there is a hit, the energy consumption will be reduced. On the other hand, if there is a miss, at least one extra cycle is needed to access the L1-cache. This degrades the processor performance. Single-cycle-accessible Two-level Cache (STC) architecture proposed in this paper can resolve the problem in the conventional L0-cache based approach. Both a small L0 and a large L1 caches in our STC architecture can be accessed from an MPU core within a single cycle. A compilation technique for effectively utilizing the STC architecture is also presented in this paper. Experiments using several benchmark programs demonstrate that our approach reduces the energy consumption of memory subsystems by 64% in the best case and by 45% on an average without any performance degradation compared to the conventional L0-cache based approach.
View full abstract
-
Taiga Takata, Yusuke Matsunaga
Article type: Logic Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
200-211
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
This paper presents Cut Resubstitution; a heuristic algorithm for post-processing of technology mapping for LUT-based FPGAs to minimize area under depth constraint. The concept of Cut Resubstitution is iterating local transformation of an LUT network with considering actual area reduction without using Boolean matching. Cut Resubstitution iterates the following process. At first, Cut Resubstitution substitutes several LUTs in current network in such a way that another LUT is to be redundant. Then Cut Resubstitution eliminates the redundant LUT from network. Experimental results show that a simple depth-minimum mapper followed by Cut Resubstitution generates network whose area is 7%, 7%, 10% smaller than that generated by DAOmap for maximum number of inputs of LUT 4, 5, 6 on average. Our method is similar or slightly faster than DAOmap.
View full abstract
-
Taeko Matsunaga, Shinji Kimura, Yusuke Matsunaga
Article type: Low-Power Arithmetic Synthesis
Subject area: Regular Paper
2009 Volume 2 Pages
212-221
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
This paper addresses parallel prefix adder synthesis which aims at minimizing the total switching activity under bitwise timing constraints. This problem is treated as synthesis of prefix graphs which represent global structures of parallel prefix adders at technology-independent level. An approach for timing-driven area minimization of prefix graphs has been already proposed which first finds the exact minimum solution on a specific subset of prefix graphs by dynamic programming, then restructures the result for further reduction by removing restrictions on the subset. In this paper, a switching cost of each node of a prefix graph is defined, and an approach to minimize the total switching cost is presented where our area minimization algorithm is extended to be able to calculate the switching cost using Ordered Binary-Decision Diagrams(OBDDs). Furthermore, a heuristic is integrated which estimates the effect of the restructuring phase in the dynamic programming phase, to improve the robustness of our algorithm under severe timing constraints. Through a series of experiments, the proposed approach is shown to be effective especially when timing constraints are not tight and/or there are comparably a large number of nodes with very low switching costs.
View full abstract
-
Qing Dong, Shigetoshi Nakatake
Article type: Placement
Subject area: Regular Paper
2009 Volume 2 Pages
222-238
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
This paper introduces a new concept of regularity-oriented floorplanning and block placement — structured placement, it takes the regularity as a criterion of placement so as to improve the performance. We provide the methods to extract regular structures from a placement representation in linear time, and manage to evaluate these structures by quantifying the regularity as an objective function. We also construct a particular simulated annealing framework, which optimizes placement topology and physical dimension separately and alternately so that it attains a solution balancing the trade-off between regularity and area efficiency. Furthermore, we introduce the symmetry-oriented structured placement to produce symmetrical placement. Experiments show that the resultant placements achieve regularity without increased chip area and wire length, compared to those by existing methods.
View full abstract
-
Tadayuki Matsumura, Tohru Ishihara, Hiroto Yasuura
Article type: Circuit-Level Low-Power Design
Subject area: Regular Paper
2009 Volume 2 Pages
239-249
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
On-chip memories generally use higher supply (
VDD) and higher threshold (
Vth) voltages than those of logic parts to improve the static noise margin and to suppress the static energy consumption. However, the higher
VDD increases the dynamic energy consumption. This paper proposes a hybrid memory architecture which consists of the following two regions; (1) a dynamic energy conscious region which uses low
VDD and
Vth and (2) a static energy conscious region which uses high
VDD and
Vth. The proposed architecture is applied to a scratchpad memory. This paper also proposes an optimization problem for finding the optimal code allocation and the memory configuration simultaneously, which minimizes the total energy consumption of the memory under constraints of a static noise margin (SNM), a write margin (WM) and a memory access delay. The memory configuration is defined by a memory division ratio, a β ratio and a
VDD. Experimental results demonstrate that the total energy consumption of our original 90nm SRAM can be reduced by 62.9% at the best case with a 4.56% area overhead without degradations of SNM, WM and access delay.
View full abstract
-
Yoshinobu Higami, Kewal K. Saluja, Hiroshi Takahashi, Sin-ya Kobayashi ...
Article type: Testing
Subject area: Regular Paper
2009 Volume 2 Pages
250-262
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
Conventional stuck-at fault model is no longer sufficient to deal with the problems of nanometer geometries in modern Large Scale Integrated Circuits (LSIs). Test and diagnosis for transistor defects are required. In this paper we propose a fault diagnosis method for transistor shorts in combinational and full-scan circuits that are described at gale level design. Since it is difficult to describe the precise behavior of faulty transistors, we define two types of transistor short models by focusing on the output values of the corresponding faulty gate. Some of the salient features of the proposed diagnosis method are 1) it uses only gate-level simulation and does not use transistor-level simulation like SPICE, 2) it uses conventional stuck-at fault simulator yet it is able to handle transistor shorts, thus suitable for large circuits, and 3) it is efficient and accurate. We apply our method to ISCAS benchmark circuits to demonstrate the effectiveness of our method.
View full abstract
-
Yiqing Huang, Qin Liu, Takeshi Ikenaga
Article type: Architectural Design
Subject area: Regular Paper
2009 Volume 2 Pages
263-273
Published: 2009
Released on J-STAGE: August 14, 2009
JOURNAL
FREE ACCESS
A macroblock (MB) feature based adaptive propagate partial SAD architecture is proposed in this paper. Firstly, by using edge detection operator, the homogeneous MB is detected before motion estimation and three hardware friendly subsampling patterns are adaptively selected for MB with different homogeneity. The proposed architecture uses four different processing elements to realize adaptive subsampling scheme. Secondly, in order to achieve data reuse and power reduction in memory part, the reference pixels in search window are reorganization into two memory groups, which output pixel data interactively for adaptive subsampling. Moreover, a compressor tree based circuit level optimization is included in our design to reduce hardware cost. Synthesized with TSMC 0.18
um technology, averagely 10k gates hardware can be reduced for the whole IME engine based on our optimization. With 481k gates at 110.5MHz, an 720-p, 30-fps HDTV integer motion estimation engine is designed. Compared with previous work, our design can achieve 39.8% reduction in power consumption with only 3.44% increase in hardware.
View full abstract