IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
E98.A 巻, 7 号
選択された号の論文の31件中1~31を表示しています
Special Section on Design Methodologies for System on a Chip
  • Akihisa YAMADA
    2015 年E98.A 巻7 号 p. 1355
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
  • Shihao WANG, Dajiang ZHOU, Jianbin ZHOU, Takeshi YOSHIMURA, Satoshi GO ...
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1356-1365
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.
  • Kotaro TERADA, Masao YANAGISAWA, Nozomu TOGAWA
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1366-1375
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.
  • Shin-ya ABE, Youhua SHI, Kimiyoshi USAMI, Masao YANAGISAWA, Nozomu TOG ...
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1376-1391
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.
  • Koichi FUJIWARA, Kazushi KAWAMURA, Shin-ya ABE, Masao YANAGISAWA, Nozo ...
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1392-1405
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.
  • Shinnosuke YOSHIDA, Youhua SHI, Masao YANAGISAWA, Nozomu TOGAWA
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1406-1418
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring the middle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.
  • Dajiang LIU, Shouyi YIN, Leibo LIU, Shaojun WEI
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1419-1430
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    The coarse-grained reconfigurable architecture (CGRA) is a promising computing platform that provides both high performance and high power-efficiency. The computation-intensive portions of an application (e.g. loop nests) are often mapped onto CGRA for acceleration. However, mapping loop nests onto CGRA efficiently is quite a challenge due to the special characteristics of CGRA. To optimize the mapping of loop nests onto CGRA, this paper makes three contributions: i) Establishing a precise performance model of mapping loop nests onto CGRA, ii) Formulating the loop nests mapping as a nonlinear optimization problem based on polyhedral model, iii) Extracting an efficient heuristic algorithm and building a complete flow of mapping loop nests onto CGRA (PolyMAP). Experiment results on most kernels of the PolyBench and real-life applications show that our proposed approach can improve the performance of the kernels by 27% on average, as compared to the state-of-the-art methods. The runtime complexity of our approach is also acceptable.
  • Shuping ZHANG, Jinjia ZHOU, Dajiang ZHOU, Shinji KIMURA, Satoshi GOTO
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1431-1441
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.
  • Gong CHEN, Yu ZHANG, Qing DONG, Ming-Yu LI, Shigetoshi NAKATAKE
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1442-1454
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    As semiconductor manufacturing processing scaling down, leakage current of CMOS circuits is becoming a dominant contributor to power dissipation. This paper provides an efficient leakage current reduction (LCR) technique for low-power and low-frequency circuit designs in terms of design rules and layout parameters related to layout dependent effects. We address the LCR technique both for analog and digital circuits, and present a design case when applying the LCR techniqe to a successive-approximation-register (SAR) analog-to-digital converter (ADC), which typically employs analog and digital transistors. In the post-layout simulation results by HSPICE, an SAR-ADC with the LCR technique achieves 38.6-nW as the total power consumption. Comparing with the design without the LCR technique, we attain about 30% total energy reduction.
  • Jun SHIOMI, Tohru ISHIHARA, Hidetoshi ONODERA
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1455-1466
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    Near-threshold computing has emerged as one of the most promising solutions for enabling highly energy efficient and high performance computation of microprocessors. This paper proposes architecture-level statistical static timing analysis (SSTA) models for the near-threshold voltage computing where the path delay distribution is approximated as a lognormal distribution. First, we prove several important theorems that help consider architectural design strategies for high performance and energy efficient near-threshold computing. After that, we show the numerical experiments with Monte Carlo simulations using a commercial 28nm process technology model and demonstrate that the properties presented in the theorems hold for the practical near-threshold logic circuits.
  • Daisuke FUKUDA, Kenichi WATANABE, Yuji KANAZAWA, Masanori HASHIMOTO
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1467-1474
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    As the technology of VLSI manufacturing process continues to shrink, it becomes a challenging problem to generate layout patterns that can satisfy performance and manufacturability requirements. Wire width variation is one of the main issues that have a large impact on chip performance and yield loss. Particularly, etching process is the last and most influential process to wire width variation, and hence models for predicting etching induced variation have been proposed. However, they do not consider an effect of global layout variation. This work proposes a prediction model of etching induced wire width variation which takes into account global layout pattern variation. We also present a wire width adjustment method that modifies etching process on the fly according to the critical dimension loss estimated by the proposed prediction model and wire space measurement just before etching process. Experimental results show that the proposed model achieved good performance in prediction, and demonstrated that the potential reduction of the gap between the target wire width and actual wire width thanks to the proposed on-the-fly etching process modification was 68.9% on an average.
  • Keisuke OKUNO, Toshihiro KONISHI, Shintaro IZUMI, Masahiko YOSHIMOTO, ...
    原稿種別: PAPER
    2015 年E98.A 巻7 号 p. 1475-1481
    発行日: 2015/07/01
    公開日: 2015/07/01
    ジャーナル 認証あり
    We present a low-jitter design for a 10-bit second-order frequency shift oscillator time-to-digital converter (FSOTDC). As described herein, we analyze the relation between performance and FSOTDC parameters and provide insight to support the design of the FSOTDC. Results show that an oscillator jitter limits the FSOTDC resolution, particularly during the first stage. To estimate and design an FSOTDC, the frequency shift oscillator requires an inverter of a certain size. In a standard 65-nm CMOS process, an SNDR of 64dB is achievable at an input signal frequency of 10kHz and a sampling clock of 2MHz. Measurements of the test chip confirmed that the measurements match the analyses.
Regular Section
feedback
Top