詳細検索結果
以下の条件での結果を表示する: 検索条件を変更
クエリ検索: "Design Compiler"
360件中 1-20の結果を表示しています
  • Shiquan Fan, Ke Wang, Li Geng
    IEICE Electronics Express
    2010年 7 巻 14 号 1091-1097
    発行日: 2010年
    公開日: 2010/07/25
    ジャーナル フリー
    There exists a trade-off among resolution, area and power losses in controllers of switching DC-DC converters. In this letter, a mixed-signal Boost converter topology is presented to lower the resolution requirements of ADC and DPWM. In addition, by using time-multiplexing technology, a novel multi-phase clock DPWM is proposed.
    Design
    Compiler
    synthesis results indict that, compared with normal 1-phase clock DPWM, chip area and power consumption of the proposed 4-phase clock DPWM is reduced by 47.0% and 54.4%, respectively. The new DPWM is realized using FPGA and applied in a prototype Boost converter. Experimental results verify the functionality of the optimized DPWM.
  • Won-young CHUNG, Ha-young JEONG, Won Woo RO, Yong-surk LEE
    IEICE Transactions on Information and Systems
    2011年 E94.D 巻 7 号 1497-1501
    発行日: 2011/07/01
    公開日: 2011/07/01
    ジャーナル フリー
    In this paper, we propose a novel low-cost Message Passing Interface (MPI) unit between processor nodes, which supports message passing in multiprocessor systems using distributed memory architecture. Our MPI unit operates in the standard mode - using the buffered mode for small amounts of data transaction and the synchronous mode for large amounts of data transaction. This results in increased performance by reducing the control message transmission time for the small amount of data. We verified the performance with a simulator designed based on SystemC. Additionally, we designed the MPI unit using VerilogHDL, and we synthesized it with a synopsys
    design
    compiler
    . The proposed standard mode MPI unit shows a high performance even though the size of the MPI unit occupies less than 1% of the whole chip. Thus, with respect to low-cost design and scalability, this MPI hardware unit is useful to increase overall performance of the embedded Multiprocessor System on a Chip (MPSoC).
  • Jae Wan Park, Jin Won Choi
    Journal of Asian Architecture and Building Engineering
    2003年 2 巻 2 号 b103-b109
    発行日: 2003年
    公開日: 2005/02/21
    ジャーナル フリー
    This paper demonstrates an experience in the development of a design performance evaluation system that can frequently evaluate physical and spatial requirements of apartment unit floor plans within the design process in a real-time manner. The evaluation system, that we call ″ Vitruvius Studio,″ is composed of several modules such as a front-end component-based CAD engine, a knowledge base, and a set of design agents. The notion of the
    design
    compiler
    is quite similar to a compiler for computer programming such as a C compiler. While a computer programmer compiles a set of programming codes to check compiling errors during the implementation of a software system, an architectural designer can ′compile′ his/her intermediate design product to evaluate design errors during the design process. The compilation can be done immediately at any level or any time during the design process in a real-time manner. We expect that this compiling process will dramatically increase design feedbacks, and thus result in a better design product. Further research issues that have been identified at the end of the research include increasing the modeling capability, extending to multi-story building representation, developing various design agents, exploring better ways to request and manage design knowledge, and supporting design collaboration.
  • Matsuyama Kazunori, Amagasaki Motoki, Yamaguchi Ryoichi, Iida Masahiro, Sueyoshi Toshinori
    電気関係学会九州支部連合大会講演論文集
    2007年 2007 巻 11-2P-03
    発行日: 2007年
    公開日: 2009/02/10
    会議録・要旨集 フリー
  • Pao-Lung CHEN
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2010年 E93.A 巻 12 号 2616-2620
    発行日: 2010/12/01
    公開日: 2010/12/01
    ジャーナル 認証あり
    The multistage noise-shaping (MASH) delta-sigma modulator (DSM) is the key element in a fractional-N frequency synthesizer. A hardware simplification method with subtraction inversion is proposed for delta-path's design in a MASH delta-sigma modulator. The subtraction inversion method focuses on simplification of adder-subtractor unit in the delta path with inversion of subtraction signal. It achieves with less hardware cost as compared with the conventional approaches. As a result, the hardware organization is regular and easy for expanding into higher order MASH DSM design. Analytical details of the implementation way and hardware cost function with N-th order configuration are presented. Finally, simulations with hardware description language as well as synthesis data verified the proposed design method.
  • Daisuke Suzuki, Takahiro Oka, Akira Tamakoshi, Yasuhiro Takako, Takahiro Hanyu
    Nonlinear Theory and Its Applications, IEICE
    2021年 12 巻 4 号 695-710
    発行日: 2021年
    公開日: 2021/10/01
    ジャーナル フリー

    Convolutional neural network (CNN) accelerators, particularly binarized CNN (BCNN) accelerators have proven to be effective for several artificial-intelligence-oriented several applications; however, their energy efficiency should be further improved for edge applications. In this paper, a design framework for an energy-efficient BCNN accelerator based on nonvolatile logic is presented. Designing BCNN accelerators using nonvolatile logic allows for the accelerators to exhibit a massively parallel and ultra-low standby power capability. Thus, a new design can be realized for accelerators that is different from that of conventional accelerators based solely on CMOS. Considering this, we discuss a concrete design considerations of nonvolatile BCNN accelerators. In fact, a systematic design flow of the nonvolatile BCNN is established by combining Vivado HLS and standard electronic design automation tools. As a typical design example, a BCNN accelerator for inferring 32 × 32 pixel MNIST dataset is designed using a 65-nm CMOS technology. By the logic-synthesis result, the proposed BCNN accelerator is estimated to consume 94.2% lower power than that of a conventional BCNN accelerator when the frame rate is 30 frames per second.

  • Ming-Chih CHEN, Shen-Fu HSIAO
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2009年 E92.A 巻 12 号 3221-3228
    発行日: 2009/12/01
    公開日: 2009/12/01
    ジャーナル 認証あり
    In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.
  • Ryohei HORI, Tatsuya KITAMORI, Taisuke UEOKA, Masaya YOSHIKAWA, Takeshi FUJINO
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2012年 E95.A 巻 9 号 1518-1528
    発行日: 2012/09/01
    公開日: 2012/09/01
    ジャーナル 認証あり
    Various kinds of structured ASICs have been proposed that can customize logic functions using a few photomasks, which decreases the initial cost, especially that of expensive photo-masks. In the past, we have developed a via programmable structured ASIC “VPEX2” (Via Programmable logic device using EXclusive-or array) that is capable of changing logics on 2 via (the 1st and 3rd via) layers. The logic element (LE) of VPEX2 is composed of EXOR gate and 2 NOT gates. However, “VPEX2” architecture has the two important penalty, the area penalty is 5-6 times that of the ASIC and wiring congestion by detouring wires to avoid I/O terminals. In this paper, we propose a new architecture “VPEX3” in order to achieve the practical structures. In VPEX3, we applied three techniques for decrease area penalty and higher wiring efficiency: (1) LE area is reduced approximately 60% by omitting 1 NOT gate on a LE and the gate width reduction, (2) the kinds of configurable logic function on a single LE is increased from 13 to 22 by introducing “flexible AOI gate technique” and (3) flexible I/O terminal by introducing 2nd via as a programmable layers. Furthermore, the delay model for via programmable wiring is necessary in order to evaluate via programmable wiring architecture compared to standard cell ASIC. We extracted wiring delay characteristics from the ring oscillator test circuit using both of normal wiring and via-programmable wiring. These three new architectures and via programmable wiring-delay-model revealed that an area-delay product of “VPEX3” is as small as twice that of ASIC. Chip-cost estimation among FPGA, “VPEX2”, “VPEX3” and ASIC revealed that the “VPEX3” is the most cost-effective architecture for Systems-on-chips (SoCs) whose production volume is from one thousand to several tens of thousands units.
  • Rongshan Wei, Xingang Zhang
    IEICE Electronics Express
    2017年 14 巻 21 号 20170976
    発行日: 2017年
    公開日: 2017/11/10
    [早期公開] 公開日: 2017/10/24
    ジャーナル フリー

    In this paper, we present a new data structure element for constructing a Huffman tree, and a new algorithm is developed to improve the efficiency of Huffman coding by constructing the Huffman tree synchronously with the generation of codewords. The improved algorithm is adopted in a VLSI architecture for a Huffman encoder. The VLSI implementation is realized using the Verilog hardware description language and simulated by Modelsim. The proposed scheme achieves rapid coding speed with a gate count of 9.962 K using SMIC 0.18 micron standard library cells.

  • Min Choi, Seungryoul Maeng
    IEICE Electronics Express
    2008年 5 巻 22 号 927-931
    発行日: 2008年
    公開日: 2008/11/25
    ジャーナル フリー
    Although today's branch predictors show high accuracy, the branch misprediction penalty is getting larger due to aggressive speculation and deeper pipelining. In order to reduce the miss penalty, we propose a fast and low-cost branch recovery scheme using the incremental register renaming (IRR) and the bit-vector based rename map table (BVMT). The IRR enforces the destination register number of the instruction stream to appear in non-decreasing order. With this incremental property of the IRR, the BVMT recovery scheme completely eliminates the roll-back overhead on branch misprediction. Thus, the instruction fetcher does not stop and it fetches instructions from the correct path immediately after the misprediction detected. The goal of our scheme is to prevent a processor from flushing the pipeline, even under branch misprediction. Consequently, the BVMT instantly reconstructs the map table to any mispredicted branch and it outperforms the conventional approach by an average of 10.93%.
  • Koki IGAWA, Masao YANAGISAWA, Nozomu TOGAWA
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2016年 E99.A 巻 7 号 1278-1293
    発行日: 2016/07/01
    公開日: 2016/07/01
    ジャーナル 認証あり
    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.
  • Yoshiaki Sasaki, Seiya Muramatsu, Kohei Nishida, Megumi Akai-Kasaya, Tetsuya Asai
    Nonlinear Theory and Its Applications, IEICE
    2022年 13 巻 2 号 324-329
    発行日: 2022年
    公開日: 2022/04/01
    ジャーナル フリー

    Stochastic Computing (SC)[2] is a probability-based computing method, which enables the performance of various operations with a small number of logic gates (i.e., low power) in exchange for high accuracy. Using SC for edge artificial intelligence (AI) integrated circuits can help circumvent the limitations inherent in the power and area required for edge AI.

    In this study, a three-layered Neural Network (NN) is presented with an online learning function that introduces pseudo-activation, pseudo-subtraction, and imperfect addition into the SC framework. This method may expand the options for edge AI integrated circuits using SC.

  • *Meng Hui, 趙 謙, 吉田 隆一
    電気関係学会九州支部連合大会講演論文集
    2021年 2021 巻 08-2P-12
    発行日: 2021/09/17
    公開日: 2022/04/27
    会議録・要旨集 フリー

    To connect functional chips of a chiplet system with high-performance and high-flexibility, we propose an area-efficient reconfigurable switch whose internal connection topology can be dynamically changed for different applications. In this paper, we implement the essential modules of a switch design, the Arbiter and the Distributor, as hard function blocks for the novel switch-specific reconfigurable architecture. The evaluation results show that the area and delay efficiency can be significantly improved by using the proposed hard function blocks rather than implementing them as soft logics using lookup tables.

  • Jonghee HWANG, Yongwoo CHOI, Yoonsik CHOE
    IEICE Transactions on Electronics
    2011年 E94.C 巻 5 号 896-904
    発行日: 2011/05/01
    公開日: 2011/05/01
    ジャーナル 認証あり
    Motion blur in TFT-LCD is caused by sample and hold characteristic, slow response time of liquid crystal, and the inconsistency between object tracking of the human eye and the actual object location. In order to solve this problem, a high frame rate driving method based on motion estimation and motion compensation has been applied to LCD products. However, as the required processing time of motion estimation increases in LCD TV and monitor systems, real-time video image processing becomes more difficult. Frame interpolation through the large macro block (MB) size has limitations to detect small objects. So, this paper proposes the efficient motion estimator architecture which uses seven kinds of macro blocks to enhance the accuracy of motion estimation and combines the parallel processing with pre-computation technology and hardware optimization for high-speed processing. Also, for increased efficiency in the hardware architecture, we employed an I2C (Inter Integrated Circuit) communication unit to control the key parameters easily through the personnel computer. Simulation results show that the critical path at the motion estimator is reduced by about 27.47% compared to the conventional structure. As a result, the proposed motion estimator will be applicable for the high-speed frame interpolation of variable video.
  • Jaeyong CHUNG, Woochul KANG
    IEICE Transactions on Electronics
    2017年 E100.C 巻 11 号 1073-1076
    発行日: 2017/11/01
    公開日: 2017/11/01
    ジャーナル 認証あり

    Massive amounts of computation involved in real-time evaluation of deep neural networks pose a serious challenge in battery-powered systems, and neuromorphic systems specialized in neural networks have been developed. This paper first shows the portion of active neurons at a time dwindles as going toward the output layer in recent large-scale deep convolutional neural networks. Spike-based, asynchronous neuromorphic systems take advantage of the sparse activation and reduce dynamic power consumption, while synchronous systems may waste much dynamic power even for the sparse activation due to clocks. We thus propose a clock gating-based dynamic power reduction method that exploits the sparse activation for synchronous neuromorphic systems. We apply the proposed method to a building block of a recently proposed synchronous neuromorphic computing system and demonstrate up to 79% dynamic power saving at a negligible overhead.

  • Naoya OKADA, Yuichi NAKAMURA, Shinji KIMURA
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2013年 E96.A 巻 6 号 1264-1272
    発行日: 2013/06/01
    公開日: 2013/06/01
    ジャーナル 認証あり
    Nonvolatile flip-flop enables leakage power reduction in logic circuits and quick return from standby mode. However, it has limited write endurance, and its power consumption for writing is larger than that of conventional D flip-flop (DFF). For this reason, it is important to reduce the number of write operations. The write operations can be reduced by stopping the clock signal to synchronous flip-flops because write operations are executed only when the clock is applied to the flip-flops. In such clock gating, a method using Exclusive OR (XOR) of the current value and the new value as the control signal is well known. The XOR based method is effective, but there are several cases where the write operations can be reduced even if the current value and the new value are different. The paper proposes a method to detect such unnecessary write operations based on state transition analysis, and proposes a write control method to save power consumption of nonvolatile flip-flops. In the method, redundant bits are detected to reduce the number of write operations. If the next state and the outputs do not depend on some current bit, the bit is redundant and not necessary to write. The method is based on Binary Decision Diagram (BDD) calculation. We construct write control circuits to stop the clock signal by converting BDDs representing a set of states where write operations are unnecessary. Proposed method can be combined with the XOR based method and reduce the total write operations. We apply combined method to some benchmark circuits and estimate the power consumption with Synopsys NanoSim. On average, 15.0% power consumption can be reduced compared with only the XOR based method.
  • Yizhong LIU, Tian SONG, Takashi SHIMAMOTO
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    2010年 E93.A 巻 9 号 1594-1604
    発行日: 2010/09/01
    公開日: 2010/09/01
    ジャーナル 認証あり
    In this paper, we propose a high-throughput binary arithmetic coding architecture for CABAC (Context Adaptive Binary Arithmetic Coding) which is one of the entropy coding tools used in the H.264/AVC main and high profiles. The full CABAC encoding functions, including binarization, context model selection, arithmetic encoding and bits generation, are implemented in this proposal. The binarization and context model selection are implemented in a proposed binarizer, in which a FIFO is used to pack the binarization results and output 4 bins in one clock. The arithmetic encoding and bits generation are implemented in a four-stage pipeline with the encoding ability of 4 bins/clock. In order to improve the processing speed, the context variables access and update for 4 bins are paralleled and the pipeline path is balanced. Also, because of the outstanding bits issue, a bits packing and generation strategy for 4 bins paralleled processing is proposed. After implemented in verilog-HDL and synthesized with Synopsys
    Design
    Compiler
    using 90nm libraries, this proposal can work at the clock frequency of 250MHz and takes up about 58K standard cells, 3.2Kbits register files and 27.6K bits ROM. The throughput of processing 1000M bins per second can be achieved in this proposal for the HDTV applications.
  • Botao Zhang, Hengzhu Liu, Xianqiang Yang
    IEICE Electronics Express
    2011年 8 巻 13 号 1001-1007
    発行日: 2011年
    公開日: 2011/07/10
    ジャーナル フリー
    This paper propose an area-efficient pipeline-balancing Reed-Solomon decoder for 10Gbps satellite communication. The proposed RS (244,212) is based on TD-iBM Key Equation Solver architecture, and Fixed-Factor Syndrome Computation & Chien Search. The decoder is implemented and verified in FPGA, and can work at 178MHz in Virtex2P. Thus a 8-channel FPGA implementation can be used for 10Gbps satellite communication systems. Additionally, the decoder is also synthesized in Chartered 90nm CMOS technology, and compared with previous decoders. The results show the decoder is more area-efficient than previous decoders. Meanwhile, by using this CMOS technology, the decoder can be clocked at about 1350MHz, so a single-channel ASIC implementation can meet the requirement of 10Gbps satellite communication.
  • Hiromine Yoshihara, Masao Yanagisawa, Nozomu Togawa
    IPSJ Transactions on System and LSI Design Methodology
    2012年 5 巻 96-105
    発行日: 2012年
    公開日: 2012/08/06
    ジャーナル フリー
    In recent years, it is quite necessary to convert conventional low-resolution images to high-resolution ones at low cost. Super-resolution is a technique to remove the noise of observed images and restore its high frequencies. We focus on reconstruction-based super-resolution. Reconstruction requires large computation cost since it requires many images. In this paper, we propose a fast weighted adder for reconstruction-based super-resolution. From the viewpoint of reducing partial products, we propose two approaches to speed up a weighted adder. First, we use selector logics to halve its partial products. Second, we propose a weights-range limit method utilizing negative term. By applying our proposed approaches to a weighted adder, we can reduce carry propagations and our weighted adder can be designed by a fast circuit as compared to conventional ones. Experimental evaluations demonstrate that our weighted adder reduces its delay time by a maximum of 25.29% and its area to a maximum of 1/3, compared to conventional implementations.
  • *井上 晶仁, 田島 加織, Yang Tongxin, 請園 智玲, 佐藤 寿倫
    電気関係学会九州支部連合大会講演論文集
    2017年 2017 巻 11-1P-04
    発行日: 2017/09/19
    公開日: 2019/06/29
    会議録・要旨集 フリー

    画像フィルタの計算では,1画素を示すデータに多少の誤差が生じても人の目からは判断し難い.そのため,正確な値でなくとも,似た値を出力する回路を許容することで,ハードウェア設計時に演算器数を減らし,実装面積や消費電力を減らすことが可能となる.本稿が対象とするガウシアンフィルタのカーネルはガウス関数によって重み付けがなされる.このガウシアンカーネルのファクタ値を定数シフトによって表現できる近似値に変更することで,ハードウェア実装時の加算器の数を減らし,同時に,近似により変更したファクタ値の総和を定数シフトにより表現可能な数に調整することで,除算回路を除去することが可能となる.

feedback
Top