IEICE Transactions on Electronics

Special Section on Advanced Technologies in Digital LSIs and Memories

FOREWORD

Masao NAKAYA

2008 Volume E91.C Issue 4 Pages 399
Published: April 01, 2008
Released on J-STAGE: July 01, 2018

DOIhttps://doi.org/10.1587/transele.E91.C.399

JOURNAL RESTRICTED ACCESS

Download PDF (62K)
A Low-Power Instruction Issue Queue for Microprocessors

Shingo WATANABE, Akihiro CHIYONOBU, Toshinori SATO

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 400-409
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.400

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Instruction issue queue is a key component which extracts instruction level parallelism (ILP) in modern out-of-order microprocessors. In order to exploit ILP for improving processor performance, instruction queue size should be increased. However, it is difficult to increase the size, since instruction queue is implemented by a content addressable memory (CAM) whose power and delay are much large. This paper introduces a low power and scalable instruction queue that replaces the CAM with a RAM. In this queue, instructions are explicitly woken up. Evaluation results show that the proposed instruction queue decreases processor performance by only 1.9% on average. Furthermore, the total energy consumption is reduced by 54% on average.

View full abstract

Download PDF (3788K)
Reliable Cache Architectures and Task Scheduling for Multiprocessor Systems

Makoto SUGIHARA, Tohru ISHIHARA, Kazuaki MURAKAMI

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 410-417
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.410

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes a task scheduling approach for reliable cache architectures (RCAs) of multiprocessor systems. The RCAs dynamically switch their operation modes for reducing the usage of vulnerable SRAMs under real-time constraints. A mixed integer programming model has been built for minimizing vulnerability under real-time constraints. Experimental results have shown that our task scheduling approach achieved 47.7-99.9% less vulnerability than a conventional one.

View full abstract

Download PDF (4124K)
Temperature-Aware Configurable Cache to Reduce Energy in Embedded Systems

Hamid NOORI, Maziar GOUDARZI, Koji INOUE, Kazuaki MURAKAMI

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 418-431
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.418

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Energy consumption is a major concern in embedded computing systems. Several studies have shown that cache memories account for 40% or more of the total energy consumed in these systems. Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of digital systems continues to grow. Moreover, temperature is another factor that exponentially increases the leakage current. In this paper, we show the effect of temperature on the optimal (minimum-energy-consuming) cache configuration for low energy embedded systems. Our results show that for a given application and technology, the optimal cache size moves toward smaller caches at higher temperatures, due to the larger leakage. Consequently, a Temperature-Aware Configurable Cache (TACC) is an effective way to save energy in finer technologies when the embedded system is used in different temperatures. Our results show that using a TACC, up to 61% energy can be saved for instruction cache and 77% for data cache compared to a configurable cache that has been configured for only the corner-case temperature (100°C). Furthermore, the TACC also enhances the performance by up to 28% for the instruction cache and up to 17% for the data cache.

View full abstract

Download PDF (10035K)
Power-Aware Compiler Controllable Chip Multiprocessor

Hiroaki SHIKANO, Jun SHIRAKO, Yasutaka WADA, Keiji KIMURA, Hironori KA ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 432-439
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.432

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.

View full abstract

Download PDF (5648K)
Reconfigurable Variable Block Size Motion Estimation Architecture for Search Range Reduction Algorithm

Yibo FAN, Takeshi IKENAGA, Satoshi GOTO

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 440-448
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.440

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Variable Block Size Motion Estimation (VBSME) costs a lot of computation during video coding. Search range reduction algorithm is widely used to reduce computational cost of motion estimation. Current VBSME designs are not suitable for this algorithm. This paper proposes a reconfigurable design of VBSME which can be efficiently used with search range reduction algorithm. While using proposed design, n×m reference MBs form an MB array which can be processed in parallel, n and m can be configured according to the new search range shape calculated by algorithm. In this way, the parallelism of proposed design is very flexible and can be adapted to any search range shape. The hardware resource is also fully used while performing VBSME. There are two primary reconfigurable modules in this design: PEGA (PE Group Array) and SAD comparator. By using TSMC 0.18μm standard cell library, the implementation results show that the hardware cost of design which uses 16 PEGs (PE Groups) is about 179K Gates, the clock frequency is 167MHz.

View full abstract

Download PDF (3948K)
A 41mW VGA@30fps Quadtree Video Encoder for Video Surveillance Systems

Qin LIU, Seiichiro HIRATSUKA, Kazunori SHIMIZU, Shinsuke USHIKI, Satos ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 449-456
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.449

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Video surveillance systems have a huge market, as indicated by the number of installed cameras, particularly for low-power systems. In this paper, we propose a low-power quadtree video encoder for video surveillance systems. It features a low-complexity motion estimation algorithm, an application-specific ME-MC processor, a dedicated quadtree encoder engine and a processor control-based clock-gating technique. A chip capable of encoding 30fps VGA (640×480) at 80MHz is fabricated using 0.18μm CMOS technology. A total of 153K gates with 558kbits SRAM have been integrated into a 5.0mm×3.5mm die. The power consumption is 40.87mW at 80MHz for VGA at 30fps and 1.97mW at 3.3MHz for QCIF at 15fps.

View full abstract

Download PDF (3150K)
A VGA 30-fps Realtime Optical-Flow Processor Core for Moving Picture Recognition

Yuichiro MURACHI, Yuki FUKUYAMA, Ryo YAMAMOTO, Junichi MIYAKOSHI, Hiro ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 457-464
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.457

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper describes an optical-flow processor core for real-time video recognition. The processor is based on the Pyramidal Lucas and Kanade (PLK) algorithm. It features a smaller chip area, higher pixel rate, and higher accuracy than conventional optical-flow processors. Introduction of search range limitation and the Carman filter to the original PLK algorithm improve the optical-flow accuracy, and reduce the processor hardware cost. Furthermore, window interleaving and window overlap methods reduces the necessary clock frequency of the processor by 70%, allowing low-power characteristics. We first verified the PLK algorithm and architecture with a proto-typed FPGA implementation. Then, we designed a VLSI processor that can handle a VGA 30-fps image sequence at a clock frequency of 332MHz. The core size and power consumption are estimated at 3.50×3.00mm² and 600mW, respectively, in a 90-nm process technology.

View full abstract

Download PDF (6371K)
A Sub 100mW H.264 MP@L4.1 Integer-Pel Motion Estimation Processor Core for MBAFF Encoding with Reconfigurable Ring-Connected Systolic Array and Segmentation-Free, Rectangle-Access Search-Window Buffer

Yuichiro MURACHI, Junichi MIYAKOSHI, Masaki HAMAMOTO, Takahiro IINUMA, ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 465-478
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.465

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We describe a sub 100-mW H.264 MP@L4.1 integerpel motion estimation processor core for low power video encoder. It supports macro block adaptive frame field (MBAFF) encoding and bidirectional prediction for a resolution of 1920×1080 pixels at 30fps. The proposed processor features a novel hierarchical algorithm, reconfigurable ring-connected systolic array architecture and segmentation-free, rectangle-access search window buffer. The hierarchical algorithm consists of a fine search and a coarse search. A complementary recursive cross search is newly introduced in the coarse search. The fine search is adaptively carried out, based on an image analysis result obtained by the coarse search. The proposed systolic array architecture minimizes the amount of transferred data, and lowers computation cycles for the coarse and fine searches. In addition, we propose a novel search window buffer SRAM that has instantaneous accessibility to a rectangular area with arbitrary location. The processor core has been designed with a 90nm CMOS design rule. Core size is 2.5×2.5mm². One core supports one-reference-frame and dissipates 48mW at 1V. Two core configuration consumes 96mW for two-reference-frame search.

View full abstract

Download PDF (12476K)
Design of a Trinocular-Stereo-Vision VLSI Processor Based on Optimal Scheduling

Masanori HARIYAMA, Naoto YOKOYAMA, Michitaka KAMEYAMA

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 479-486
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.479

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a processor architecture for high-speed and reliable tinocular stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on regularity of reference pixels. The stereo matching processor is designed in a 0.18μm CMOS technology. The processing time is 83.2μs@100MHz. By using optimal scheduling, the increases in area and processing time is only 5% and 3% respectively compared to binocular stereo vision although the computational amount is double.

View full abstract

Download PDF (6338K)
Automatic Synthesis of Cost Effective FFT/IFFT Cores for VLSI OFDM Systems

Nicola E. L'INSALATA, Sergio SAPONARA, Luca FANUCCI, Pierangelo TERREN ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 487-496
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.487

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This work presents an FFT/IFFT core compiler particularly suited for the VLSI implementation of OFDM communication systems. The tool employs an architecture template based on the pipelined cascade principle. The generated cores support run-time programmable length and transform type selection, enabling seamless integration into multiple mode and multiple standard terminals. A distinctive feature of the tool is its accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results of the generated macrocells are presented for two deep sub-micron standard-cells libraries (65 and 90nm) and commercially available FPGA devices. When compared with other tools for automatic FFT core generation, the proposed environment produces macrocells with lower circuit complexity expressed as gate count and RAM/ROM bits, while keeping the same system level performance in terms of throughput, transform size and numerical accuracy.

View full abstract

Download PDF (3234K)
A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions

Hamid NOORI, Farhad MEHDIPOUR, Koji INOUE, Kazuaki MURAKAMI

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 497-508
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.497

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.

View full abstract

Download PDF (5316K)
Regular Fabric of Via Programmable Logic Device Using EXclusive-or Array (VPEX) for EB Direct Writing

Akihiro NAKAMURA, Masahide KAWARASAKI, Kouta ISHIBASHI, Masaya YOSHIKA ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 509-516
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.509

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The photo-mask cost of standard-cell-based ASICs has been increased so prohibitively that low-volume production LSIs are difficult to fabricate due to high non-recurring engineering (NRE) cost including mask cost. Recently, user-programmable devices, such as FPGAs are started to be used for low-volume consumer products. However, FPGAs cannot be replaced for general purpose because of its lower speed-performance and higher power consumption. In this paper, we propose the user-programmable architecture called VPEX (Via Programmable logic device using EXclusive-or array), in which the hardware logic can be programmed by changing layout patterns on 2 via-layers. The logic element (LE) of VPEX consists of complex-gate-type EXclusive OR (EXOR) and Inverter (NOT) gates. The single LE can output 12 logics which include NOT, Buffer (BUF), all 2-inputs logic functions, 3-inputs AOI21 and inverted-output multiplexer (MUXI) by changing via-1 layout pattern. Furthermore, via-1 layout is optimized for high-throughput EB direct writing, so mask-less programming will be realized in VPEX. We compared the performance of area, speed, and power consumption of VPEX with that of standard-cell-based ASICs and FPGAs. As a result, the speed performance of VPEX was much better than FPGAs and about 1.3-1.6 times worse than standard-cells. We believe that the combination of VPEX architecture and EB direct writing is the best solution for low-volume production LSIs.

View full abstract

Download PDF (5732K)
Multi-Context FPGA Using Fine-Grained Interconnection Blocks and Its CAD Environment

Hasitha Muthumala WAIDYASOORIYA, Weisheng CHONG, Masanori HARIYAMA, Mi ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 517-525
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.517

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Dynamically-programmable gate arrays (DPGAs) promise lower-cost implementations than conventional field-programmable gate arrays (FPGAs) since they efficiently reuse limited hardware resources in time. One of the typical DPGA architectures is a multi-context FPGA (MC-FPGA) that requires multiple memory bits per configuration bit to realize fast context switching. However, this additional memory bits cause significant overhead in area and power consumption. This paper presents novel architecture of a switch element to overcome the required capacity of configuration memory. Our main idea is to exploit redundancy between different contexts by using a fine-grained switch element. The proposed MC-FPGA is designed in a 0.18μm CMOS technology. Its maximum clock frequency and the context switching frequency are measured to be 310MHz and 272MHz, respectively. Moreover, novel CAD process that exploits the redundancy in configuration data, is proposed to support the MC-FPGA architecture.

View full abstract

Download PDF (4316K)
A Design of Constant-Charge-Injection Programming Scheme for AG-AND Flash Memories Using Array-Level Analytical Model

Shinya KAJIYAMA, Ken'ichiro SONODA, Kazuo OTSUGA, Hideaki KURATA, Kiyo ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 526-533
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.526

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A design methodology optimizing constant-charge-injection programming (CCIP) for assist-gate (AG)-AND flash memories is proposed. Transient circuit simulations using an array-level model including lucky electron model (LEM) current source describing hot electron physics enables a concept design over the whole memory-string in advance of wafer manufacturing. The dynamic programming behaviors of various CCIP sequences, obtained by circuit simulations using the model is verified with the measurement results of 90-nm AG-AND flash memory, and we confirmed that the simulation results sufficiently agree with the measurement, considering the simulation results give optimum bias AG voltage approximately within 0.2V error. Then, we have applied the model to a conceptual design and have obtained optimum bit line capacitance value and CCIP sequence those are the most important issues involved in high-throughput programming for an AG-AND array.

View full abstract

Download PDF (3605K)
FinFET-Based Flex-Vth SRAM Design for Drastic Standby-Leakage-Current Reduction

Shin-ichi O'UCHI, Meishoku MASAHARA, Kazuhiko ENDO, Yongxun LIU, Takas ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 534-542
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.534

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Aiming at drastically reducing standby leakage current, an SRAM using Four-Terminal- (4T-) FinFETs, named Flex-Vth SRAM, with a dynamic row-by-row threshold voltage control (RRTC) was developed. The Flex-Vth SRAM realizes an extremely low standby-leakage current thanks to the flexible threshold-voltage (Vth) controllability of the 4T-FinFETs, while its access speed and static noise margin (SNM) are maintained. A TCAD-based Monte Carlo simulation indicates that even when the process-induced random variation in the device performance is taken into account, the Flex-Vth SRAM reduces the leakage current to 1/100 of that of a standard SRAM in a 256×256 array, where 20-nm-gate-length technologies with the same on-current are assumed.

View full abstract

Download PDF (7005K)
A 10T Non-precharge Two-Port SRAM Reducing Readout Power for Video Processing

Hiroki NOGUCHI, Yusuke IGUCHI, Hidehiro FUJIWARA, Shunsuke OKUMURA, Ya ...

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 543-552
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.543

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We propose a low-power non-precharge-type two-port SRAM for video processing that exploits statistical similarity in images. To minimize the charge/discharge power on a read bitline, the proposed memory cell (MC) has ten transistors (10T), comprised of the conventional 6T MC, a readout inverter and a transmission gate for a read port. In addition, to incorporate three wordlines, we propose a shared wordline structure, with which the vertical cell size of the 10T MC is fitted to the same size as the conventional 8T MC. Since the readout inverter fully charges/discharges a read bitline, there is no precharge circuit on the read bitline. Thus, power is not consumed by precharging, but is consumed only when a readout datum is changed. This feature is suitable to video processing since image data have spatial correlation and similar data are read out in consecutive cycles. As well as the power reduction, the prechargeless structure shortens a cycle time by 38% compared with the conventional SRAM, because it does not require a precharge period. This, in turn, demonstrates that the proposed SRAM operates at a lower voltage, which achieves further power reduction. Compared to the conventional 8T SRAM, the proposed SRAM reduces a charge/discharge possibility to 19% (81% saving) on the bitlines. As the measurement result, we confirmed that the proposed 64-kb video memory in a 90-nm process achieves an 85% power saving on the read bitline, when considered as an H.264 reconstructed image memory. The area overhead is 14.4%.

View full abstract

Download PDF (6692K)
Clock Driver Design for Low-Power High-Speed 90-nm CMOS Register Array

Tadayoshi ENOMOTO, Suguru NAGAYAMA, Hiroaki SHIKANO, Yousuke HAGIWARA

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 553-561
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.553

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The delay time (t_dT), power dissipation (P_T) and circuit volume of a CMOS register array were minimized. Seven test circuits, each of which had a register array and a single clock tree that generated a pair of complement clock pulses, and a conventional register were fabricated using 90-nm CMOS technology. The register array was constructed with M delay flip-flops (FFs) and the clock tree, which consisted of 2 driver stages. Each driver stage had m inverters, each of which drove M/m FFs where M was fixed at 40 and m varied from 1 to 40. The minimum values of t_dT and P_T were 0.25ns and 17.88μW, respectively, and were both obtained when m was 10. These values were 71.4% and 70.4% of t_dT and P_T for the conventional register, for which m is 40, respectively. The number of inverters in the clock tree when m was 10 was 21 which was only 25.9% that for the conventional register. The measured results agreed well with SPICE-simulated results. Furthermore, for values of M from 20 to 320, both the minimum t_dT and the minimum P_T were obtained when m was approximately 1.5 times the square root of M.

View full abstract

Download PDF (2973K)
Statistical Corner Conditions of Interconnect Delay (Corner LPE Specifications)

Kenta YAMADA, Noriaki ODA

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 562-570
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.562

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Timing closure in LSI design is becoming more and more difficult. But the conventional interconnect RC extraction method has over-margins caused by its corner conditions settings. In this paper, statistical corner conditions using the independence of variations between process parameters and between interconnect layers are proposed, with examinations using the measurement data. As a result of the method, the fast-to-slow guardband decreases by half in average, compared to the conventional method. The proposed method is ready for implementation to LPE tools.

View full abstract

Download PDF (2068K)
Redundant Vias Insertion for Performance Enhancement in 3D ICs

Xu ZHANG, Xiaohong JIANG, Susumu HORIGUCHI

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 571-580
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.571

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Three dimensional (3D) integrated circuits (ICs) have the potential to significantly enhance VLSI chip performance, functionality and device packing density. Interconnects delay and signal integrity issues are critical in chip design. In this paper, we extend the idea of redundant via insertion of conventional 2D ICs and propose an approach for vias insertion/placement in 3D ICs to minimize the propagation delay of interconnects with the consideration of signal integrity. The simulation results based on a 65nm CMOS technology demonstrate that our approach in general can result in a 9% improvement in average delay and a 26% decrease in reflection coefficient. It is also shown that the proposed approach can be more effective for interconnects delay improvement when it is integrated with the buffer insertion in 3D ICs.

View full abstract

Download PDF (4770K)
Power-Aware Asynchronous Peer-to-Peer Duplex Communication System Based on Multiple-Valued One-Phase Signaling

Kazuyasu MIZUSAWA, Naoya ONIZAWA, Takahiro HANYU

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 581-588
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.581

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a design of an asynchronous peer-to-peer half-duplex/full-duplex-selectable data-transfer system on-chip interconnected. The data-transfer method between channels is based on a 1-phase signaling scheme realized by using multiple-valued current-mode (MVCM) circuits and encoding, which performs high-speed communication. A data transmission is selectable by adding a mode-detection circuit that observes data-transmission modes; full-duplex, half duplex and standby modes. Especially, since current sources are completely cut off during the standby mode, the power dissipation can be greatly reduced. Moreover, both half-duplex and full-duplex communication can be realized by sharing a common circuit except a signal-level conversion circuit. The proposed interface is implemented using 0.18-μm CMOS, and its performance improvement is discussed in comparison with those of the other ordinary asynchronous methods.

View full abstract

Download PDF (5450K)
Highly Reliable Multiple-Valued Current-Mode Comparator Based on Active-Load Dual-Rail Operation

Masatomo MIURA, Takahiro HANYU

Article type: PAPER
2008 Volume E91.C Issue 4 Pages 589-594
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.589

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, a multiple-valued current-mode (MVCM) circuit based on active-load dual-rail differential logic is proposed for a high-performance arithmetic VLSI system with crosstalk-noise immunity. The use of dual-rail complementary differential-pair circuits (DPCs), whose outputs are summed up by wiring makes it possible to reduce the common-mode noise, and yet enhance the switching speed. By using the diode-connected cross-coupled PMOS active loads, the rapid transition of switching in the DPC is relaxed appropriately, which can also eliminate spiked input noise. It is demonstrated that the noise reduction ratio and the switching delay of the proposed MVCM circuit in a 90nm CMOS technology is superior to those of the corresponding ordinary implementation.

View full abstract

Download PDF (1948K)

Regular Section

Co-modeling, Experimental Verification, and Analysis of Chip-Package Hierarchical Power Distribution Network

Hyunjeong PARK, Hyungsoo KIM, Jun So PAK, Changwook YOON, Kyoungchoul ...

Article type: PAPER
Subject area: Electromagnetic Theory
2008 Volume E91.C Issue 4 Pages 595-606
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.595

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we present and verify a new chip-package co-modeling and simulation approach for a low-noise chip-package hierarchical power distribution network (PDN) design. It is based on a hierarchical modeling to combine distributed circuit models at both chip-level PDN and package-level PDN. In particular, it includes all on- and off-chip parasitic circuit elements in the hierarchical PDN with a special consideration on on-chip decoupling capacitor design and placement inside chip. The proposed hierarchical PDN model was successfully validated with good correlations and subsequent analysis to a series of Z11 and Z21 PDN impedance measurements with a frequency range from 1MHz to 3GHz. Using the proposed model, we can analyze and estimate the performance of the chip-package hierarchical PDN as well as can predict the effect of high frequency electromagnetic interactions between the chip-level PDN and the package-level PDN. Furthermore, we can precisely anticipate PDN resonance frequencies, noise generation sources, and noise propagation paths through the multiple levels in the hierarchical PDN.

View full abstract

Download PDF (8735K)
TM Plane Wave Reflection and Transmission from a One-Dimensional Random Slab

Yasuhiko TAMURA

Article type: PAPER
Subject area: Electromagnetic Theory
2008 Volume E91.C Issue 4 Pages 607-614
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.607

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper deals with a TM plane wave reflection and transmission from a one-dimensional random slab with stratified fluctuation by means of the stochastic functional approach. Based on a previous manner [IEICE Trans. Electron. E88-C, 4, pp. 713-720, 2005], an explicit form of the random wavefield is obtained in terms of a Wiener-Hermite expansion with approximate expansion coefficients (Wiener kernels) under small fluctuation. The optical theorem and coherent reflection coefficient are illustrated in figures for several physical parameters. It is then found that the optical theorem by use of the first two or three order Wiener kernels holds with good accuracy and a shift of Brewster's angle appears in the coherent reflection.

View full abstract

Download PDF (1878K)
Design Method for a Low-Profile Dual-Shaped Reflector Antenna with an Elliptical Aperture by the Suppression of Undesired Scattering

Yoshio INASAWA, Shinji KURODA, Kenji KUSAKABE, Izuru NAITO, Yoshihiko ...

Article type: PAPER
Subject area: Electromagnetic Theory
2008 Volume E91.C Issue 4 Pages 615-624
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.615

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A design method is proposed for a low-profile dual-shaped reflector antenna for the mobile satellite communications. The antenna is required to be low-profile because of mount restrictions. However, reduction of its height generally causes degradation of antenna performance. Firstly, an initial low-profile reflector antenna with an elliptical aperture is designed by using Geometrical Optics (GO) shaping. Then a Physical Optics (PO) shaping technique is applied to optimize the gain and sidelobes including mitigation of undesired scattering. The developed design method provides highly accurate design procedure for electrically small reflector antennas. Fabrication and measurement of a prototype antenna support the theory.

View full abstract

Download PDF (3855K)
Planar T-Shaped Monopole Antenna for WLAN/WiMAX Applications

Jhin-Fang HUANG, Shih-Huang WU

Article type: PAPER
Subject area: Electromagnetic Theory
2008 Volume E91.C Issue 4 Pages 625-630
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.625

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A multiband T-shaped monopole antenna for WLAN/WiMAX applications is presented. The T-shaped monopole is comprised of two horizontal arms of different lengths, which generate two separate resonant modes for 2.5/5.5GHz WLAN/WiMAX bands, and with a shortened parasitic element, which generates a middle resonant mode for 3.5GHz WiMAX band, for seamless wireless network access applications. The proposed antenna has been successfully simulated and implemented. Both results of simulation and measurement show good agreement. For the lower band from 2.3 to 2.7GHz, the gain varies in the range of 2.5-3.3dB, while the radiation efficiency is from 72% to 85% over the band. As for the middle band from 3.3 to 3.7GHz, the gain varies from 1.5 to 2.0dB, and the radiation efficiency is from 62% to 70%. As for the upper band from 5.2 to 5.8GHz, the antenna gain varies from 5.4 to 5.9dB, and the radiation efficiency is from 63% to 66%.

View full abstract

Download PDF (5300K)
Characterization of Two-Stage Composite Right- and Left-Handed Transmission Lines

Shun NAKAGAWA, Koichi NARAHARA

Article type: PAPER
Subject area: Electromagnetic Theory
2008 Volume E91.C Issue 4 Pages 631-637
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.631

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The characteristics of two-stage composite right- and left-handed (CRLH) transmission lines are discussed. The dispersion relation-ship of both balanced and unbalanced two-stage CRLH lines is described, together with numerical calculations that demonstrate their potential.

View full abstract

Download PDF (1917K)
Computer Simulation about Temperature Distribution of an EM-Wave Absorber Using a Coupled Analysis Method

Shinya WATANABE, Akitoshi TANIGUCHI, Kota SAITO, Osamu HASHIMOTO, Tosh ...

Article type: PAPER
Subject area: Microwaves, Millimeter-Waves
2008 Volume E91.C Issue 4 Pages 638-646
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.638

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Utilization of electromagnetic absorbers under high power is increasing. The absorbers are used in anechoic chambers for performance estimation of high power radars. Variation of the absorption characteristics of the absorbers under such conditions is expected, due to the generation of heat or temperature change. In this paper, first the temperature distribution of a λ/4 type EM-wave absorber under high power injection is examined using the coupled method. The coupled method can calculate the electromagnetic field and all of the heat transmissions (heat transport, heat transfer and heat radiation). Next, the power injection experiment is examined using the absorber and high power instruments to get the temperature distribution experimentally. Finally the calculated and measured temperature distributions of the absorber are compared and discussed.

View full abstract

Download PDF (5100K)
Concise Modeling of Transistor Variations in an LSI Chip and Its Application to SRAM Cell Sensitivity Analysis

Masakazu AOKI, Shin-ichi OHKAWA, Hiroo MASUDA

Article type: PAPER
Subject area: Semiconductor Materials and Devices
2008 Volume E91.C Issue 4 Pages 647-654
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.647

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Random variations in I_d-V_g characteristics of MOS transistors in an LSI chip are shown to be concisely characterized by using only 3 transistor parameters (V_th, β₀, υ_SAT) in the MOS level 3 SPICE model. Statistical analyses of the transistor parameters show that not only the threshold voltage variation, ΔV_th, but also the current factor variation, Δβ₀, independently induces I_d-variation, and that Δβ₀ is negatively correlated with the saturation velocity variation, Δυ_SAT. Using these results, we have proposed a simple method that effectively takes the correlation between parameters into consideration when creating statistical model parameters for designing a circuit. Furthermore, we have proposed a sensitivity analysis methodology for estimating the process window of SRAM cell operation taking transistor variability into account. By applying the concise statistical model parameters to the sensitivity analysis, we are able to obtain valid process windows without the large volume of data-processing and long turnaround time associated with the Monte Carlo simulation. The processs window was limited not only by ΔV_th, but also by Δβ₀ which enhanced the failure region in the process window by 20%.

View full abstract

Download PDF (2446K)
A PVT Tolerant STM-16 Clock-and-Data Recovery LSI Using an On-Chip Loop-Gain Variation Compensation Architecture in 0.20-μm CMOS/SOI

Yusuke OHTOMO, Hiroshi KOIZUMI, Kazuyoshi NISHIMURA, Masafumi NOGAWA

Article type: PAPER
Subject area: Integrated Electronics
2008 Volume E91.C Issue 4 Pages 655-661
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.655

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes an on-chip loop gain variation compensation architecture for a clock and data recovery (CDR) LSI. The CDR LSI using the proposed architecture can meet the jitter specifications recommended in ITU-T G. 958 under wide variation of temperature and supply voltage. The relation between the jitter specifications and the loop gain is derived theoretically. Gain-variation characteristics of component circuits are studied by circuit simulation. The proposed architecture uses voltage controllers to reduce the gain variation of the LC voltage controlled oscillator (LC-VCO) circuit and charge-pump circuit. The voltage controllers are designed to have a first-order positive coefficient to temperature, which is found by an analysis of the gain variation characteristics. An STM-16 CDR with the proposed architecture is implemented in 0.20-μm fully depleted CMOS/SOI. The CDR shows a wide capture range of ±140MHz and meets both the jitter transfer and the jitter tolerance specifications in the ambient temperature range from -40 to 85°C and with the supply voltage variation of ±6%.

View full abstract

Download PDF (3975K)
An Ultra-Low-Voltage Ultra-Low-Power Weak Inversion Composite MOS Transistor: Concept and Applications

Luis H. C. FERREIRA, Tales C. PIMENTA, Robson L. MORENO

Article type: LETTER
Subject area: Electronic Circuits
2008 Volume E91.C Issue 4 Pages 662-665
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.662

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This work presents an ultra-low-voltage ultra-low-power weak inversion composite MOS transistor. The steady state power consumption and the linear swing signal of the composite transistor are comparable to a single transistor, whereas presenting very high output impedance. This work also presents two interesting applications for the composite transistor; a 1:1 current mirror and an extremely low power temperature sensor, a thermistor. Both implementations are verified in a standard 0.35-μm TSMC CMOS process. The current mirror presents high output impedance, comparable to the cascode configuration, which is highly desirable to improve gain and PSRR of amplifiers circuits, and mirroring relation in current mirrors.

View full abstract

Download PDF (670K)
High-Input and Low-Output Impedance Voltage-Mode Universal DDCC and FDCCII Filter

Hua-Pin CHEN, Wan-Shing YANG

Article type: LETTER
Subject area: Electronic Circuits
2008 Volume E91.C Issue 4 Pages 666-669
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.666

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Despite the extensive literature on current conveyor-based universal (namely, low-pass, band-pass, high-pass, notch, and all-pass) biquads with three inputs and one output, no filter circuits have been reported to date which simultaneously achieve the following seven important features: (i) employment of only two current conveyors, (ii) employment of only grounded capacitors, (iii) employment of only grounded resistors, (iv) high-input and low-output impedance, (v) no need to employ inverting type input signals, (vi) no need to impose component choice conditions to realize specific filtering functions, and (vii) low active and passive sensitivity performances. This letter describes a new voltage-mode biquad circuit that satisfies all the above features simultaneously, and without trade-offs.

View full abstract

Download PDF (458K)
A Low-Cost BIST Based on Histogram Testing for Analog to Digital Converters

Kicheol KIM, Youbean KIM, Incheol KIM, Hyeonuk SON, Sungho KANG

Article type: LETTER
Subject area: Semiconductor Materials and Devices
2008 Volume E91.C Issue 4 Pages 670-672
Published: April 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietele/e91-c.4.670

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this letter a histogram-based BIST (Built-In Self-Test) approach for deriving the main characteristic parameters of an ADC (Analog to Digital Converter) such as offset, gain and non-linearities is proposed. The BIST uses a ramp signal as an input signal and two counters as a response analyzer to calculate the derived static parameters. Experimental results show that the proposed method reduces the hardware overhead and testing time while detecting any static faults in an ADC.

View full abstract

Download PDF (2098K)

Register with J-STAGE for free!