IEICE Transactions on Electronics

Special Section on Circuits and Design Techniques for Advanced Large Scale Integration

FOREWORD

Kunio UCHIYAMA

2011Volume E94.CIssue 4 Pages 385
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.385

JOURNAL RESTRICTED ACCESS

Download PDF (94K)
Prospective Silicon Applications and Technologies in 2025

Koji KAI, Minoru FUJISHIMA

Article type: INVITED PAPER
2011Volume E94.CIssue 4 Pages 386-393
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.386

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Today, practical semiconductor products are an integral part of our lives and the infrastructure of society, and this trend will continue in the future. New areas of application will expand into medical, environmental, and agriculture (food)-related fields in addition to the conventional information and communication technology (ICT)-related field. Low-cost semiconductor devices with advanced functions have thus far been realized by miniaturization. However, we are now approaching the physical limit of miniaturization, and also, the investment required for new semiconductor manufacturing facilities has become huge. Under such circumstances, we propose an approach based on semiconductor devices called microcube chips and ideas of semiconductor development, i.e., agile integration and “inch-fab.” Our approach is expected to contribute to expanding the range of companies that can fabricate semiconductor devices to include small-size companies, exploring new applications of semiconductor devices, and providing a wide variety of semiconductor devices at a low cost from the semiconductor industry.

View full abstract

Download PDF (1197K)
Low Power Platform for Embedded Processor LSIs

Toru SHIMIZU, Kazutami ARIMOTO, Osamu NISHII, Sugako OTANI, Hiroyuki K ...

Article type: INVITED PAPER
2011Volume E94.CIssue 4 Pages 394-400
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.394

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Various low power technologies have been developed and applied to LSIs from the point of device and circuit design. A lot more CPU cores as well as function IPs are integrated on a single chip LSI today. Therefore, not only the device and circuit low power technologies, but software power control technologies are becoming more important to reduce active power of application systems. This paper overviews the low power technologies and defines power management platform as a combination of hardware functions and software programming interface. This paper discusses importance of the power management platform and direction of its development.

View full abstract

Download PDF (1273K)
Multiple Region-of-Interest Based H.264 Encoder with a Detection Architecture in Macroblock Level Pipelining

Tianruo ZHANG, Chen LIU, Minghui WANG, Satoshi GOTO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 401-410
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.401

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68k gates. Detection speed is 753fps for CIF format video at the operation frequency of 200MHz in multiple detection mode with power consumption of 0.47mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.

View full abstract

Download PDF (3388K)
Optimized 2-D SAD Tree Architecture of Integer Motion Estimation for H.264/AVC

Yibo FAN, Xiaoyang ZENG, Satoshi GOTO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 411-418
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.411

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Integer Motion Estimation (IME) costs much computation in H.264/AVC video encoder. 2-D SAD tree IME architecture provides very high performance for encoder, and it has been used by many video codec designs. This paper proposes an optimized hardware design of 2-D SAD tree IME. Firstly, a new hardware architecture is proposed to reduce on-chip memory size. Secondly, a new search pattern is proposed to fully use memory bandwidth and reduce external memory access. Thirdly, the data-path is redesigned, and the performance is greatly improved. In order to compare with other IME designs, an IME design support D1 size, 30fps with search range [±32, ±32] is implemented. The hardware cost of this design includes 118 KGates and 8Kb SRAM, the maximum clock frequency is 200MHz. Compared to the original 2-D SAD tree IME, our design saves 87.5% on-chip memory, and achieves 3 times performance than original one. Our design provides a new way to design a low cost and high performance IME for H.264/AVC encoder.

View full abstract

Download PDF (1626K)
A 530Mpixels/s Intra Prediction Architecture for Ultra High Definition H.264/AVC Encoder

Gang HE, Dajiang ZHOU, Jinjia ZHOU, Tianruo ZHANG, Satoshi GOTO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 419-427
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.419

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Intra coding in H.264/AVC significantly enhances video compression efficiency. However, due to the high data dependency of intra prediction in H.264, both pipelining and parallel processing techniques are limited to be applied. Moreover, it is difficult to get high hardware utilization and throughput because of the long block/MB-level reconstruction loops. This paper proposes a high-performance intra prediction architecture that can support H.264/AVC high profile. The proposed MB/block co-reordering can avoid data dependency and improve pipeline utilization. Therefore, the timing constraint of real-time 4096 × 2160 encoding can be achieved with negligible quality loss. 16 × 16 prediction engine and 8 × 8 prediction engine work parallel for prediction and coefficients generating. A reordering interlaced reconstruction is also designed for fully pipelined architecture. It takes only 160 cycles to process one macroblock (MB). Hardware utilization of prediction and reconstruction modules is almost 100%. Furthermore, PE-reusable 8 × 8 intra predictor and hybrid SAD & SATD mode decision are proposed to save hardware cost. The design is implemented by 90nm CMOS technology with 113.2k gates and can encode 4096 × 2160 video sequences at 60fps with operation frequency of 332MHz.

View full abstract

Download PDF (2110K)
Highly Parallel and Fully Reused H.264/AVC High Profile Intra Predictor Generation Engine for Super Hi-Vision 4k×4k@60fps

Yiqing HUANG, Xiaocong JIN, Jin ZHOU, Jia SU, Takeshi IKENAGA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 428-438
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.428

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18µm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60fps. The maximum work frequency of our design under worst condition is 468MHz.

View full abstract

Download PDF (1632K)
Cache Based Motion Compensation Architecture for Quad-HD H.264/AVC Video Decoder

Jinjia ZHOU, Dajiang ZHOU, Gang HE, Satoshi GOTO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 439-447
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.439

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4Sx4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 39∼49%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 24∼40%. When implemented with SMIC 90nm process, this design costs a logic gate count and on-chip memory of 108.8k and 3.1kB respectively. The proposed MC architecture can support real-time processing of 3840x2160@60fps with less than 166MHz.

View full abstract

Download PDF (1154K)
A Low-Power Real-Time SIFT Descriptor Generation Engine for Full-HDTV Video Recognition

Kosuke MIZUNO, Hiroki NOGUCHI, Guangji HE, Yosuke TERACHI, Tetsuya KAM ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 448-457
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.448

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 × 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65nm CMOS technology and occupies 4.2 × 4.2mm² containing 1.1M gates and 1.38Mbit on-chip memory. The measured data demonstrates 38.2mW power consumption at 78MHz and 1.2V.

View full abstract

Download PDF (3312K)
VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

Hiroki NOGUCHI, Kazuo MIURA, Tsuyoshi FUJINAGA, Takanobu SUGAHARA, Hir ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 458-467
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.458

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74MHz) and 84.04% memory bandwidth reduction (549.91MB/s) for real-time 60-k word continuous speech recognition.

View full abstract

Download PDF (1809K)
A Novel Cache Replacement Policy via Dynamic Adaptive Insertion and Re-Reference Prediction

Xi ZHANG, Chongmin LI, Zhenyu LIU, Haixia WANG, Dongsheng WANG, Takesh ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 468-476
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.468

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Previous research illustrates that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently RRIP policy is proposed to improve the performance for such kind of workloads. However, the lack of access recency information in RRIP confuses the replacement policy to make the accurate prediction. To enhance the robustness of RRIP for recency-friendly workloads, we propose an Dynamic Adaptive Insertion and Re-reference Prediction (DAI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. DAI-RRP makes adaptive adjustment on insertion position and prediction value for different access patterns, which makes the policy robust across different workloads and different phases. Simulation results show that DAI-RRP outperforms LRU and RRIP. For a single-core processor with a 1MB 16-way set last-level cache (LLC), DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.1% and 2.7% respectively. Evaluations on quad-core CMP with a 4MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 8.1% and 15.7% respectively. Furthermore, compared to LRU, DAI-RRP consumes the similar hardware for 16-way cache, or even less hardware for high-associativity cache. In summary, the proposed policy is practical and can be easily integrated into existing hardware approximations of LRU.

View full abstract

Download PDF (1278K)
A Dynamic Continuous Signature Monitoring Technique for Reliable Microprocessors

Makoto SUGIHARA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 477-486
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.477

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Reliability issues such as a soft error and NBTI (negative bias temperature instability) have become a matter of concern as integrated circuits continue to shrink. It is getting more and more important to take reliability requirements into account even for consumer products. This paper presents a dynamic continuous signature monitoring (DCSM) technique for high reliable computer systems. The DCSM technique dynamically generates reference signatures as well as runtime ones during executing a program. The DCSM technique stores the generated signatures in a signature table, which is a small storage circuit in a microprocessor, unlike the conventional static continuous signature monitoring techniques and contributes to saving program or data memory space that stores the signatures. Our experiments showed that our DCSM technique protected 1.4-100.0% of executed instructions depending on the size of signature tables.

View full abstract

Download PDF (486K)
All-Digital On-Chip Monitor for PMOS and NMOS Process Variability Utilizing Buffer Ring with Pulse Counter

Tetsuya IIZUKA, Jaehyun JEONG, Toru NAKURA, Makoto IKEDA, Kunihiro ASA ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 487-494
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.487

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes an all-digital process variability monitor which utilizes a simple buffer ring with a pulse counter. The proposed circuit monitors the process variability according to a count number of a single pulse which propagates on the buffer ring and a fixed logic level after the pulse vanishes. The proposed circuit has been fabricated in 65nm CMOS process and the measurement results demonstrate that we can monitor the PMOS and NMOS variabilities independently using the proposed monitoring circuit. The proposed monitoring technique is suitable not only for the on-chip process variability monitoring but also for the in-field monitoring of aging effects such as negative/positive bias instability (NBTI/PBTI).

View full abstract

Download PDF (1085K)
A Continuous-Time Waveform Monitoring Technique for On-Chip Power Noise Measurements in VLSI Circuits

Yoji BANDO, Satoshi TAKAYA, Toru OHKAWA, Toshiharu TAKARAMOTO, Toshio ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 495-503
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.495

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A continuous-time waveform monitoring technique for quality on-chip power noise measurements features matched probing performance among a variety of voltage domains of interest in a VLSI circuit, covering digital Vdd, analog Vdd, as well as at Vss, and multiple probing capability at various locations on power planes. A calibration flow eliminates the offset as well as gain errors among probing channels. The consistency of waveforms acquired by the proposed continuous-time monitoring and sampled-time precise digitization techniques is ensured. A 90-nm CMOS on-chip monitor prototype demonstrates dynamic power supply noise measurements with ±200mV at 2.5V, 1.0V, and 0.0V, respectively, with less than 4mV deviation among 240 probing channels.

View full abstract

Download PDF (2509K)
Stochastic Non-homogeneous Arnoldi Method for Analysis of On-Chip Power Grid Networks under Process Variations

Zhihua GUI, Fan YANG, Xuan ZENG

Article type: PAPER
2011Volume E94.CIssue 4 Pages 504-510
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.504

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, a Stochastic Non-Homogeneous ARnoldi (SNHAR) method is proposed for the analysis of the on-chip power grid networks in the presence of process variations. In SNHAR method, the polynomial chaos based stochastic method is employed to handle the variations of power grids. Different from the existing StoEKS method which uses extended Krylov Subspace (EKS) method to compute the coefficients of the polynomial chaos, a computation-efficient and numerically stable Non-Homogeneous ARnoldi (NHAR) method is employed in SNHAR method to compute the coefficients of the polynomial chaos. Compared with EKS method, NHAR method has superior numerical stability and can achieve remarkably higher accuracy with even lower computational cost. As a result, SNHAR can capture the stochastic characteristics of the on-chip power grid networks with higher accuracy, but even lower computational cost than StoEKS.

View full abstract

Download PDF (557K)
On-Chip Resonant Supply Noise Canceller Utilizing Parasitic Capacitance of Sleep Blocks for Power Mode Switch

Jinmyoung KIM, Toru NAKURA, Hidehiro TAKATA, Koichiro ISHIBASHI, Makot ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 511-519
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.511

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents an on-chip resonant supply noise canceller utilizing parasitic capacitance of sleep blocks. The test chip was fabricated in a 0.18µm CMOS process and measurement results show 43.3% and 12.5% supply noise reduction on the abrupt supply voltage switching and the abrupt wake-up of a sleep block, respectively. The proposed method requires 1.5% area overhead for four 100k-gate blocks, which is 7.1X noise reduction efficient comparing with the conventional decap for the same power supply noise, while achieves 47% improvement of settling time. These results make fast switching of power mode possible for dynamic voltage scaling and power gating.

View full abstract

Download PDF (1765K)
Short Term Cell-Flipping Technique for Mitigating SNM Degradation Due to NBTI

Yuji KUNITAKE, Toshinori SATO, Hiroto YASUURA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 520-529
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.520

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.

View full abstract

Download PDF (1239K)
A Large “Read” and “Write” Margins, Low Leakage Power, Six-Transistor 90-nm CMOS SRAM

Tadayoshi ENOMOTO, Nobuaki KOBAYASHI

Article type: PAPER
2011Volume E94.CIssue 4 Pages 530-538
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.530

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We developed and applied a new circuit, called the “Self-controllable Voltage Level (SVL)” circuit, to achieve an expanded “read” and “write” margins and low leakage power in a 90-nm, 2-kbit, six-transistor CMOS SRAM. At the threshold voltage fluctuation of 6σ, the minimum supply voltage of the newly developed (dvlp.) SRAM for “write” operation was significantly reduced to 0.11V, less than half that of an equivalent conventional (conv.) SRAM. The standby leakage power of the dvlp. SRAM was only 1.17µW, which is 4.64% of that of the conv. SRAM at supply voltage of 1.0V. Moreover, the maximum operating clock frequency of the dvlp. SRAM was 138MHz, which is 15% higher than that (120MHz) of the conv. SRAM at V_MM of 0.4V. An area overhead was 0.81% that of the conv. SRAM.

View full abstract

Download PDF (1408K)
Improvement of Read Disturb, Program Disturb and Data Retention by Memory Cell V_TH Optimization of Ferroelectric (Fe)-NAND Flash Memories for Highly Reliable and Low Power Enterprise Solid-State Drives (SSDs)

Teruyoshi HATANAKA, Mitsue TAKAHASHI, Shigeki SAKAI, Ken TAKEUCHI

Article type: PAPER
2011Volume E94.CIssue 4 Pages 539-547
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.539

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents an improvement of the memory cell reliability by the memory cell V_TH optimization of the ferroelectric (Fe)-NAND flash memory. The effects of the memory cell V_TH on the reliability of the Fe-NAND flash memory are experimentally analyzed for the first time. The reliability is evaluated by the measured V_TH shift due to the read disturb, program disturb and data retention. Three types of Fe-NAND flash memory cells, a positive, zero and negative V_TH memory cell, are defined on the basis of the memory cell V_TH. The middle of V_TH of programmed and erased states is 1V, 0V and -0.3V in a positive, zero and negative V_TH memory cell, respectively. The V_TH shift of the positive, zero and negative V_TH memory cells show similar characteristics in the program/erase and the V_PASS and V_PGM disturbs because the external electric field is so high that the internal depolarization field does not affect the V_TH shift. On the other hand, in the data retention, the V_TH shift of the three types of V_TH memory cells show different characteristics. The reliability of the Fe-NAND flash memory is best optimized in the zeroV_TH memory cell. In the proposed zero V_TH Fe-NAND flash memory cell scheme, the measured V_TH shift due to the read disturb, program disturb and data retention decreases by 32%, 24% and 10%, respectively, compared with conventional positive V_TH Fe-NAND flash memory cell scheme. Contrarily, in the negative V_TH memory cell, the V_TH shift during the data retention is 0.49V and unacceptably large because of the depolarization field. The conventional positive V_TH memory cell suffers from a sever read and program disturb. The measured results are drastically different from those of the conventional floating-gate NAND flash memory cell where the negative V_TH memory cell is most suitable in terms of the reliability.

View full abstract

Download PDF (1445K)
A Genuine Power-Gatable Reconfigurable Logic Chip with FeRAM Cells

Masahiro IIDA, Masahiro KOGA, Kazuki INOUE, Motoki AMAGASAKI, Yoshinob ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 548-556
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.548

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

An advantage of an RLD (reconfigurable logic device) such as an FPGA (field programmable gate array) is that it can be customized after being manufactured. Due to the aggressive technology scaling, device density is increasing, and it has become a serious problem in power consumption accordingly. In SoC of embedded systems, power gating is one of the major power reduction techniques. However, it is difficult to adopt SRAM-based RLDs because of the high overhead and SRAM being volatile. In this paper, we describe a TEG (test element group) chip of a reconfigurable logic based FeRAM (ferroelectric random access memory) technology. FeRAM brings reconfigurable logic devices the advantage of being a genuine power gater. The chip employs island-style routing architecture and uses a variable grain logic cell as a logic block. A NV-FF (non-volatile flip-flop), which contains FeRAM, a FF, and power-gating control circuits, is used as both configuration memories and FFs in a logic block. The NV-FF can transmit data between FeRAM and FF automatically when a power source is turned off/on. Thus chip-level power gating is possible. The hibernate/restore time is less than 1ms. The chip has 18 × 18 logic blocks and an area of 54.76mm².

View full abstract

Download PDF (5368K)
A Single-Chip RF Tuner/OFDM Demodulator for Mobile Digital TV Application

Yoshimitsu TAKAMATSU, Ryuichi FUJIMOTO, Tsuyoshi SEKINE, Takaya YASUDA ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 557-566
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.557

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a single-chip RF tuner/OFDM demodulator for a mobile digital TV application called “1-segment broadcasting.” To achieve required performances for the single-chip receiver, a tunable technique for a low-noise amplifier (LNA) and spurious suppression techniques are proposed in this paper. Firstly, to receive all channels from 470MHz to 770MHz and to relax distortion characteristics of following circuit blocks such as an RF variable-gain amplifier and a mixer, a tunable technique for the LNA is proposed. Then, to improve the sensitivity, spurious signal suppression techniques are also proposed. The single-chip receiver using the proposed techniques is fabricated in 90nm CMOS technology and total die size is 3.26mm × 3.26mm. Using the tunable LNA and suppressing undesired spurious signals, the sensitivities of less than -98.6dBm are achieved for all the channels.

View full abstract

Download PDF (3037K)
A 500 Mb/s Differential Input Non-coherent BPSK Receiver for UWB-IR Communication

Mohiuddin HAFIZ, Nobuo SASAKI, Takamaro KIKKAWA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 567-574
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.567

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A differential input non-coherent BPSK receiver for the UWB-IR communication, based on threshold detection, has been presented in this paper. The chip can recover BPSK modulated Gaussian monocycle pulses (GMP), along with its first derivative, at a data rate of 500Mb/s. No clock reception is required, as the receiver recovers data based on the relative phase of the two simultaneously received inputs. While retrieving the data, it consumes a power of 63mW from a supply voltage of 1.8V. A shunt-peaked narrow band amplifier, matched to the input antenna, is used to amplify the received GMP. Wireless data have been successfully recovered using a pair of horn antennas at a distance of 6cm. The chip, developed in a 180nm CMOS technology, occupies a die area of 3.4mm². The receiver is suitable for the non-coherent (self-synchronized) UWB-IR communication.

View full abstract

Download PDF (1418K)
A 66-dBc Fundamental Suppression Frequency Doubler IC for UWB Sensor Applications

Jiangtao SUN, Qing LIU, Yong-Ju SUH, Takayuki SHIBATA, Toshihiko YOSHI ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 575-581
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.575

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A balanced push-push frequency doubler has been demonstrated in 0.25-µm SOI (Silicon on Insulator) SiGe BiCMOS technology operating from 22GHz to 29GHz with high fundamental frequency suppression and high conversion gain. A series LC resonator circuit is connected in parallel with the differential outputs of the doubler core circuit. The LC resonator is effective to improve the fundamental frequency suppression. In addition, the LC resonator works as a matching circuit between the output of the doubler core and the input of the output buffer amplifier, which increases the conversion gain of the whole circuit. A measured fundamental frequency suppression of greater than 46dBc is achieved at an input power of -10dBm in the output frequency band of 22-29GHz. Moreover, maximum fundamental frequency suppression of 66dBc is achieved at an input frequency of 13GHz and an input power of -10dBm. The frequency doubler works at a supply voltage of 3.3V.

View full abstract

Download PDF (1724K)
An Injection-Controlled 10-Gb/s Burst-Mode CDR Circuit for a 1G/10G PON System

Hiroaki KATSURAI, Hideki KAMITSUNA, Hiroshi KOIZUMI, Jun TERADA, Yusuk ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 582-588
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.582

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

As a future passive optical network (PON) system, the 10 Gigabit Ethernet PON (10G-EPON) has been standardized in IEEE 802.3av. As conventional Gigabit Ethernet PON (GE-PON) systems have already been widely deployed, 1G/10G co-existence technologies are strongly required for the next system. A gated voltage-controlled-oscillator (G-VCO)-based 10-Gb/s burst-mode clock and data recovery (CDR) circuit is presented for a 1G/10G co-existence PON system. It employs two new circuits to improve jitter transfer and provide tolerance to 1G/10G operation. An injection-controlled jitter-reduction circuit reduces output-clock jitter by 7dB from 200-MHz input data jitter while keeping a short lock time of 20ns. A frequency-variation compensation circuit reduces frequency mismatch among the three VCOs on the chip and offers large tolerance to consecutive identical digits. With the compensation, the proposed CDR circuit can employ multi VCOs, which provide tolerance to the 1G/10G co-existence situation. It achieves error-free (bit-error rate <10^-12) operation for 10-G bursts following bursts of other rates, obviously including 1G bursts. It also provides tolerance to a 256-bit sequence without a transition in the data, which is more than enough tolerance for 65-bit CIDs in the 64B/66B code of 10 Gigabit Ethernet.

View full abstract

Download PDF (1604K)
Device Modeling Techniques for High-Frequency Circuits Design Using Bond-Based Design at over 100 GHz

Ryuichi FUJIMOTO, Kyoya TAKANO, Mizuki MOTOYOSHI, Uroschanit YODPRASIT ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 589-597
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.589

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Device modeling techniques for high-frequency circuits operating at over 100GHz are presented. We have proposed the bond-based design as an accurate high-frequency circuit design method. Because layout parasitic extractions (LPE) are not required in the bond-based design, it can be applied high-frequency circuit design at over 100GHz. However, customized device models are indispensable for the bond-based design. In this paper, device modeling techniques for high-frequency circuit design using the bond-based design are proposed. The customized device model for MOSFETs, transmission lines and pads are introduced. By using customized device models, the difference between the simulated and measured gains of an amplifier is improved to less than 0.6dB at 120GHz.

View full abstract

Download PDF (2068K)
0.18-V Input Charge Pump with Forward Body Bias to Startup Boost Converter for Energy Harvesting Applications

Po-Hung CHEN, Koichi ISHIDA, Xin ZHANG, Yasuyuki OKUMA, Yoshikatsu RYU ...

Article type: PAPER
2011Volume E94.CIssue 4 Pages 598-604
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.598

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, a 0.18-V input three-stage charge pump circuit applying forward body bias is proposed for energy harvesting applications. In the developed charge pump, all the MOSFETs are forward body biased by using the inter-stage/output voltages. By applying the proposed charge pump as the startup in the boost converter, the kick-up input voltage of the boost converter is reduced to 0.18V. To verify the circuit characteristics, the conventional zero body bias charge pump and the proposed forward body bias charge pump were fabricated with 65nm CMOS process. The measured output current of the proposed charge pump under 0.18-V input voltage is increased by 170% comparing to the conventional one at the output voltage of 0.5V. In addition, the boost converter successfully boosts the 0.18-V input to higher than 0.65-V output.

View full abstract

Download PDF (2024K)
An Energy Efficiency 4-bit Multiplier with Two-Phase Non-overlap Clock Driven Charge Recovery Logic

Yimeng ZHANG, Leona OKAMURA, Tsutomu YOSHIHARA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 605-612
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.605

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A novel charge-recovery logic structure called Pulse Boost Logic (PBL) is proposed in this paper. PBL is a high-speed low-energy-dissipation charge-recovery logic with dual-rail evaluation tree structure. It is driven by 2-phase non-overlap clock, and requires no DC power supply. PBL belongs to boost logic family, which includes boost logic, enhanced boost logic and subthreshold boost logic. In this paper, PBL has been compared with other charge-recovery logic technologies. To demonstrate the performance of PBL structure, a 4-bit pipeline multiplier is designed and fabricated with 0.18µm CMOS process technology. The simulation results indicate that the 4-bit multiplier can work at a frequency of 1.8GHz, while the measurement of test chip is at operation frequency of 161MHz, and the power dissipation at 161MHz is 772µW.

View full abstract

Download PDF (1104K)
Dicode Partial Response Signaling over Inductively-Coupled Channel

Koichi YAMAGUCHI, Masayuki MIZUNO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 613-618
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.613

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Dicode partial response signaling system over inductively-coupled channel has been developed to achieve higher data rate than self-resonant frequencies of inductors. The developed system operates at five times higher data rates than conventional systems with the same inductor. A current-mode equalization in the transmitter designed in a 90-nm CMOS successfully reshapes waveforms to obtain dicode signals at the receiver. For a 5-Gb/s signaling through the coupled inductors with a 120-µm diameter and a 120-µm distance, 20-mV eye opening was observed. The power consumption value of the transmitter was 58mW at the 5-Gb/s operation.

View full abstract

Download PDF (1211K)
A Duobinary Signaling for Asymmetric Multi-Chip Communication

Koichi YAMAGUCHI, Masayuki MIZUNO

Article type: PAPER
2011Volume E94.CIssue 4 Pages 619-626
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.619

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Duobinary signaling has been introduced into asymmetric multi-chip communications such as DRAM or display interfaces, which allows a controlled amount of ISI to reduce signaling bandwidth by 2/3. A x2 oversampled equalization has been developed to realize Duobinary signaling. Symbol-rate clock recovery form Duobinary signal has been developed to reduce power consumption for receivers. A Duobinary transmitter test chip was fabricated with 90-nm CMOS process. A 3.5dB increase in eye height and a 1.5 times increase in eye width was observed.

View full abstract

Download PDF (2303K)
A 0.18-µm CMOS X-Band Shock Wave Generator with an On-Chip Dipole Antenna and a Digitally Programmable Delay Circuit for Pulse Beam-Formability

Nguyen Ngoc MAI KHANH, Masahiro SASAKI, Kunihiro ASADA

Article type: PAPER
2011Volume E94.CIssue 4 Pages 627-634
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.627

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we present a 0.18-µm CMOS fully integrated X-band shock wave generator (SWG) with an on-chip dipole antenna and a digitally programmable delay circuit (DPDC) for pulse beam-formability in short-range and hand-held microwave active imaging applications. This chip includes a SWG, a 5-bit DPDC and an on-chip wide-band meandering dipole antenna. By using an integrated transformer, output pulse of the SWG is sent to the on-chip meandering dipole antenna. The SWG operates based on damping conditions to produce a 0.4-V peak-to-peak (p-p) pulse amplitude at the antenna input terminals in HSPICE simulation. The DPDC is designed to adjust delays of shock-wave outputs for the purpose of steering beams in antenna array systems. The wide-band dipole antenna element designed in the meandering shape is located in the top metal of a 5-metal-layer 0.18-µm CMOS chip. By simulating in Momentum of ADS 2009, the minimum value of antenna's return loss, S11, and antenna's bandwidth (BW) are -19.37dB and 25.3GHz, respectively. The measured return loss of a stand-alone integrated meandering dipole is from -26dB to -10dB with frequency range of 7.5-12GHz. In measurements of the SWG with the integrated antenna, by using a 20-dB standard gain horn antenna placed at a 38-mm distance from the chip's surface, a 1.1-mVp-p shock wave with a 9-11-GHz frequency response is received. A measured 3-ps pulse delay resolution is also obtained. These results prove that our proposed circuit is suitable for the purpose of fully integrated pulse beam-forming system.

View full abstract

Download PDF (1985K)
A 500 MS/s 600 µW 300 µm² Single-Stage Gain-Improved and Kickback Noise Rejected Comparator in 0.35 µm 3.3 v CMOS Process

Sarang KAZEMINIA, Morteza MOUSAZADEH, Kayrollah HADIDI, Abdollah KHOEI

Article type: BRIEF PAPER
2011Volume E94.CIssue 4 Pages 635-640
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.635

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a high speed single-stage latched comparator which is scheduled in time for both amplification and latch operations. Small active area and simple switching strategy besides desired power consumption at high comparison rates qualifies the proposed comparator to be repeatedly employed in high speed flash A/D converters. A strategy of kickback noise elimination besides gain enhancement is also introduced. A low power holding read-out circuit is presented. Post-Layout simulation results confirm 500MS/s comparison rate with 5mv resolution for a 1.6v peak-to-peak input signal range and 600µw power consumption from a 3.3v power supply by using TSMC model of 0.35µm CMOS technology. Total active area of proposed comparator and read-out circuit is about 300µm².

View full abstract

Download PDF (1495K)

Regular Section

Broadening Adjustable Range on Post-Fabrication Resonance Wavelength Trimming of Long-Period Fiber Gratings and the Mechanisms of Resonance Wavelength Shifts

Fatemeh ABRISHAMIAN, Katsumi MORISHITA

Article type: PAPER
Subject area: Optoelectronics
2011Volume E94.CIssue 4 Pages 641-647
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.641

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The adjustable range on post-fabrication resonance wavelength trimming of long-period fiber gratings was broadened toward the blue side, and the mechanisms of the resonance wavelength shifts caused by heating were investigated. It can be concluded that the glass structure relaxes more slowly than the residual stress with decreasing heating temperature and the blue shift caused by the residual stress relaxation appears more strongly at the early stage of heating. The blue shift of 41nm was obtained by heating a long-period grating at 600°C for 3500 minutes. The changes of the index difference inducing the wavelength shifts of -41nm and 35nm were estimated at about -1.2 × 10^-4 and +1.0 × 10^-4 by numerical analysis, respectively.

View full abstract

Download PDF (944K)
A 7-GHz, Low-Power, Low Phase-Noise Differential Current-Reused VCO Utilizing a Trifilar-Transformer-Feedback Technique

Yan-Ru TSENG, Tzuen-Hsi HUANG, Shang-Hsun WU

Article type: PAPER
Subject area: Microwaves, Millimeter-Waves
2011Volume E94.CIssue 4 Pages 648-653
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.648

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a 7GHz differential current-reused voltage-controlled oscillator (CR-VCO) with low power consumption and low phase noise using 0.18-µm CMOS technology. The output power of this CR-VCO is enhanced by utilizing a trifilar-transformer-feedback technique. The lower phase noise is achieved by the more symmetric voltage swings resulting from the improved balance of switching current. At a 1.5-V DC supply voltage, the power dissipation is only 3.4mW. The total tuning range is 1.4GHz (17.9%) as the tuning voltage ranges from 0V to 1.8V. The optimum phase noise is around -117.3dBc/Hz at a frequency offset of 1MHz from the center frequency of 7.07GHz. The corresponding output power is around -6.8dBm. For the proposed CR-VCO, the calculated figures-of-merit, FOM and FOM_T, are -188.9 and -193.9dBc/Hz, respectively.

View full abstract

Download PDF (790K)
Cascaded Time Difference Amplifier with Differential Logic Delay Cell

Shingo MANDAI, Toru NAKURA, Tetsuya IIZUKA, Makoto IKEDA, Kunihiro ASA ...

Article type: PAPER
Subject area: Electronic Circuits
2011Volume E94.CIssue 4 Pages 654-662
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.654

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We introduce a 16x cascaded time difference amplifier (TDA) using a differential logic delay cell with 0.18µm CMOS process. By employing the differential logic delay cell in the delay chain instead of the CMOS logic delay cell, less than 8% TD gain offset with ±150ps input range is achieved. The input referred standard deviation of the output time difference error is 2.7ps and the input referred is improved by 17% compared with that of the previous TDA using the CMOS logic delay cell.

View full abstract

Download PDF (890K)
A 45-nm 37.3 GOPS/W Heterogeneous Multi-Core SOC with 16/32 Bit Instruction-Set General-Purpose Core

Osamu NISHII, Yoichi YUYAMA, Masayuki ITO, Yoshikazu KIYOSHIGE, Yusuke ...

Article type: PAPER
Subject area: Integrated Electronics
2011Volume E94.CIssue 4 Pages 663-669
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.663

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We built a 12.4mm × 12.4mm, 45-nm CMOS, chip that integrates eight 648-MHz general purpose cores, two matrix processor (MX-2) cores, four flexible engine (FE) cores and media IP (VPU5) to establish heterogeneous multi-core chip architecture. The general purpose core had its IPC (instructions per cycle) performance enhanced by adding 32-bit instructions to the existing 16-bit fixed-length instruction set and executing up to two 32-bit instructions per cycle. Considering these five-to-seven years of embedded LSI and increasing trend of access-master within LSI, we predict that the memory usage of single core will not exceed 32-bit physical area (i.e. 4GB), but chip-total memory usage will exceed 4GB. Based on this prediction, the physical address was expanded from 32-bit to 40-bit. The fabricated chip was tested and a parallel operation of eight general purpose cores and four FE cores and eight data transfer units (DTU) is obtained on AAC (Advanced Audio Coding) encode processing.

View full abstract

Download PDF (2749K)
A Resistor-Compensation Technique for CMOS Bandgap and Current Reference with Simplified Start-Up Circuit

Guo-Ming SUNG, Ying-Tsu LAI, Chien-Lin LU

Article type: BRIEF PAPER
Subject area: Electronic Circuits
2011Volume E94.CIssue 4 Pages 670-673
Published: April 01, 2011
Released on J-STAGE: April 01, 2011

DOIhttps://doi.org/10.1587/transele.E94.C.670

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a resistor-compensation technique for a CMOS bandgap and current reference, which utilizes various high positive temperature coefficient (TC) resistors, a two-stage operational transconductance amplifier (OTA) and a simplified start-up circuit in the 0.35-µm CMOS process. In the proposed bandgap and current reference, numerous compensated resistors, which have a high positive temperature coefficient (TC), are added to the parasitic n-p-n and p-n-p bipolar junction transistor devices, to generate a temperature-independent voltage reference and current reference. The measurements verify a current reference of 735.6nA, the voltage reference of 888.1mV, and the power consumption of 91.28µW at a supply voltage of 3.3V. The voltage TC is 49ppm/°C in the temperature range from 0°C to 100°C and 12.8ppm/°C from 30°C to 100°C. The current TC is 119.2ppm/°C at temperatures of 0°C to 100°C. Measurement results also demonstrate a stable voltage reference at high temperature (> 30°C), and a constant current reference at low temperature (< 70°C).

View full abstract

Download PDF (773K)

Register with J-STAGE for free!