Special Section on Circuits and Design Techniques for Advanced Large Scale Integration
-
Kunio UCHIYAMA
2011 Volume E94.C Issue 4 Pages
385
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
-
Koji KAI, Minoru FUJISHIMA
Article type: INVITED PAPER
2011 Volume E94.C Issue 4 Pages
386-393
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Today, practical semiconductor products are an integral part of our lives and the infrastructure of society, and this trend will continue in the future. New areas of application will expand into medical, environmental, and agriculture (food)-related fields in addition to the conventional information and communication technology (ICT)-related field. Low-cost semiconductor devices with advanced functions have thus far been realized by miniaturization. However, we are now approaching the physical limit of miniaturization, and also, the investment required for new semiconductor manufacturing facilities has become huge. Under such circumstances, we propose an approach based on semiconductor devices called microcube chips and ideas of semiconductor development, i.e., agile integration and “inch-fab.” Our approach is expected to contribute to expanding the range of companies that can fabricate semiconductor devices to include small-size companies, exploring new applications of semiconductor devices, and providing a wide variety of semiconductor devices at a low cost from the semiconductor industry.
View full abstract
-
Toru SHIMIZU, Kazutami ARIMOTO, Osamu NISHII, Sugako OTANI, Hiroyuki K ...
Article type: INVITED PAPER
2011 Volume E94.C Issue 4 Pages
394-400
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Various low power technologies have been developed and applied to LSIs from the point of device and circuit design. A lot more CPU cores as well as function IPs are integrated on a single chip LSI today. Therefore, not only the device and circuit low power technologies, but software power control technologies are becoming more important to reduce active power of application systems. This paper overviews the low power technologies and defines power management platform as a combination of hardware functions and software programming interface. This paper discusses importance of the power management platform and direction of its development.
View full abstract
-
Tianruo ZHANG, Chen LIU, Minghui WANG, Satoshi GOTO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
401-410
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper proposes a region-of-interest (ROI) based H.264 encoder and the VLSI architecture of the ROI detection algorithm. In ROI based video coding system, pre-processing unit to detect ROI should only introduce low computational complexity overhead due to the low power requirement. The Macroblocks (MBs) in ROIs are detected sequentially in the same order of H.264 encoding to satisfy the MB level pipelining of ROI detector and H.264 encoder. ROI detection is performed in a novel estimation-and-verification process with an ROI contour template. Proposed architecture can be configured to detect either single ROI or multiple ROIs in each frame and the throughput of single detection mode is 5.5 times of multiple detection mode. 98.01% and 97.89% of MBs in ROIs can be detected in single and multiple detection modes respectively. Hardware cost of proposed architecture is only 4.68k gates. Detection speed is 753fps for CIF format video at the operation frequency of 200MHz in multiple detection mode with power consumption of 0.47mW. Compared with previous fast ROI detection algorithms for video coding application, the proposed architecture obtains more accurate and smaller ROI. Therefore, more efficient ROI based computation complexity and compression efficiency optimization can be implemented in H.264 encoder.
View full abstract
-
Yibo FAN, Xiaoyang ZENG, Satoshi GOTO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
411-418
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Integer Motion Estimation (IME) costs much computation in H.264/AVC video encoder. 2-D SAD tree IME architecture provides very high performance for encoder, and it has been used by many video codec designs. This paper proposes an optimized hardware design of 2-D SAD tree IME. Firstly, a new hardware architecture is proposed to reduce on-chip memory size. Secondly, a new search pattern is proposed to fully use memory bandwidth and reduce external memory access. Thirdly, the data-path is redesigned, and the performance is greatly improved. In order to compare with other IME designs, an IME design support D1 size, 30fps with search range [±32, ±32] is implemented. The hardware cost of this design includes 118 KGates and 8Kb SRAM, the maximum clock frequency is 200MHz. Compared to the original 2-D SAD tree IME, our design saves 87.5% on-chip memory, and achieves 3 times performance than original one. Our design provides a new way to design a low cost and high performance IME for H.264/AVC encoder.
View full abstract
-
Gang HE, Dajiang ZHOU, Jinjia ZHOU, Tianruo ZHANG, Satoshi GOTO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
419-427
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Intra coding in H.264/AVC significantly enhances video compression efficiency. However, due to the high data dependency of intra prediction in H.264, both pipelining and parallel processing techniques are limited to be applied. Moreover, it is difficult to get high hardware utilization and throughput because of the long block/MB-level reconstruction loops. This paper proposes a high-performance intra prediction architecture that can support H.264/AVC high profile. The proposed MB/block co-reordering can avoid data dependency and improve pipeline utilization. Therefore, the timing constraint of real-time 4096 × 2160 encoding can be achieved with negligible quality loss. 16 × 16 prediction engine and 8 × 8 prediction engine work parallel for prediction and coefficients generating. A reordering interlaced reconstruction is also designed for fully pipelined architecture. It takes only 160 cycles to process one macroblock (MB). Hardware utilization of prediction and reconstruction modules is almost 100%. Furthermore, PE-reusable 8 × 8 intra predictor and hybrid SAD & SATD mode decision are proposed to save hardware cost. The design is implemented by 90nm CMOS technology with 113.2k gates and can encode 4096 × 2160 video sequences at 60fps with operation frequency of 332MHz.
View full abstract
-
Yiqing HUANG, Xiaocong JIN, Jin ZHOU, Jia SU, Takeshi IKENAGA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
428-438
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18µm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60fps. The maximum work frequency of our design under worst condition is 468MHz.
View full abstract
-
Jinjia ZHOU, Dajiang ZHOU, Gang HE, Satoshi GOTO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
439-447
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4Sx4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 39∼49%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 24∼40%. When implemented with SMIC 90nm process, this design costs a logic gate count and on-chip memory of 108.8k and 3.1kB respectively. The proposed MC architecture can support real-time processing of 3840x2160@60fps with less than 166MHz.
View full abstract
-
Kosuke MIZUNO, Hiroki NOGUCHI, Guangji HE, Yosuke TERACHI, Tetsuya KAM ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
448-457
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 × 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65nm CMOS technology and occupies 4.2 × 4.2mm
2 containing 1.1M gates and 1.38Mbit on-chip memory. The measured data demonstrates 38.2mW power consumption at 78MHz and 1.2V.
View full abstract
-
Hiroki NOGUCHI, Kazuo MIURA, Tsuyoshi FUJINAGA, Takanobu SUGAHARA, Hir ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
458-467
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74MHz) and 84.04% memory bandwidth reduction (549.91MB/s) for real-time 60-k word continuous speech recognition.
View full abstract
-
Xi ZHANG, Chongmin LI, Zhenyu LIU, Haixia WANG, Dongsheng WANG, Takesh ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
468-476
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Previous research illustrates that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently RRIP policy is proposed to improve the performance for such kind of workloads. However, the lack of access recency information in RRIP confuses the replacement policy to make the accurate prediction. To enhance the robustness of RRIP for recency-friendly workloads, we propose an Dynamic Adaptive Insertion and Re-reference Prediction (DAI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. DAI-RRP makes adaptive adjustment on insertion position and prediction value for different access patterns, which makes the policy robust across different workloads and different phases. Simulation results show that DAI-RRP outperforms LRU and RRIP. For a single-core processor with a 1MB 16-way set last-level cache (LLC), DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.1% and 2.7% respectively. Evaluations on quad-core CMP with a 4MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 8.1% and 15.7% respectively. Furthermore, compared to LRU, DAI-RRP consumes the similar hardware for 16-way cache, or even less hardware for high-associativity cache. In summary, the proposed policy is practical and can be easily integrated into existing hardware approximations of LRU.
View full abstract
-
Makoto SUGIHARA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
477-486
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Reliability issues such as a soft error and NBTI (negative bias temperature instability) have become a matter of concern as integrated circuits continue to shrink. It is getting more and more important to take reliability requirements into account even for consumer products. This paper presents a dynamic continuous signature monitoring (DCSM) technique for high reliable computer systems. The DCSM technique dynamically generates reference signatures as well as runtime ones during executing a program. The DCSM technique stores the generated signatures in a signature table, which is a small storage circuit in a microprocessor, unlike the conventional static continuous signature monitoring techniques and contributes to saving program or data memory space that stores the signatures. Our experiments showed that our DCSM technique protected 1.4-100.0% of executed instructions depending on the size of signature tables.
View full abstract
-
Tetsuya IIZUKA, Jaehyun JEONG, Toru NAKURA, Makoto IKEDA, Kunihiro ASA ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
487-494
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper proposes an all-digital process variability monitor which utilizes a simple buffer ring with a pulse counter. The proposed circuit monitors the process variability according to a count number of a single pulse which propagates on the buffer ring and a fixed logic level after the pulse vanishes. The proposed circuit has been fabricated in 65nm CMOS process and the measurement results demonstrate that we can monitor the PMOS and NMOS variabilities independently using the proposed monitoring circuit. The proposed monitoring technique is suitable not only for the on-chip process variability monitoring but also for the in-field monitoring of aging effects such as negative/positive bias instability (NBTI/PBTI).
View full abstract
-
Yoji BANDO, Satoshi TAKAYA, Toru OHKAWA, Toshiharu TAKARAMOTO, Toshio ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
495-503
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
A continuous-time waveform monitoring technique for quality on-chip power noise measurements features matched probing performance among a variety of voltage domains of interest in a VLSI circuit, covering digital Vdd, analog Vdd, as well as at Vss, and multiple probing capability at various locations on power planes. A calibration flow eliminates the offset as well as gain errors among probing channels. The consistency of waveforms acquired by the proposed continuous-time monitoring and sampled-time precise digitization techniques is ensured. A 90-nm CMOS on-chip monitor prototype demonstrates dynamic power supply noise measurements with ±200mV at 2.5V, 1.0V, and 0.0V, respectively, with less than 4mV deviation among 240 probing channels.
View full abstract
-
Zhihua GUI, Fan YANG, Xuan ZENG
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
504-510
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
In this paper, a Stochastic Non-Homogeneous ARnoldi (SNHAR) method is proposed for the analysis of the on-chip power grid networks in the presence of process variations. In SNHAR method, the polynomial chaos based stochastic method is employed to handle the variations of power grids. Different from the existing StoEKS method which uses extended Krylov Subspace (EKS) method to compute the coefficients of the polynomial chaos, a computation-efficient and numerically stable Non-Homogeneous ARnoldi (NHAR) method is employed in SNHAR method to compute the coefficients of the polynomial chaos. Compared with EKS method, NHAR method has superior numerical stability and can achieve remarkably higher accuracy with even lower computational cost. As a result, SNHAR can capture the stochastic characteristics of the on-chip power grid networks with higher accuracy, but even lower computational cost than StoEKS.
View full abstract
-
Jinmyoung KIM, Toru NAKURA, Hidehiro TAKATA, Koichiro ISHIBASHI, Makot ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
511-519
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper presents an on-chip resonant supply noise canceller utilizing parasitic capacitance of sleep blocks. The test chip was fabricated in a 0.18µm CMOS process and measurement results show 43.3% and 12.5% supply noise reduction on the abrupt supply voltage switching and the abrupt wake-up of a sleep block, respectively. The proposed method requires 1.5% area overhead for four 100k-gate blocks, which is 7.1X noise reduction efficient comparing with the conventional decap for the same power supply noise, while achieves 47% improvement of settling time. These results make fast switching of power mode possible for dynamic voltage scaling and power gating.
View full abstract
-
Yuji KUNITAKE, Toshinori SATO, Hiroto YASUURA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
520-529
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.
View full abstract
-
Tadayoshi ENOMOTO, Nobuaki KOBAYASHI
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
530-538
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
We developed and applied a new circuit, called the “Self-controllable Voltage Level (SVL)” circuit, to achieve an expanded “read” and “write” margins and low leakage power in a 90-nm, 2-kbit, six-transistor CMOS SRAM. At the threshold voltage fluctuation of 6σ, the minimum supply voltage of the newly developed (dvlp.) SRAM for “write” operation was significantly reduced to 0.11V, less than half that of an equivalent conventional (conv.) SRAM. The standby leakage power of the dvlp. SRAM was only 1.17µW, which is 4.64% of that of the conv. SRAM at supply voltage of 1.0V. Moreover, the maximum operating clock frequency of the dvlp. SRAM was 138MHz, which is 15% higher than that (120MHz) of the conv. SRAM at
VMM of 0.4V. An area overhead was 0.81% that of the conv. SRAM.
View full abstract
-
Teruyoshi HATANAKA, Mitsue TAKAHASHI, Shigeki SAKAI, Ken TAKEUCHI
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
539-547
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper presents an improvement of the memory cell reliability by the memory cell
VTH optimization of the ferroelectric (Fe)-NAND flash memory. The effects of the memory cell
VTH on the reliability of the Fe-NAND flash memory are experimentally analyzed for the first time. The reliability is evaluated by the measured
VTH shift due to the read disturb, program disturb and data retention. Three types of Fe-NAND flash memory cells, a positive, zero and negative
VTH memory cell, are defined on the basis of the memory cell
VTH. The middle of
VTH of programmed and erased states is 1V, 0V and -0.3V in a positive, zero and negative
VTH memory cell, respectively. The
VTH shift of the positive, zero and negative
VTH memory cells show similar characteristics in the program/erase and the
VPASS and
VPGM disturbs because the external electric field is so high that the internal depolarization field does not affect the
VTH shift. On the other hand, in the data retention, the
VTH shift of the three types of
VTH memory cells show different characteristics. The reliability of the Fe-NAND flash memory is best optimized in the zero
VTH memory cell. In the proposed zero
VTH Fe-NAND flash memory cell scheme, the measured
VTH shift due to the read disturb, program disturb and data retention decreases by 32%, 24% and 10%, respectively, compared with conventional positive
VTH Fe-NAND flash memory cell scheme. Contrarily, in the negative
VTH memory cell, the
VTH shift during the data retention is 0.49V and unacceptably large because of the depolarization field. The conventional positive
VTH memory cell suffers from a sever read and program disturb. The measured results are drastically different from those of the conventional floating-gate NAND flash memory cell where the negative
VTH memory cell is most suitable in terms of the reliability.
View full abstract
-
Masahiro IIDA, Masahiro KOGA, Kazuki INOUE, Motoki AMAGASAKI, Yoshinob ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
548-556
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
An advantage of an RLD (reconfigurable logic device) such as an FPGA (field programmable gate array) is that it can be customized after being manufactured. Due to the aggressive technology scaling, device density is increasing, and it has become a serious problem in power consumption accordingly. In SoC of embedded systems, power gating is one of the major power reduction techniques. However, it is difficult to adopt SRAM-based RLDs because of the high overhead and SRAM being volatile. In this paper, we describe a TEG (test element group) chip of a reconfigurable logic based FeRAM (ferroelectric random access memory) technology. FeRAM brings reconfigurable logic devices the advantage of being a genuine power gater. The chip employs island-style routing architecture and uses a variable grain logic cell as a logic block. A NV-FF (non-volatile flip-flop), which contains FeRAM, a FF, and power-gating control circuits, is used as both configuration memories and FFs in a logic block. The NV-FF can transmit data between FeRAM and FF automatically when a power source is turned off/on. Thus chip-level power gating is possible. The hibernate/restore time is less than 1ms. The chip has 18 × 18 logic blocks and an area of 54.76mm
2.
View full abstract
-
Yoshimitsu TAKAMATSU, Ryuichi FUJIMOTO, Tsuyoshi SEKINE, Takaya YASUDA ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
557-566
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper presents a single-chip RF tuner/OFDM demodulator for a mobile digital TV application called “1-segment broadcasting.” To achieve required performances for the single-chip receiver, a tunable technique for a low-noise amplifier (LNA) and spurious suppression techniques are proposed in this paper. Firstly, to receive all channels from 470MHz to 770MHz and to relax distortion characteristics of following circuit blocks such as an RF variable-gain amplifier and a mixer, a tunable technique for the LNA is proposed. Then, to improve the sensitivity, spurious signal suppression techniques are also proposed. The single-chip receiver using the proposed techniques is fabricated in 90nm CMOS technology and total die size is 3.26mm × 3.26mm. Using the tunable LNA and suppressing undesired spurious signals, the sensitivities of less than -98.6dBm are achieved for all the channels.
View full abstract
-
Mohiuddin HAFIZ, Nobuo SASAKI, Takamaro KIKKAWA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
567-574
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
A differential input non-coherent BPSK receiver for the UWB-IR communication, based on threshold detection, has been presented in this paper. The chip can recover BPSK modulated Gaussian monocycle pulses (GMP), along with its first derivative, at a data rate of 500Mb/s. No clock reception is required, as the receiver recovers data based on the relative phase of the two simultaneously received inputs. While retrieving the data, it consumes a power of 63mW from a supply voltage of 1.8V. A shunt-peaked narrow band amplifier, matched to the input antenna, is used to amplify the received GMP. Wireless data have been successfully recovered using a pair of horn antennas at a distance of 6cm. The chip, developed in a 180nm CMOS technology, occupies a die area of 3.4mm
2. The receiver is suitable for the non-coherent (self-synchronized) UWB-IR communication.
View full abstract
-
Jiangtao SUN, Qing LIU, Yong-Ju SUH, Takayuki SHIBATA, Toshihiko YOSHI ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
575-581
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
A balanced push-push frequency doubler has been demonstrated in 0.25-µm SOI (Silicon on Insulator) SiGe BiCMOS technology operating from 22GHz to 29GHz with high fundamental frequency suppression and high conversion gain. A series LC resonator circuit is connected in parallel with the differential outputs of the doubler core circuit. The LC resonator is effective to improve the fundamental frequency suppression. In addition, the LC resonator works as a matching circuit between the output of the doubler core and the input of the output buffer amplifier, which increases the conversion gain of the whole circuit. A measured fundamental frequency suppression of greater than 46dBc is achieved at an input power of -10dBm in the output frequency band of 22-29GHz. Moreover, maximum fundamental frequency suppression of 66dBc is achieved at an input frequency of 13GHz and an input power of -10dBm. The frequency doubler works at a supply voltage of 3.3V.
View full abstract
-
Hiroaki KATSURAI, Hideki KAMITSUNA, Hiroshi KOIZUMI, Jun TERADA, Yusuk ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
582-588
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
As a future passive optical network (PON) system, the 10 Gigabit Ethernet PON (10G-EPON) has been standardized in IEEE 802.3av. As conventional Gigabit Ethernet PON (GE-PON) systems have already been widely deployed, 1G/10G co-existence technologies are strongly required for the next system. A gated voltage-controlled-oscillator (G-VCO)-based 10-Gb/s burst-mode clock and data recovery (CDR) circuit is presented for a 1G/10G co-existence PON system. It employs two new circuits to improve jitter transfer and provide tolerance to 1G/10G operation. An injection-controlled jitter-reduction circuit reduces output-clock jitter by 7dB from 200-MHz input data jitter while keeping a short lock time of 20ns. A frequency-variation compensation circuit reduces frequency mismatch among the three VCOs on the chip and offers large tolerance to consecutive identical digits. With the compensation, the proposed CDR circuit can employ multi VCOs, which provide tolerance to the 1G/10G co-existence situation. It achieves error-free (bit-error rate <10
-12) operation for 10-G bursts following bursts of other rates, obviously including 1G bursts. It also provides tolerance to a 256-bit sequence without a transition in the data, which is more than enough tolerance for 65-bit CIDs in the 64B/66B code of 10 Gigabit Ethernet.
View full abstract
-
Ryuichi FUJIMOTO, Kyoya TAKANO, Mizuki MOTOYOSHI, Uroschanit YODPRASIT ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
589-597
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Device modeling techniques for high-frequency circuits operating at over 100GHz are presented. We have proposed the bond-based design as an accurate high-frequency circuit design method. Because layout parasitic extractions (LPE) are not required in the bond-based design, it can be applied high-frequency circuit design at over 100GHz. However, customized device models are indispensable for the bond-based design. In this paper, device modeling techniques for high-frequency circuit design using the bond-based design are proposed. The customized device model for MOSFETs, transmission lines and pads are introduced. By using customized device models, the difference between the simulated and measured gains of an amplifier is improved to less than 0.6dB at 120GHz.
View full abstract
-
Po-Hung CHEN, Koichi ISHIDA, Xin ZHANG, Yasuyuki OKUMA, Yoshikatsu RYU ...
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
598-604
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
In this paper, a 0.18-V input three-stage charge pump circuit applying forward body bias is proposed for energy harvesting applications. In the developed charge pump, all the MOSFETs are forward body biased by using the inter-stage/output voltages. By applying the proposed charge pump as the startup in the boost converter, the kick-up input voltage of the boost converter is reduced to 0.18V. To verify the circuit characteristics, the conventional zero body bias charge pump and the proposed forward body bias charge pump were fabricated with 65nm CMOS process. The measured output current of the proposed charge pump under 0.18-V input voltage is increased by 170% comparing to the conventional one at the output voltage of 0.5V. In addition, the boost converter successfully boosts the 0.18-V input to higher than 0.65-V output.
View full abstract
-
Yimeng ZHANG, Leona OKAMURA, Tsutomu YOSHIHARA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
605-612
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
A novel charge-recovery logic structure called Pulse Boost Logic (PBL) is proposed in this paper. PBL is a high-speed low-energy-dissipation charge-recovery logic with dual-rail evaluation tree structure. It is driven by 2-phase non-overlap clock, and requires no DC power supply. PBL belongs to boost logic family, which includes boost logic, enhanced boost logic and subthreshold boost logic. In this paper, PBL has been compared with other charge-recovery logic technologies. To demonstrate the performance of PBL structure, a 4-bit pipeline multiplier is designed and fabricated with 0.18µm CMOS process technology. The simulation results indicate that the 4-bit multiplier can work at a frequency of 1.8GHz, while the measurement of test chip is at operation frequency of 161MHz, and the power dissipation at 161MHz is 772µW.
View full abstract
-
Koichi YAMAGUCHI, Masayuki MIZUNO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
613-618
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Dicode partial response signaling system over inductively-coupled channel has been developed to achieve higher data rate than self-resonant frequencies of inductors. The developed system operates at five times higher data rates than conventional systems with the same inductor. A current-mode equalization in the transmitter designed in a 90-nm CMOS successfully reshapes waveforms to obtain dicode signals at the receiver. For a 5-Gb/s signaling through the coupled inductors with a 120-µm diameter and a 120-µm distance, 20-mV eye opening was observed. The power consumption value of the transmitter was 58mW at the 5-Gb/s operation.
View full abstract
-
Koichi YAMAGUCHI, Masayuki MIZUNO
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
619-626
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
Duobinary signaling has been introduced into asymmetric multi-chip communications such as DRAM or display interfaces, which allows a controlled amount of ISI to reduce signaling bandwidth by 2/3. A x2 oversampled equalization has been developed to realize Duobinary signaling. Symbol-rate clock recovery form Duobinary signal has been developed to reduce power consumption for receivers. A Duobinary transmitter test chip was fabricated with 90-nm CMOS process. A 3.5dB increase in eye height and a 1.5 times increase in eye width was observed.
View full abstract
-
Nguyen Ngoc MAI KHANH, Masahiro SASAKI, Kunihiro ASADA
Article type: PAPER
2011 Volume E94.C Issue 4 Pages
627-634
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
In this paper, we present a 0.18-µm CMOS fully integrated X-band shock wave generator (SWG) with an on-chip dipole antenna and a digitally programmable delay circuit (DPDC) for pulse beam-formability in short-range and hand-held microwave active imaging applications. This chip includes a SWG, a 5-bit DPDC and an on-chip wide-band meandering dipole antenna. By using an integrated transformer, output pulse of the SWG is sent to the on-chip meandering dipole antenna. The SWG operates based on damping conditions to produce a 0.4-V peak-to-peak (p-p) pulse amplitude at the antenna input terminals in HSPICE simulation. The DPDC is designed to adjust delays of shock-wave outputs for the purpose of steering beams in antenna array systems. The wide-band dipole antenna element designed in the meandering shape is located in the top metal of a 5-metal-layer 0.18-µm CMOS chip. By simulating in Momentum of ADS 2009, the minimum value of antenna's return loss,
S11, and antenna's bandwidth (BW) are -19.37dB and 25.3GHz, respectively. The measured return loss of a stand-alone integrated meandering dipole is from -26dB to -10dB with frequency range of 7.5-12GHz. In measurements of the SWG with the integrated antenna, by using a 20-dB standard gain horn antenna placed at a 38-mm distance from the chip's surface, a 1.1-mVp-p shock wave with a 9-11-GHz frequency response is received. A measured 3-ps pulse delay resolution is also obtained. These results prove that our proposed circuit is suitable for the purpose of fully integrated pulse beam-forming system.
View full abstract
-
Sarang KAZEMINIA, Morteza MOUSAZADEH, Kayrollah HADIDI, Abdollah KHOEI
Article type: BRIEF PAPER
2011 Volume E94.C Issue 4 Pages
635-640
Published: April 01, 2011
Released on J-STAGE: April 01, 2011
JOURNAL
RESTRICTED ACCESS
This paper presents a high speed single-stage latched comparator which is scheduled in time for both amplification and latch operations. Small active area and simple switching strategy besides desired power consumption at high comparison rates qualifies the proposed comparator to be repeatedly employed in high speed flash A/D converters. A strategy of kickback noise elimination besides gain enhancement is also introduced. A low power holding read-out circuit is presented. Post-Layout simulation results confirm 500MS/s comparison rate with 5mv resolution for a 1.6v peak-to-peak input signal range and 600µw power consumption from a 3.3v power supply by using TSMC model of 0.35µm CMOS technology. Total active area of proposed comparator and read-out circuit is about 300µm
2.
View full abstract