Special Section on Solid-State Circuit Design - Architecture, Circuit, Device and Design Methodology
-
Masahiko YOSHIMOTO
2012 Volume E95.C Issue 4 Pages
413
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
-
Kiyoshi TAKEUCHI
Article type: INVITED PAPER
2012 Volume E95.C Issue 4 Pages
414-420
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
As MOS transistors are scaled down, the impact of randomly placed discrete charge (impurity atoms, traps and surface states) on device characteristics rapidly increases. Significant variability caused by random dopant fluctuation (RDF) is a direct result of this, which urges the adoption of new device architectures (ultra-thin body SOI FETs and FinFETs) which do not use impurity for body doping. Variability caused by traps and surface states, such as random telegraph noise (RTN), though less significant than RDF today, will soon be a major problem. The increased complexity of such residual-charge-induced variability due to non-Gaussian and time-dependent behavior will necessitate new approaches for variation-aware design.
View full abstract
-
Shiro DOSHO
Article type: INVITED PAPER
2012 Volume E95.C Issue 4 Pages
421-431
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Along with the miniaturization of CMOS-LSIs, control methods for LSIs have been extensively developed. The most predominant method is to digitize observed values as early as possible and to use digital control. Thus, many types of analog-to-digital converters (ADCs) have been developed such as temperature, time, delay, and frequency converters. ADCs are the easiest circuits into which digital correction methods can be introduced because their outputs are digital. Various types of calibration method have been developed, which has markedly improved the figure of merits by alleviating margins for device variations. The above calibration and correction methods not only overcome a circuit's weak points but also give us the chance to develop quite new circuit topologies and systems. In this paper, several digital calibration and correction methods for major analog-to-digital converters are described, such as pipelined ADCs, delta-sigma ADCs, and successive approximation ADCs.
View full abstract
-
Koyo NITTA, Hiroe IWASAKI, Takayuki ONISHI, Takashi SANO, Atsushi SAGA ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
432-440
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
An H.264/AVC encoder LSI (named “SARA”) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains three motion estimation and compensation (ME/MC) engines with wide search ranges of -217.75 to +199.75 horizontally, -109.75 to +145.75 vertically, which can utilize almost all H.264/AVC ME/MC coding tools, such as multiple reference frame, variable block size, quarter-pel prediction, macroblock adaptive field/frame prediction (MBAFF), spatial/temporal direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2dB to 1.7dB higher than the JM. It was successfully fabricated in a 90-nm technology, and integrates 140 million transistors.
View full abstract
-
Weiwei SHEN, Yibo FAN, Xiaoyang ZENG
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
441-446
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
In this paper, a high-throughput debloking filter is presented for H.264/AVC standard, catering video applications with 4K × 2K (4096 × 2304) ultra-definition resolution. In order to strengthen the parallelism without simply increasing the area, we propose a luma-chroma parallel method. Meanwhile, this work reduces the number of processing cycles, the amount of external memory traffic and the working frequency, by using triple four-stage pipeline filters and a luma-chroma interlaced sequence. Furthermore, it eliminates most unnecessary off-chip memory bandwidth with a highly reusable memory scheme, and adopts a “slide window” buffer scheme. As a result, our design can support 4K × 2K at 30fps applications at the working frequency of only 70.8MHz.
View full abstract
-
Yibo FAN, Jialiang LIU, Dexue ZHANG, Xiaoyang ZENG, Xinhua CHEN
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
447-455
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 8×8/4×4 adaptive Hadamard Transform (8×8/4×4 AHT). The 8×8/4×4 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4K×2K@24fps or six-mode 4K×2K@30fps video sequences.
View full abstract
-
Kazuhiro NAKAMURA, Ryo SHIMAZAKI, Masatoshi YAMAMOTO, Kazuyoshi TAKAGI ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
456-467
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
View full abstract
-
Mitsuru SHIOZAKI, Kota FURUHASHI, Takahiko MURAYAMA, Akitaka FUKUSHIMA ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
468-477
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Silicon Physical Unclonable Functions (PUFs) have been proposed to exploit inherent characteristics caused by process variations, such as transistor size, threshold voltage and so on, and to produce an inexpensive and tamper-resistant device such as IC identification, authentication and key generation. We have focused on the arbiter-PUF utilizing the relative delay-time difference between the equivalent paths. The conventional arbiter-PUF has a technical issue, which is low uniqueness caused by the ununiformity on response-generation. To enhance the uniqueness, a novel arbiter-based PUF utilizing the Response Generation according to the Delay Time Measurement (RG-DTM) scheme, has been proposed. In the conventional arbiter-PUF, the response 0 or 1 is assigned according to the single threshold of relative delay-time difference. On the contrary, the response 0 or 1 is assigned according to the multiple threshold of relative delay-time difference in the RG-DTM PUF. The conventional and RG-DTM PUF were designed and fabricated with 0.18µm CMOS technology. The Hamming distances (HDs) between different chips, which indicate the uniqueness, were calculated by 256-bit responses from the identical challenges on each chip. The ideal distribution of HDs, which indicates high uniqueness, is achieved in the RG-DTM PUF using 16 thresholds of relative delay-time differences. The generative stability, which is the fluctuation of responses in the same environment, and the environmental stability, which is the changes of responses in the different environment were also evaluated. There is a trade-off between high uniqueness and high stability, however, the experimental data shows that the RG-DTM PUF has extremely smaller false matching probability in the identification compared to the conventional PUF.
View full abstract
-
Changsheng ZHOU, Yuebin HUANG, Shuangqu HUANG, Yun CHEN, Xiaoyang ZENG
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
478-486
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Based on Turbo-Decoding Message-Passing (TDMP) and Normalized Min-Sum (NMS) algorithm, an area efficient LDPC decoder that supports both structured and unstructured LDPC codes is proposed in this paper. We introduce a solution to solve the memory access conflict problem caused by TDMP algorithm. We also arrange the main timing schedule carefully to handle the operations of our solution while avoiding much additional hardware consumption. To reduce the memory bits needed, the extrinsic message storing strategy is also optimized. Besides the extrinsic message recover and the accumulate operation are merged together. To verify our architecture, a LDPC decoder that supports both China Multimedia Mobile Broadcasting (CMMB) and Digital Terrestrial/ Television Multimedia Broadcasting (DTMB) standards is developed using SMIC 0.13µm standard CMOS process. The core area is 4.75mm
2 and the maximum operating clock frequency is 200MHz. The estimated power consumption is 48.4mW at 25MHz for CMMB and 130.9mW at 50MHz for DTMB with 5 iterations and 1.2V supply.
View full abstract
-
Hirofumi IWATO, Keishi SAKANUSHI, Yoshinori TAKEUCHI, Masaharu IMAI
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
487-494
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
To measure the detrusor pressure for diagnosing lower urinary tract symptoms, we designed a small-area and low-power System on a Chip (SoC). The SoC should be small and low power because it is encapsulated in tiny air-tight capsules which are simultaneously inserted in the urinary bladder and rectum for several days. Since the SoC is also required to be programmable, we designed an Application Specific Instruction set Processor (ASIP) for pressure measurement and wireless communication, and implemented almost required functions on the ASIP. The SoC was fabricated using a 0.18µm CMOS mixed-signal process and the chip size is 2.5×2.5mm
2. Evaluation results show that the power consumption of the SoC is 93.5µW, and that it can operate the capsule for seven days with a tiny battery.
View full abstract
-
Shouyi YIN, Yang HU, Zhen ZHANG, Leibo LIU, Shaojun WEI
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
495-505
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Hybrid wired/wireless on-chip network is a promising communication architecture for multi-/many-core SoC. For application-specific SoC design, it is important to design a dedicated on-chip network architecture according to the application-specific nature. In this paper, we propose a heuristic wireless link allocation algorithm for creating hybrid on-chip network architecture. The algorithm can eliminate the performance bottleneck by replacing multi-hop wired paths by high-bandwidth single-hop long-range wireless links. The simulation results show that the hybrid on-chip network designed by our algorithm improves the performance in terms of both communication delay and energy consumption significantly.
View full abstract
-
Naohiro HAMADA, Hiroshi SAITO
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
506-515
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
In this paper, we propose a synthesis method for asynchronous circuits with bundled-data implementation. The proposed method iteratively applies behavioral synthesis and floorplanning to obtain a near optimum circuit in the term of latency under given design constraints. To improve latency, behavioral synthesis and floorplanning are carried out so that the delay of the control circuit is minimized and the addition of delay elements to satisfy timing constraints is minimized. We evaluate the effectiveness of the proposed method in terms of latency, area, and the number of timing violations while synthesizing several benchmarks. Experimental results show that the proposed method synthesizes faster circuits compared to the circuit synthesized without the proposed method. Also, the proposed method is effective to reduce the number of timing violations.
View full abstract
-
Jung-Lin YANG, Shin-Nung LU, Pei-Hsuan YU
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
516-522
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Developing a rapid prototyping environment utilizing hardware description languages (HDLs) and conventional FPGAs can help ease and conquer the difficulties caused by the complexity of asynchronous digital systems and the advance of VLSI technology recently. We proposed a design flow and a FPGA template for implementing generalized C-element (gC) style asynchronous controllers. Utilizing conventional FPGA synthesis tools, self-timed bundled-data function modules can be realized with some effort on timing validation. The proposed design flow with FPGA-based realization approach is a very effective design methodology for rapid prototyping and functionality validation. This work could be useful for the early stage of performance estimation, power reduction exploration, circuits design training, and many other applications regarded asynchronous circuits. In this paper, the proposed FPGA-based asynchronous circuit design flow, a hands-on design tutorial, a generalized C-element template, and a list of synthesized benchmark circuits are documented and discussed in detail.
View full abstract
-
Yohei NAKATA, Hiroshi KAWAGUCHI, Masahiko YOSHIMOTO
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
523-533
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
As process technology is scaled down, a typical system on a chip (SoC) becomes denser. In scaled process technology, process variation becomes greater and increasingly affects the SoC circuits. Moreover, the process variation strongly affects network-on-chips (NoCs) that have a synchronous network across the chip. Therefore, its network frequency is degraded. We propose a process-variation-adaptive NoC with a variation-adaptive variable-cycle router (VAVCR). The proposed VAVCR can configure its cycle latency adaptively on a processor core basis, corresponding to the process variation. It can increase the network frequency, which is limited by the process variation in a conventional router. Furthermore, we propose a variable-cycle pipeline adaptive routing (VCPAR) method with VAVCR; the proposed VCPAR can reduce packet latency and has tolerance to network congestion. The total execution time reduction of the proposed VAVCR with VCPAR is 15.7%, on average, for five task graphs.
View full abstract
-
Wei ZHONG, Takeshi YOSHIMURA, Bei YU, Song CHEN, Sheqin DONG, Satoshi ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
534-545
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Network-on-Chips (NoCs) have been proposed as a solution for addressing the global communication challenges in System-on-Chip (SoC) architectures that are implemented in nanoscale technologies. For the use of NoCs to be feasible in today's industrial designs, a custom-tailored, power- efficient NoC topology that satisfies the application characteristics is required. In this work, we present a design methodology that automates the synthesis of such application-specific NoC topologies. We present a method which integrates partitioning into floorplanning phase to explore optimal clustering of cores during floorplanning with minimized link and switch power consumption. Based on the size of applications, we also present an Integer Linear Programming and a heuristic method to place switches and network interfaces on the floorplan. Then, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. We perform experiments on several SoC benchmarks and present a comparison with the latest work. For small applications, the NoC topologies synthesized by our method show large improvements in power consumption (27.54%), hop-count (4%) and running time (66%) on average. And for large applications, the synthesized topologies result in large power (31.77%), hop-count (29%) and running time (94.18%) on average.
View full abstract
-
Benjamin DEVLIN, Makoto IKEDA, Kunihiro ASADA
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
546-554
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of self synchronous signaling allows the FPGA to operate at voltages down to 370mV without any parameter tuning. We show both 2.6x total energy reduction and 6.4x performance improvement at the same time for energy minimum operation compared to the non-power gated SSFPGA, and compared to the latest research 1.8x improvement in power-delay product (PDP) and 2x performance improvement. When compared to a synchronous FPGA in a similar process we are able to show up to 84.6x PDP improvement. We also show energy minimum operation for maximum throughput on the power gated SSFPGA is achieved at 0.6V, 27fJ/operation at 264MHz.
View full abstract
-
Akira KOTABE, Kiyoo ITOH, Riichiro TAKEMURA
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
555-563
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
It is shown that it is feasible to apply 0.5-V 6-T SRAM cells in a 25-nm high-speed 1-Gb e-SRAM. In particular, for coping with rapidly reduced voltage margin as
VDD is reduced, a boosted word-voltage scheme is first proposed. Second,
Vt variations are reduced with repair techniques and nanoscale FD-MOSFETs to further widen the voltage margin. Third, a worst case design is developed, for the first time, to evaluate the cell. This design features a dynamic margin analysis and takes subthreshold current, temperature, and
Vt variations and their combination in the cell into account. Fourth, the proposed scheme is evaluated by applying the worst-case design and a 25-nm planar FD-SOI MOSFET. It is consequently found that the scheme provides a wide margin and high speed even at 0.5V. A 0.5-V high-speed 25-nm 1-Gb SRAM is thus feasible. Finally, to further improve the scheme, it is shown that it is necessary to use FinFETs and suppress and compensate process, voltage, and temperature variations in a chip and wafer.
View full abstract
-
Kousuke MIYAJI, Kentaro HONDA, Shuhei TANAKAMARU, Shinji MIYANO, Ken T ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
564-571
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Three types of electron injection scheme: both side injection scheme and self-repair one side injection scheme Type A (injection for once) and Type B (injection for twice) are proposed and analyzed comprehensively for 65nm technology node 6T- and 8T-SRAM cells to find the optimum injection scheme and cell architecture. It is found that the read speed degrades by as much as 6.3 times in the 6T-SRAM with the local injected electrons. However, the read speed of the 8T-SRAM cell does not degrade because the read port is separated from the write pass gate transistors. Furthermore, the self-repair one side injection scheme is most suitable to solve the conflict of the half select disturb and write characteristics. The worst cell characteristics of Type A and Type B self-repair one side injection schemes were found to be the same. In the self-repair one side injection 8T-SRAM, the disturb margin increases by 141% without write margin or read speed degradation. The proposed schemes have no process or area penalty compared with the standard CMOS-process.
View full abstract
-
Shusuke YOSHIMOTO, Masaharu TERADA, Shunsuke OKUMURA, Toshikazu SUZUKI ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
572-578
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper presents a novel disturb mitigation scheme which achieves low-energy operation for a deep sub-micron 8T SRAM macro. The classic write-back scheme with a dedicated read port overcame both half-select and read-disturb problems. Moreover, it improved the yield, particularly in the low-voltage range. The conventional scheme, however, consumed more power because of charging and discharging all write bitlines in a sub-block. Our proposed scheme reduces the power overhead of the write-back scheme using a floating write bitline technique and a low-swing bitline driver (LSBD). The floating bitline and the LSBD respectively consist of a precharge-less CMOS equalizer (transmission gate) and an nMOS write-back driver. The voltage on the floating write bitline is at an intermediate voltage between the ground and the supply voltage before a write cycle. The write target cells are written by normal CMOS drivers, whereas the write bitlines in half-selected columns are driven by the LSBDs in the write cycle, which suppresses the write bitline voltage to VDD -
Vtn and therefore saves the active power in the half-selected columns (where
Vtn is a threshold voltage of an nMOS). In addition, the proposed scheme reduces a leakage current from the write bitline because of the floating write bitline. The active leakage is reduced by 33% at the FF corner, 125°C. The active energy in the write operation is reduced by 37% at the FF corner. In other process corners, more writing power reduction can be expected because it depends on the
Vtn in the LSBD. We fabricated a 512-Kb 8T SRAM test chip that operates at a single 0.5-V supply voltage. The test chip with the proposed scheme respectively achieves 1.52-µW/MHz writing energy and 72.8-µW leakage power, which are 59.4% and 26.0% better than those of the conventional write-back scheme. The total energy is 12.9 µW/MHz (12.9 pJ/access) at a supply voltage of 0.5V and operating frequency of 6.25MHz in a 50%-read/50%-write operation.
View full abstract
-
Shunsuke OKUMURA, Hidehiro FUJIWARA, Kosuke YAMAGUCHI, Shusuke YOSHIMO ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
579-585
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
We propose a novel substrate-bias control scheme for an FD-SOI SRAM that suppresses inter-die variability. The proposed circuits detect inter-die threshold-voltage variation automatically, and then maximize read/write margins of memory cells to supply the substrate bias. We confirmed that a 486-kb 6T SRAM operates at 0.42V, in which an FS corner can be compared as much as 0.14V or more.
View full abstract
-
Takuya SAWADA, Taku TOSHIKAWA, Kumpei YOSHIKAWA, Hidehiro TAKATA, Koji ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
586-593
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
The susceptibility of a static random access memory (SRAM) core against static and dynamic variation of power supply voltage is evaluated, by using on-chip diagnosis structures of memory built-in self testing (MBIST) and on-chip voltage waveform monitoring (OCM). The SRAM core of interest in this paper is a synthesizable version applicable to general systems-on-a-chip (SoC) design, and fabricated in a 90nm CMOS technology. RF power injection to power supply networks is quantified by OCM. The number of resultant erroneous bits as well as their distribution in the cell array is given by MBIST. The frequency-dependent sensitivity reflects the highly capacitive nature of densely integrated SRAM cells.
View full abstract
-
Akira KOTABE, Riichiro TAKEMURA, Yoshimitsu YANAGAWA, Tomonori SEKIGUC ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
594-599
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A small-sized leakage-controlled gated sense amplifier (SA) and relevant circuits are proposed for 0.5-V multi-gigabit DRAM arrays. The proposed SA consists of a high-
VT PMOS amplifier and a low-
VT NMOS amplifier which is composed of high-
VT NMOSs and a low-
VT cross-coupled NMOS, and achieves 46% area reduction compared to a conventional SA with a low-
VT CMOS preamplifier. Separation of the proposed SA and a data-line pair achieves a sensing time of 6ns and a writing time of 0.6ns. Momentarily overdriving the PMOS amplifier achieves a restoring time of 13ns. The gate level control of the high-
VT NMOSs and the gate level compensation circuit for PVT variations reduce the leakage current of the proposed SA to 2% of that without the control, and its effectiveness was confirmed using a 50-nm test chip.
View full abstract
-
Satoru AKIYAMA, Riichiro TAKEMURA, Tomonori SEKIGUCHI, Akira KOTABE, K ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
600-608
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A gated sense amplifier (GSA) consisting of a low-
Vt gated preamplifier (LGA) and a high-
Vt sense amplifier (SA) is proposed. The gating scheme of the LGA enables quick amplification of an initial cell signal voltage (
vS0) because of its low
Vt and prevents the cell signal from degrading due to interference noise between data lines. As for a conventional sense amplifier (CSA), this new type of noise causes sensing error, and the noise-generation mechanism was clarified for the first time by analysis of
vS0. The high-
Vt SA holds the amplified signal and keeps subthreshold current low. Moreover, the gating scheme of the low-
Vt MOSFETs in the LGA drives the I/O line quickly. The GSA thus simultaneously achieves fast sensing, low-leakage data holding, and fast I/O driving, even for sub-1-V mid-point sensing. The GSA is promising for future sub-1-V gigabit dynamic random-access memory (DRAM) because of reduced variations in the threshold voltage of MOSFETs; thus, the offset voltage of the LGA is reduced. The effectiveness of the GSA was verified with a 70-nm 512-Mbit DRAM chip. It demonstrated row access time (t
RCD) of 16.4ns and read access (t
AA) of 14.3ns at array voltage of 0.9V.
View full abstract
-
Kousuke MIYAJI, Ryoji YAJIMA, Teruyoshi HATANAKA, Mitsue TAKAHASHI, Sh ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
609-616
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Initialize and weak-program erasing scheme is proposed to achieve high-performance and high-reliability Ferroelectric (Fe-) NAND flash solid-state drive (SSD). Bit-by-bit erase
VTH control is achieved by the proposed erasing scheme and history effects in Fe-NAND is also suppressed. History effects change the future erase
VTH shift characteristics by the past program voltage. The proposed erasing scheme decreases
VTH shift variation due to history effects from ±40% to ±2% and the erase
VTH distribution width is reduced from over 0.4V to 0.045V. As a result, the read and
VPASS disturbance decrease by 42% and 37%, respectively. The proposed erasing scheme is immune to
VTH variations and voltage stress. The proposed erasing scheme also suppresses the power and bandwidth degradation of SSD.
View full abstract
-
Hyoungjun NA, Tetsuo ENDOH
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
617-626
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
In this paper, a theoretical analysis of current-controlled (CC-) MOS current mode logic (MCML) is reported. Furthermore, the circuit performance of the CC-MCML with the auto-detection of threshold voltage (
Vth) fluctuation is evaluated. The proposed CC-MCML with the auto-detection of
Vth fluctuation automatically suppresses the degradation of circuit performance induced by the
Vth fluctuations of the transistors automatically, by detecting these fluctuations. When a
Vth fluctuation of ±0.1V occurs on the circuit, the cutoff frequency of the circuit is increased from 0Hz to 3.5GHz by using the proposed CC-MCML with the auto-detection of
Vth fluctuation.
View full abstract
-
Tetsuya IIZUKA, Kunihiro ASADA
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
627-634
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper proposes an all-digital process variability monitor based on a shared structure of a buffer ring and a ring oscillator. The proposed circuit monitors the PMOS and NMOS process variabilities independently according to a count number of a single pulse which propagates on the ring during the buffer ring mode, and an oscillation period during the ring oscillator mode. Using this shared-ring structure, we reduce the occupation area about 40% without loss of process variability monitoring properties compared with the conventional circuit. The proposed shared-ring circuit has been fabricated in 65nm CMOS process and the measurement results with two different wafer lots show the feasibility of the proposed process variability monitoring scheme.
View full abstract
-
Hiroki YABE, Makoto IKEDA
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
635-642
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
We present a 3-D range map acquisition system using a gray-encoded time-multiplexing structured pattern. In this method the only information needed to reconstruct 3-D range map is whether the pixel is bright or not for the exposed structured patterns. A dedicated image sensor to capture the pattern consists of pixel parallel 1-bit A/D converter, in-pixel pattern address memory and column parallel digital pattern address readout circuit. This in-pixel memory and digital bit-parallel pattern address readout eliminate unnecessary readout of pattern data to enhance 3-D acquisition speed. We fabricated the image sensor in 0.18µm CMOS and demonstrated up to 122 range map per second 3-D range map acquisition performance for 7 patterns with the average error of 3.2mm under the condition of 10% pattern recognition error.
View full abstract
-
Jinmyoung KIM, Toru NAKURA, Hidehiro TAKATA, Koichiro ISHIBASHI, Makot ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
643-650
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Switched parasitic capacitors of sleep blocks with a tri-mode power gating structure are implemented to reduce on-chip resonant supply noise in 1.2V, 65nm standard CMOS process. The tri-mode power gating structure makes it possible to store charge into the parasitic capacitance of the power gated blocks. The proposed method achieves 53.1% and 57.9% noise reduction for wake-up noise and 130MHz periodic supply noise, respectively. It also realizes noise cancelling without discharging time before using parasitic capacitors of sleep blocks, and shows 8.4x boost of the effective capacitance value with 2.1% chip area overhead. The proposed method can save the chip area for reducing resonant supply noise more effectively.
View full abstract
-
Kazuo ONO, Yoshimitsu YANAGAWA, Akira KOTABE, Riichiro TAKEMURA, Tatsu ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
651-660
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A charge-integration read scheme has been developed for a solid-nanopore DNA-sequencer that determines a genome by direct and electrical measurements of transverse tunneling current in single-stranded DNA. The magnitude of the current was simulated with a first-principles molecular dynamics method. It was found that the magnitude is as small as in the sub-pico ampere range, and signals from four bases represent wide distributions with overlaps between each base. The distribution is believed to originate with translational and rotational motion of DNA in a nanopore with a frequency of over 10
5Hz. A sequence scheme is presented to distinguish the distributed signals. The scheme makes widely distributed signals time-integrated convergent by cumulating charge at the capacitance of a nanopore device and read circuits. We estimated that an integration time of 1.4ms is sufficient to obtain a signal difference of over 10mV for distinguishing between each DNA base. Moreover, the time is shortened if paired bases, such as A-T and C-G in double-stranded DNA, can be measured simultaneously with two nanopores. Circuit simulations, which included the capacitance of a nanopore calculated with a device simulator, successfully distinguished between DNA bases in less than 2.0ms. The speed is roughly six orders faster than that of a conventional DNA sequencer. It is possible to determine the human genome in one day if 100-nanopores are operated in parallel.
View full abstract
-
Tetsuya IIZUKA, Satoshi MIURA, Ryota YAMAMOTO, Yutaka CHIBA, Shunichi ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
661-667
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper proposes a sub-ps resolution TDC utilizing a differential pulse-shrinking buffer ring. This scheme uses two differentially-operated pulse-shrinking inverters and the TDC resolution is finely controlled by the transistor size ratio between them. The proposed TDC realizes 9bit, 580fs resolution in a 0.18µm CMOS technology with 0.04mm
2 area, and achieves DNL and INL of +0.8/-0.8LSB and +4.3/-4.0LSB, respectively, without linearity calibration. A power dissipation at 1.5MS/s ranges from 10.8 to 12.6mW depending on the input time intervals.
View full abstract
-
Andrzej RADECKI, Hayun CHUNG, Yoichi YOSHIDA, Noriyuki MIURA, Tsunaaki ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
668-676
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Wafer-level testing is a well established solution for detecting manufacturing errors and removing non-functional devices early in the fabrication process. Recently this technique has been facing a number of challenges, resulting from increased complexity of devices under test, larger number and higher density of pads or bumps, application of mechanically fragile materials, such as low-k dielectrics, and ever developing packaging technologies. Most of these difficulties originate from the use of mechanical probes, as they limit testing speed, impose performance limitations and add reliability issues. Earlier work focused on relaxing these constraints by removing mechanical probes for data transmission and DC signal measurement and replacing them with non-contact interfaces. In this paper we extend this concept by adding a capability of transferring power wirelessly, enabling non-contact wafer-level testing. In addition to further improvements in the performance and reliability, this solution enables new testing scenarios such as probing wafers from their backside. The proposed system achieves 6W/25mm
2 power transfer density over a distance of up to 0.32mm, making it suitable for non-contact wafer-level testing of medium performance CMOS integrated circuits.
View full abstract
-
Toru SAI, Yasuhiro SUGIMOTO
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
677-685
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
By using a quadratic compensation slope, a CMOS current-mode buck DC-DC converter with constant frequency characteristics over wide input and output voltage ranges has been developed. The use of a quadratic slope instead of a conventional linear slope makes both the damping factor in the transfer function and the frequency bandwidth of the current feedback loop independent of the converter's output voltage settings. When the coefficient of the quadratic slope is chosen to be dependent on the input voltage settings, the damping factor in the transfer function and the frequency bandwidth of the current feedback loop both become independent of the input voltage settings. Thus, both the input and output voltage dependences in the current feedback loop are eliminated, the frequency characteristics become constant, and the frequency bandwidth is maximized. To verify the effectiveness of a quadratic compensation slope with a coefficient that is dependent on the input voltage in a buck DC-DC converter, we fabricated a test chip using a 0.18µm high-voltage CMOS process. The evaluation results show that the frequency characteristics of both the total feedback loop and the current feedback loop are constant even when the input and output voltages are changed from 2.5V to 7V and from 0.5V to 5.6V, respectively, using a 3MHz clock.
View full abstract
-
Shin-ichi O'UCHI, Kazuhiko ENDO, Takashi MATSUKAWA, Yongxun LIU, Tadas ...
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
686-695
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper demonstrates a FinFET operational amplifier (opamp), which is suitable to be integrated with digital circuits in a scaled low-standby-power (LSTP) technology and operates at extremely low voltage. The opamp is consisting of an adaptive threshold-voltage (
Vt) differential pair and a low-voltage source follower using independent-double-gate- (IDG-) FinFETs. These two components enable the opamp to extend the common-mode voltage range (CMR) below the nominal
Vt even if the supply voltage is less than 1.0V. The opamp was implemented by our FinFET technology co-integrating common-DG- (CDG-) and IDG-FinFETs. More than 40-dB DC gain and 1-MHz gain-bandwidth product in the 500-mV-wide input CMR at the supply voltage of 0.7V was estimated with SPICE simulation. The fabricated chip successfully demonstrated the 0.7-V operation with the 480-mV-wide CMR, even though the nominal
Vt was 400mV.
View full abstract
-
Bo LIU, Bo YANG, Shigetoshi NAKATAKE
Article type: PAPER
2012 Volume E95.C Issue 4 Pages
696-705
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
Current sources are essential components for analog circuit designs, the mismatch of which causes the significant degradation of the circuit performance. This paper addresses the mismatch model of CMOS current sources, unlike the conventional modeling, focusing on the layout- and
λ-dependency of the process variation, where
λ is the output conductance parameter. To make it clear what variation parameter influences the mismatch, we implemented a test chip on 90nm process technology, where we can collect the characteristics variation data for MOSFETs of various layouts. The test chip also includes D/A converters to check the differential non-linearity (DNL) caused by the mismatch of current sources when behaving as a DAC. Identifying the variation and the circuit-level errors in the measured DNLs, we reveal that our model can more accurately account for the current variation compared to the conventional mismatch model.
View full abstract
-
Amir FATHI, Sarkis AZIZIAN, Khayrollah HADIDI, Abdollah KHOEI
Article type: BRIEF PAPER
2012 Volume E95.C Issue 4 Pages
706-709
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
This paper presents design of a novel high speed booth encoder-decoder in a 0.35µm CMOS technology. Focusing on transistor level implementation of the new architecture and employing newly designed truth table, the gate level delay of the whole system is reduced to one logic gate plus one transistor delay which is the main advantage of the proposed circuit. Simulation results indicate high speed performance of the designed circuit and depict low power dissipation feature of implemented architecture which makes this work suitable for extensive use in high speed arithmetic blocks.
View full abstract
-
Amir FATHI, Sarkis AZIZIAN, Khayrollah HADIDI, Abdollah KHOEI
Article type: BRIEF PAPER
2012 Volume E95.C Issue 4 Pages
710-712
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A novel high speed 4-2 compressor using static and pass-transistor logic, has been designed in a 0.35µm CMOS technology. In order to reduce gate level delay and increase the speed, some changes are performed in truth table of conventional 4-2 compressor which leaded to the simplification of logic function for all parameters. Therefore, power dissipation is decreased. In addition, because of similar paths from all inputs to the outputs, the delays are the same. So there will be no need for extra buffers in low latency paths to equalize the delays.
View full abstract
-
Shoichi OSHIMA, Mamoru UGAJIN, Mitsuru HARADA
Article type: BRIEF PAPER
2012 Volume E95.C Issue 4 Pages
713-716
Published: April 01, 2012
Released on J-STAGE: April 01, 2012
JOURNAL
RESTRICTED ACCESS
A new low-power feedback structure for a power amplifier (PA) reduces signal distortion while keeping the power efficiency of the PA high. The feedback structure injects the envelope of the third-order harmonics into the input signal. In adopting this method for a class-A amplifier, we obtain over 10% higher efficiency while maintaining the same adjacent channel power ratio (ACPR). The power consumption of additional circuit is 200µW.
View full abstract