IEICE Transactions on Electronics

Special Section on Low-Power and High-Speed Chips

FOREWORD

Fumio ARAKAWA, Makoto IKEDA

2022Volume E105.CIssue 6 Pages 207-208
Published: June 01, 2022
Released on J-STAGE: June 01, 2022

DOIhttps://doi.org/10.1587/transele.2021LHF0001

JOURNAL FREE ACCESS

Download PDF (166K)
In Search of the Performance- and Energy-Efficient CNN Accelerators

Stanislav SEDUKHIN, Yoichi TOMIOKA, Kohei YAMAMOTO

Article type: PAPER
2022Volume E105.CIssue 6 Pages 209-221
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021LHP0003

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.

View full abstract

Download PDF (1265K)
A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU

Kentaro KAWAKAMI, Kouji KURIHARA, Masafumi YAMAZAKI, Takumi HONDA, Nao ...

Article type: PAPER
2022Volume E105.CIssue 6 Pages 222-231
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021LHP0001

JOURNAL FREE ACCESS

Show abstractHide abstract

To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

View full abstract

Download PDF (976K)
A Metadata Prefetching Mechanism for Hybrid Memory Architectures

Shunsuke TSUKADA, Hikaru TAKAYASHIKI, Masayuki SATO, Kazuhiko KOMATSU, ...

Article type: PAPER
2022Volume E105.CIssue 6 Pages 232-243
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021LHP0004

JOURNAL FREE ACCESS

Show abstractHide abstract

A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.

View full abstract

Download PDF (969K)

Special Section on Progress & Trend of Superconductor-based Computers

FOREWORD

Satoshi KOHJIRO

2022Volume E105.CIssue 6 Pages 244
Published: June 01, 2022
Released on J-STAGE: June 01, 2022

DOIhttps://doi.org/10.1587/transele.2021SEF0001

JOURNAL FREE ACCESS

Download PDF (90K)
32-Bit ALU with Clockless Gates for RSFQ Bit-Parallel Processor

Takahiro KAWAGUCHI, Naofumi TAKAGI

Article type: INVITED PAPER
2022Volume E105.CIssue 6 Pages 245-250
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021SEP0005

JOURNAL FREE ACCESS

Show abstractHide abstract

A 32-bit arithmetic logic unit (ALU) is designed for a rapid single flux quantum (RSFQ) bit-parallel processor. In the ALU, clocked gates are partially replaced by clockless gates. This reduces the number of D flip flops (DFFs) required for path balancing. The number of clocked gates, including DFFs, is reduced by approximately 40 %, and size of the clock distribution network is reduced. The number of pipeline stages becomes modest. The layout design of the ALU and simulation results show the effectiveness of using clockless gates in wide datapath circuits.

View full abstract

Download PDF (609K)
Adiabatic Quantum-Flux-Parametron: A Tutorial Review

Naoki TAKEUCHI, Taiki YAMAE, Christopher L. AYALA, Hideo SUZUKI, Nobuy ...

Article type: INVITED PAPER
2022Volume E105.CIssue 6 Pages 251-263
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: January 19, 2022

DOIhttps://doi.org/10.1587/transele.2021SEP0003

JOURNAL FREE ACCESS

Show abstractHide abstract

The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic element based on the quantum flux parametron. AQFP circuits can operate with energy dissipation near the thermodynamic and quantum limits by maximizing the energy efficiency of adiabatic switching. We have established the design methodology for AQFP logic and developed various energy-efficient systems using AQFP logic, such as a low-power microprocessor, reversible computer, single-photon image sensor, and stochastic electronics. We have thus demonstrated the feasibility of the wide application of AQFP logic in future information and communications technology. In this paper, we present a tutorial review on AQFP logic to provide insights into AQFP circuit technology as an introduction to this research field. We describe the historical background, operating principle, design methodology, and recent progress of AQFP logic.

View full abstract

Download PDF (6759K)
A High-Speed Interface Based on a Josephson Latching Driver for Adiabatic Quantum-Flux-Parametron Logic

Fumihiro CHINA, Naoki TAKEUCHI, Hideo SUZUKI, Yuki YAMANASHI, Hirotaka ...

Article type: PAPER
2022Volume E105.CIssue 6 Pages 264-269
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021SEP0002

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The adiabatic quantum flux parametron (AQFP) is an energy-efficient, high-speed superconducting logic device. To observe the tiny output currents from the AQFP in experiments, high-speed voltage drivers are indispensable. In the present study, we develop a compact voltage driver for AQFP logic based on a Josephson latching driver (JLD), which has been used as a high-speed driver for rapid single-flux-quantum (RSFQ) logic. In the JLD-based voltage driver, the signal currents of AQFP gates are converted into gap-voltage-level signals via an AQFP/RSFQ interface and a four-junction logic gate. Furthermore, this voltage driver includes only 15 Josephson junctions, which is much fewer than in the case for the previously designed driver based on dc superconducting quantum interference devices (60 junctions). In measurement, we successfully operate the JLD-based voltage driver up to 4 GHz. We also evaluate the bit error rate (BER) of the driver and find that the BER is 7.92×10^-10 and 2.67×10^-3 at 1GHz and 4GHz, respectively.

View full abstract

Download PDF (1327K)
A 16-Bit Parallel Prefix Carry Look-Ahead Kogge-Stone Adder Implemented in Adiabatic Quantum-Flux-Parametron Logic

Tomoyuki TANAKA, Christopher L. AYALA, Nobuyuki YOSHIKAWA

Article type: PAPER
2022Volume E105.CIssue 6 Pages 270-276
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: January 19, 2022

DOIhttps://doi.org/10.1587/transele.2021SEP0001

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Extremely energy-efficient logic devices are required for future low-power high-performance computing systems. Superconductor electronic technology has a number of energy-efficient logic families. Among them is the adiabatic quantum-flux-parametron (AQFP) logic family, which adiabatically switches the quantum-flux-parametron (QFP) circuit when it is excited by an AC power-clock. When compared to state-of-the-art CMOS technology, AQFP logic circuits have the advantage of relatively fast clock rates (5 GHz to 10 GHz) and 5 - 6 orders of magnitude reduction in energy before cooling overhead. We have been developing extremely energy-efficient computing processor components using the AQFP. The adder is the most basic computational unit and is important in the development of a processor. In this work, we designed and measured a 16-bit parallel prefix carry look-ahead Kogge-Stone adder (KSA). We fabricated the circuit using the AIST 10 kA/cm² High-speed STandard Process (HSTP). Due to a malfunction in the measurement system, we were not able to confirm the complete operation of the circuit at the low frequency of 100 kHz in liquid He, but we confirmed that the outputs that we did observe are correct for two types of tests: (1) critical tests and (2) 110 random input tests in total. The operation margin of the circuit is wide, and we did not observe any calculation errors during measurement.

View full abstract

Download PDF (2428K)
Adiabatic Quantum-Flux-Parametron with Delay-Line Clocking Using Square Excitation Currents

Taiki YAMAE, Naoki TAKEUCHI, Nobuyuki YOSHIKAWA

Article type: PAPER
2022Volume E105.CIssue 6 Pages 277-282
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: January 19, 2022

DOIhttps://doi.org/10.1587/transele.2021SEP0004

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic device. In a previous study, we proposed a low-latency clocking scheme called delay-line clocking, and several low-latency AQFP logic gates have been demonstrated. In delay-line clocking, the latency between adjacent excitation phases is determined by the propagation delay of excitation currents, and thus the rising time of excitation currents should be sufficiently small; otherwise, an AQFP gate can switch before the previous gate is fully excited. This means that delay-line clocking needs high clock frequencies, because typical excitation currents are sinusoidal and the rising time depends on the frequency. However, AQFP circuits need to be tested in a wide frequency range experimentally. Hence, in the present study, we investigate AQFP circuits adopting delay-line clocking with square excitation currents to apply delay-line clocking in a low frequency range. Square excitation currents have shorter rising time than sinusoidal excitation currents and thus enable low frequency operation. We demonstrate an AQFP buffer chain with delay-line clocking using square excitation currents, in which the latency is approximately 20ps per gate, and confirm that the operating margin for the buffer chain is kept sufficiently wide at clock frequencies below 1GHz, whereas in the sinusoidal case the operating margin shrinks below 500MHz. These results indicate that AQFP circuits adopting delay-line clocking can operate in a low frequency range by using square excitation currents.

View full abstract

Download PDF (2003K)
Development of Quantum Annealer Using Josephson Parametric Oscillators

Tomohiro YAMAJI, Masayuki SHIRANE, Tsuyoshi YAMAMOTO

Article type: INVITED PAPER
2022Volume E105.CIssue 6 Pages 283-289
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021SEP0006

JOURNAL FREE ACCESS

Show abstractHide abstract

A Josephson parametric oscillator (JPO) is an interesting system from the viewpoint of quantum optics because it has two stable self-oscillating states and can deterministically generate quantum cat states. A theoretical proposal has been made to operate a network of multiple JPOs as a quantum annealer, which can solve adiabatically combinatorial optimization problems at high speed. Proof-of-concept experiments have been actively conducted for application to quantum computations. This article provides a review of the mechanism of JPOs and their application as a quantum annealer.

View full abstract

Download PDF (791K)
Toward Realization of Scalable Packaging and Wiring for Large-Scale Superconducting Quantum Computers

Shuhei TAMATE, Yutaka TABUCHI, Yasunobu NAKAMURA

Article type: INVITED PAPER
2022Volume E105.CIssue 6 Pages 290-295
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: December 03, 2021

DOIhttps://doi.org/10.1587/transele.2021SEP0007

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we review the basic components of superconducting quantum computers. We mainly focus on the packaging and wiring technologies required to realize large-scalable superconducting quantum computers.

View full abstract

Download PDF (667K)
Evaluation of a True Random Number Generator Utilizing Timing Jitters in RSFQ Logic Circuits

Kenta SATO, Naonori SEGA, Yuta SOMEI, Hiroshi SHIMADA, Takeshi ONOMI, ...

Article type: BRIEF PAPER
2022Volume E105.CIssue 6 Pages 296-299
Published: June 01, 2022
Released on J-STAGE: June 01, 2022
Advance online publication: January 19, 2022

DOIhttps://doi.org/10.1587/transele.2021SES0001

JOURNAL FREE ACCESS

Show abstractHide abstract

We experimentally evaluated random number sequences generated by a superconducting hardware random number generator composed of a Josephson-junction oscillator, a rapid-single-flux-quantum (RSFQ) toggle flip-flop (TFF), and an RSFQ AND gate. Test circuits were fabricated using a 10 kA/cm² Nb/AlO_x/Nb integration process. Measurements were conducted in a liquid helium bath. The random numbers were generated for a trigger frequency of 500 kHz under the oscillating Josephson-junction at 29 GHz. 26 random number sequences of 20 kb length were evaluated for bias voltages between 2.0 and 2.7 mV. The NIST FIPS PUBS 140-2 tests were used for the evaluation. 100% pass rates were confirmed at the bias voltages of 2.5 and 2.6 mV. We found that the Monobit test limited the pass rates. As numerical simulations suggested, a detailed evaluation for the probability of obtaining “1” demonstrated the monotonical dependence on the bias voltage.

View full abstract

Download PDF (479K)

Register with J-STAGE for free!