IPSJ Transactions on System and LSI Design Methodology
Online ISSN : 1882-6687
ISSN-L : 1882-6687
Current issue
Displaying 1-9 of 9 articles from this issue
 
  • Tohru Ishihara
    Article type: Editorial
    Subject area: Editorial
    2024 Volume 17 Pages 1
    Published: 2024
    Released on J-STAGE: February 28, 2024
    JOURNAL FREE ACCESS
    Download PDF (32K)
  • Tadahiro Kuroda
    Article type: Invited Paper
    2024 Volume 17 Pages 2-6
    Published: 2024
    Released on J-STAGE: February 28, 2024
    JOURNAL FREE ACCESS

    The continuous growth of the semiconductor industry, driven by the use of AI to fuse the physical with virtual space, requires drastic improvement in IC power efficiency, memory capacity, and memory bandwidth. This paper describes two solutions to slash IC power using 3D integration and specialized chips. In addition, it proposes a novel sliced bread memory stacking scheme that enables more than a 10-fold increase in the number of memory chips and hence capacity per stack, as well as memory bandwidth. Furthermore, it elaborates on an agile development platform that enables designing chips like writing software and prototyping chips in days. The ease of use of the platform and its 10-fold reduction of development time and costs are expected to democratize access to specialized chips to accelerate innovation by increasing the number of developers. It will also accelerate the transition of society to the digital age. Finally, the author will discuss the need for the global IC industry to move away from its reliance on competition to co-existence and co-evolution to sustain its growth.

    Download PDF (765K)
  • Ryotaro Ohara, Atsushi Fukunaga, Masakazu Taichi, Masaya Kabuto, Riku ...
    Article type: Regular Paper
    Subject area: System LSI Design Methodology
    2024 Volume 17 Pages 7-15
    Published: 2024
    Released on J-STAGE: February 28, 2024
    JOURNAL FREE ACCESS

    We investigated the improvement achieved in the performance of a deep-learning inference processor by changing its cache memory from SRAM to spin-orbit torque magnetoresistive random-access memory (SOT-MRAM). The implementation of SOT-MRAM doubled the capacity in the same area compared to SRAM. It is also expected to reduce the main memory transfer without changing the chip area, thereby reducing the energy. As a case study, we simulated how much the performance could be improved by replacing SRAM with MRAM in a deep learning processor. The NVIDIA deep-learning accelerator (NVDLA) was used as a motif processor, and SegNet and U-Net were used as the target networks for the segmentation task. The image size was set to 512 × 1024 pixels. We evaluated the performance of the NVDLA with a 512-KB buffer and cache memory sizes of 1, 2, 4, and 8MB for its on-chip memory, replacing these two memories with MRAM implementations. As a result, when both the buffer and cache were replaced with SOT-MRAM, the energy consumption and speed could be reduced by 18.6% and 17.9%, respectively. In addition, the performance per unit area was improved by more than 36.4%. Replacing SRAM with spin-transfer torque MRAM is not suitable for inference devices, because the latency is significantly worse as a result of its slow write operation.

    Download PDF (1157K)
  • Shota Nakabeppu, Nobuyuki Yamasaki
    Article type: Regular Paper
    Subject area: Architecture Design Methodology
    2024 Volume 17 Pages 16-35
    Published: 2024
    Released on J-STAGE: February 28, 2024
    JOURNAL FREE ACCESS

    A magnetic tunnel junction (MTJ) based non-volatile flip-flop (NVFF) is attractive for non-volatile power gating to reduce power consumption and for non-volatile checkpointing to improve fault tolerance. An MTJ-based NVFF can perform a store operation to write the slave latch value to the MTJs, non-volatile devices, and a restore operation to write the MTJs value to the slave latch. However, a store operation is a stochastic operation. The store operations' success rate depends on their duration, NVFF characteristics, voltage, and temperature. Their success rate changes statically because each NVFF has different characteristics due to process variation in actual chips. Their success rate changes dynamically because voltage and temperature change dynamically depending on operating environments. Our goal is to reduce the checkpoint creation's energy consumption while ensuring its success. We propose a learning-based hardware scheme that dynamically finds the appropriate parameters to achieve our goal. The proposed scheme consists of a machine-learning unit and an exploration unit. The machine-learning unit learns and predicts the store operations' success rate by inputting their duration, voltage, and temperature. The exploration unit explores the trained machine-learning unit to find the appropriate parameters. The evaluation shows that the proposed scheme could achieve our goal.

    Download PDF (13030K)
  • Takehiro Kitamura, Takashi Hisakado, Osami Wada, Mahfuzul Islam
    Article type: Regular Paper
    Subject area: Low Power Design Methodology
    2024 Volume 17 Pages 36-43
    Published: 2024
    Released on J-STAGE: February 28, 2024
    JOURNAL FREE ACCESS

    Statistical element selection has been proposed to solve the offset voltage variation problem for a flash ADC. A calibration method based on order statistics has been proposed for statistical selection that does not require offset voltage measurement. This paper presents a design methodology of flash ADC with such calibration using multiple comparator groups. We validate our proposal with measurement results from test chips fabricated in a commercial 65nm general-purpose process. Measurement results confirm that rank-based comparator selection achieves a reference-free ADC. Compared to the baseline ADC, where only one group of comparators is used, the ADC with three groups significantly increases the linearity and input range under the same power consumption. As no reference voltage and DACs are required, the proposed ADC design will help realize ADCs in advanced process nodes with lower power consumption.

    Download PDF (1749K)
  • Yuncheng Zhang, Kenichi Okada
    Article type: Invited Paper
    2024 Volume 17 Pages 44-54
    Published: 2024
    Released on J-STAGE: June 19, 2024
    JOURNAL FREE ACCESS

    Phase-locked loops (PLLs) are crucial building blocks in almost everry electronic device. With the continuous scaling down of CMOS process, conventional custom-designed PLLs based on analog circuits suffers from the large area, degraded performance, and long design time. This paper introduces fully synthesizable PLLs based on purely digital standard cells. The fully synthesizable PLLs are compact and can be designed using commercial digital synthesis tools. Design time for synthesizable PLLs is much shorter than conventional analog PLLs, especially given a new technology. Injection-locked architecture is adopted in the synthesizable PLLs to improve the PLL performance. Synthesis method to automatically design the synthesizable PLLs is described in this paper. Furthermore, building block design is introduced. Finally, design examples of injection-locked PLLs are introduced, and their performance is compared with that of custom-designed PLLs.

    Download PDF (8345K)
  • Hansen Wang, Dongju Li, Tsuyoshi Isshiki
    Article type: Regular Paper
    Subject area: Hardware/Software Co-Design
    2024 Volume 17 Pages 55-66
    Published: 2024
    Released on J-STAGE: June 19, 2024
    JOURNAL FREE ACCESS

    Deep neural networks (DNNs) find extensive applications across diverse domains, including Speech Recognition, Face Detection, and Image Classification. While the conventional approach relies on Graphics Processing Units (GPUs) for DNN implementation, it prioritizes speed at the expense of efficiency. In the pursuit of reduced power consumption and enhanced efficiency, we advocate for the adoption of application-specific hardware computing. This paper introduces a run-time reconfigurable DNN accelerator SoC (DNN-AS) architecture, seamlessly integrated into the instruction-extended RISC-V platform. The meticulously crafted application-specific extension instruction set is tailored to expedite high-frequency DNN operations. To optimize circuit structure, we have devised an 8-bit dynamic fixed-point (DFP) scheme within the DNN-AS. Furthermore, we conduct a comparative accuracy analysis between DFP and the PyTorch float implementation. Our results demonstrate that DNN-AS exhibits minimal accuracy loss, with Top 1 accuracy deviations of only up to 0.53%, 0.31%, and 0.68% for RESNET34, RESNET50, and RESNET101, respectively. Finally, we juxtapose the overall simulated results with other platforms, revealing that our design has achieved remarkable improvements in throughput per joule (GOP/J), ranging from 8.4x to 1897x compared to Field-Programmable Gate Arrays (FPGAs) and GPU.

    Download PDF (782K)
  • Kazuya Taniguchi, Satoshi Tayu, Atsushi Takahashi, Mathieu Molongo, Ma ...
    Article type: Regular Paper
    Subject area: Behavioral/Logic/Layout Synthesis and Verification
    2024 Volume 17 Pages 67-76
    Published: 2024
    Released on J-STAGE: June 19, 2024
    JOURNAL FREE ACCESS

    Design automation that realizes analog integrated circuits to meet performance specifications in a small area is desired. To reduce the layout area, “Bottleneck Channel Routing” is proposed in which two wires go through a routing track in the bottleneck region. A two-layer routing problem that consists of the bottleneck channel and the adjacent regions where the HV rule is not applicable is defined. The proposed algorithm uses a U-shaped routing model, and generates two-layer routing in which the number of intersections is minimized and the wire of a net includes at most one via. The obtained routing contains no conflicts if the algorithm outputs a feasible solution.

    Download PDF (1250K)
  • Kensuke Iizuka, Kohei Ito, Ryota Yasudo, Hideharu Amano
    Article type: Regular Paper
    Subject area: System LSI Design Methodology
    2024 Volume 17 Pages 77-86
    Published: 2024
    Released on J-STAGE: June 19, 2024
    JOURNAL FREE ACCESS

    Expectations for Multi-access Edge Computing (MEC) have been increasing in recent years and, FPGA clusters have attracted attention as a computing platform as MEC servers due to their power-saving characteristics. These FPGA clusters can improve performance by directly connecting multiple FPGAs to utilize more computing resources. However, the results of the power consumption analysis of FPGA clusters in our previous studies show that the power consumption of the interconnect between FPGAs accounts for most of the power consumption of the entire system. In this study, we propose a framework that uses an optimization-based mapping algorithm to automatically map distributed processing applications without sacrificing communication performance and disable links that are unnecessary for application execution. This framework can automatically reduce power consumption. Using this framework to run applications on multiple boards, we can reduce the power consumption of the entire system by up to 52%.

    Download PDF (1629K)
feedback
Top