IPSJ Transactions on System and LSI Design Methodology

Message from the Editor-in-Chief

Hiroyuki Tomiyama

原稿種別: Editorial
専門分野: Editorial
2012 年5 巻 p. 1
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.1

ジャーナルフリー

PDF形式でダウンロード (29K)
A Stackable LTE Chip for Cost-effective 3D Systems

Walid Lafi, Didier Lattard, Ahmed Jerraya

原稿種別: System-Level Design
専門分野: Invited Paper
2012 年5 巻 p. 2-13
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.2

ジャーナルフリー

抄録を表示する抄録を非表示にする

To address the problem of prohibitive cost of advanced fabrication technologies, one solution consists in reusing masks to address a wide range of ICs. This could be achieved by a modular circuit that can be stacked to build TSV-based 3D systems with processing performance adapted to several applications. This paper focuses on 4G wireless telecom applications. We propose a basic circuit that meets the SISO (Single Input Single Output) transmission mode. By stacking multiple instances of this same circuit, it will be possible to address several MIMO (Multiple Input Multiple Output) modes. The proposed circuit is composed of several processing units interconnected by a 3D NoC and controlled by a host processor. Compared to a 2D reference platform, the proposed circuit keeps at least the same performance and power consumption in the context of 4G telecom applications, while reducing total cost. More generally, our cost analysis shows that 3D integration efficiency depends on the size of the circuit and the stacking option (die-to-die, die-to-wafer and interposer-based stacking).

抄録全体を表示

PDF形式でダウンロード (956K)
Advances in PCB Routing

Tan Yan, Qiang Ma, Martin D.F. Wong

原稿種別: Board-Level Routing
専門分野: Invited Paper
2012 年5 巻 p. 14-22
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.14

ジャーナルフリー

抄録を表示する抄録を非表示にする

The increasing complexity of electronic systems has made PCB routing a difficult problem. A large amount of research effort has been dedicated to the study of this problem. In this paper, we provide an overview of recent research results on the PCB routing problem. We focus on the escape routing problem and the length-matching routing problem, which are the two most important problems in PCB routing. Other relevant works are also briefly introduced.

抄録全体を表示

PDF形式でダウンロード (1764K)
DVB-T2 LDPC Decoder with Perfect Conflict Resolution

Xiongxin Zhao, Zhixiang Chen, Xiao Peng, Dajiang Zhou, Satoshi Goto

原稿種別: Architectural Design
専門分野: Regular Paper
2012 年5 巻 p. 23-31
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.23

ジャーナルフリー

抄録を表示する抄録を非表示にする

Currently most of LDPC decoders are implemented with the so-called layered algorithm for its implementation efficiency and relatively high decoding performance. However, not all of structured LDPC codes can be implemented with the layered algorithm directly because of the message updating conflicts within layers in the a-posteriori information memory. In this paper we focus on the resolution of this kind of conflicts for DVB-T2 LDPC decoders. Unlike the previous resolutions, we directly implement the layered algorithm without modifying the parity-check matrices (PCM) or the decoding algorithm. DVB-T2 LDPC decoder architecture is also proposed in this paper with two new techniques which guarantee conflict-free layered decoding. The PCM Rearrange technique reduces the number of conflicts and eliminates all of data dependency problems between layers to ensure high pipeline efficiency. The Layer Division technique deals with all remaining conflicts with a well-designed decoding schedule. Experiment results show that compared to state-of-the-art works we achieve a slight error-correcting performance gain for DVB-T2 LDPC codes.

抄録全体を表示

PDF形式でダウンロード (738K)
0.5-V 4-MB Variation-Aware Cache Architecture Using 7T/14T SRAM and Its Testing Scheme

Yohei Nakata, Shunsuke Okumura, Hiroshi Kawaguchi, Masahiko Yoshimoto

原稿種別: Architectural Design
専門分野: Regular Paper
2012 年5 巻 p. 32-43
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.32

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents a novel cache architecture using 7T/14T SRAM, which can improve its reliability with control lines dynamically. Our proposed 14T word-enhancing scheme can enhance its operating margin in word granularity by combining two words in a low-voltage mode. Furthermore, we propose a new testing method that maximizes the efficiency of the 14T word-enhancing scheme. In a 65-nm process, it can reduce the minimum operation voltage (V_min) to 0.5V to a level that is 42% and 21% lower, respectively, than those of a conventional 6T SRAM and a cache word-disable scheme. Measurement results show that the 14T word-enhancing scheme can reduce V_min of the 6T SRAM and 14T dependable modes by 25% and 19%, respectively. The respective dynamic power reductions are 89.2% and 73.9%. The respective total power reductions are 44.8% and 20.9%.

抄録全体を表示

PDF形式でダウンロード (2350K)
A Fast Performance Estimation Framework for System-Level Design Space Exploration

Seiya Shibata, Yuki Ando, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Tak ...

原稿種別: System-Level Performance Analysis
専門分野: Regular Paper
2012 年5 巻 p. 44-54
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.44

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents a fast performance estimation framework and an performance estimation method for design space exploration at system level. As the complexity of embedded systems grows, design space exploration at a system level plays a more important role than before. In the system-level design, system designers start from describing functionalities of the system as processes and channels, and then decide mapping of them to various Processing Elements (PEs) including processors and dedicated hardware modules. A mapping decision is evaluated by simulation or FPGA-based prototyping. Designers iterate mapping and evaluation until all design requirements are met. In order to shorten the evaluation time, we have developed a fast design space exploration framework which combines our system-level design tool, named SystemBuilder, and a newly developed fast performance estimation tool, named SystemPerfEst. SystemPerfEst is based on trace-based simulation method. The trace is obtained as the result of SystemBuilder, and the trace is fed to SystemPerfEst smoothly. Since the estimation of a design candidate finishes in about one second, design space exploration of a number of design candidates can be performed with SystemPerfEst in a practical time. A case study on design space exploration of a JPEG decoder system demonstrates the effectiveness of our framework.

抄録全体を表示

PDF形式でダウンロード (1288K)
A Robust Algorithm for Pessimistic Analysis of Logic Masking Effects in Combinational Circuits

Taiga Takata, Yusuke Matsunaga

原稿種別: Logic-Level Reliability Analysis
専門分野: Regular Paper
2012 年5 巻 p. 55-62
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.55

ジャーナルフリー

抄録を表示する抄録を非表示にする

Analyzing logic masking effects is an important key to evaluate soft error tolerance of circuits. The computing complexity of analyzing logic masking effects exactly is proportional to the square of circuit size, which is unacceptable to achieve a scalable analyzer. This paper shows a robust algorithm to analyze logic masking effects pessimistically with multiple CODCs (Compatible combinations of Observability Don't Cares). It is guaranteed that an upper bound of the susceptibility of each gate is estimated using the proposed algorithm. The computing complexity of the proposed algorithm is proportional to circuit size. Experimental results show that the proposed algorithm runs about 91 times faster than an algorithm which analyzes logic masking effects exactly with fault simulation. The proposed algorithm estimates average susceptibility 11.5% larger than that of the exact algorithm for circuits in ITC'99 benchmark set. The state-of-the-art heuristic AnSER estimates average susceptibility with 96% underestimation for circuits protected with partial TMR (Triple Modular Redundancy) on average, which can be fatal error for soft error tolerance evaluation. On the other hand, the proposed algorithm estimates average susceptibility with 37.9% overestimation for such circuits on average. The proposed algorithm is useful to estimate an upper bound of the susceptibility of each gate quickly.

抄録全体を表示

PDF形式でダウンロード (411K)
An Exact Estimation Algorithm of Error Propagation Probability for Sequential Circuits

Masayoshi Yoshimura, Yusuke Akamine, Yusuke Matsunaga

原稿種別: Logic-Level Reliability Analysis
専門分野: Regular Paper
2012 年5 巻 p. 63-70
発行日: 2012年
公開日: 2012/02/21

DOIhttps://doi.org/10.2197/ipsjtsldm.5.63

ジャーナルフリー

抄録を表示する抄録を非表示にする

In advanced integrated circuit technology, the soft error tolerance is low. Soft errors ultimately lead to failure in VLSIs. We propose a method for the exact estimation of error propagation probabilities in sequential circuits whose FFs latch failure values. The failure due to soft errors in sequential circuits is defined using the modified product machine. The modified product machine monitors whether failure values appear at any primary output. The behavior of the modified product machine is analyzed with the Markov model. The probabilities that the failure values latched into the flip-flops (FFs) appear at any primary output are calculated from the state transition probabilities of the modified product machine. The time required for solving simultaneous linear equations accounts for a large portion of the execution time. We also propose two acceleration techniques to enable the application of our estimation method to larger scale circuits. These acceleration techniques reduce the number of variables in simultaneous linear equations. We apply the proposed method to ISCAS'89 and MCNC benchmark circuits and estimate error propagation probabilities for sequential circuits. Experimental results show that total execution times for the proposed method with two acceleration techniques are up to 10 times lesser than the total execution times for a naive implementation.

抄録全体を表示

PDF形式でダウンロード (364K)
System-On-Chip for Biologically Inspired Vision Applications

Sungho Park, Ahmed Al Maashri, Kevin M. Irick, Aarti Chandrashekhar, M ...

原稿種別: Invited Paper
専門分野: System-Level Design
2012 年5 巻 p. 71-95
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.71

ジャーナルフリー

抄録を表示する抄録を非表示にする

Neuromorphic vision algorithms are biologically-inspired computational models of the primate visual pathway. They promise robustness, high accuracy, and high energy efficiency in advanced image processing applications. Despite these potential benefits, the realization of neuromorphic algorithms typically exhibit low performance even when executed on multi-core CPU and GPU platforms. This is due to the disparity in the computational modalities prominent in these algorithms and those modalities most exploited in contemporary computer architectures. In essence, acceleration of neuromorphic algorithms requires adherence to specific computational and communicational requirements. This paper discusses these requirements and proposes a framework for mapping neuromorphic vision applications on a System-on-Chip, SoC. A neuromorphic object detection and recognition on a multi-FPGA platform is presented with performance and power efficiency comparisons to CMP and GPU implementations.

抄録全体を表示

PDF形式でダウンロード (4300K)
A Fast Weighted Adder by Reducing Partial Product for Reconstruction in Super-Resolution

Hiromine Yoshihara, Masao Yanagisawa, Nozomu Togawa

原稿種別: Regular Paper
専門分野: Arithmetic Design
2012 年5 巻 p. 96-105
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.96

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, it is quite necessary to convert conventional low-resolution images to high-resolution ones at low cost. Super-resolution is a technique to remove the noise of observed images and restore its high frequencies. We focus on reconstruction-based super-resolution. Reconstruction requires large computation cost since it requires many images. In this paper, we propose a fast weighted adder for reconstruction-based super-resolution. From the viewpoint of reducing partial products, we propose two approaches to speed up a weighted adder. First, we use selector logics to halve its partial products. Second, we propose a weights-range limit method utilizing negative term. By applying our proposed approaches to a weighted adder, we can reduce carry propagations and our weighted adder can be designed by a fast circuit as compared to conventional ones. Experimental evaluations demonstrate that our weighted adder reduces its delay time by a maximum of 25.29% and its area to a maximum of 1/3, compared to conventional implementations.

抄録全体を表示

PDF形式でダウンロード (678K)
Energy-efficient High-level Synthesis for HDR Architectures

Shin-ya Abe, Masao Yanagisawa, Nozomu Togawa

原稿種別: Regular Paper
専門分野: Low-Power Behavioral Synthesis
2012 年5 巻 p. 106-117
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.106

ジャーナルフリー

抄録を表示する抄録を非表示にする

As battery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an interconnection delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnection delays even in high-level synthesis. In this paper, we first propose a huddle-based distributed-register architecture (HDR architecture), an island-based distributed-register architecture for multi-cycle interconnect communications where we can develop several energy-saving techniques. Next, we propose an energy-efficient high-level synthesis algorithm for HDR architectures focusing on multiple supply voltages. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, a huddle, which is composed of functional units, registers, controller, and level converters, are very naturally generated using floorplanning results. By assigning high supply voltage to critical huddles and low supply voltage to non-critical huddles, we can finally have energy-efficient floorplan-aware high-level synthesis. Experimental results show that our algorithm achieves 45% energy-saving compared with the conventional distributed-register architectures and conventional algorithms.

抄録全体を表示

PDF形式でダウンロード (788K)
Optimized Communication and Synchronization for Embedded Multiprocessors Using ASIP Methodology

Hao Xiao, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda, Yuko Nakase, S ...

原稿種別: Regular Paper
専門分野: Architectural Design
2012 年5 巻 p. 118-132
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.118

ジャーナルフリー

抄録を表示する抄録を非表示にする

Inter-processor communication and synchronization are critical problems in embedded multiprocessors. In order to achieve high-speed communication and low-latency synchronization, most recent designs employ dedicated hardware engines to support these communication protocols individually, which is complex, inflexible, and error prone. Thus, this paper motivates the optimization of inter-processor communication and synchronization by using application-specific instruction-set processor (ASIP) techniques. The proposed communication mechanism is based on a set of custom instructions coupled with a low-latency on-chip network, which provides efficient support for both data transfer and process synchronization. By using state-of-the-art ASIP design methodology, we embed the communication functionalities into a base processor, making the proposed mechanism feature ultra low overhead. More importantly, industry-standard compatible programming interfaces supporting both message-passing and shared-memory paradigms are exposed to end-users to ease the software porting. Experimental results show that the bandwidth of the proposed message-passing protocol can achieve up to 703Mbyte/s @ 200MHz, and the latency of the proposed synchronization protocol can be reduced by more than 81% when compared with the conventional approach. Moreover, as a case study, we also show the effectiveness of the proposed communication mechanism in a real-life embedded application, WiMedia UWB MAC.

抄録全体を表示

PDF形式でダウンロード (2145K)
Efficient Algorithms for Extracting Pareto-optimal Hardware Configurations in DEPS Framework

Hirotaka Kawashima, Gang Zeng, Hideki Takase, Masato Edahiro, Hiroaki ...

原稿種別: Regular Paper
専門分野: System-Level Energy Optimization
2012 年5 巻 p. 133-142
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.133

ジャーナルフリー

抄録を表示する抄録を非表示にする

A dynamic energy performance scaling (DEPS) framework has been proposed as a generalization of dynamic voltage frequency scaling (DVFS). The DEPS framework selects an energy-optimal hardware configuration at runtime. To reduce runtime overhead, Pareto-optimal combinations of hardware configurations should be provided via DEPS profiling during the design phase. The challenge of DEPS profiling lies in extracting the Pareto-optimal combinations efficiently from the exponential search space. We propose two exact algorithms to reduce the number of calculations in DEPS profiling. These algorithms can be used with common search algorithms. We also propose a heuristic algorithm for searching Pareto-optimal configurations efficiently. Extensive experiments are performed, and they demonstrate that the proposed algorithms can complete DEPS profiling within a reasonable amount of time and generate optimal DEPS profiles. It is believed that the proposed algorithms will enable easy application of the DEPS framework in practice.

抄録全体を表示

PDF形式でダウンロード (422K)
NBTI-Induced Delay Degradation Analysis of FPGA Routing Structures

Michitarou Yabuuchi, Kazutoshi Kobayashi

原稿種別: Regular Paper
専門分野: Logic-Level Performance Analysis
2012 年5 巻 p. 143-149
発行日: 2012年
公開日: 2012/08/06

DOIhttps://doi.org/10.2197/ipsjtsldm.5.143

ジャーナルフリー

抄録を表示する抄録を非表示にする

Reliability issues, such as soft errors, process variations and Negative Bias Temperature Instability (NBTI), become dominant on Field Programmable Gate Arrays (FPGAs) fabricated in a nanometer process. We focus on aging degradation by NBTI, which causes threshold voltage shifts on PMOS transistors. We characterize delay degradation in the routing structures on FPGAs. The rising and falling delays vary due to NBTI and heavily depend on circuit configurations. In the independent routing switch, the delay fluctuation due to NBTI can be minimized by transistor sizing. The falling delay does not change after 10-years degradation. In the routing structures composed of the routing switches and wires, the delay fluctuation depends on the wire length and can be minimized to optimize the wire length. We also show that the signal flipping can reduce the delay degradation from 11.3% to 2.76% on the routing resources.

抄録全体を表示

PDF形式でダウンロード (685K)

J-STAGEへの登録はこちら（無料）