IPSJ Transactions on System and LSI Design Methodology

Message from the Editor-in-Chief

Hidetoshi Onodera

Article type: Editorial
Subject area: Editorial
2010 Volume 3 Pages 1
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.1

JOURNAL FREE ACCESS

Download PDF (28K)
Advances and Challenges in 3D Physical Design

Jason Cong, Guojie Luo

Article type: Invited Paper
Subject area: Physical Synthesis
2010 Volume 3 Pages 2-18
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.2

JOURNAL FREE ACCESS

Show abstractHide abstract

The task of 3D physical design is to map a circuit from a netlist (structural) representation into a geometric (physical) representation according to a specific 3D IC technology with multiple active device layers. This paper discusses the recent progress made on the major steps in 3D physical design, including 3D floorplanning, 3D placement, 3D routing and thermal through-silicon via (TS via) planning, and outlines the challenges ahead.

View full abstract

Download PDF (2159K)
Recent Advances in Analog, Mixed-Signal, and RF Testing

Kwang-Ting (Tim) Cheng, Hsiu-Ming (Sherman) Chang

Article type: Invited Paper
Subject area: Analog Testing
2010 Volume 3 Pages 19-46
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.19

JOURNAL FREE ACCESS

Show abstractHide abstract

Due to the lack of widely applicable fault models, testing for analog, mixed-signal (AMS), and radio frequency (RF) circuits has been, and will continue to be, primarily based on checking their conformance to the specifications. However, with the higher level of integration and increased diversity of specifications for measurement, specification-based testing is becoming increasingly difficult and costly. As a result, design for testability (DfT), combined with automatic test stimuli generation, has gradually become a necessity to ensure test quality at an affordable cost. This paper provides an overview of cost-effective test techniques that either enhance circuit testability, or enable built-in self-test (BIST) for integrated AMS/RF frontends. In addition, we introduce several low-cost testing paradigms including the loopback testing, alternate testing, and digitally-assisted testing that offer the promise of significant test cost reduction with little or even no compromise in test quality. Moving forward, in addition to screening the defective parts, testing will play an increasingly important role in supporting other post-silicon quality assurance functions such as post-silicon validation, tuning, and in-field reliability of system chips.

View full abstract

Download PDF (1015K)
Performance Evaluation of a Dynamically Switchable SIMD/MIMD Processor by Using an Image Recognition Application

Shohei Nomoto, Shorin Kyo, Shinichiro Okazaki

Article type: Regular Paper
Subject area: Architectural Performance Analysis
2010 Volume 3 Pages 47-56
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.47

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed an “XC core” processor that achieves low cost, high performance, and low power consumption through the use of a highly parallel SIMD architecture (the SIMD mode), as well as achieves high flexibility by morphing into a MIMD architecture (MIMD mode). In this paper, we evaluate the effectiveness of the MIMD mode by using a white line detection algorithm for open roads. Our evaluation shows that the algorithm can be processed in real time (less than 33ms) by using the MIMD mode to execute verification of white line segments, which is a part of the algorithm not suitable to be executed by the SIMD mode. We also show that the verification can be executed five times faster by using region of interest (ROI) transfer instructions to efficiently transfer the ROI of an image. Furthermore, we also measured the execution time in the MIMD mode with changing the number of processing units (PUs) used, from 2 to 4, 8, 16 and 32. The measured results show that the performance improvement rate slows down when using more than 16 PUs in the MIMD mode, mainly due to insufficient parallelism in the verification process. Overall, a 10.68 times speedup was achieved by using 32 PUs in the MIMD mode, compared with only using the SIMD mode.

View full abstract

Download PDF (2327K)
Custom Instruction Generation for Configurable Processors with Limited Numbers of Operands

Kenshu Seto, Masahiro Fujita

Article type: Regular Paper
Subject area: Architectural Optimization
2010 Volume 3 Pages 57-68
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.57

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a novel framework to generating efficient custom instructions for common configurable processors with limited numbers of I/O ports in the register files and fixed-length instruction formats, such as RISCs. Unlike previous approaches which generate a single custom instruction from each subgraph, our approach generates a sequence of multiple custom instructions from each subgraph by applying high-level synthesis techniques such as scheduling and binding to the subgraphs. Because of this feature, our approach can provide both of the following two advantages simultaneously: (1) generation of effective custom instructions from Multiple Inputs Multiple Outputs (MIMO) subgraphs without any change in the configurable processor hardware and the instruction format, and (2) resource sharing among custom instructions. We performed synthesis, placement and routing of the automatically generated Custom Functional Units (CFUs) on an FPGA. Experimental results showed that our approach could generate custom instructions with significant speedups of 28% on average compared to a state-of-the-art framework of custom instruction generation for configurable processors with limited numbers of I/O ports in the register file and fixed-length instruction formats.

View full abstract

Download PDF (815K)
Performance Estimation with Automatic False-Path Detection for System-Level Designs

Takeshi Matsumoto, Tasuku Nishihara, Masahiro Fujita

Article type: Regular Paper
Subject area: System-Level Performance Analysis
2010 Volume 3 Pages 69-80
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.69

JOURNAL FREE ACCESS

Show abstractHide abstract

When designing today's highly complicated systems consisting of several hardware and software modules, it is essential to estimate the performance such as worst-case or best-case execution time in early design stages. Such estimation is essential to explore architecture and hardware/software partitioning in system-level design. A maximum execution time estimated topologically without considering false-paths is longer than the real. In this paper, we propose an static estimation method of maximum execution time in system-level designs, considering false-paths. Also, we adopt an approximation approach in order to avoid the path explosion problem. The experimental results show that our method can provide much smaller estimated maximum execution time than the method without considering false-paths. At the same time, the results show us that the maximum execution time can be estimated to a very small range, by applying both simulation-based method and our static method.

View full abstract

Download PDF (361K)
Globally Optimal Time-multiplexing of Inter-FPGA Connections for Multi-FPGA Prototyping Systems

Masato Inagi, Yasuhiro Takashima, Yuichi Nakamura

Article type: Regular Paper
Subject area: System-level Verification
2010 Volume 3 Pages 81-90
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.81

JOURNAL FREE ACCESS

Show abstractHide abstract

Multi-FPGA prototyping systems are widely used to verify logic circuit designs. To implement a large circuit using such a system, the circuit is partitioned into multiple FPGAs. Subsequently, sub-circuits assigned to FPGAs are connected using interconnection resources among the FPGAs. Because of limited resources, time-multiplexed I/Os are used to accommodate all signals in exchange for system speed. In this study, we propose an optimization method of inter-FPGA connections for multi-FPGA systems with time-multiplexed I/Os to shorten the verification time by accelerating the systems. Our method decides whether each inter-FPGA signal is transferred by a normal I/O or a time-multiplexed I/O, which is slower than a normal I/O but can transfer multiple signals. Our method optimizes inter-FPGA connections not only between a single FPGA pair, but among all the FPGAs. Experiments showed that for four-way partitioned circuits, our method obtains an average system clock period 16.0% shorter than that of a conventional method.

View full abstract

Download PDF (377K)
High-level Synthesis Challenges for Mapping a Complete Program on a Dynamically Reconfigurable Processor

Takao Toi, Noritsugu Nakamura, Yoshinosuke Kato, Toru Awashima, Kazuto ...

Article type: Regular Paper
Subject area: Behavioral Synthesis
2010 Volume 3 Pages 91-104
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.91

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a high-level synthesizer to map a complete program efficiently on a dynamically reconfigurable processor (DRP). Initially, we introduce our DRP architecture, which is suitable for control-intensive programs since it has a stand-alone finite state machine that switches “contexts” consisting of many processing elements (PEs). Then, we propose three new techniques optimized for our DRP. Firstly, we explain how synthesized control steps are mapped onto the contexts. Several control steps are combined as a context to utilize PEs efficiently since each control step does not require the same amount of operational units. Secondly, we describe a modulo scheduling algorithm for loop pipelining, considering both spatial and time dimensions of our DRP. Lastly, we explain a scheduling technique to optimize clock frequency, which can take advantage of multiplexer, wire and routing switch delays. We have demonstrated a JPEG-based image decoder example to evaluate our methods. Experimental results show that high area efficiency is achieved by balancing the number of PEs between contexts. Despite an overall increase in performance on pipelining of 3.6 times that without pipelining, the number of operational units increased by a factor of 2.2. The clock frequency is maximized with accurate delay estimation.

View full abstract

Download PDF (1774K)
Approximate Invariant Property Checking Using Term-Height Reduction for a Subset of First-Order Logic

Hiroaki Shimizu, Kiyoharu Hamaguchi, Toshinobu Kashiwabara

Article type: Regular Paper
Subject area: Behavioral Formal Verification
2010 Volume 3 Pages 105-117
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.105

JOURNAL FREE ACCESS

Show abstractHide abstract

The use of a subset of first-order logic, called EUF, in model checking can be an effective abstraction technique for verifying larger and more complicated systems. The EUF model checking problem is, however, undecidable. In this paper, in order to guarantee the termination of state enumeration in the EUF-based model checking, we introduce a technique called term-height reduction. This technique is used to generate a finitely represented over-approximate set of states including all the reachable states. By checking a specified invariant property for this over-approximate set of states, we can safely assure that the invariant property always holds for the design, when verification succeeds. We also show some experimental results for a simple C program and a DSP design.

View full abstract

Download PDF (331K)
Programmable Architectures and Design Methods for Two-Variable Numeric Function Generators

Shinobu Nagayama, Tsutomu Sasao, Jon T. Butler

Article type: Regular Paper
Subject area: Arithmetic Circuit Synthesis
2010 Volume 3 Pages 118-129
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.118

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes programmable architectures and design methods for numeric function generators (NFGs) of two-variable functions. To realize a two-variable function in hardware, we partition a given domain of the function into segments, and approximate the function by a polynomial in each segment. This paper introduces two planar segmentation algorithms that efficiently partition a domain of a two-variable function. This paper also introduces a design method for symmetric two-variable functions (i.e. f(X, Y)=f(Y, X)). This method can reduce the memory size needed for symmetric functions by nearly half with small speed penalty. The proposed architectures allow a systematic design of various two-variable functions. We compare our approach with one based on a one-variable NFG. FPGA implementation results show that, for a complicated function, our NFG achieves 57% of memory size and 60% of delay time of a circuit designed based on a one-variable NFG.

View full abstract

Download PDF (525K)
Effect of Regularity-Enhanced Layout on Variability and Circuit Performance of Standard Cells

Hiroki Sunagawa, Haruhiko Terada, Akira Tsuchiya, Kazutoshi Kobayashi, ...

Article type: Regular Paper
Subject area: Physical-Level Yield Optimization
2010 Volume 3 Pages 130-139
Published: 2010
Released on J-STAGE: February 15, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.130

JOURNAL FREE ACCESS

Show abstractHide abstract

As the minimum feature size shrinks down far below sub-wavelength, Design for Manufacturability or layout regularity plays an important role for maintaining pattern fidelity in photolithography. However, it also incurs overheads in circuit performances due to parasitic capacitance. In this paper, we examine the effect of layout regularity on printability and circuit performance by lithography simulation and transistor-level simulation. It is shown that regularity-enhanced cells provide better Critical Dimension (CD) stability under defocus and lead to delay increase. Then we evaluate the effect of layout regularity by a real chip measurement in 90nm, 65nm and 45nm processes. For example, in a 65nm process, inverter Ring Oscillators (ROs) that have the smallest poly pitch with dummy-poly insertion exhibits 19% reduction of WID and D2D variation with delay overhead of 2.5%, compared to the ROs without dummy-poly insertion. However, we have observed that the effect of layout regularity varies depending on fabrication processes and circuit structures. It is therefore important to obtain the best trade-off among performance overhead and variability reduction for each process technology.

View full abstract

Download PDF (834K)
Design and Run-time Reliability at the Electronic System Level

Björn Sander, Andreas Bernauer, Wolfgang Rosenstiel

Article type: Invited Paper
Subject area: System-Level Reliability Optimization
2010 Volume 3 Pages 140-160
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.140

JOURNAL FREE ACCESS

Show abstractHide abstract

The ongoing scaling of CMOS technology facilitates the design of systems with continuously increasing functionality but also raises the susceptibility of these systems to reliability issues. These can for example be caused by high power densities and temperatures. At the moment it is still possible to cope with the posed challenges in an affordable manner. But in the future, a combination of design and run-time measures will become necessary in order to guarantee that reliability guidelines are met. Because of complexity reasons, the Electronic System Level (ESL) is gaining importance as starting point of design. Design alternatives are evaluated at ESL with respect to several design objectives, lately also including reliability. In this paper, the most important phenomena threatening the reliability are introduced and the current status of related research work and tools is presented. After that, a high level design space exploration considering performance, energy and reliability trade-offs in multi-core systems is introduced. Finally, it is shown how reliability can be further improved during run-time by the application of a machine learning system.

View full abstract

Download PDF (3458K)
Advantage and Possibility of Application-domain Specific Instruction-set Processor (ASIP)

Masaharu Imai, Yoshinori Takeuchi, Keishi Sakanushi, Nagisa Ishiura

Article type: Invited Paper
Subject area: Architectural Synthesis
2010 Volume 3 Pages 161-178
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.161

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper introduces the concept and technology of Application-domain Specific Instruction-set Processor (ASIP). First, VLSI design trend over the decades is overviewed and processors are shown to be expected one of the main components in the System Level Design. Then, the advantage of ASIP over General Purpose Processor (GPP) and Application Specific Integrated Circuit (ASIC) is illustrated. Next, processor hardware description synthesis technology, application program development tool set generation technology, and processor architecture optimization technology are outlined. Then, as an ASIP development environment example, ASIP Meister is explained. Next, an application of ASIP to medical and healthcare is introduced. Finally, the possibility of ASIP as an important component of Multi Processor SoC (MPSoC) is discussed.

View full abstract

Download PDF (3086K)
Efficient Design Space Exploration at System Level with Automatic Profiler Instrumentation

Seiya Shibata, Yuki Ando, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Tak ...

Article type: Regular Paper
Subject area: System-Level Performance Analysis
2010 Volume 3 Pages 179-193
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.179

JOURNAL FREE ACCESS

Show abstractHide abstract

As the complexity of embedded systems grows, design space exploration at a system level plays a more important role than before. In the system-level design, system designers start from describing functionalities of the system as processes and channels, and then decide mapping of them to various Processing Elements (PEs) including processors and dedicated hardware modules. A mapping decision is evaluated by simulation or FPGA-based prototyping. Designers iterate mapping and evaluation until all design requirements are met. We have developed two profilers, a process profiler and a memory profiler, for FPGA-based performance analysis of design candidates. The process profiler records a trace of process activations, while the memory profiler records a trace of channel accesses. According to mapping of processes to PEs, the profilers are automatically configured and instrumented into FPGA-based system prototypes by a system-level design tool that we have developed. Designers therefore need to manually modify neither the system description nor the profilers upon each change of process mapping. In order to demonstrate the effectiveness of our profilers, two case studies are conducted where the profiles are used for design space exploration of AES encryption and MPEG4 decoding systems.

View full abstract

Download PDF (1555K)
A Unified Performance Estimation Method for Hardware and Software Components in Multiprocessor System-On-Chips

Arif Ullah Khan, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda

Article type: Regular Paper
Subject area: System-Level Performance Analysis
2010 Volume 3 Pages 194-206
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.194

JOURNAL FREE ACCESS

Show abstractHide abstract

With the growing complexity of consumer embedded products and the improvements in process technology, multiprocessor system-on-chip (MPSoC) architectures have become widespread. These MPSoCs include not only multiple processors but also multiple dedicated hardware accelerators that can be designed from software programs, written in high-level languages like ‘C’, using high-level synthesis tools (HLS). Traditional techniques of HW/SW co-simulation are very slow and time consuming when used for exploring HW/SW partitioning strategies. There is a strong need for methodologies that quickly and accurately estimate the performance of such complex systems. In this paper, we present a system level performance estimation method for exploring the trade-off between hardware and software implementations in such “hybrid” MPSoC architectures. The key feature of our performance estimation is the unified timing model, in the form of a program trace graph (PTG) for both software executions on processors as well as the hardware blocks (finite state machines) synthesized by a HLS tool. The RTL code from the HLS tool is analyzed and its state transition graph is transformed into the PTG, which was originally developed for software timing annotations. These PTGs represent the workload of the computation that is driven by program execution traces in the form of ‘Branch Bitstreams’. Our methodology allows highly accurate performance estimation under the existence of data dependent behavior of software and hardware components.

View full abstract

Download PDF (2554K)
Software Development Tool Generation Method Suitable for Instruction Set Extension of Embedded Processors

Takahiro Kumura, Soichiro Taga, Nagisa Ishiura, Yoshinori Takeuchi, Ma ...

Article type: Regular Paper
Subject area: Architectural Synthesis
2010 Volume 3 Pages 207-221
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.207

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method of software development tool generation suitable for instruction set extension of existing embedded processors. The key idea in the proposed method is to enhance a base processor's toolchain by adding plugins, which are software components that handle additional instructions and registers. The proposed method can generate a compiler, assembler, disassembler, and instruction set simulator. Generated compilers with the plugins provide intrinsic functions that are translated directly into the new instructions. To demonstrate that the proposed method works effectively, this paper presents an experimental result of the proposed method in the study of adding SIMD instructions to the embedded microprocessor V850. In the experiment, by using intrinsic functions, the compiler generated good code with only 7% increase in the number of instructions against the hand-optimized assembly codes.

View full abstract

Download PDF (4173K)
A Low-power ASIP Generation Method by Extracting Minimum Execution Conditions

Hirofumi Iwato, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai

Article type: Regular Paper
Subject area: Architectural Synthesis
2010 Volume 3 Pages 222-233
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.222

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a low-power ASIP generation method by automatically extracting minimum execution conditions of pipeline registers for clock gating. For highly effective power reduction by clock gating, it is important to create minimum execution conditions, which can shut off redundant clock supplies for registers. To automatically extract the conditions, our proposed method employs micro-operation descriptions (MODs) that specify ASIP architecture. Utilizing MODs through the ASIP generation processes, our proposed method automatically extracts the minimum execution conditions. Experimental results show that the power consumption of the pipeline registers in ASIPs generated with the proposed method is reduced about 80% compared to ASIPs that are not clock gated, and about 60% compared to ASIPs that are clock gated by Power Compiler with negligible delay and area overhead.

View full abstract

Download PDF (974K)
Semi-Automatic Control Unit Generation for Complex VLSI Designs

Benjamin Carrion Schafer, Majid Sarrafzadeh

Article type: Regular Paper
Subject area: Architectural Synthesis
2010 Volume 3 Pages 234-243
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.234

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a semi-automated way to generate control units for complex VLSI hardware designs based on a massive parallel micro-controller. This micro-controller can execute as many instructions in parallel as needed by the hardware design as well as having an unlimited number of input and output ports. Two versions of this control unit are presented in this paper. A generic one, which is generated from a set of parameters given by the designer and an optimized version which parses the control program that will run on the control unit in order to generate an optimized micro-controller. Results show that up to a 60% in area savings can be achieved using the optimized controller unit instead of the generic one. The presented controller was validated using a previously developed SoC design with a FSM based control unit showing that the functionality can be completely replicated at the expense of incurring in a 7.2% and 15.4% area overhead respectively.

View full abstract

Download PDF (350K)
Automatic Pipeline Construction Focused on Similarity of Rate Law Functions for an FPGA-based Biochemical Simulator

Hideki Yamada, Yui Ogawa, Tomonori Ooya, Tomoya Ishimori, Yasunori Osa ...

Article type: Regular Paper
Subject area: Behavioral Area Optimization
2010 Volume 3 Pages 244-256
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.244

JOURNAL FREE ACCESS

Show abstractHide abstract

For FPGA-based scientific simulation systems, hardware design technique that can reduce required amount of hardware resources is a key issue, since the size of simulation target is often limited by the size of the FPGA. Focusing on FPGA-based biochemical simulation, this paper proposes hardware design methodology which finds and combines common datapath for similar rate law functions appeared in simulation target models, so as to generate area-effective pipelined hardware modules. In addition, similarity-based clustering techniques of rate law functions are also presented in order to alleviate negative effects on performance for combined pipelines. Empirical evaluation with a practical biochemical model reveals that our method enables the simulation with 66% of the original hardware resources at a reasonable cost of 20% performance overhead.

View full abstract

Download PDF (726K)
A Resource Binding Method to Reduce Data Communication Power Dissipation on LSI

Hidekazu Seto, Kazuhito Ito

Article type: Regular Paper
Subject area: Behavioral Low-Power Synthesis
2010 Volume 3 Pages 257-267
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.257

JOURNAL FREE ACCESS

Show abstractHide abstract

The energy dissipation by data communications on a LSI chip depends on the layout of modules as well as how data are communicated among modules. The requirement of data communications are determined by the schedule of computations and by the resource binding of computations to functional units and data to registers. In this paper, a method of resource binding is proposed to derive a binding which is able to obtain the floorplan with reduced energy dissipation by data communications.

View full abstract

Download PDF (418K)
Approximate Model Checking Using a Subset of First-order Logic

Kiyoharu Hamaguchi, Kazuya Masuda, Toshinobu Kashiwabara

Article type: Regular Paper
Subject area: Formal Logic Verification
2010 Volume 3 Pages 268-282
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.268

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to reduce the computational complexity of model checking, we can use a subset of first-order logic, called EUF, but the model checking problem using EUF is generally undecidable. In our previous work, we proposed a technique for checking invariant property for an over-approximate set of states including all the reachable states. In this paper, we extend this technique for handling not only invariants but also temporal properties written in computational tree logic with EUF extension. We show that model checking becomes possible for designs which are hard to handle without the proposed technique.

View full abstract

Download PDF (1451K)
On Delay Test Quality for Test Cubes

Shinji Oku, Seiji Kajihara, Yasuo Sato, Kohei Miyase, Xiaoqing Wen

Article type: Regular Paper
Subject area: Delay Testing
2010 Volume 3 Pages 283-291
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.283

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method to compute delay values in 3-valued fault simulation for test cubes which are test patterns with unspecified values (Xs). Because the detectable delay size of each fault by a test cube is not fixed before assigning logic values to the Xs in the test cube, the proposed method only computes a range of the detectable delay values of the test patterns covered by the test cubes. By using the proposed method, we derive the lowest and the highest test quality of test patterns covered by the test cubes. Furthermore, we also propose a GA (genetic algorithm)-based method to generate fully specified test patterns with high test quality from test cubes. Experimental results for benchmark circuits show the effectiveness of the proposed methods.

View full abstract

Download PDF (698K)
A High Parallelism LDPC Decoder with an Early Stopping Criterion for WiMax and WiFi Application

Zhixiang Chen, Xiongxin Zhao, Xiao Peng, Dajiang Zhou, Satoshi Goto

Article type: Regular Paper
Subject area: Architectural Design
2010 Volume 3 Pages 292-302
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.292

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we propose a synthesizable LDPC decoder IP core for WiMax and WiFi applications. Two new techniques are applied in the proposed decoder to improve the decoding performance. Firstly, a high parallelism permutation network (PN) is proposed to perform the circulant shift according to the parity check matrix (PCM) defined in WiMax and WiFi standards. By using the proposed PN, at most, four independent code frames with small code length are decoded concurrently, which largely improves the decoding throughput (2-4 times). Secondly, a fast early stopping criterion specialized for WiMax and WiFi LDPC code is proposed to reduce the average iteration number. Unlike the early works, by utilizing our proposed stopping criterion, the decoding will be stopped when all the information bits of a code frame are corrected even if there are still some errors in redundant part. Experiment results show that, it can reduce up to 20% iteration numbers compared to popular used stopping criterion.

View full abstract

Download PDF (2963K)
High Profile Intra Prediction Architecture for UHD H.264 Decoder

Xun He, Dajiang Zhou, Jinjia Zhou, Satoshi Goto

Article type: Regular Paper
Subject area: Architectural Design
2010 Volume 3 Pages 303-313
Published: 2010
Released on J-STAGE: August 16, 2010

DOIhttps://doi.org/10.2197/ipsjtsldm.3.303

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a new architecture for high profile intra prediction in H.264/AVC video coding standard. Our goal is to design an Intra prediction engine for 4Kx2K@60fps Ultra High Definition (UHD) Decoder. The proposed architecture can provide very stable throughput, which can predict any H.264 intra prediction mode within 66 cycles. Compared with previous design, this feature can guarantee the whole decoding pipeline to work efficiently. The intra prediction engine is divided into two parallel pipelines, one is used for 4x4 block prediction loops and the other is used to prepare data for MB loops. It can overlap data preparing time with prediction time, which can finish data loading and storing within 2 cycles. Comparing with MB pipeline only architecture, it can achieve more than 3.2 times higher throughput with 29.8K gates cost. The proposed architecture is verified to work at 175MHz for our UHD Decoder by using TSMC 90G.

View full abstract

Download PDF (2311K)

Register with J-STAGE for free!