IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

Special Section on VLSI Design and CAD Algorithms

FOREWORD

Atsushi TAKAHASHI

2011Volume E94.AIssue 12 Pages 2481
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2481

JOURNAL RESTRICTED ACCESS

Download PDF (115K)
Greedy Algorithm for the On-Chip Decoupling Capacitance Optimization to Satisfy the Voltage Drop Constraint

Mikiko SODE TANAKA, Nozomu TOGAWA, Masao YANAGISAWA, Satoshi GOTO

Article type: PAPER
Subject area: Physical Level Design
2011Volume E94.AIssue 12 Pages 2482-2489
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2482

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

With the progress of process technology in recent years, low voltage power supplies have become quite predominant. With this, the voltage margin has decreased and therefore the on-chip decoupling capacitance optimization that satisfies the voltage drop constraint becomes more important. In addition, the reduction of the on-chip decoupling capacitance area will reduce the chip area and, therefore, manufacturing costs. Hence, we propose an algorithm that satisfies the voltage drop constraint and at the same time, minimizes the total on-chip decoupling capacitance area. The proposed algorithm uses the idea of the network algorithm where the path which has the most influence on voltage drop is found. Voltage drop is improved by adding the on-chip capacitance to the node on the path. The proposed algorithm is efficient and effectively adds the on-chip capacitance to the greatest influence on the voltage drop. Experimental results demonstrate that, with the proposed algorithm, real size power/ground network could be optimized in just a few minutes which are quite practical. Compared with the conventional algorithm, we confirmed that the total on-chip decoupling capacitance area of the power/ground network was reducible by about 40∼50%.

View full abstract

Download PDF (711K)
Leakage-Aware TSV-Planning with Power-Temperature-Delay Dependence in 3D ICs

Kan WANG, Sheqin DONG, Yuchun MA, Yu WANG, Xianlong HONG, Jason CONG

Article type: PAPER
Subject area: Physical Level Design
2011Volume E94.AIssue 12 Pages 2490-2498
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2490

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Due to the increased power density and lower thermal conductivity, 3D ICs are faced with heat dissipation and temperature problem seriously. TSV (Through-Silicon-Via) has been shown as an effective way to help heat removal, but they introduce several issues related with cost and reliability as well. Previous researches of TSV planning didn't pay much attention to the impact of leakage power, which will bring in error on estimation of temperature, TSV number and also critical path delay. The leakage-temperature-delay dependence can potentially negate the performance improvement of 3D designs. In this paper, we analyze the impact of leakage power on TSV planning and integrate leakage-temperature-delay dependence into thermal via planning of 3D ICs. A weighted via insertion approach, considering the influence on both module delay and wire delay, is proposed to achieve the best balance among temperature, via number and performance. Experiment results show that, with leakage power and resource constraint considered, temperature and the required via number can be quite different, and the weighted TSV insertion approach with iterative process can obtain the trade-off between different factors including thermal, power consumption, via number and performance.

View full abstract

Download PDF (2817K)
Sleep Transistor Sizing Method Using Accurate Delay Estimation Considering Input Vector Pattern and Non-linear Current Model

Seidai TAKEDA, Kyundong KIM, Hiroshi NAKAMURA, Kimiyoshi USAMI

Article type: PAPER
Subject area: Physical Level Design
2011Volume E94.AIssue 12 Pages 2499-2509
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2499

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Beyond deep sub-micron era, Power Gating (PG) is one of the most effective techniques to reduce leakage power of circuits. The most important issue of PG circuit design is how to decide the width of sleep transistor. Smaller total sleep transistor width provides smaller leakage power in standby mode, however, insufficient sleep transistor insertion suffers from significant performance degradation. In this paper, we present an accurate and fast gate-level delay estimation method for PG circuits and a novel sleep transistor sizing method utilizing our delay estimation for module-based PG circuits. This method achieves high accuracy within acceptable computation time utilizing accurate discharge current estimation based on delayed logic simulations with limited input vector patterns and by realizing precise current characteristics for logic gates and sleep transistors. Experimental results show that our delay estimation successfully achieves high accuracy and avoids overestimation and underestimation seen in conventional method. Also, our sleep transistor sizing method on average successfully reduces the width of sleep transistors by 40% when compared to conventional methods within an acceptable computation time.

View full abstract

Download PDF (2046K)
Single-Layer Trunk Routing Using Minimal 45-Degree Lines

Kyosuke SHINODA, Yukihide KOHIRA, Atsushi TAKAHASHI

Article type: PAPER
Subject area: Physical Level Design
2011Volume E94.AIssue 12 Pages 2510-2518
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2510

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In recent Printed Circuit Boards (PCB), the design size and density have increased, and the improvement of routing tools for PCB is required. There are several routing tools which generate high quality routing patterns when connection requirement can be realized by horizontal and vertical segments only. However, in high density PCB, the connection requirements cannot be realized when only horizontal and vertical segments are used. Up to one third nets can not be realized if no non-orthogonal segments are used. In this paper, a routing method for a single-layer routing area that handles higher density designs in which 45-degree segments are used locally to relax the routing density is introduced. In the proposed method, critical zones in which non-orthogonal segments are required in order to realize the connection requirements are extracted, and 45-degree segments are used only in these zones. By extracting minimal critical zones, the other area that can be used to improve the quality of routing pattern without worry about connectivity issues is maximized. Our proposed method can utilize the routing methods which generate high quality routing pattern even if they only handle horizontal and vertical segments as subroutines. Experiments show that the proposed method analyzes a routing problem properly, and that the routing is realized by using 45-degree segments effectively.

View full abstract

Download PDF (1402K)
Low Power Placement and Routing for the Coarse-Grained Power Gating FPGA Architecture

Ce LI, Yiping DONG, Takahiro WATANABE

Article type: PAPER
Subject area: Physical Level Design
2011Volume E94.AIssue 12 Pages 2519-2527
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2519

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Since the power consumption of FPGA is larger than that of ASIC under the condition to perform the same function using the same scaling, the application of FPGA is limited especially in portable electronic devices. In this paper, we propose a novel low-power FPGA architecture based on coarse-grained power gating to reduce power consumption. The new placement algorithm and routing resource graph for sleep regions is also presented. After enhancing the CAD framework, a detailed discussion is given under different region size supported by the new FPGA architecture. As a result, our proposed FPGA architecture combined with the new placement and routing algorithm can reduce 19.4% in the total power consumption compared with the traditional FPGA. By using our proposed method, FPGA is promising to be widely applied to portable devices.

View full abstract

Download PDF (7908K)
A Statistical Maximum Algorithm for Gaussian Mixture Models Considering the Cumulative Distribution Function Curve

Shuji TSUKIYAMA, Masahiro FUKUI

Article type: PAPER
Subject area: Device and Circuit Modeling and Analysis
2011Volume E94.AIssue 12 Pages 2528-2536
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2528

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The statistical static timing analysis has been studied intensively in the last decade so as to deal with the process variability, and various techniques to represent distributions of timing information, such as a gate delay, a signal arrival time, and a slack, have been proposed. Among them, the Gaussian mixture model is distinguished from the others in that it can handle various correlations, non-Gaussian distributions, and slew distributions easily. However, the previous algorithm of computing the statistical maximum for Gaussian mixture models, which is one of key operations in the statistical static timing analysis, has a defect such that it produces a distribution similar to Gaussian in a certain case, although the correct distribution is far from Gaussian. In this paper, we propose a new algorithm for statistical maximum (minimum) operation for Gaussian mixture models. It takes the cumulative distribution function curve into consideration so as to compute accurate criticalities (probabilities of timing violation), which is important for detecting delay faults and circuit optimization with the use of statistical approaches. We also show some experimental results to evaluate the performance of the proposed method.

View full abstract

Download PDF (1076K)
Extracting Device-Parameter Variations with RO-Based Sensors

Ken-ichi SHINKAI, Masanori HASHIMOTO, Takao ONOYE

Article type: PAPER
Subject area: Device and Circuit Modeling and Analysis
2011Volume E94.AIssue 12 Pages 2537-2544
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2537

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Device-parameter estimation sensors inside a chip are gaining its importance as the post-fabrication tuning is becoming of a practical use. In estimation of variational parameters using on-chip sensors, it is often assumed that the outputs of variation sensors are not affected by random variations. However, random variations can deteriorate the accuracy of the estimation result. In this paper, we propose a device-parameter estimation method with on-chip variation sensors explicitly considering random variability. The proposed method derives the global variation parameters and the standard deviation of the random variability using the maximum likelihood estimation. We experimentally verified that the proposed method improves the accuracy of device-parameter estimation by 11.1 to 73.4% compared to the conventional method that neglects random variations.

View full abstract

Download PDF (1017K)
Stress Probability Computation for Estimating NBTI-Induced Delay Degradation

Hiroaki KONOURA, Yukio MITSUYAMA, Masanori HASHIMOTO, Takao ONOYE

Article type: PAPER
Subject area: Device and Circuit Modeling and Analysis
2011Volume E94.AIssue 12 Pages 2545-2553
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2545

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

PMOS stress (ON) probability has a strong impact on circuit timing degradation due to NBTI effect. This paper evaluates how the granularity of stress probability calculation affects NBTI prediction using a state-of-the-art long term prediction model. Experimental evaluations show that the stress probability should be estimated at transistor level to accurately predict the increase in delay, especially when the circuit operation and/or inputs are highly biased. We then devise and evaluate two annotation methods of stress probability to gate-level timing analysis; one guarantees the pessimism desirable for timing analysis and the other aims to obtain the result close to transistor-level timing analysis. Experimental results show that gate-level timing analysis with transistor-level stress probability calculation estimates the increase in delay with 12.6% error.

View full abstract

Download PDF (1420K)
A 65-nm CMOS Fully Integrated Shock-Wave Antenna Array with On-Chip Jitter and Pulse-Delay Adjustment for Millimeter-Wave Active Imaging Application

Nguyen Ngoc MAI KHANH, Masahiro SASAKI, Kunihiro ASADA

Article type: PAPER
Subject area: Device and Circuit Modeling and Analysis
2011Volume E94.AIssue 12 Pages 2554-2562
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2554

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a 65-nm CMOS 8-antenna array transmitter operating in 117-130-GHz range for short range and portable millimeter-wave (mm-wave) active imaging applications. Each antenna element is a new on-chip antenna located on the top metal. By using on-chip transformer, pulse output of each resistor-less mm-wave pulse generators (PG) are sent to each integrated antenna. To adjust pulse delays for the purpose of pulse beam-forming, a 7-bit digitally programmable delay circuit (DPDC) is added to each of PGs. Moreover, in order to dynamically adjust pulse delays among eight SW's outputs, we implemented on-chip jitter and relative skew measuring circuit with 20-bit digital output to achieve cumulative distribution (CDF) and probability density (PDF) functions from which DPDC's input codes are decided to align eight antenna's output pulses. Two measured radiation peaks after relative skew alignment are obtained at (θ; φ) angles of (-56°; 0°) and (+57°; 0°). Measurement results shows that beam-forming angles of the fully integrated antenna array can be adjusted by digital input codes and by the on-chip skew adjustment circuit for active imaging applications.

View full abstract

Download PDF (4506K)
Flexible Test Scheduling for an Asynchronous On-Chip Interconnect through Special Data Transfer

Tsuyoshi IWAGAKI, Eiri TAKEDA, Mineo KANEKO

Article type: PAPER
Subject area: Logic Synthesis, Test and Verification
2011Volume E94.AIssue 12 Pages 2563-2570
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2563

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes a test scheduling method for stuck-at faults in a CHAIN interconnect, which is an asynchronous on-chip interconnect architecture, with scan ability. Special data transfer which is permitted only during test, is exploited to realize a more flexible test schedule than that of a conventional approach. Integer linear programming (ILP) models considering such special data transfer are developed according to the types of modules under test in a CHAIN interconnect. The obtained models are processed by using an ILP solver. This framework can not only obtain optimal test schedules but also easily introduce additional constraints such as a test power budget. Experimental results using benchmark circuits show that the proposed method can reduce test application time compared to that achieved by the conventional method.

View full abstract

Download PDF (1311K)
Hybrid Test Application in Partial Skewed-Load Scan Design

Yuki YOSHIKAWA, Tomomi NUWA, Hideyuki ICHIHARA, Tomoo INOUE

Article type: PAPER
Subject area: Logic Synthesis, Test and Verification
2011Volume E94.AIssue 12 Pages 2571-2578
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2571

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we propose a hybrid test application in partial skewed-load (PSL) scan design. The PSL scan design in which some flip-flops (FFs) are controlled as skewed-load FFs and the others are controlled as broad-side FFs was proposed in [1]. We notice that the PSL scan design potentially has a capability of two test application modes: one is the broad-side test mode, and the other is the hybrid test mode which corresponds to the test application considered in [1]. According to this observation, we present a hybrid test application of the two test modes in the PSL scan design. In addition, we also address a way of skewed-load FF selection based on propagation dominance of FFs in order to take advantage of the hybrid test application. Experimental results for ITC'99 benchmark circuits show that the hybrid test application in the proposed PSL scan design can achieve higher fault coverage than the design based on the skewed-load FF selection [1] does.

View full abstract

Download PDF (783K)
Multi-Operand Adder Synthesis Targeting FPGAs

Taeko MATSUNAGA, Shinji KIMURA, Yusuke MATSUNAGA

Article type: PAPER
Subject area: Logic Synthesis, Test and Verification
2011Volume E94.AIssue 12 Pages 2579-2586
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2579

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Multi-operand adders, which calculates the summation of more than two operands, usually consist of compressor trees which reduce the number of operands to two without any carry propagation, and a carry-propagate adder for the two operands in ASIC implementation. The former part is usually realized using full adders or (3;2) counters like Wallace-trees in ASIC, while adder trees or dedicated hardware are used in FPGA. In this paper, an approach to realize compression trees on FPGAs is proposed. In case of FPGA with m-input LUT, any counters with up to m inputs can be realized with one LUT per an output. Our approach utilizes generalized parallel counters (GPCs) with up to m inputs and synthesizes high-performance compressor trees by setting some intermediate height limits in the compression process like Dadda's multipliers. Experimental results show that the number of GPCs are reduced by up to 22% compared to the existing heuristic. Its effectivity on reduction of delay is also shown against existing approaches on Altera's Stratix III.

View full abstract

Download PDF (719K)
A 6.72-Gb/s 8pJ/bit/iteration IEEE 802.15.3c LDPC Decoder Chip

Zhixiang CHEN, Xiao PENG, Xiongxin ZHAO, Leona OKAMURA, Dajiang ZHOU, ...

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2587-2596
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2587

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we introduce an LDPC decoder design for decoding a length-672 multi-rate code adopted in IEEE 802.15.3c standard. The proposed decoder features high performances in both data rate and power efficiency. A macro-layer level fully parallel layered decoding architecture is proposed to support the throughput requirement in the standard. For the proposed decoder, it takes only 4 clock cycles to process one decoding iteration. While parallelism increases, the chip routing congestion problem becomes more severe because a more complicated interconnection network is needed for message passing during the decoding process. This problem is nicely solved by our proposed efficient message permutation scheme utilizing exploited parity check matrix features. The proposed message permutation network features high compatibility and zero-logic-gate VLSI implementation, which contribute to the remarkable improvements in both area utilization ratio and total gate count. Meanwhile, frame-level pipeline decoding is applied in the design to shorten the critical path. To verify the above techniques, the proposed decoder is implemented on a chip fabricated using Fujitsu 65nm 1P12L LVT CMOS process. The chip occupies a core area of 1.30mm² with area utilization ratio 86.3%. According to the measurement results, working at 1.2V, 400MHz and 10 iterations the proposed decoder delivers a 6.72Gb/s data throughput and dissipates a power of 537.6mW, resulting in an energy efficiency 8.0pJ/bit/iteration. Moreover, a decoder of the same architecture but with no pipeline stage for low-profile application is also implemented and evaluated at post-layout level.

View full abstract

Download PDF (1681K)
Implementation of Stack Data Placement and Run Time Management Using a Scratch-Pad Memory for Energy Consumption Reduction of Embedded Applications

Lovic GAUTHIER, Tohru ISHIHARA

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2597-2608
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2597

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Memory accesses are a major cause of energy consumption for embedded systems. This paper presents the implementation of a fully software technique which places stack and static data into a scratch-pad memory (SPM) in order to reduce the energy consumed by the processor while accessing them. Since an SPM is usually too small to include all these data, some of them must be left into the external main memory (MM). Therefore, further energy reduction is achieved by moving some stack data between both memories at run time. The technique employs integer linear programming in order to find at compile time the optimal placement of static data and management of the stack and implements it by inserting stack operations inside the code. Experimental results show that with an SPM of only 1KB, our technique is able to exploit it for reducing the energy consumption related to the static and stack data accesses by more than 90% for several applications and on an average by 57% compared to the case where these data are fully placed into the main memory.

View full abstract

Download PDF (1935K)
A 98 GMACs/W 32-Core Vector Processor in 65 nm CMOS

Xun HE, Xin JIN, Minghui WANG, Dajiang ZHOU, Satoshi GOTO

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2609-2618
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2609

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192GB/s low latency inter-core BW in average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm² in 65nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.

View full abstract

Download PDF (3722K)
Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor

Takao TOI, Takumi OKAMOTO, Toru AWASHIMA, Kazutoshi WAKABAYASHI, Hideh ...

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2619-2627
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2619

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Iterative synthesis methods for making aware of wire congestion are proposed for a multi-context dynamically reconfigurable processor (DRP) with a large number of processing elements (PEs) and programmable-wire connections. Although complex data-paths can be synthesized using the programmable-wire, its delay is long especially when wire connections are congested. We propose two iterative synthesis techniques between a high-level synthesizer (HLS) and the place & route tool to shorten the prolonged wire delay. First, we feed back wire delays for each context to a scheduler in the HLS. The experimental results showed that a critical-path delay was shorten by 21% on average for applications with timing closure problems. Second, we skip the routing and estimate wire delays based on the congestion. The synthesis time was shorten to 1/3 causing delay improvement rate degradation at two points on average.

View full abstract

Download PDF (2169K)
Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher

Dai YAMAMOTO, Kouichi ITOH, Jun YAJIMA

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2628-2638
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2628

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Compact design is very important for embedded systems such as wireless sensor nodes, RFID tags and mobile devices because of their limited hardware (H/W) resources. This paper proposes a compact H/W implementation for the KASUMI block cipher, which is the 3GPP standard encryption algorithm. In [8] and [9], Yamamoto et al. proposed a method of reducing the register size for the MISTY1 FO function (YYI-08), and implemented very compact MISTY1H/W. In this paper we aim to implement the smallest KASUMIH/W to date by applying a YYI-08 configuration to KASUMI, whose FO function has a similar structure to that of MISTY1. However, we discovered that straightforward application of YYI-08 raises problems. We therefore propose a new YYI-08 configuration improved for KASUMI and the compact H/W architecture. The new YYI-08 configuration consists of new FL function calculation schemes and a suitable calculation order. According to our logic synthesis on a 0.11-µm ASIC process, the gate size is 2.99 K gates, which, to our knowledge, is the smallest to date.

View full abstract

Download PDF (946K)
A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

Jiongyao YE, Yu WAN, Takahiro WATANABE

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2639-2648
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2639

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.

View full abstract

Download PDF (744K)
Maximal Interconnect Resilient Methodology for Fault Tolerance, Yield, and Reliability Improvement in Network on Chip

Katherine Shu-Min LI, Chih-Yun PAI, Liang-Bi CHEN

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2649-2658
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2649

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents an interconnect resilient (IR) methodology with maximal interconnect fault tolerance, yield, and reliability for both single and multiple interconnect faults under stuck-at and open fault models. By exploiting multiple routes inherent in an interconnect structure, this method can tolerate faulty connections by efficiently finding alternative paths. The proposed approach is compatible with previous interconnect detection and diagnosis methods under oscillation ring schemes, and together they can be applied to implement a robust interconnect structure that may still provide correct communication even under multiple link faults in Network-on-Chips (NoCs). With such knowledge, designers can significantly improve interconnect reliability by augmenting vulnerable interconnect structures in NoCs. As a result, the experimental results show that alternative paths in NoCs can be found for almost all paths. Hence, the proposed method provides a good way to achieve fault tolerance and reliability/yield improvement.

View full abstract

Download PDF (1911K)
Two-Stage Configurable Decoder Model for Domain Specific FEC Decoder Design

Ittetsu TANIGUCHI, Ayataka KOBAYASHI, Keishi SAKANUSHI, Yoshinori TAKE ...

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2659-2668
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2659

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Forward error correction (FEC) is one of important and heavy tasks for wireless communication. Leading edge mobile embedded systems usually support not only one FEC standard, but multiple FEC standards in order to adapt to various wireless communication standards. In this paper, we propose two-stage configurable decoder model (2-Stage CDM) for multiple FEC standards for Viterbi and Turbo coding which have a variation under the constraint length, coding rate, etc. Proposed decoder model realizes a decoder instance which supports dedicated multiple FEC standards, and rapid design for domain specific decoder is realized. Proposed decoder model is configurable in two stages: at hardware generation time and at runtime, and designers can easily specify these specifications by various design parameters. Experimental results show proposed two-stage configurable decoder model supports various domain specific FEC decoder including existing decoder, and the decoder instances based on proposed 2-Stage CDM have sufficient throughput for each communication standard and reasonable area overhead compared with existing decoder.

View full abstract

Download PDF (1021K)
Variation-Tolerance of a 65-nm Error-Hardened Dual-Modular-Redundancy Flip-Flop Measured by Shift-Register-Based Monitor Structures

Chikara HAMANAKA, Ryosuke YAMAMOTO, Jun FURUTA, Kanto KUBOTA, Kazutosh ...

Article type: PAPER
Subject area: High-Level Synthesis and System-Level Design
2011Volume E94.AIssue 12 Pages 2669-2675
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2669

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We show measurement results of variation-tolerance of an error-hardened dual-modular-redundancy flip-flop fabricated in a 65-nm process. The proposed error-hardened FF called BCDMR is very strong against soft errors and also robust to process variations. We propose a shift-register-based test structure to measure variations. The proposed test structure has features of constant pin count and fast measurement time. A 65nm chip was fabricated including 40k FFs to measure variations. The variations of the proposed BCDMR FF are 74% and 55% smaller than those of the conventional BISER FF on the twin-well and triple-well structures respectively.

View full abstract

Download PDF (3163K)
A High-Efficiency On-Chip DC-DC Down-Conversion Using Selectable Supply-Voltage Charge-Recycling

Byung-Do YANG

Article type: PAPER
Subject area: Circuit Design
2011Volume E94.AIssue 12 Pages 2676-2684
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2676

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes a high-efficiency on-chip DC-DC down-conversion technique using selectable supply-voltage charge-recycling. This technique converts an external high supply-voltage (2×V_DD) to an on-chip low supply-voltage (V_DD) by using charge-recycling. It partitions the original logic using V_DD into high logic (H-logic) and low logic (L-logic), consuming nearly the same amount of power. The H-logic uses a higher supply-voltage (2×V_DD and V_DD). The L-logic uses a lower supply-voltage (V_DD and ground). The charge used in the H-logic is recycled in the L-logic. In order to reduce a charge mismatch between the H-logic and the L-logic, this scheme dynamically changes the ratio between the H-logic and the L-logic by selecting the supply-voltages used by the divided logic blocks. To verify the DC-DC down-conversion using the proposed charge-recycling scheme, a test chip was fabricated using a 0.35µm CMOS technology. Its power efficiency was measured at 93%.

View full abstract

Download PDF (1394K)
A High Efficiency Hybrid Step-Up/Step-Down DC-DC Converter Using Digital Dither for Smooth Transition

Yanzhao MA, Hongyi WANG, Guican CHEN

Article type: PAPER
Subject area: Circuit Design
2011Volume E94.AIssue 12 Pages 2685-2692
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2685

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a step-up/step-down DC-DC converter using a digital dither technique to achieve high efficiency and small output voltage ripple for portable electronic devices. The proposed control method minimizes not only the switching loss by operating like a pure buck or boost converter, but also the conduction loss by reducing the average inductor current even when four switches are used. Digital dither control is introduced to implement a buffer region for smooth transition between buck and boost modes. A minimum ripple dither with higher fundamental frequency is adopted to decrease the output voltage ripple. A window delay-line analog to digital converter (ADC) with delay calibration is achieved to digitalize the control voltage. The step-up/step-down DC-DC converter has been designed with a standard 0.5µm CMOS process. The output voltage is regulated within the input voltage ranged from 2.5V to 5.5V, and the output voltage ripple is reduced to less than 25mV during the mode transition. The peak power efficiency is 96%, and the maximum load current can reach 800mA.

View full abstract

Download PDF (888K)
7T SRAM Enabling Low-Energy Instantaneous Block Copy and Its Application to Transactional Memory

Shunsuke OKUMURA, Yuki KAGIYAMA, Yohei NAKATA, Shusuke YOSHIMOTO, Hiro ...

Article type: PAPER
Subject area: Circuit Design
2011Volume E94.AIssue 12 Pages 2693-2700
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2693

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes 7T SRAM which realizes block-level simultaneous copying feature. The proposed SRAM can be used for data transfer between local memories such as checkpoint data storage and transactional memory. The 1-Mb SRAM is comprised of 32-kb blocks, in which 16-kb data can be copied in 33.3ns at 1.2V. The proposed scheme reduces energy consumption in copying by 92.7% compared to the conventional read-modify-write manner. By applying the proposed scheme to transactional memory, the number of write back cycles is possibly reduced by 98.7% compared with the conventional memory system.

View full abstract

Download PDF (3285K)
A Low-Power Multi-Phase Oscillator with Transfer Gate Phase Coupler Enabling Even-Numbered Phase Output

Toshihiro KONISHI, Hyeokjong LEE, Shintaro IZUMI, Takashi TAKEUCHI, Ma ...

Article type: PAPER
Subject area: Circuit Design
2011Volume E94.AIssue 12 Pages 2701-2708
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2701

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

We propose a transfer gate phase coupler for a low-power multi-phase oscillator (MPOSC). The phase coupler is an nMOS transfer gate, which does not waste charge to the ground and thus achieves low power. The proposed MPOSC can set the number of outputs to an arbitrary number. The test circuit in a 180-nm process and a 65-nm process exhibits 20 phases, including 90° different angles. The designs in a 180-nm CMOS process and a 65-nm CMOS process were fabricated to confirm its process scalability; in the respective designs, we observed 36.6% and 38.3% improvements in a power-delay products, compared with the conventional MPOSCs using inverters and nMOS latches. In a 65-nm process, the measured DNL and 3σ period jitter are, respectively, less than ±1.22° and 5.82ps. The power is 284µW at 1.85GHz.

View full abstract

Download PDF (3058K)

Special Section on Wideband Systems

FOREWORD

Makoto ITAMI

2011Volume E94.AIssue 12 Pages 2709
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2709

JOURNAL RESTRICTED ACCESS

Download PDF (85K)
Effective Transmit Weight Design for DPC with Maximum Beam in Multiuser MIMO OFDM Downlink

Cong LI, Yasunori IWANAMI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2710-2718
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2710

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we consider the signal processing algorithm on each subcarrier for the downlink of Multi-User Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing (MU-MIMO OFDM) system. A novel transmit scheme is proposed for the cancellation of Inter-User Interference (IUI) at the Base Station (BS). The improved performance of each user is obtained by optimizing the transmit scheme on each subcarrier, where the Particle Swarm Optimization (PSO) algorithm is employed to solve the constrained nonlinear optimization problem. Compared with the conventional Zero Forcing Dirty Paper Coding (ZF-DPC) having only single receive antenna at each Mobile Station (MS), the proposed scheme also applies the principle of DPC to cancel the IUI, but the MS users can be equipped with multiple receive antennas producing their increased receive SNR's. With the Channel State Information (CSI) being known at the BS and the MS, the eigenvalues for all the user channels are calculated first and then the user with the maximum eigenvalue is selected as the 1-st user. The remaining users are ordered and sequentially processed, where the transmit weights are generated from the previously selected users by the Particle Swarm Optimization (PSO) algorithm which ensures the transmit gain for each user as large as possible. The computational complexity analysis, BER performance and achievable sum-rate analysis of system verify the effectiveness of the proposed scheme.

View full abstract

Download PDF (791K)
Open Loop DPC Beamforming Effective for Multiuser MIMO Transmissions in FDD Systems

Tomoko MATSUMOTO, Yasuyuki HATAKAWA, Satoshi KONISHI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2719-2727
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2719

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper proposes an open loop beamforming scheme for downlink multiuser multiple-input multiple-output (MIMO) transmissions in frequency division duplex (FDD). The proposed scheme uses the uplink direction of arrival (DOA) estimation, and then generates the beamforming weight such that the interference caused by the overlapping beams is removed by applying the dirty-paper coding (DPC) principle. The simulation results show that the proposed scheme provides the gain of 32.3% at minimum in terms of the spectral efficiency at the CDF of 50% compared to the conventional DOA based beamforming scheme. In addition, it is shown that the proposed scheme has superior performance to closed loop scheme with the limited feedback information.

View full abstract

Download PDF (796K)
Orthogonal and ZCZ Sets of Real-Valued Periodic Orthogonal Sequences from Huffman Sequences

Takahiro MATSUMOTO, Shinya MATSUFUJI, Tetsuya KOJIMA, Udaya PARAMPALLI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2728-2736
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2728

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a method of generating sets of orthogonal and zero-correlation zone (ZCZ) periodic real-valued sequences of period 2ⁿ, n ≥ 1. The sequences admit a fast correlation algorithm and the sets of sequences achieve the upper bound on family size. A periodic orthogonal sequence has the periodic autocorrelation function with zero sidelobes, and a set with orthogonal sequences whose mutual periodic crosscorrelation function at zero shift is zero. Similarly, a ZCZ set is the set of the sequences with zero-correlation zone. In this paper, we derive the real-valued periodic orthogonal sequences of period 2ⁿ from a real-valued Huffman sequence of length 2^ν + 1, ν being a positive integer and ν ≥ n, whose aperiodic autocorrelation function has zero sidelobes except possibly at the left and right shift-ends. The orthogonal and ZCZ sets of real-valued periodic orthogonal sequences are useful in various systems, such as synchronous code division multiple access (CDMA) systems, quasi-synchronous CDMA systems and digital watermarkings.

View full abstract

Download PDF (488K)
A Tracking System Using a Differential Detector for M-ary Bi-orthogonal Spread Spectrum Communication Systems

Junya KAWATA, Kouji OHUCHI, Hiromasa HABUCHI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2737-2745
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2737

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

As an application of the direct sequence spread spectrum (SS) communication system, there is an M-ary bi-orthogonal SS communication system. In its system, several spreading sequences (bi-orthogonal sequences) are used in a code shift keying basis. Hence, design of the spreading code synchronization system has been an issue in the M-ary bi-orthogonal SS systems. In this paper, the authors focus on a code tracking system using a differential detector and a Delay Lock Loop (DLL). They investigate a tracking performance of their code tracking system by theoretical analysis. In addition, a multi-stage interference canceler is applied to the M-ary bi-orthogonal SS system. As the result, it is shown that the tracking performance of the theoretical analysis is almost the same as that of computer simulations in a multi-user environment. It is also shown that the multi-stage interference canceler is effective in improvement of the BER performance.

View full abstract

Download PDF (951K)
Multi-User Scheduling Method Considering Fairness and Mitigation of Multi-Cell Interference for Multi-Hop Cellular System

Yuji OKAMOTO, Takeo FUJII

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2746-2752
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2746

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In order to improve the cell boundary throughput performance and to extend the coverage area, relaying transmission with relay stations (RSs) is becoming a promising architecture for the next generation cellular systems. However, if RSs are operated in every cell, the interference between cells increases and the throughput improvement effect with RSs is prone to be restricted. In this paper, we propose a scheme reducing the interference from other cells by using packet transmission control. This packet transmitting control technique is realized by the compound scheduling technique with the Proportional fair (PF) scheduling and the Maximum Carrier-to-Interference power Ratio (Max CIR) scheduling. The proposed scheme can improve the throughput around the cell boundary by controlling the timing of transmission of each cell with appropriate power and user assignment. The simulation results show that the proposed method can also improve the fairness of user throughput and system throughput considering the users of whole cell.

View full abstract

Download PDF (1201K)
BER Evaluation of CDMA-Based Wireless Services Transmission over Aperture Averaged FSO Links

Chedlia BEN NAILA, Kazuhiko WAKAMORI, Mitsuji MATSUMOTO

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2753-2761
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2753

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Radio frequency on free-space optical (RoFSO) technology is regarded as a new universal platform for enabling seamless convergence of fiber and FSO communication networks, thus extending broadband connectivity to underserved areas. In this paper, we investigate the performance to characterize the transmission of code division multiple access (CDMA) based wireless signals over RoFSO system using aperture averaging (AA) technique under strong turbulence conditions. An analytical model including a modified carrier-to-noise-plus- interference ratio (CNIR) form and a novel closed-form expression for the bit-error rate (BER) is derived. Unlike earlier work, our model takes into consideration the effect of using the AA technique modeled by the gamma-gamma distribution, the optical noises, the intermodulation distortion term due to the laser diode non-linearity and the multiple interference access. By investigating the impact of AA on our model in the strong turbulence regime, we show that there is a design trade-off between the receiver lens aperture and the number of users to achieve a required CNIR ensuring a substantial scintillation fade reduction. The presented work can be used as baseline for the design and performance evaluation of the RoFSO system's ability to transmit different broadband wireless services signals over turbulent FSO links in real scenarios.

View full abstract

Download PDF (1386K)
Interference Mitigation Capability of a Low Duty DS-Multiband-UWB System in Realistic Environment

Chin-Sean SUM, Shigenobu SASAKI, Hiroshi HARADA

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2762-2772
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2762

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, the performance of a low duty factor (DF) hybrid direct sequence (DS) multiband (MB)-pulsed ultra wideband (UWB) system is evaluated over realistic propagation channels to highlight its capability of interference mitigation. The interference mitigation techniques incorporated in the DS-MB-UWB system is a novel design that includes the utilization of the frequency-agile multiple sub-band configuration and the coexistence-friendly low DF signaling. The system design consists of a Rake type receiver over multipath and multi-user channel in the presence of a coexisting narrowband interferer. The propagation channels are modeled based on actual measurement data. Firstly, by suppressing the power in the particular sub-band coexisting with the narrowband signal, performance degradation due to narrowband interference can be improved. It is observed that by fully suppressing the sub-band affected by the narrowband signal, a typical 1-digit performance improvement (e.g. BER improves from 10^-3 to 10^-4) can be achieved. Secondly, by employing lower DF signaling, self interference (SI) and multi-user interference (MUI) can be mitigated. It is found that a typical 3dB improvement is achieved by reducing the DF from 0.5 to 0.04. Together, the sub-band power suppression and low DF signaling are shown to be effective mitigation techniques against environment with the presence of SI, MUI and narrowband interference.

View full abstract

Download PDF (833K)
Power Supply Overlaid Communication with Common Clock Delivery for Cooperative Motion Control

Fumikazu MINAMIYAMA, Hidetsugu KOGA, Kentaro KOBAYASHI, Masaaki KATAYA ...

Article type: LETTER
2011Volume E94.AIssue 12 Pages 2773-2775
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2773

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

For the control of multiple servomotors in a humanoid robots, a communication system is proposed. In the system, DC electric power, command/response signals and a common clock signal for precise synchronous movement of the servomotors are transmitted via the same wiring with a multi-drop bus. Because of the bandwidth limitation, the common clock signal and the command/response signals overlap each other. It is confirmed that the coexistence of both signals is possible by using interference cancellation at the reception of command/response signals.

View full abstract

Download PDF (611K)
Subcarrier Mapping for Single-User SC-FDMA Relay Communications

Yu HEMMI, Koichi ADACHI, Tomoaki OHTSUKI

Article type: LETTER
2011Volume E94.AIssue 12 Pages 2776-2779
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2776

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

A combination of single-carrier frequency-division mult-iple-access (SC-FDMA) and relay transmission is effective for performance improvement in uplink transmission. In SC-FDMA, a mapping strategy of user's spectrum has an enormous impact on system performance. In the relay communication, the optimum mapping strategy may differentiate from that in direct communication because of the independently distributed channels among nodes. In this letter, how each link should be considered in subcarrier mapping is studied and the impact of mapping strategies on the average bit error rate (BER) performance of single-user SC-FDMA relay communications will be given.

View full abstract

Download PDF (833K)
Iterations of FB-MSDSD and Turbo Codes over the Correlated Flat Fading Channel

Chien-Sheng CHEN, Ching-Chi LO

Article type: LETTER
2011Volume E94.AIssue 12 Pages 2780-2786
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2780

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Over a correlated flat fading channel, multiple-symbol differential detection can enhance the performance of coded differential phase shift keying (DPSK) systems but with exponential complexity. For iterative decoding schemes, the soft-input soft-output (SISO) multiple-symbol differential sphere decoding (MSDSD) can offer suboptimal performance and its complexity is quadratic with detection length. To further reduce the complexity, this paper proposes a Forward/Backward MSDSD (FB-MSDSD) for coded DPSK systems. The key idea is that the detection interval is split into two subintervals which are processed in the forward and backward directions respectively. Simulation results show that the proposed scheme has almost the same performance and lower complexity when compared with the SISO-MSDSD scheme with the same detection length.

View full abstract

Download PDF (897K)

Special Section on Mathematical Systems Science and its Applications

FOREWORD

Toshimitsu USHIO

2011Volume E94.AIssue 12 Pages 2787
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2787

JOURNAL RESTRICTED ACCESS

Download PDF (92K)
A Verification and Analysis Tool Set for Embedded System Design

Yuichi NAKAMURA

Article type: INVITED PAPER
2011Volume E94.AIssue 12 Pages 2788-2793
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2788

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a verification and analysis tool set for embedded systems. Recently, the development scale of embedded systems has been increasing since they are used for mobile systems, automobile platforms, and various consumer systems with rich functionality. This has increased the amount of time and cost needed to develop them. Consequently, it is very important to develop tools to reduce development time and cost. This paper describes a tool set consisting of three tools to enhance the efficiency of embedded system design. The first tool is an integrated tool platform. The second is a remote debugging system. The third is a clock-accurate verification system based on a field-programmable gate array (FPGA) for custom embedded systems. This tool set promises to significantly reduce the time and cost needed to develop embedded systems.

View full abstract

Download PDF (1697K)
Checking On-the-Fly Universality and Inclusion Problems of Visibly Pushdown Automata

Nguyen VAN TANG, Hitoshi OHSAKI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2794-2801
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2794

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Visibly pushdown automata (VPA), introduced by Alur and Madhusuan in 2004, is a subclass of pushdown automata whose stack behavior is completely determined by the input symbol according to a fixed partition of the input alphabet. Since it was introduced, VPA have been shown to be useful in various contexts, e.g., as specification formalism for verification and as an automaton model for processing XML streams. However, implementation of formal verification based on VPA framework is a challenge. In this paper, we propose on-the-flyalgorithms to test universality and inclusion problems of this automata class. In particular, we first present a slight improvement on the upper bound for determinization of VPA. Next, in order to check universality of a nondeterministic VPA, we simultaneously determinize this VPA and apply the P-automata technique to compute a set of reachable configurations of the target determinized VPA. When a rejecting configuration is found, the checking process stops and reports that the original VPA is not universal. Otherwise, if all configurations are accepting, the original VPA is universal. Furthermore, to strengthen the algorithm, we define a partial ordering over transitions of P-automaton, and only minimal transitions are used to incrementally generate the P-automaton. The purpose of this process is to keep the determinization step implicitly for generating reachable configurations as minimum as possible. This improvement helps to reduce not only the size of the P-automaton but also the complexity of the determinization phase. We implement the proposed algorithms in a prototype tool, named VPAchecker. Finally, we conduct experiments on randomly generated VPA. The experimental results show that the proposed method outperforms the standard one by several orders of magnitude.

View full abstract

Download PDF (880K)
Decentralized Supervisory Control of Timed Discrete Event Systems

Masashi NOMURA, Shigemasa TAKAI

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2802-2809
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2802

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In the framework of supervisory control of timed discrete event systems (TDESs), a supervisor decides the set of events to be enabled to occur and the set of events to be forced to occur in order for a given specification to be satisfied. In this paper, we consider decentralized supervisory control of TDESs where enforcement decisions of local supervisors are fused by the AND rule or the OR rule. We derive existence conditions of a decentralized supervisor under these decision fusion rules.

View full abstract

Download PDF (849K)
Option-Based Monte Carlo Algorithm with Conditioned Updating to Learn Conflict-Free Task Allocation in Transport Applications

Alex VALDIVIELSO, Toshiyuki MIYAMOTO

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2810-2820
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2810

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In automated transport applications, the design of a task allocation policy becomes a complex problem when there are several agents in the system and conflicts between them may arise, affecting the system's performance. In this situation, to achieve a globally optimal result would require the complete knowledge of the system's model, which is infeasible for real systems with huge state spaces and unknown state-transition probabilities. Reinforcement Learning (RL) methods have done well approximating optimal results in the processing of tasks, without requiring previous knowledge of the system's model. However, to our knowledge, there are not many RL methods focused on the task allocation problem in transportation systems, and even fewer directly used to allocate tasks, considering the risk of conflicts between agents. In this paper, we propose an option-based RL algorithm with conditioned updating to make agents learn a task allocation policy to complete tasks while preventing conflicts between them. We use a multicar elevator (MCE) system as test application. Simulation results show that with our algorithm, elevator cars in the same shaft effectively learn to respond to service calls without interfering with each other, under different passenger arrival rates, and system configurations.

View full abstract

Download PDF (962K)
Polynomial Time Verification of Behavioral Inheritance for Interworkflows Based on WfMC Protocol

Shingo YAMAGUCHI, Tomohiro HIRAKAWA

Article type: PAPER
2011Volume E94.AIssue 12 Pages 2821-2829
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2821

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The Workflow Management Coalition, WfMC for short, has given a protocol for interorganizational workflows, interworkflows for short. In the protocol, an interworkflow is constructed by connecting two or more existing workflows; and there are three models to connect those workflows: chained, nested, and parallel synchronized. Business continuity requires the interworkflow to preserve the behavior of the existing workflows. This requirement is called behavioral inheritance, which has three variations: protocol inheritance, projection inheritance, and life-cycle inheritance. Van der Aalst et al. have proposed workflow nets, WF-nets for short, and have shown that the behavioral inheritance problem is decidable but intractable. In this paper, we first show that all WF-nets of the chained model satisfy life-cycle inheritance, and all WF-nets of the nested model satisfy projection inheritance. Next we show that soundness is a necessary condition of projection inheritance for an acyclic extended free choice WF-net of the parallel synchronized model. Then we prove that the necessary condition can be verified in polynomial time. Finally we show that the necessary condition is a sufficient condition if the WF-net is obtained by connecting state machine WF-nets.

View full abstract

Download PDF (483K)
Simplified Relative Model to Measure Visual Fatigue in a Stereoscopy

Jae Gon KIM, Jun-Dong CHO

Article type: LETTER
2011Volume E94.AIssue 12 Pages 2830-2831
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2830

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we propose a quantitative metric of measuring the degree of the visual fatigue in a stereoscopy. To the best of our knowledge, this is the first simplified relative quantitative approach describing visual fatigue value of a stereoscopy. Our experimental result shows that the correlation index of more than 98% is obtained between our Simplified Relative Visual Fatigue (SRVF) model and Mean Opinion Score (MOS).

View full abstract

Download PDF (798K)
Verifying Structurally Weakly Persistent Net Is Co-NP Complete

Atsushi OHTA, Kohkichi TSUJI

Article type: LETTER
2011Volume E94.AIssue 12 Pages 2832-2835
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2832

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

Petri net is a powerful modeling tool for concurrent systems. Subclasses of Petri net are suggested to model certain realistic applications with less computational cost. Structurally weakly persistent net (SWPN) is one of such subclasses where liveness is verified in deterministic polynomial time. This paper studies the computational complexity to verify whether a give net is SWPN. 3UNSAT problem is reduced to the problem to verify whether a net is not SWPN. This implies co-NP completeness of verification problem of SWPN.

View full abstract

Download PDF (141K)

Regular Section

Flicker Parameters Estimation in Old Film Sequences Containing Moving Objects

Xiaoyong ZHANG, Masahide ABE, Masayuki KAWAMATA

Article type: PAPER
Subject area: Digital Signal Processing
2011Volume E94.AIssue 12 Pages 2836-2844
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2836

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The aim of this study is to improve the accuracy of flicker parameters estimation in old film sequences in which moving objects are present. Conventional methods tend to fail in flicker parameters estimation due to the effects of moving objects. Our proposed method firstly utilizes an adaptive Gaussian mixture model (GMM)-based method to detect the moving objects in the film sequences, and combines the detected results with the histogram-matched frames to generate reference frames for flicker parameters estimation. Then, on the basis of a linear flicker model, the proposed method uses an M-estimator with the reference frames to estimate the flicker parameters. Experimental results show that the proposed method can effectively improve the accuracy of flicker parameters estimation when the moving objects are present in the film sequences.

View full abstract

Download PDF (1457K)
Self-Organizing Digital Spike Maps for Learning of Spike-Trains

Takashi OGAWA, Toshimichi SAITO

Article type: PAPER
Subject area: Nonlinear Problems
2011Volume E94.AIssue 12 Pages 2845-2852
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2845

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

This paper presents a digital spike map and its learning algorithm of spike-trains. The map is characterized by a swarm of particles on lattice points. As a teacher signal is applied, the algorithm finds a winner particle. The winner and its neighbor particles move in a similar way to the self-organizing maps. A new particle can born and the particle swarm can grow depending on the property of teacher signals. If learning parameters are selected suitably, the map can evolve to approximate a class of teacher signals. Performing basic numerical experiments, the algorithm efficiency is confirmed.

View full abstract

Download PDF (1425K)
On Structural Analysis and Efficiency for Graph-Based Rewiring Techniques

Fu-Shing CHIM, Tak-Kei LAM, Yu-Liang WU, Hongbing FAN

Article type: PAPER
Subject area: VLSI Design Technology and CAD
2011Volume E94.AIssue 12 Pages 2853-2865
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2853

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

The digital logic rewiring technique has been shown to be one of the most powerful logic transformation methods. It has been proven that rewiring is able to further improve some already excellent results on many EDA problems, ranging from logic minimization, partitioning, FPGA technology mappings to final routings. Previous studies have shown that ATPG-based rewiring is one of the most powerful tools for logic perturbation while a graph-based rewiring engine is able to cover nearly one fifth of the target wires with 50 times runtime speedup. For some problems that only require good-enough and very quick solutions, this new rewiring technique may serve as a useful and more practical alternative. In this work, essential elements in graph-based rewiring such as rewiring patterns, pattern size and locality, etc., have been studied to understand their relationship with rewiring performance. A structural analysis on the target-alternative wire pairs discovered by ATPG-based and graph-based engines has also been conducted to analyze the structural characteristics that favor the identification of alternative wires. We have also developed a hybrid rewiring approach that can take the advantages from both ATPG-based and graph-based rewiring. Experimental results suggest that our hybrid engine is able to achieve about 50% of alternative wire coverage when compared with the state-of-the-art ATPG-based rewiring engine with only 4% of the runtime. Through applying our hybrid rewiring approach to the FGPA technology mapping problem, we could achieve similar depth level and look-up table number reductions with much shorter runtime. This shows that the fast runtime of our hybrid approach does not sacrifice the quality of certain rewiring applications.

View full abstract

Download PDF (863K)
Adaptive Go-Back-N ARQ Protocol over Two Parallel Channels with Slow State Transition

Chun-Xiang CHEN, Kenichi NAGAOKA, Masaharu KOMATSU

Article type: PAPER
Subject area: Reliability, Maintainability and Safety Analysis
2011Volume E94.AIssue 12 Pages 2866-2873
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2866

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we propose an adaptive Go-Back-N (GBN) ARQ protocol over two parallel channels with slow state transition. This proposed protocol sophisticatedly determines the order of priority of the channel usage for sending packets, by using the channel-state feedback information. We exactly analyze the throughput efficiency of the protocol and obtain its closed-form expression under the assumption that the time-varying channel is modeled by a two-state Markov chain, which is characterized by packet error rate and the decay factor. The analytical results and numerical examples show that, for a given round-trip time, the throughput efficiency depends on both the average packet-error rate and the decay factor. Furthermore, it is shown that the throughput efficiency of the proposed protocol is superior to that of the non-adaptive Go-Back-N protocol using the two channels in a fixed order in the case of slow state transition (i.e. the decay factor is positively large).

View full abstract

Download PDF (460K)
New Constructions of Binary Sequences with Good Autocorrelation Based on Interleaving Technique

Xiuwen MA, Qiaoyan WEN, Jie ZHANG, Xuan ZHANG

Article type: PAPER
Subject area: Cryptography and Information Security
2011Volume E94.AIssue 12 Pages 2874-2878
Published: December 01, 2011
Released on J-STAGE: December 01, 2011

DOIhttps://doi.org/10.1587/transfun.E94.A.2874

JOURNAL RESTRICTED ACCESS

Show abstractHide abstract

In this paper, we propose new constructions of binary sequences based on an interleaving technique. In our constructions, we make use of any binary sequences with ideal 2-level autocorrelation, a special shift sequence as well as the perfect binary sequence or sequence (0,1,1) in the interleaved structure to get the new sequences. Except for the most autocorrelation values of our new sequences, we find that the unexpected autocorrelation values only occur four or two times in each period no matter how long the period is. We state that the sequences have a good autocorrelation in this case. In particular, the autocorrelation distribution of our sequences is determined.

View full abstract

Download PDF (346K)

Register with J-STAGE for free!