Scalar replacement is an effective technique to improve the performance of the RTL code generated by high-level synthesis (HLS) from C programs with intensive array accesses. In scalar replacement, data accessed from arrays are stored into shift registers, and later array accesses on the same data are replaced with the accesses to the shift registers instead of the arrays. Namely, scalar replacement replaces array accesses with shift register accesses. Since arrays in C programs are usually mapped to RAMs with limited numbers of ports, reducing array accesses with scalar replacement leads to the memory access reduction, which in turn improves the performance of the resulting RTL code. In real-life C programs, sometimes, shift registers must be initialized conditionally using multiple array accesses, which increases the number of array accesses in main loops. To reduce the conditional array access in the main loops, the previous scalar replacement method proposed the use of a loop transformation called loop peeling. Loop peeling brings significant increase in code size, leading to the negative impacts on performance or circuit area of the synthesized hardware. In this paper, we propose a new method to initialize shift registers without loop peeling. The proposed method works as a preprocessing of the input C program prior to scalar replacement. With experimental results, we demonstrate the proposed method reduces the numbers of execution cycles of the synthesized hardware compared to the previous method.
Stochastic computing is a computation method which can implement arithmetic operations by simple logic circuits. Stochastic numbers are used in this method, whose values are defined by their bit streams' appearance rates of 1's. As a nature of stochastic computing, changing the length of the input stochastic numbers will change the whole circuit's accuracy. However, in some implementations with re-convergence paths, the circuit itself will cause errors, i.e., the length of the input stochastic numbers does not change that circuit's accuracy. This paper proposes a stochastic number duplicator whose outputs differ every time and are all independent. This stochastic number duplicator has a scalable structure by changing the numbers of flip-flops for bit re-arrangement. From the experimental evaluations and discussions, we clarify that the proposed stochastic number duplicator enables accuracy-flexible circuits.
With the progress of semiconductor process miniaturization, delay degradation by aging increases and threatens the reliability of fabricated chips. The amount of delay degradation is known to be circuit and workload dependent, but previous evaluations are based on simulations, and delay degradation measurement of real circuit under realistic workload has not been reported yet. This paper proposes real circuit delay measurement method, which achieves enough accuracy to measure circuit and workload dependent delay degradation. In the proposed method, on-chip oscillator supplies fine resolution variable frequency clock to internal circuit. Internal circuit execute test pattern to activate critical paths at various frequency and determine the maximum frequency at which correct results can be obtained. The maximum frequency corresponds to the delay of the critical paths activated by the test pattern. Clock multiplication improves delay resolution, and repetitive measurement reduces measurement error caused by time dependent random delay variation. The proposed method has been implemented on a 65nm low power process test chip. Variable frequency oscillator utilizes only standard cells and is designed with automatic layout flow without any timing tuning. The area overhead of the proposed method is 0.09% of the total random logic. The evaluation result show that 0.18% average measurement accuracy has been achieved.
Recently, there have been more chances to calculate matrix-vector multiplication due to the growing use of the neural network. We have proposed the method to automatically synthesize the optimum parallel algorithm for the given environment and synthesized an algorithm for matrix-vector multiplication of a specific size matrix with 4 nodes connected in a oneway ring. This paper proposes a method to generalize the synthesized algorithm to deal with any size matrix. We generalized the synthesized algorithm for the 32×32 matrix to calculate N × N matrix-vector multiplication.
In this paper, we propose a logic optimization method to remove the redundancy in the circuit. The incremental Automatic Test Pattern Generation method is used to find the redundant multiple faults. In order to remove as many redundancies as possible, instead of removing the redundant single faults first, we clear up the redundant faults from higher cardinality to lower cardinality. The experiments prove that the proposed method can successfully eliminate more redundancies comparing to the redundancy removal command in the synthesis tool SIS.
This paper presents experimental evaluations of operating speed, power consumption, and chip temperature under various load conditions on commercial FPGAs. To measure the operating speed of FPGAs, we count the oscillation frequency of a ring oscillator (RO), which is implemented on one 4-input look-up table (LUT). The load condition on FPGAs can be controlled by the number of activated ROs in parallel around the measurement RO. The measurement results show that operating speed may be decreased more than10% with 142 ROs as load circuits and the degradation rates depend on the measurement location in an FPGA. The evaluation of die-to-die variations using four more FPGA boards show that there exists a non-negligible speed variation.
In this article, we will present recent advances in VLSI reliability effects with a focus on electromigration (EM) failure/aging effect on interconnects, which is one of the most important reliability concerns for VLSI systems especially at the nanometer regime. One of the most important advances for EM analysis in recent years is the recognition that EM failure analysis can't depend on single wire segment anymore, as done in the traditional Black and Blech's based methods. New generation of EM modeling and design must consider all the wire segments in an interconnect as the hydrostatic stress in those wire segments affect each other. Such recognition bring both challenges and opportunities. We will start with physics-level stress-oriented characterization of EM failure effects and recently proposed three-phase EM models. Then we present a new EM immortality check at the circuit level considering multi-segment interconnects and void saturation volumes. After this, we will present how to accelerate EM aging effects for fast EM validation at the circuit level under normal working conditions using advanced structure-based techniques. Finally, we will present new EM sign-off analysis tool, called EMspice, at the full-chip power grid level considering the interplays between resistance changes from post-voiding processes and current density changes from power grids over the aging process. A number of other relevant works will be reviewed and compared as well.
In this paper, a simple and compact Negative Bias Temperature Instability (NBTI) model is proposed. It is based on the reaction-diffusion (tn) and hole-trapping (log(t)) theories. A single shot of DC stress and recovery data is utilized to express duty cycle dependence of NBTI degradation and recovery. Parameter fitting is proceeded by considering that the amount of recovery cannot be larger than stress degradation. The proposed universal model is applied to two types of transistors in different fabrication process technologies, and evaluate its feasibility to show the universality of our proposed model. It replicates stress and recovery successfully with various duty cycles.
Energy consumption is one of the most critical concerns in delivery services using drones. This paper studies a routing problem for energy minimization of delivery drones. This paper formally defines Energy Minimizing Vehicle Routing Problem (EMVRP) and proposes a dynamic programming algorithm to efficiently solve the problem. Experiments show the effectiveness of the proposed algorithm in terms of both quality of results and algorithm runtime.
Graph neural networks are a type of deep-learning model for classification of graph domains. To infer arithmetic functions in a netlist, we applied relational graph convolutional networks (R-GCN), which can directly treat relations between nodes and edges. However, because original R-GCN supports only for node level labeling, it cannot be directly used to infer set of functions in a netlist. In this paper, by considering the distribution of labels for each node, we show a R-GCN based function inference method and data augmentation technique for netlist having multiple functions. According to our result, 91.4% accuracy is obtained from 1, 000 training data, thus demonstrating that R-GCN-based methods can be effective for graphs with multiple functions.