Research on the placement problem in physical design has evolved timely in the recent few decades from traditional wirelength-driven, to routability-driven and then to detailed-routability driven. In this paper, we will focus on the interconnect and routing issues in placement, and study and survey on the development and progress of related works on this important problem.
For the purpose of analyzing the cause of delay in modern digital circuits, efficient diagnosis methods for delay faults need to be developed. This paper presents diagnosis methods for gate delay faults by using a fault dictionary approach. Although a fault dictionary is created by fault simulation and for a specific amount of delay, the proposed method using it can deduce candidate faults successfully even when the amount of delay in a circuit under diagnosis is different from that of the delay assumed during the fault simulation. In this paper, we target diagnosing the presence of single gate delay faults and double gate delay faults. Experimental results for benchmark circuits demonstrate the effectiveness of the proposed methods.
This paper presents new methods of detecting missed arithmetic optimization opportunities for C compilers by random testing. For each iteration of random testing, two equivalent programs are generated, where the arithmetic expressions in the second program are more optimized in the C program level. By comparing the two assembly codes compiled from the two C programs, lack of optimization on either of the programs is detected. This method is further extended for detecting erroneous or insufficient optimization involving volatile variables. Two random programs differing only on the initial values for volatile variables are generated, and the resulting assembly codes are compared. Random test systems implemented based on the proposed methods have detected missed optimization opportunities on several compilers, including the latest development versions of GCC-5.0.0 and LLVM/Clang-3.6.
This paper presents a novel calibration method for Delay Value Measurement Circuit (DVMC), a class of embedded time to digital converter (TDC), using a variable clock generator for accurate delay measurement. The proposed method uses a design for calibration as well as a variable clock generator. The design utilizes a delay time controllable (DTC) inverter. It also uses two OR-NAND gates which work as selectors; we reconfigure the construction of the ring oscillator (RO) in DVMC when calibrating the DTC inverter. The proposed scheme accomplishes more accurate calibration compared to the traditional calibration which only uses the variable clock generator. For example, when using a variable clock generator with the resolution of 5.2ps, the resolution of the proposed method is 0.58ps while the traditional method is 5.2ps.
While Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus architecture. This paper presents a novel fast statistical based bus stall prediction model that enables estimating the effects of bus-contention stall on the cycle-count of an application program on a subject MPSoC architecture. Our technique fills the gap in existing techniques for bus performance estimation, that are either not accurate enough (e.g. static techniques) or too slow to be used in iterative analysis (e.g. cycle by cycle arbitration simulation on every bus access). First we formulate a model named “single blocking model” that models blocking of a single bus request due to a single bus transfer on another bus master at a time. Furthermore we augment this model with a “burst blocking model” that models bus stall incurred due to burst bus transfers. Together these two models give us a very fast way to predict bus stalls on an MPSoC bus. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. The proposed technique makes use of accumulated “workload statistics” to accurately predict the “stall cycle counts” caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. Proposed technique is verified by experiments on applications such as “synthetic traffic generators”, “Newton-Euler dynamic control calculation for the 6-degrees-of-freedom Stanford manipulator benchmark”, “Random sparse matrix solver for electronic circuit simulations benchmark”, “Fast Fourier Transform with 1024 inputs of complex numbers” and “SPEC95 Fpppp which is a chemical program performing multi-electron integral derivatives”. Experimental results show that the proposed method delivers a speed-up factor of 1.33, 1.7, 74 and 6 against the simulation method for the four benchmark applications respectively, while average estimation error is 7% for benchmark application, “Fast Fourier Transform with 1024 inputs of complex numbers” and under 1% for other benchmarks.
While customizing the memory system design or picking the most fitting design for applications is very critical, many software vendors refrain from releasing their software for several reasons. First, many applications are proprietary, hence releasing them to hardware architects or vendors is not desired. Second, applications such as defense and nuclear simulations are very sensitive, hence accessing them is very restricted. Nonetheless, customizing the hardware for such applications is still important and highly desired. Workload cloning is the technique of generating synthetic clones from the original workload. The clones mimic the memory access behavior of the original workloads, hence enable exploring the design space with high level of accuracy. In this article, we survey the state-of-art cloning techniques of the memory access behavior and their uses.
As complexity of system LSI design is increased significantly, efficient verification methodology is mandatory to achieve reliable system and to speed up development time. HW/SW co-verification, nowadays, is interesting and practical as a tool for system verification because it allows covering large number of verification scenarios in acceptable time. In this paper, we present an efficient and unified framework of HW/SW co-verification methodology for large scale system, particularly high throughput wireless communication system. The proposed methodology combine system level simulation (e.g., MATLAB or C/C++) and physical level verification (e.g., FPGA). It allows performing fast HW/SW verification, as well as fast turn-around design exploration. The proposed methodology has been successfully employed to our case study which is 4x4 MIMO wireless communication system. Experimental results show that our case study is able to run in near real-time processing, resulting in an improvement of simulation time orders of magnitude faster than software based simulation. Moreover, the proposed verification platform can be used for complete characterization of communication performance of a MIMO wireless system employing MLD MIMO decoder for various operation modes and channel models.
In this paper we outline a transistor size optimization technique for logic circuits that takes into account BTI (Bias Temperature Instability) and process variations. We demonstrate the accuracy of our results with statistical analysis. Since variations have a large impact on the scaling process, dependable circuit designs should include a quantitative analysis if they are to become more reliable in the future. In this study we used an algorithm to prove that with our technique we efficiently lowered the timing margin of the logic path by 4.4% below the margin achieved by conventional techniques. We also observed that the lifetime of the optimized circuits extended without any area overhead.
In this paper, in order to realize 0.4V operation of STT-MRAM, we propose the counter base read circuit. The proposed read circuit has tolerance for process variation and temperature fluctuation by changing dynamically the load curve in a time-axis at the read operation. We confirmed that the proposed read circuit can operate at the conditions of five process corners (TT, FF, FS, SF, and SS) and three temperatures (-20°C, 25°C, and 100°C) by HSPICE simulations. At the condition of TT corner and 25°C, read time of the proposed circuit is 271ns, and energy consumption is 1.05pJ at “1” read operation and 1.23pJ at “0” read operation.