In advanced technology nodes, unidirectional layout is strongly preferred for high-density metal layers for better manufacturability. The lithography printing of unidirectional layout can be tightly controlled with illumination source optimization and resolution enhancement techniques. Meanwhile, with the unidirectional layout, self-aligned double and quadruple patterning can be applied to achieve finer pitches beyond the resolutions limits of 193nm lithography tools. However, unidirectional layout style introduces significant impacts on the physical design flow. Notably, unidirectional routing limits the standard cell pin accessibility, which makes manual cell layout design and optimization more and more challenging under modern standard cell architecture. In the routing phase, routing densities and resource competitions on lower metal layers are becoming increasingly high, where intelligent approaches are needed to resolve routing resource competitions for unidirectional routing closure. Moreover, post-routing optimization is inevitable to avoid significant engineering-changing-efforts for unidirectional routing under advanced manufacturing constraints. In this paper, we present a holistic approach, including standard pin access evaluation/optimization, pin access and routing co-optimization and post-routing optimization, to enable unidirectional routing closure in advanced technology nodes.
Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.
Optical Proximity Correction (OPC) is still nominated as a main stream in printing Sub-16nm technology nodes in optical micro-lithography. However, long computation time is required to generate mask solutions with acceptable wafer image quality. Intensity Difference Map (IDM) has been recently proposed as a fast methodology to shorten OPC computation time with preserving acceptable wafer image quality. However, IDM has been evaluated only under a relatively relaxed Edge Placement Error (EPE) constraint of the final mask solution. Such an evaluation does not provide a satisfactory confirmation of the effectiveness of IDM if strict EPE constraints are imposed. In this paper, the accuracy of IDM is deeply analyzed to confirm its validity in terms of wafer image estimation accuracy along with its efficiency in shortening computation time. Thereafter, the stability of IDM accuracy against the increase in pattern area/density is confirmed. Finally, the regions suffering from lack of accuracy are analyzed for further enhancement. Experimental results show that congestion in the mask pattern forms a cardinal source of the lack of accuracy which is compensated through optimized selection of the kernels included in IDM.
This paper describes an accelerating technique for SAT based ATPG (automatic test pattern generation). The main idea of the proposed algorithm is representing more than one test generation problems as one CNF formula with introducing control variables, which reduces CNF generation time. Furthermore, learnt clauses of previously solved problems are effectively shared for other problems solving, so that the SAT solving time is also reduced. Experimental results show that the proposed algorithm runs more than 3 times faster than the original SAT-based ATPG algorithm.
Three-dimensional (3D) integration of electronic chips has been advocated by both industry and academia for many years. It is acknowledged as one of the most promising approaches to meet ever-increasing demands on performance, functionality, and power consumption. Furthermore, 3D integration has been shown to be most effective and efficient once large-scale integration is targeted for. However, a multitude of challenges has thus far obstructed the mainstream transition from “classical 2D chips” to such large-scale 3D chips. In this paper, we survey all popular 3D integration options available and advocate that using an interposer as system-level integration backbone would be the most practical for large-scale industrial applications and design reuse. We review major design (automation) challenges and related promising solutions for interposer-based 3D chips in particular, among the other 3D options. Thereby we outline (i) the need for a unified workflow, especially once full-custom design is considered, (ii) the current design-automation solutions and future prospects for both classical (digital) and advanced (heterogeneous) interposer stacks, (iii) the state-of-art and open challenges for testing of 3D chips, and (iv) the challenges of securing hardware in general and the prospects for large-scale and trustworthy 3D chips in particular.
Field-programmable gate array (FPGA) is a promising technology for the implementing of high-performance and power-efficient cloud computing by serving dedicated hardware as co-processor to accelerate loads on CPUs. However, developing an FPGA-based system is challenging because the complexity of the hardware and software co-design. In this paper, we propose a platform named hCODE to simplify the design, share, and deployment of FPGA accelerators. First, we adopt a shell-and-IP design pattern to improve the reusability and the portability of accelerator designs. Second, we implement an open accelerator repository to bridge hardware development and software development on one platform. On the hCODE platform, hardware developers can provide designs that follow hCODE specifications, which allowing software engineers to easily search, download, and integrate accelerators in their applications without caring about the hardware details.