Semiconductor design, manufacturing, and system deployment are being challenged on many fronts owing to technology scaling, process variability, device aging effects, ever increasing performance expectations, and the continued reduction in time with respect to volume. Data centers and applications have stringent reliability, availability, and serviceability (RAS) requirements straining under the massive scale of compute today. Silent data corruption (SDC) has become a problem for all semiconductor suppliers in large-scale compute. Automotive OEMs (original equipment manufacturers) are accelerating adoption of advanced process nodes to address the compute required for fully autonomous transportation while still meeting the stringent functional safety (FuSa) requirements. In this paper, we describe the challenges, potential causes, and mitigation techniques to address modern RAS requirements including SDC. Silicon Lifecycle Management (SLM) will be explained which involves the insertion of in-chip monitors, Electronic Design Automation (EDA) tools, and data analytics solutions on the cloud, edge and embedded in the SoC. SLM monitors, collects, and stores device data throughout a system's life and it provides insights through purpose-built analytics for production data through the mission mode of operation.
抄録全体を表示