Reliability issues in deep deep sub-micron technologies: time-dependent variability and its impact on embedded system design

Antonis Papanikolaou, Hua Wang, Miguel Miranda, Francky Catthoor
2007 13th IEEE International On-Line Testing Symposium (IOLTS 2007)  
Embedded system design is especially demanding in terms of requirements that need to be satisfied, e.g. real-time processing, cost effectiveness, low energy consumption and reliable operation. These requirements have to be properly balanced until a financially viable global solution is found. Novel mobile multimedia and communication applications pose extremely severe requirements on the amount of storage, processing and functionality capabilities of the system. Near future embedded systems
more » ... have to combine interactive gaming with advanced 3D and video codecs together with leading edge wireless connectivity standards, like software defined radio front-ends and protocol stacks for cognitive radio. This will increase the platform requirements by at least a factor of 10. Meanwhile, battery capacity is only increasing by about 7% per year and users demand longer times between battery recharges. Optimizing any one of these requirements by compromising on another is a rather straightforward design task. In embedded system design the solution must obey the constraints in all four requirement axes. Products containing some embedded system implementation targeting safety critical applications (i.e. advanced braking systems and traction control of modern cars, biomedical devices, etc.) have aggressive design constraints, especially in terms of meeting reliability and fail-safe operation targets during the guaranteed product lifetime. This translates onto very low field return targets during that time, since failures can lead to dire financial consequences or catastrophic results. On the other hand, systems that belong to the low end consumer electronics market are also subject to tight lifetime and reliability targets. They are usually deployed in very large volume, thus even a small percentage of failures can lead to a large amount of field returns that cost both financially and in consumer loyalty and in company image. For all these reasons fail-safe reliable operation throughout a guaranteed product lifetime becomes a strategically important property. Technology scaling has traditionally enabled improvements in three of the design quality metrics: increased processing power, lower energy consumption per task and lower die cost. Reliability targets were also guaranteed at the technology level by using well controlled processes and well characterized materials. Unfortunately this "happy scaling" scenario where technology and design could be kept decoupled is coming to an end. New technologies become far less mature than earlier ones, e.g. the nanometer range feature sizes require the introduction of new materials and process steps that are not properly characterized by the time they start being used in commercial products, leading to potentially less reliable products. On the other hand, progressive degradation instead of abrupt failure of transistors and wires becomes reality as an intrinsic consequence of the smaller feature sizes and interfaces as well as increasing electric fields and operating temperatures. Effects considered as second-order in the past, become a clear threat now for the correct operation of the circuits and systems since they start affecting their parametric features (e.g., timing but also energy dissipation) while the functionality remains unaltered. Moreover, the combined impact of manufacturing uncertainty and reliability degradation results in time-dependent variability. The electrical characteristics of the transistors and the wires will vary statistically in a spatial and a temporal manner, directly translating into design uncertainty during fabrication and even during operation in the field. Unfortunately, current reliability models based on traditional worst case stress analysis are not sufficient to capture these more dynamic system level interactions, resulting in over-pessimistic implementations. On the solution side, a number of conventional techniques already exist for dealing with uncertainty. But most of them rely on the introduction of worst-case design slacks at the process technology, circuit and system level in order to absorb the unpredictability of the transistor and interconnect performance and to provide implementations with predictable parametric features. Trade-offs are always involved in these decisions, which result in excessive energy consumption and/or cost leading to infeasible design choices. From the designers perspective reliability degradation mechanisms manifest themselves as timedependent uncertainties in the parametric performance metrics of the devices. In the future sub 45 nm regime, these uncertainties would be way too high to be handled with existing worst-case design techniques without incurring significant penalties in terms of area/delay/energy. As a result, reliability becomes a great threat to the design of reliable complex digital systems-on-chip (SoC) implementations. This will require the development of novel reliability models at all three levels, namely device, circuit and system level. They should be capable of capturing the impact of the application functionality on the system as well as new design paradigms for embedded system design in order to build reliable systems using technology which will be largely unpredictable in nature. A shift toward Technology-Aware Design solutions will be required to keep designing successful systems in future aggressively scaled technologies. 13th IEEE International On-Line Testing Symposium (IOLTS 2007) 0-7695-2918-6/07 $25.00
doi:10.1109/iolts.2007.55 dblp:conf/iolts/PapanikolaouWMC07 fatcat:hye3jmxabrb4vp6s5tv2fn2sqy