HALT is a BIG change
Implementing a new reliability development paradigm in a company which is using traditional, standards-based testing can be a perilous journey. It is especially true with introducing HALT (Highly Accelerated Life Test) in which strength against stress, and not quantifying electronics lifetimes is the new metric. Because of this significant change in test orientation, a critical factor for success begins with educating the company’s top technical and financial stakeholders on why and how HALT is so effective for rapid reliability development. Without the upper levels of management understanding in parallel the big picture of the HALT paradigm shift, the work of educating each skeptical key player in a serial fashion will cost much more time and puts success of HALT at significant risk.
To illustrate, let’s imagine that you are your electronics systems company’s reliability engineer or you have been involved in reliability qualification or validation testing of its products for several years. You have experienced field failures that resulted from design margin issues that were overlooked during the development process, as well as some from mistakes in manufacturing. Reliability development in your company consists of running tests that simulate the LCEP (Life Cycle Environmental Profile) estimates (guesstimates?), or design engineers apply limited stress to their predefined “that’s good enough” level or guesses on what may be the worst case stress environmental conditions for the product.
You have just learned the generic methodology and some of the benefits of HALT (Highly Accelerated Life Test) from reading a book on the subject or attending a class or webinar. You now want to try HALT for your company’s new product, and find an outside test lab that has a HALT chamber so that you can go do a HALT. It would seem to be the most straight forward path to getting started with HALT. Or is it?
Teaching HALT in Series takes longer
Let’s consider the following scenario.
You find an outside test lab that can perform HALT a few miles away from your company. You have five samples of the new product, support equipment to operate and monitor the UUTs (Units Under Test), and possibly a technician (or if lucky you have a design engineer for the product you are going to test) to go with you. The environmental design specifications for the product are 0°C to 35°C.
The HALT lab helps you set up the first sample of the product and proceeds to find the lower temperature operational limit and the upper temperature operational limit. Since the product in this case is a digital system, there are no thermal destruct levels found as the system will not operate above or below the temperature operational limits. In the five samples used for HALT, you find upper temperature operational limits at 70, 72, 90, 117, and 110°C. The lower temperature operational limits for the five samples are found to be -55, -45, -50, -58, and -47°C.
The final stress used in HALT is vibration and two of the samples fail when the vibration level reaches the maximum vibration level of the HALT chamber. The failure mechanism on both is a broken lead of a capacitor mounted high off the PWB. You and the design engineer repair the capacitor leg and glue it down to the PWB. To verify the HALT improvement, you apply HALT vibration to the same maximum level and the glued capacitors do not fail.
Improving margins beyond spec – the bigger challenge
After you complete HALT at the outside lab and come back to your company, you wonder why there is approximately a 40°C difference in upper temperature operational limits between the five samples. You realize that wide variation in limits may be an indicator of some components’ inconsistent manufacturing processes, or significant sensitivity to inherent parametric variations of a component, which if it increases its variation could significantly impact field reliability. You hope that you can have the manager of design engineering support an investigation into the cause of the wide upper thermal limit variations between the samples. When you meet with him he tells you that his department is very busy with the next design and his limited resources will not be available because:
- “The product meets the design specifications, and even the worst sample has 35°C margin above design specifications.”
- “The product will never see 70°C in its worst case use, therefore if it does fail it’s the customers fault.”
- “We do not have time to re-design the product to meet your HALT stress requirements.”
How do you address these obstacles from the design engineering manager for resources needed to identify the weaknesses and potentially improve the product robustness and reliability?
Let’s say you spend an hour with the design manager, overcome his objections, and get help from the design engineers. With the help of a couple of design engineers you determine a ten-watt FET is the most likely cause of the upper temperature operational limit. Fortunately you find a twenty-watt FET in the same size package and voltage and use it to replace the ten-watt FET. You go the HALT lab two weeks after your first HALT and find that all three new samples have an upper operational limit above 115°C and again no thermal destruct limit is found.
While you were busy in the Lab
Later in the week you find that during the time you have been doing the HALT at the local environment test lab, a skeptical design engineer (you have yet to speak with) has heard that you want to “over-design the product for stresses it will never see” and has spoken to others in the design and procurement departments. Now you start hearing that engineers (you have not spoken with) comment on your desire to add product costs to overdesign a product for an irrelevant failure mode. You find that you are teaching the HALT paradigm shift to each skeptic as you find them, convincing some, but the ones you have not spoken with are also spreading the same “fear of over design.”
When you go to the purchasing department you find out the higher wattage FET will add of 50 cents additional costs to a product that retails at $700.00. Now the Vice President of Engineering hears about the additional product costs if the FET’s are changed. Since it will reduce the profit margin on an already competitively priced product, the VP then asks very similar questions to those the design engineering manager asked previously. The design manager attempts to explain the reasoning in his half-hour meeting with the VP, but doesn’t succeed.
Product Launch Time – too late, but now you may get the field failure data
Ultimately increasing the wattage of thermal operation limiting FET is not implemented and the product is released to market. In a year or two years, you may be able accumulate the warranty return data and find the FET failures are the second highest cause of warranty returns, after NFF (No Fault Found). You bring this information to the new design engineering manager who just joined the company, or had moved from another position during reorganization only five months ago.
Back to square one in teaching HALT to the new design engineering manager, and then many new design engineers, reliability engineers, and department managers since you first begin your path to introducing HALT at your company two years ago. Do you have the energy to show and explain HALT to the new key players? Are you still a reliability engineer at the same company?
A “HALT Battlefield” Experienced Consultant can accelerate understanding
It is critical to win the “hearts and minds” of the top management when introducing any new engineering concept. Introducing HALT methods, a relatively new yet still very misunderstood paradigm in electronics reliability development can be a challenging path. An experienced HALT consultant can provide a simultaneous education of the top executives and key management personnel to help their understanding the fundamentals of the HALT paradigm shift. They can provide data and examples showing how and why HALT is so effective and pitfalls to avoid. A HALT consultant can review causes of your field failures and identify those that would be likely found during HALT so that you can show the future potential ROI for different test strategies. They can also provide multiple paths to HALT adoption by demonstration tests on a known weakness, or showing how it can reduce the NFF warranty returns problem.
A HALT consultant can provide a short overview presentation to the company executives, and their staff can ask questions about HALT with all hearing the same answers and explanations. You will still have questions to answer about the new methods going forward, but with an experienced HALT consultant, you and your company will have someone who has heard the questions many times before and likely has the data to support the answers or knows where to find it.
Of course there are many other benefits of having an experienced leader in HALT guiding your way, such as planning, test monitoring, procedure writing, and most importantly the continued teaching and coaching of new engineers and skeptics while you are accumulating real field data showing the benefits of HALT in your products. But, without a common understanding at the highest levels of the company, you may “win several battles but lose the war” of introducing the most efficient approach to making a reliable and robust electronic system.
What has convinced you that HALT is worth it or that it is not worth it? What has been your most effective tool for accelerating the adoption of “stress to limits” (HALT) methods? Please leave your questions and comments or stories of success or failure with accelerated stress testing below or contact Accelerated Reliability Solutions, L.L.C. for more information.
Many thanks to Chet Haibel for improving this blog post.
Virginia Hobbs says
Kirk – Thank you for taking the time to write this great artile that should be read by anyone not doing HALT. Virginia
Kelvin Adams says
Kirk, thank you for sharing such helpful information! HALT is testing to fail strategy. The purpose of HALT is to proactively detect and repair weak connections in the product to make it more dependable. The ideal time to start HALT testing is during the product development phase when the prototypes become available. It is because HALT is intended to expose flaws and weaknesses in a product. A successful test will identify areas for product improvement.
Kirk Gray says
Thanks Kelvin. You can hear more about HALT from me a couple of times a month on the Podcast “Speaking of Reliability” which you can find on your favorite podcast app.