This post is a conversation first held on the LinkedIn group No MTBF. I’m capturing a portion of the contributions here to continue the discussion or to widen the audience. Reminds me of always assuming 95% confidence is the right value when designing a test, or assuming constant failure rate. So, let the conversation continue, starting with the original post.
Where does “0.7eV” come from?
Most manufacturers are still using an Arrhenius law (shouldn’t we rather call it “Erroneous law” ? 😉 with an activation energy of 0.7eV to extrapolate High T Operating Life test data to use conditions for every kind of electronic component (from complex ICs to simple passives). It is often claimed that 0.7eV are based on “historical data”. I have never seen actually any paper or pubblication where this activation energy has been really measured. The use of a constant hazard rate (lambda) was originally justified by the fact that electronic boards have many different components with different failure mechanisms. The argument was that different Weibull distributions with different activation energies yield on an average a roughly constant hazard rate for which an apparent activation energy can be defined. Many manufacturers seem to be convinced now that the other way round must work too! Since a constant hazard rate with Ea=0.7eV has been once claimed for electronic boards, every single component must follow the same law too! It is just amazing how errors do propagate!
by Enirco
Well done young man. I do agree with you 100%. I do wish that other “reliability” experts have knowledge and ability to ask the same question. Keep asking and in touch.
by Dr K
Great observation and insight. I’d have to check the math, yet I seem to recall that 0.7eV roughly doubles the failure rate (this part I’m not sure about as I think about it) with an increase of 10°C. Playing with a few numbers just now and found, as expected that it depends on what temperature range the steps of 10°C are considered, yet roughly, very roughly, the failure rate roughly doubles with each 10°C. It’s the nice round numbers and that it’s easy to remember is probably where that ‘rule’ came from. In short, as you noticed, do the math and check the assumptions. Getting the right activation energy is too important to simply guess.
by Fred
I would recommend the following-
- Check Dimitri Kececioglu’s “Burn In Testing- It’s quantification and optimization”. I had referred to the text around a year back for the figure of Activation Energy for different components.
-
ASM International’s EDFAS references. I had read certain references sometime back at their website on this subject.
by Vinod Pal
Enrico, very good observation.I have seen no evidence that there is any valid basis for the 0.7 eV as I have never heard what physical mechanism that the activation energy is being referenced to. Is it oxide breakdown, electromigration, diffusion? It makes no sense when it is referring to the propagation of solder crack, or component package delamination, as those mechanisms are driven by thermal cycles and vibration. So much of reliability prediction of electronics is smoke and mirrors, and the real causes of unreliability are due to mistakes or overlooked design margin errors, errors in manufacturing, or abuse by customers. These causes are not predictable or follow some predictable pattern that can be modeled. There is a much better use of engineering resources to make a reliable electronics system by using stress to rapidly these overlooked margins, and errors in manufacturing before they are produced in mass quantities or shipped to the customer. We have more life in the vast majority of electronics that we really just need to remove the unreliable elements (from the causes previously mentioned) and we have a robust system that will exceed its technologically useful life.
by Kirk
To obtain the activation energy the arrhenius model is simply fit to time to failure versus temperature and the activation energy is solved for. Specific failure mechanisms are typically looked for except in the case of basic material evaluations where failure is defined as a 50% loss of tensile strength. The value of 0.7 is a common rule of thumb, remembering that the lower this number the less time compression one gets for an increase in stress. If you look at polymer materials suppliers will often perform aging tests and activation energies will be published. This value is reasonable for high glass filled nylon (45%), but for unfilled nylon the activation energy is 1.0, low to medium levels of glass fill are 0.9 to 0.8. For electronics there are many studies that have looked at various failure mechanisms but most are chemical in nature and documented what the activation energy is and as expected there are ranges of values. High temperature does produce grain growth in solder which reduces strength and can reduce thermal cycle life. Research is ongoing for lead free solders compared to tin lead. See Joe Smentana’s published work here. Some published values: Silicon semiconductor devices Silicon Oxide 1-1.05 Electromigration 0.5-1.2 Corrosion 0.3-.06, 0.45 typ. Intermetallic Growth Al/Au 1-1.05 FAMOS Transistors Charge Loss 0.8 Contamination 1.4 Oxide Effects 0.3 IC MOSFETs, Threshold Voltage Shift 1.2 Plastic Excapsulated Transistors 0.5 MOS Devices 1.1-1.3, Weak populations 0.3-0.9 Flexible Printed Circuits Below 75C 0.4 Flexible Printed Circuits above 75C 1.4 Opto Electric devices – 0.4 Photo Transistors 1.3 Carbon Resistors 0.6 LEDs 0.8 Linear Op Amps 1.6-1.8, Weak populations 0.7-1.1 In general, damage models provide some technical basis for accelerated tests but never decimal point accuracy, even though we get precise answers. It is quite easy for people to get comfortable with a common number such that they becomes a sacred cow and nobody knows where the information came from or its proper application.
by Dustin
There is still a lot of ignorance in the world. I agree electronic components are not a major issue. The eV value is determined by simply fitting a model to scattered set of measurements for an average value. One can define a strategy to use them all but it is just a waste of effort. These can be used for specific field noted issues. For a test program, one might choose the conservative grand average. If an aging issues exists, it is often due to a weak sub population, thus the need to understand if there is a difference (which there can be) but it can just be a scaling factor of life with a consistent activation energy. For a test program one considers where the most risk is; and you are correct it is not in the components nor aging damage. It appears that some blindly believe that Arrhenius is all you need to consider. Wrong. Aging can reduce strength of solder joints and thus reduce solder joint thermal cycle life. This is not consistently true for lead free solder where it is a mixed bag still being researched. Hence to guide the aging target and exercise due care, an conservative average activation energy is often chosen. Aging does change the strength of engineering polymers, so for these needs this is a valid use. A value in activation energies is after root cause failure analysis. If a particular mode is created that could be caused by an aging mechanism, analyses are possible to determine the population risk for that failure considering what we know about the failed part, where it is used, how it was used, etc. and analyzing in light of usage and environmental variation. These values do not help make reliable products, but they can help prioritize where design enhancement focus should occur. A high activation energy value mechanism is more sensitive from a stress perspective. There is false precision with any of these models. The weakest things fail first and if everything fails for a non aging reason one will never observe an aging failure in the field. Hence, if the thermal cycling mode occurs much earlier, this dominates, and thus one concentrates on minimizing CTE mismatch, strain relief, or other design strategies. Experience often gives a hint for what stressors will produce failures within the design life. Poor quality can be detected using burn in. Here we just want to be sure we do not remove too much life from the product such that an Arrhenius analysis can be useful. For the most part thermal cycling is consistently the most effective stressor for electronics. Focusing too much on Arrhenius is incorrect as the aging mechanisms are not the primary cause for field failure. However to be true to a simulation need, it is reasonable to provide a requirement with some merit for the aging portion in addition to the thermal cycling requirement if this is a major stressor in the field environment. The thermal cycling requirement is also subject to similar hand waving with various models, exponents, material constants, dependency upon dwell times, rates, catalytic effects, etc. In real life one often has a mix of failure modes the occur randomly resulting in the overall exponential failure distribution observation. One lumps them all together and possibly the Arrhenius model fits even though it is not true to the physics, it still may be useful for an engineering need. Attach a named formula to it, communicate confidently and credibility goes way up with management. Consideration of aging, thermal fatigue, structural fatigue, corrosion, etc. damage mechanisms is prudent for any reliability engineer. With all these competing modes we want to focus on the ones with the highest risk. The models help us decide where to focus our efforts, and sometimes comprehend the physics, but they really only tell us within an order of magnitude when the product might fail. Analysis simply bounds our uncertainty to a perceived tolerable level.
Paul says
It may well be true that activation energy isn’t constant with temperature, and it is almost certainly true that each physical failure mode has its own activation energy.
While direct measurement is the only way to measure activation energy is by test, but the tester must be vigilant. Frank Nash of Bell Labs pointed this out many years ago in a talk he gave. He agreed that you can do valid accelerated testing, but that at very high temperatures (or other high stresses) there is the risk that a failure mode that takes a very long time to appear at low temperatures may be activated. As I recall, he had test data that predicted that a certain component had an expected life greater than the known age of the universe based on very highly accelerated testing.
Fred Schenkelberg says
Hi Paul,
thanks for another great comment.
When I first learned about the Arrhenius rate equation it was from a chemist. He said it’s not just temperature, it also includes the availability of reactants. For an oxidation process we can accelerate it with higher temperature and / or with having more O2 available (so we increased pressure at room temperature for a ALT, which proved out to work very well.
Going back to first principles and fully understanding the failure mechanism is really our best course.
As you mention, there are limits. And, we must ‘stay real’ and cautious as we attempt to predict the future.
Cheers,
Fred
marzieh says
hi,i am studying on reliability of photovoltaic systems,i want to know the relationship between the failure rate of an electronic board with ambient temperature,can i use the Arrhenius model?
Fred Schenkelberg says
Hi Piri,
A PV system may have dozens if not hundreds of failure mechanisms, so of which respond to temperature according to the Arrhenius rate equation. First determine which failure mechanisms you wish to model, then determine the appropriate formula.
For electronics, temperature is a common stress. And, again there are many ways that temperature can cause failure – the Arrhenius equation relies on the activation energy term, which is determined by the failure mechanisms rate change due to temperature change.
Using Arrhenius in general with a general activation energy is a common and faulty approach. You’ll calculate something and it may or may not be meaningful. Do the research first.
Cheers,
Fred
Paul says
0.7 eV is a commonly accepted value, but it is only a general guide. If you want to do accurate accelerated life testing, you should look into doing your own test plans and data analysis. Nelson’s book “Accelerated Life Testing” is a good reference.
A couple of points should be kept in mind. First, the Arrhenius model was developed in chemistry to describe how the rate of chemical reactions change with temperature. Also, Frank Nash (formerly of Bell Labs) did some excellent work on accelerated testing that indicates that you cannot just raise the temperature to an arbitrary level since you may activate different failures modes at very high temperatures that don’t appear as field failures.
There are other nasties lurking beneath the stones to be aware of. Accelerated testing can be very useful and there is no reason to avoid it. But it’s not “one size fits all.”
Paul says
Also, you may find that current, voltage, or other stresses besides or in conjunction with temperature may be useful in looking for failure modes.
Jack says
Hi Fred,
I am new to the reliability field and done a lot of research on the subject. I was hired to develop various test plans. Recently a customer demanded a test lasting a few thousand hours at a certain temperature a RH=85% which I see as a DH (Damp Heat) test. The equipment is capable of doing these tests, but we want to help the customer to accelerate this test. I researched on Arhenius model which could accelerate the test significantly. I would probably end up using the 0,7eV for activation energy since relevant data are almost impossible to find. My question is, can Arhenius model be used when the RH is held constant (but high 85%)? Why?
Fred Schenkelberg says
Hi Jack, good question and one that may require a textbook or two to fully respond. While I work on the books, I recommend looking at the presentation by Bhanu Sood, NASA, at http://ntrs.nasa.gov/search.jsp?R=20160003384 which goes into detail on failure analysis. Setting up an accelerated test requires knowing which failure mechanism you are evaluating.. thus you can select the appropriate model and activation energy. Not knowing the activation energy is not a justification to assume it’s 0.7eV – that is worse than simply guessing, imho.
Second as another example of failure mechanisms and models, see the paper by Disch and Durant, SEMETECH, at http://www.sematech.org/docubase/document/3955axfr.pdf
They highlight the many ways ICs can fail and the many models (up till 2000, and there are more in the intervening years) for the various material and process combinations that impact failure mechanism behaviors.
You can just do the test, everything passes, and your customer is happy, although misled on whether or not their device is reliable.
Or you can help them understand what is likely to fail given their technology and application, then design a test, confirm it is right, and conduct a meaningful accelerated test.
Testing is done to learn or confirm something. To help people make decisions. Doing a poor job at this wasted money and reputation.
I’m pretty sure this isn’t the answer you wanted to hear – tough. Realibility testing is expensive and should result in meaningful information that adds value. It take understanding and work to get right.
Just getting started in the field just means you have some more reading and research to do – it’s possible and the information is out there… just keep asking questions and move forward. cheers, Fred
Paul Franklin says
Well said, Fred. I would also note that the Arrhenius relationship was developed for chemical reactions. To claim that it transfers to electronics is likely to lead to the “Erroneous Relationship.” That’s not to say that temperature relationships don’t exist; they almost surely do. But even if a failure mode is Arrhenius, it does not follow (as you correctly point out) that activation energy has to be measured.
As Wayne Nelson points out in his papers and books, the only way to know is to measure. If you don’t measure, you don’t know.
If you are starting with analysis, then it is most appropriate to think about failure modes for components, and roll that understanding up into the product. Since these failure modes are all competing, it is essential to figure out which ones are important (i.e., likely to result in failures and at what point in the life cycle), and use whatever models seem appropriate to understand why and how they can happen. When that’s done, designers have useful feedback because the reliability engineer can identify the failure mode and its drivers.
Fred Schenkelberg says
Hi Paul, spot on for the need to measure (chemist are our friends here) activation energy. Plus, many failure mechanisms within electronics are chemical reactions. Discussion, bond breaking, migration, creep, embrittlement, etc all can have chemical elements that are accurately modeled with Arrhenius.
Plus the Arrhenius model does work in many cases as a simple empirical model – it’s pretty flexible… yet should be done with caution and sufficient characterization.
Cheers,
Fred
Fred Schenkelberg says
Well said Paul, thanks!
Hilaire Perera says
Failure Mechanisms and Models for Semiconductor Devices JEP122E March 2009 (Revision of JEP122D, October 2008) Originaly published as JEP122D.01
This publication provides a list of failure mechanisms and their associated activation energies or acceleration factors that may be used in making system failure rate estimations when the only available data is based on tests performed at accelerated stress test conditions. The method to be used is the Sum-of-the-Failure-Rates method.
The models apply primarily to the following:
a) Aluminum (doped with small amounts of Cu and/or Si) and copper alloy metallization
b) Refractory metal barrier metals with thin anti-reflection coatings
c) Doped silica or silicon nitride interlayer dielectrics, including low dielectric constant materials
d) Poly silicon or “salicide” gates (metal-rich silicides such as W, Ni & Co to decrease resistivity)
e) Thin SiO2 gate dielectric
f) Silicon with p-n junction isolation
acceleration factors that may be used in making system failure rate estimations when the only available data is based on tests performed at accelerated stress test conditions. The method to be used is the Sum-of-the-Failure-Rates method.
The models apply primarily to the following:
a) Aluminum (doped with small amounts of Cu and/or Si) and copper alloy metallization
b) Refractory metal barrier metals with thin anti-reflection coatings
c) Doped silica or silicon nitride interlayer dielectrics, including low dielectric constant materials
d) Poly silicon or “salicide” gates (metal-rich silicides such as W, Ni & Co to decrease resistivity)
e) Thin SiO2 gate dielectric
f) Silicon with p-n junction isolation
Fred Schenkelberg says
Hi Hilaire, just to add the link
http://www.jedec.org/sites/default/files/docs/JEP122F.pdf
You will have to register with JEDEC to be able to download the standard.
If you are working with the recent silicon technology (smallest gate sizes) check the current literature for updates to the information on specific failure mechanisms – as the technology progresses, the activation energies and what dominates various failure mechanisms changes.
Cheers,
Fred
John Cerny says
As someone who is new to the reliability realm after 25 years of RF design (2-way public safety radios), I am seeking solid resources to help me get up to speed quickly. I have been asked to find the MTTF and FIT for a couple of my company’s products, and I don’t have the software to make the calculations. I spoke with reps from three companies, including DfR, Relyence, and t-cubed, and it appears there is a growing disagreement surrounding the accuracy of the the old MIL standards to run the calculations. As someone who is new, please suggest how to navigate this area.
By the way, I know the number of failures for our product in over 283k hours, assuming the part was running at 85C. Based on my remedial understanding, since the parts were not tested at, say 125C, I can’t use the Arrhenius equation to calculate MTTF and FIT. Is this correct? Thanks for your consideration. It’s difficult starting over!
Regards,
John
Fred Schenkelberg says
Hi John,
I wasn’t sure how to respond to your note and frankly didn’t remember the note should have a response till now.
Welcome to the realm of reliability and I suspect much of your design work touched many aspects of reliability, so I suspect you will find many things familiar.
You really don’t need software to make MTBF or FIT calculation, as the math is very simple and often done by hand or with a simple spreadsheet.
I suspect I know where the various entities land concerning the Mil Std and similar parts count approaches – and I suggest not wasting your time with such silliness. The Mil Hdbk 217 is nearing 25 years without an update and the methodology has long been proven to be a very poor and misleading approach to estimating product or system reliability. In short, it is worthless, and in many situations less than worthless. In my opinion, there is no debate around the accuracy of parts count methods – there is no foundation and plenty of evidence of absolutely no accuracy of parts count, or stress parts count, or similar schemes for reliability prediction – unless you are selling software supporting the use of parts count.
The issue around the use of the Arrhenius equation has more to do with understanding the specific failure mechanism and the associated activation energy, not what temperatures you used for testing. Just using the Arrhenius without full knowledge of the failure mechanism at play and the associated activation energy for the chemistry involved is risky – given the nature of the equation a small change in activation energy can lead to wildly different results.
check out accendoreliability.com for loads of information around reliability – especially check out the series of short tutorial I write under CRE Preparation – that may help you come up to speed with reliability stuff.
Cheers,
Fred
Paul Franklin says
Even more than that, at least in my experience. Two things to keep in mind. First, the physics and chemistry of a failure mode may not follow the Arrhenius model. Second, even if the Arrhenius equation applies, each failure mode will generally have its own activation energy. You can make the general case that increased thermal stress increases the rate of occurrence of failures, and in the absence of data, Arrhenius provides a rough guide (but relying on an acceleration factor of 8.2–or whatever–is unlikely to be accurate).
It’s also worth keeping in mind that people often misinterpret the results of accelerated tests. It’s tempting to say that if a test runs 30 days, with an acceleration factor of 8, and 50 units under test with no failures, then you’ve demonstrated that the MTBF exceeds 288,000 hours. That’s misleading. What you have done is look for early life failures (8 months, if you believe your acceleration factor) with 50 trials. And that’s not an exponential model, but a Poisson model.
That said, I believe that RDT is a useful tool, and you can stress a product to make failures occur faster. Temperature doesn’t accelerate every failure mode, and so you have to be aware of the physics and chemistry and work accordingly.