While aerospace and defense lagged in 2019, analysts at Deloitte have said that demand in these industries has grown in 2020—despite initial COVID-19-related setbacks. Reliability in systems engineering takes center stage in A&D, automotive, medical, and other mission-critical industries.
But lifespan and failure rates aren’t only important in these high-stakes fields. Designers of any electronic system must consider component reliability, though how they prioritize reliability varies among applications.
The term reliability in electrical engineering refers to the life cycle, management, and efficacy of systems. And, as mentioned, reliability is paramount in mission-critical systems, where the failure rate needs to be as low as possible.
Failure rate vs. time, often referred to as a “bathtub curve.” Image used courtesy of Susan Stanley
In general, every reliability prediction has a basis in failure rates. These rates tell us about the anticipated number of times that a component or system will fail given a specific time period. Reliability engineers will perform calculations based on complex models—including temperature, mechanical stress, and other types of data—to measure the reliability of a component.
In this article, we’ll look at different metrics used to quantify reliability.
Popular Metrics of Reliability
Some of the most common reliability terms based on methods and procedures for lifecycle predictions of a product are Mean Time Between Failure (MTBF), Mean Time to Repair (MTTR), Mean Time to Failure (MTTF), and Failure in Time (FIT). Each one of these methods provides a numeric value based on a compilation of data to quantify a failure rate and the resulting time of expected performance.
- MTBF: Mean Time Between Failure describes the amount of failures per million hours for a product. Arguably the most common inquiry about a product’s life span, MTBF is especially important in mission-critical applications.
- MTTR: Mean Time to Repair describes the time needed to repair a failed hardware module. Generally speaking, repairing means physically replacing a failed hardware part. Long repair times (which theoretically increase installation costs) are caused by downtime until the new part arrives and the time to schedule the installation.
- MTTF: Mean Time to Failure is used to describe reliability for non-repairable systems. It represents the average time expected until a component’s first failure. MTTF is a statistical value meant to represent the mean over a long period of time and a large number of units. Strictly speaking, MTBF should be used only to describe repairable items, while MTTF should be used for non-repairable items.
- FIT: Similar to MTBF, Failure In Time reports the number of expected failures per one billion hours of operation for a device. This term is used very commonly in the semiconductor industry.
An Example of Reliability Metrics from Transphorm
As an example of how these terms come into play in real system designs, Transphorm, a company that produces high-performance GaN transistors for power systems, recently made headlines in the reliability engineering world.
The company announced that its GaN platform offers a FIT rate of < 1 failure per billion hours in real-world applications. This is notable news since SiC transistors, a competing power transistor type, are reported to have a FIT of about 4.1.
Transphorm’s GaN reliability ratings. Image used courtesy of Transphorm
One confounding variable in the finding is that SiC transistors are a much more mature device. Transphorm’s FIT calculation was based on about 10 billion field hours of operation whereas the SiC FIT was based on over 1 trillion field hours. While 10 billion is not a small number, it certainly remains to be shown how reliability will vary with more testing hours.
Making Sense of Metrics
In the world of mission-critical applications like aerospace and defense, the importance of reliability cannot be understated. Even in consumer applications, reliability will affect device lifetime and customer satisfaction. Reliability in components, especially in power systems, is essential; without these subsystems, the whole design would fail. Showing a significantly more reliable device (four times improvement in FIT) can be a game-changer for reliable system design.
Understanding basic terms in reliability engineering can help all designers as they solidify their BOM. It can also help engineers make sense of new metrics, like this one from Transphorm, that suggests the efficacy of one major category of component (in this case, GaN transistors) over another (like SiC transistors).