Robert V. Binder

Competent, Mediocre, or Dangerous?

Homer Simpson and the Quality Assurance EngineerA Chicago Tribune article recounts how a software bug in an infusion pump lead to brain-death for a patient in 2009 (“Medical Industry Taking Hard Look at Software Faults,” Christine Mai-Duc, Chicago Tribune, August 31, 2011, p. 19)

It reports that the US Food and Drug Administration (FDA), which regulates and monitors the development, testing, and marketing of medical devices, has been criticized for not requiring more testing or allowing release of untested products. The article notes that US Federal Aviation Administration (FAA) regulations for aviation software are more rigorous, and quotes U Mass Professor Kevin Fu as stating that the FDA “was caught off-guard as to how significant software would be.” I’ve worked on several FDA-regulated software projects over the last fifteen years, and I certainly would not characterize the FDA as failing to understand and act on the hazards of software defects. FDA guidance for software development is quite clear about the need for comprehensive testing.

Both the FAA and the FDA specify general requirements for software process and artifacts that may be met in many ways. The FDA calls for a documented quality management system that includes certain basic software development activities, including software testing. Producers must be able to prove that the documented process meets all the general requirements and is routinely and correctly followed.  Click here to view the details of the FDA guidance. The software process and artifacts of FAA DO-178b are very similar.

However, DO-178b is prescriptive about one narrow aspect of testing: it requires that tests reach all combinations of software conditions defined in a program. Interestingly, the FDA regulations call for thorough in-use (clinical) testing (“validation”), which has no direct analog in the FAA regulation.

The Tribune article concludes by noting that “Fu, Pfleeger, and others believe implementing testing standards and engineering practices prevalent in other software-reliant industries, such as avionics, could diminish some of the risks.”

Although requiring the FAA’s higher code coverage could diminish some risks, I doubt that it alone would make an appreciable difference in the number of serious software defects in medical devices. The Tribune story states that the pump failure was a result of a “memory buffer overflow” that occurred when “all three of the pump’s channels failed.” Compliance with DO-178b would not necessarily have revealed that bug. Also, DO-178b predates object-oriented programming and the condition test coverage it calls for does not provide any better chance of revealing the kind of bugs that are specific to object-oriented programming languages like C++ or Java. (A revision to address these and other issues, DO-178c,  is near release.) The article confuses post-release product evaluation under FDA’s 510k regulation with pre-release software testing.  The 510k process calls for evaluating a new product with comparable products, after it has completed full development testing. Similarly, the FAA certification process requires that flight software has been developed in compliance with DO-178b prior to certification of an entire aircraft.

Although it is certainly true that quality cannot be tested-in to software systems, very high-reliability software in critical systems can be achieved.  Software testing is essential for this achievement. None of the tools and methods needed to do this are secret, proprietary, extraordinarily expensive, or impractical. Nor are they the exclusive province of avionics developers or the result of regulatory extent/compliance. In all cases that I know of, they result from applying well-known lessons of software engineering consistently and effectively.

  • Some firms producing medical devices, avionics, and other critical systems routinely follow practices that produce fielded devices with very high reliability. In my experience, they are a minority. Let’s call them competent.
  • Most firms slouch into a haphazard mediocrity but usually avoid serious consequences by dumb luck. Let’s call them mediocre.
  •  And then there are the firms whose culture of ignorance, incompetence, sloppiness, and/or cost-expediency routinely produces shoddy software. Software produced this way has killed people and caused huge financial losses. Let’s call them dangerous.

Is it predestined that only a lucky few are competent? That the serious risk posed from mediocre or dangerous producers is no more avoidable than bad weather and earthquakes? In a word, no.

What does it take to be competent?  My list of software development practices for very high reliability follows. I cannot think of any merely annoying or catastrophic software bug inflicted on society that would not have been found and removed before it escaped, had the producer done all of the following.

  • There is a requirements specification that defines every behavior of the system, including all “non-functional” requirements.
  • The requirements specification complies with IEEE standard 830.
  • Every requirement that specifies an “effect” is linked to all of the requirements that can “cause” this effect.
  • An acceptable response to the absence or corruption of each input or necessary resource is defined.
  • The source code of the system has been evaluated by one or more static analyzers and all source code anomalies have been corrected, or deemed harmless by rigorous inspection.
  • Every source code module has an automated test suite.
  • In all circumstances where there are bounded inputs, module test suites exercises every on and off boundary at least once.
  • In all circumstances where two or more input variables may be given in combination, module test suites include at least one test for every pair-wise combination.
  • The test suite achieves 100% branch and loop coverage of all controllable functions.
  • Functions that cannot be reached in test are proven to be correct and/or rigorously inspected.
  • All module test suites are run automatically after every build.
  • All requirements, source code, test suites, and documentation are maintained with a configuration management system.
  • There is an operational (usage or use case) profile for the system with estimated usage frequencies.
  • All requirements are mapped to at least one operation.
  • Sequentially constrained operations (e.g., must logon before logoff) are specified with a state machine(s).
  • The operational profile has been validated with objective customer/user review.
  • A minimum system reliability for release is set (e.g., no more than one failure in 1 million inputs.)
  • A system test suite is produced and executed such that:
    • If there are very low frequency critical operational modes (e.g., plant shut down owing to an earthquake), these modes are tested separately.
    • There is at least one test case for every requirement (including “non-functional” requirements) and it passes.
    • Every input boundary is exercised at least once.
    • Every pair-wise input combination is exercised at least once.
    • Every test case is assigned to an operation and the test cases are in proportion to their relative frequency.
    • Every round-trip constraint path is tested at least once.
    • Every constraint sneak path is tested at least once.
    • Every cause-effect linkage is tested at least once.
  • Performance testing is done to evaluate response and capacity requirements at nominal, peak, and overload levels. The variation in load for these levels also follows the operational profile (multi-dimensional testing.)
  • The system is released only when the failure rate for test suites generated under an operational profile for each mode indicate an acceptable probability that the target reliability rate will be achieved.


Achieving these goals will result in very reliable software systems. Nothing in this list is new, exotic, proprietary, or prohibitively expensive. None of it is specific to any programming language or platform. By the way, none of this is required or even suggested by Agile development practices.

  • Competent producers do most of the above (and more) while achieving economic success.
  • Mediocre producers are wasting money and unnecessarily carrying substantial business risk. They can be transformed into competent producers (I have assisted in this many times), but have to be willing to support and pay for the transformation, and then make it stick. This is difficult as it requires leadership to sustain the investment and difficulties that occur before the payoff is achieved and the improvement becomes routine.
  • Dangerous producers can be improved, but that typically doesn’t happen until and unless the management is replaced or has a quality epiphany. The epiphany is often preceded by a high-profile software failure.

Here’s a very simple self-assessment for software producers. If you’re doing at least 80% or better of the list, consider yourself competent. If you’re doing at least half, mediocre. Less than half, dangerous. If you’re close to a cutoff, assume the lower side, unless you can make strong case to the contrary. If you work for a dangerous firm that produces safety-critical products, please let us know, so we can stop using them.

 



3 Comments


  1. Hi Robert, very nice post! I agree with all the software development practices you listed above, even if they may differ in both priority and costs. Perhaps the list is not complete. I do not see any direct measure of the overall design quality. I perceive a shift in your list toward requirements, external quality characteristics and (black-box/system) testing. In my experience, especially in SMEs, an indirect (but relevant) cause of costs is the absence of an explicit project. All the design decisions are hidden in the code, becoming implicit or emergent, as agile developers love to say. In the real world, this type of emergence however is more a symptom of “not keeping the responsibility of the design”. Thus, I think that it would be interesting to list also some internal quality characteristics. Software systems will be maintained, bugs will be fixed, requirements will be updated/extended, and all these activities have also an impact on internal quality. What do you think?

    – Andrea

    • I completely agree with you about the importance of design practices and good design. The Agile approach to software development is focused on social psychology — it has nothing to say about technical matters. Although it could be argued that this is technology neutral, in practice it encourages sloppy, ad hoc, and idiosyncratic development. These problems are not new — I’ve been fighting other factors that have encouraged sloppy, ad hoc, and idiosyncratic development since I started advocating structured analysis and design, 30 years ago.

      There are many metrics and aspects for internal quality — I think Grady Booch’s recent work on characterizing architecture provides some great insights. Requirements and design two of the six main determinants of testability.

      Poorly designed and organized software systems result in unnecessary cost to develop and maintain. They contain more bugs and fail more often. The time, money, and human energy spent in this has an opportunity cost as well, as it could have been spent on creating other features or getting a product to market sooner.

      Thanks for your thoughtful comments.
      Bob

  2. Bob – nice blog. I’d like to provide a few comments taken from some reports that I’ve produced on this subject.

    One of the most highly regarded, and used as safety evidence for DO-178B projects, is the MC/DC test measurement. Some experts doubt that MC/DC testing is sufficient verification of safety-critical software because a test case that is 100% MC/DC-compliant does not guarantee execution of every possible input value by the software. Other criticisms, from industry, are the amount of effort and associated high cost needed to derive a test case that meets the full 100% coverage MC/DC. Dupuy and Leveson indicate that the MC/DC measure is a superior test-effectiveness indicator, and the additional costs are not significantly more expensive than less rigorous structural testing [Dupuy 2000]. However, it is possible to misplace test effort into meeting the MC/DC criteria rather than testing the system software safety or constraints.

    Arguments have been made that the MC/DC criterion is unrelated to the safety of the software and does not find errors that are not detected by functional testing. However, [Dupuy 2000] and [Leveson 1986] found that the test cases generated to satisfy the MC/DC coverage requirement detected important errors not detectable by functional testing. In addition, they found that although MC/DC coverage testing took a considerable amount of resources (about 40% of the total testing time), it was not significantly more difficult than satisfying MC/DC, and it found errors that could not have been found with that lower level of structural coverage [Dupuy 2000].

    Finally, DO-178C is suppose to address guidelines for object-oriented coding, but there are some guidelines Handbook for Object-Oriented Technology in Aviation (OOTiA) provided here:
    http://www.faa.gov/aircraft/air_cert/design_approvals/air_software/oot/

    [Dupuy 2000] Dupuy, A., and Leveson, M. “An Empirical Evaluation of the MC/DC Coverage Criterion of the HETE-2 Satellite Software.” In Proceedings, Digital Aviations Systems Conference (DASC). October 2000.

    [Leveson 1986] Leveson, N.G. “Software Safety: Why, What, and How.” Computing Surveys 18, 2 (June 1986).

Trackbacks

  1. Another Day, Another $440 Million - Robert Binder's Blog and Professional Portfolio

Leave a Reply

Comment moderation is enabled, no need to resubmit any comments posted.