Failing badly

Fails with a catastrophic result or without warning

Failing badly and failing well are concepts in systems security and network security (and engineering in general) describing how a system reacts to failure. The terms have been popularized by Bruce Schneier, a cryptographer and security consultant.[1][2]

Failing badly

A system that fails badly is one that has a catastrophic result when failure occurs. A single point of failure can thus bring down the whole system. Examples include:

  • Databases (such as credit card databases) protected only by a password. Once this security is breached, all data can be accessed.
  • Fracture critical structures, such as buildings or bridges, that depend on a single column or truss, whose removal would cause a chain reaction collapse under normal loads.
  • Security checks which concentrate on establishing identity, not intent (thus allowing, for example, suicide attackers to pass).
  • Internet access provided by a single service provider. If the provider's network fails, all Internet connectivity is lost.
  • Systems, including social ones, that rely on a single person, who, if absent or becomes permanently unavailable, halts the entire system.
  • Brittle materials, such as "over-reinforced concrete", when overloaded, fail suddenly and catastrophically with no warning.
  • Keeping the only copy of data in one central place. That data is lost forever when that place is damaged, such as the 1836 U.S. Patent Office fire, the American 1973 National Personnel Records Center fire, and the destruction of the Library of Alexandria.

Failing well

A system that fails well is one that compartmentalizes or contains its failure. Examples include:

  • Compartmentalized hulls in watercraft, ensuring that a hull breach in one compartment will not flood the entire vessel.
  • Databases that do not allow downloads of all data in one attempt, limiting the amount of compromised data.
  • Structurally redundant buildings conceived to resist loads beyond those expected under normal circumstances, or resist loads when the structure is damaged.
  • Computer systems that restart or proceed to a stopped state when an invalid operation occurs.[3]
  • Access control systems that are locked when power is cut to the unit.[3]
  • Concrete structures which show fractures long before breaking under load, thus giving early warning.
  • Armoured cockpit doors on airplanes, which confine a potential hijacker within the cabin even if they are able to bypass airport security checks.[1]
  • Internet connectivity provided by more than one vendor or discrete path, known as multihoming.
  • Star or mesh networks, which can continue to operate when a node or connection has failed (though for a star network, failure of the central hub will still cause the network to fail).
  • Ductile materials, such as "under-reinforced concrete", when overloaded, fail gradually – they yield and stretch, giving some warning before ultimate failure.
  • Making a backup copy of all important data and storing it in a separate place. That data can be recovered from the other location when either place is damaged.

Designing a system to 'fail well' has also been alleged to be a better use of limited security funds than the typical quest to eliminate all potential sources of errors and failure.[4]

See also

  • Fail-safe – Design feature or practice
  • Fault tolerance – Resilience of systems to component failures or errors
  • Fail-deadly – Concept in nuclear military strategy
  • Resilience (network) – Systems with high up-time, a.k.a. "always on"Pages displaying short descriptions of redirect targets
  • Resilience (engineering and construction) – Infrastructure design able to absorb damage without suffering complete failure

References

  1. ^ a b Homeland Insecurity Archived 2011-09-28 at the Wayback Machine, Atlantic Monthly, September 2002
  2. ^ David Hillson (29 March 2011). The Failure Files: Perspectives on Failure. Triarchy Press. p. 146. ISBN 9781908009302.
  3. ^ a b Eric Vanderburg (February 18, 2013). "Fail Secure – The right way to fail". PC Security World. Archived from the original on October 27, 2014. Retrieved November 11, 2014.
  4. ^ Failing Well with Information Security Archived 2008-10-14 at the Wayback Machine - Young, William; Apogee Ltd Consulting, 2003