Single point of failure

A part whose failure will disrupt the entire system
In this diagram the router is a single point of failure for the communication network between computers.

A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working.[1] SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial system.

Overview

Systems can be made robust by adding redundancy in all potential SPOFs. Redundancy can be achieved at various levels.

The assessment of a potential SPOF involves identifying the critical components of a complex system that would provoke a total systems failure in case of malfunction. Highly reliable systems should not rely on any such individual component.

For instance, the owner of a small tree care company may only own one woodchipper. If the chipper breaks, they may be unable to complete their current job and may have to cancel future jobs until they can obtain a replacement. The owner of the tree care company may have spare parts ready for the repair of the wood chipper, in case it fails. At a higher level, they may have a second wood chipper that they can bring to the job site. Finally, at the highest level, they may have enough equipment available to completely replace everything at the work site in the case of multiple failures.

  • Possible SPOFs in a simple setup
    Possible SPOFs in a simple setup
  • Using redundancy to avoid some SPOFs
    Using redundancy to avoid some SPOFs
  • Completely redundant system without SPOFs (note: assumes generator and grid sources are each rated at N, each UPS is rated at N, and "A/C" and "Electrical" are in and of themselves completely fault tolerant systems)
    Completely redundant system without SPOFs (note: assumes generator and grid sources are each rated at N, each UPS is rated at N, and "A/C" and "Electrical" are in and of themselves completely fault tolerant systems)

Computing

A fault-tolerant computer system can be achieved at the internal component level, at the system level (multiple machines), or site level (replication).

One would normally deploy a load balancer to ensure high availability for a server cluster at the system level. In a high-availability server cluster, each individual server may attain internal component redundancy by having multiple power supplies, hard drives, and other components. System-level redundancy could be obtained by having spare servers waiting to take on the work of another server if it fails.

Since a data center is often a support center for other operations such as business logic, it represents a potential SPOF in itself. Thus, at the site level, the entire cluster may be replicated at another location, where it can be accessed in case the primary location becomes unavailable. This is typically addressed as part of an IT disaster recovery program.

Paul Baran and Donald Davies developed packet switching, a key part of "survivable communications networks". Such networks – including ARPANET and the Internet – are designed to have no single point of failure. Multiple paths between any two points on the network allow those points to continue communicating with each other, the packets "routing around" damage, even after any single failure of any one particular path or any one intermediate node.

Software engineering

In software engineering, a bottleneck occurs when the capacity of an application or a computer system is limited by a single component. The bottleneck has lowest throughput of all parts of the transaction path.

Performance engineering

Tracking down bottlenecks (sometimes known as hot spots – sections of the code that execute most frequently – i.e., have the highest execution count) is called performance analysis. Reduction is usually achieved with the help of specialized tools, known as performance analyzers or profilers. The objective is to make those particular sections of code perform as fast as possible to improve overall algorithmic efficiency.

Computer security

A vulnerability or security exploit in just one component can compromise an entire system.

Other fields

The concept of a single point of failure has also been applied to fields outside of engineering, computers, and networking, such as corporate supply chain management[2] and transportation management.[3]

Design structures that create single points of failure include bottlenecks and series circuits (in contrast to parallel circuits).

In transportation, some noted recent examples of the concept's recent application have included the Nipigon River Bridge in Canada, where a partial bridge failure in January 2016 entirely severed road traffic between Eastern Canada and Western Canada for several days because it is located along a portion of the Trans-Canada Highway where there is no alternate detour route for vehicles to take;[4] and the Norwalk River Railroad Bridge in Norwalk, Connecticut, an aging swing bridge that sometimes gets stuck when opening or closing, disrupting rail traffic on the Northeast Corridor line.[3]

The concept of a single point of failure has also been applied to the fields of intelligence. Edward Snowden talked of the dangers of being what he described as "the single point of failure" – the sole repository of information.[5]

Life-support systems

A component of a life-support system that would constitute a single point of failure would be required to be extremely reliable.

See also

Concepts

  • Cascading failure – Systemic risk of failure
  • Redundancy – Duplication of critical components to increase reliability of a system
  • Bus factor – Concept in risk management
  • Lusser's law – The probability product law of series components
  • Service-level agreement – Official commitment between a service provider and a customer

Applications

  • Kill switch – Safety mechanism to quickly shut down a system
  • Jesus nut – Slang term for the main rotor-retaining nut of some helicopters
  • Reliability engineering – Sub-discipline of systems engineering that emphasizes dependability
  • Safety engineering – Engineering discipline which assures that engineered systems provide acceptable levels of safety
  • Dead man's switch – Equipment that activates or deactivates upon the incapacitation of operator

In literature

  • Achilles' heel – Critical weakness which can lead to downfall in spite of overall strength
  • Hamartia – Protagonist's error in Greek dramatic theory

References

  1. ^ 1: Designing Large-scale LANs – Page 31, K. Dooley, O'Reilly, 2002
  2. ^ Gary S. Lynch (Oct 7, 2009). Single Point of Failure: The 10 Essential Laws of Supply Chain Risk Management. Wiley. ISBN 978-0-470-42496-4.
  3. ^ a b "Crucial, Century-Old, And Sometimes Stuck: Connecticut Bridge Is Key To Northeast Corridor". Connecticut Public Radio, August 8, 2017.
  4. ^ "The Nipigon River Bridge and other Trans-Canada bottlenecks". Global News, January 11, 2016.
  5. ^ "Edward Snowden: the true story behind his NSA leaks". Telegraph.co.uk. Archived from the original on 2022-01-12. Retrieved 2016-12-13.
  • v
  • t
  • e
Basic equipment
Breathing gas
Buoyancy and
trim equipment
Decompression
equipment
Diving suit
Helmets
and masks
Instrumentation
Mobility
equipment
Safety
equipment
Underwater
breathing
apparatus
Open-circuit
scuba
Diving rebreathers
Surface-supplied
diving equipment
Diving
equipment
manufacturers
Access equipment
Breathing gas
handling
Decompression
equipment
Platforms
Underwater
habitat
Remotely operated
underwater vehicles
Safety equipment
General
Activities
Competitions
Equipment
Freedivers
Hazards
Historical
Organisations
Occupations
Military
diving
Military
diving
units
Underwater
work
Salvage diving
  • SS Egypt
  • Kronan
  • La Belle
  • SS Laurentic
  • RMS Lusitania
  • Mars
  • Mary Rose
  • USS Monitor
  • HMS Royal George
  • Vasa
Diving
contractors
Tools and
equipment
Underwater
weapons
Underwater
firearm
Specialties
Diver
organisations
Diving tourism
industry
Diving events
and festivals
Diving
hazards
Consequences
Diving
procedures
Risk
management
Diving team
Equipment
safety
Occupational
safety and
health
Diving
disorders
Pressure
related
Oxygen
Inert gases
Carbon dioxide
Breathing gas
contaminants
Immersion
related
Treatment
Personnel
Screening
Research
Researchers in
diving physiology
and medicine
Diving medical
research
organisations
Law
Archeological
sites
Underwater art
and artists
Engineers
and inventors
Historical
equipment
Diver
propulsion
vehicles
Military and
covert operations
  • Raid on Alexandria (1941)
  • Sinking of the Rainbow Warrior
Scientific projects
Awards and events
Incidents
Dive boat incidents
  • Sinking of MV Conception
Diver rescues
Early diving
Freediving fatalities
Offshore
diving incidents
  • Byford Dolphin diving bell accident
  • Drill Master diving accident
  • Star Canopus diving accident
  • Stena Seaspread diving accident
  • Venture One diving accident
  • Waage Drill II diving accident
  • Wildrake diving accident
Professional
diving fatalities
Scuba diving
fatalities
Publications
Manuals
  • NOAA Diving Manual
  • U.S. Navy Diving Manual
  • Basic Cave Diving: A Blueprint for Survival
  • Underwater Handbook
  • Bennett and Elliott's physiology and medicine of diving
  • Encyclopedia of Recreational Diving
  • The new science of skin and scuba diving
  • Professional Diver's Handbook
  • Basic Scuba
Standards and
Codes of Practice
General non-fiction
Research
Dive guides
Training and registration
Diver
training
Skills
Recreational
scuba
certification
levels
Core diving skills
Leadership skills
Specialist skills
Diver training
certification
and registration
organisations
Commercial diver
certification
authorities
Commercial diving
schools
Free-diving
certification
agencies
Recreational
scuba
certification
agencies
Scientific diver
certification
authorities
Technical diver
certification
agencies
Cave
diving
Military diver
training centres
Military diver
training courses
Surface snorkeling
Snorkeling/breath-hold
Breath-hold
Open Circuit Scuba
Rebreather
  • Underwater photography
Sports governing
organisations
and federations
Competitions
Pioneers
of diving
Underwater
scientists
archaeologists and
environmentalists
Scuba record
holders
Underwater
filmmakers
and presenters
Underwater
photographers
Underwater
explorers
Aquanauts
Writers and journalists
Rescuers
Frogmen
Commercial salvors
Diving
physics
Diving
physiology
Decompression
theory
Diving
environment
Classification
Impact
Other
Deep-submergence
vehicle
  • Aluminaut
  • DSV Alvin
  • American submarine NR-1
  • Bathyscaphe
    • Archimède
    • FNRS-2
    • FNRS-3
    • Harmony class bathyscaphe
    • Sea Pole-class bathyscaphe
    • Trieste II
  • Deepsea Challenger
  • Ictineu 3
  • JAGO
  • Jiaolong
  • Konsul-class submersible
  • Limiting Factor
  • Russian submarine Losharik
  • Mir
  • Nautile
  • Pisces-class deep submergence vehicle
  • DSV Sea Cliff
  • DSV Shinkai
  • DSV Shinkai 2000
  • DSV Shinkai 6500
  • DSV Turtle
  • DSV-5 Nemo
Submarine rescue
Deep-submergence
rescue vehicle
Submarine escape
Escape set
Special
interest
groups
Neutral buoyancy
facilities for
Astronaut training
Other