Root cause analysis

Root cause analysis

Root cause analysis (RCA) is a class of problem solving methods aimed at identifying the root causes of problems or events.

Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences.

The practice of RCA is predicated on the belief that problems are best solved by attempting to address, correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is more probable that problem recurrence will be prevented. However, it is recognized that complete prevention of recurrence by one corrective action is not always possible.

Nevertheless, in the U.S. nuclear power industry the NRC requires that "In the case of significant conditions adverse to quality, the measures shall assure that the cause of the condition is determined and corrective action taken to prevent repetition." [10CFR50, Appendix B, Criterion XVI, Sentence 2)] In practice more than one "cause" is allowed and more than one corrective action is not forbidden.

Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement.

RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a pro-active method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management.

Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies for performing RCA analysis. However, several very-broadly defined approaches or "schools" can be identified by their basic approach or field of origin: safety-based, production-based, process-based, failure-based, and systems-based.

Despite the different approaches among the various schools of root cause analysis, there are some common principles. It is also possible to define several general processes for performing RCA.


General principles of root cause analysis

  1. The primary aim of RCA is to identify the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences. ("Success" is defined as the near-certain prevention of recurrence.)
  2. To be effective, RCA must be performed systematically, usually as part of an investigation, with conclusions and root causes identified backed up by documented evidence. Usually a team effort is required.
  3. There may be more than one root cause for an event or a problem, the difficult part is demonstrating the persistence and sustaining the effort required to develop them.
  4. The purpose of identifying all solutions to a problem is to prevent recurrence at lowest cost in the simplest way. If there are alternatives that are equally effective, then the simplest or lowest cost approach is preferred.
  5. Root causes identified depend on the way in which the problem or event is defined. Effective problem statements and event descriptions (as failures, for example) are helpful, or even required.
  6. To be effective, the analysis should establish a sequence of events or timeline to understand the relationships between contributory (causal) factors, root cause(s) and the defined problem or event to prevent in the future.
  7. Root cause analysis can help to transform a reactive culture (that reacts to problems) into a forward-looking culture that solves problems before they occur or escalate. More importantly, it reduces the frequency of problems occurring over time within the environment where the RCA process is used.
  8. RCA is a threat to many cultures and environments. Threats to cultures often meet with resistance. There may be other forms of management support required to achieve RCA effectiveness and success. For example, a "non-punitory" policy towards problem identifiers may be required.

Evaluating root cause analysis

Root Cause Analysis Reports, like other 'deliverables' can vary in quality. Each stakeholder can have their own qualitative and quantitative acceptance criteria. There are some general possiblilities for evaluating root cause analysis outputs.

First: Is it readable? If it is readable it will be grammatically correct, the sentences will make sense, it will be free on internal inconsistencies, terms will be defined, it will contain appropriate graphics, and the like.

Second: Does it contain a complete set of all of the causal relationships? If it did contain a "complete set of all of the causal relationships" one could (at least): 1. Trace the causal relationships from the harmful outcomes to the deepest conditions, behaviors, actions, and inactions. 2. Show that the important attributes of the harmful outcomes were completely explained by the deepest conditions, behaviors, actions, and inactions.

General process for performing and documenting an RCA-based Corrective Action

Notice that RCA (in steps 3, 4 and 5) forms the most critical part of successful corrective action, because it directs the corrective action at the true root cause of the problem. The root cause is secondary to the goal of prevention, but without knowing the root cause, we cannot determine what an effective corrective action for the defined problem will be.

  1. Define the problem or describe the event factually. Include the qualitative and quantitative attributes (properties) of the harmful outcomes. This usually includes specifying the natures, the magnitudes, the locations, and the timings.
  2. Gather data and evidence, classifying that along a timeline of events to the final failure or crisis. For every behavior, condition, action, and inaction specify in the "timeline" what should have been when it differs from the actual.
  3. Ask "why" and identify the causes associated with each step in the sequence towards the defined problem or event. "Why" is taken to mean "What were the factors that directly resulted in the effect?"
  4. Classify causes into causal factors that relate to an event in the sequence, and root causes, that if applied can be agreed to have interrupted that step of the sequence chain.
  5. If there are multiple root causes, which is often the case, reveal those clearly for later optimum selection. identify all other harmful factors that have equal or better claim to be called "root causes."
  6. Identify corrective action(s) that will with certainty prevent recurrence of each harmful effect, including outcomes and factors. Check that each corrective action would, if pre-implemented before the event, have reduced or prevented specific harmful effects.
  7. Identify solutions that effective, prevent recurrence with reasonable certainty with consensus agreement of the group, are within your control, meet your goals and objectives and do not cause introduce other new, unforeseen problems.
  8. Implement the recommended root cause correction(s).
  9. Ensure effectiveness by observing the implemented recommendation solutions.
  10. Other methodologies for problem solving and problem avoidance may be useful.
  11. Identify and address the other instances of each harmful outcome and harmful factor.

Root cause analysis techniques

  • Re-enactment-for example having the participants in the event do it over the way it was done (with due care to avoid the same harmful outcomes).
  • Re-enactment using a computer or a simulator.
  • Comparative re-enactment-doing it over the right way as well as the way it was actually done.
  • Re-construction-reassembling all of the available accident debris to see clues as to how the disassembly occurred.
  • Barrier analysis - a technique often used in process industries. It is based on tracing energy flows, with a focus on barriers to those flows, to identify how and why the barriers did not prevent the energy flows from causing harm.
  • Bayesian inference
  • Change analysis - an investigation technique often used for problems or accidents. It is based on comparing a situation that does not exhibit the problem to one that does, in order to identify the changes or differences that might explain why the problem occurred.
  • "Delta Work"-comparing the way an episode did happen with the way it was intended to happen.
  • Current Reality Tree - A method developed by Eliahu M. Goldratt in his theory of constraints that guides an investigator to identify and relate all root causes using a cause-effect tree whose elements are bound by rules of logic (Categories of Legitimate Reservation). The CRT begins with a brief list of the undesirables things we see around us, and then guides us towards one or more root causes. This method is particularly powerful when the system is complex, there is no obvious link between the observed undesirable things, and a deep understanding of the root cause(s) is desired.
  • Failure mode and effects analysis
  • Fault tree analysis
  • Five whys emphasizes recursive depth, using the heuristic that you're probably not done until you've looked five levels deep.
  • Ishikawa diagrams emphasize initial breadth, using a checklist of types of causes that should be considered.
  • Why-Because analysis emphasizes recursive breadth, using the concepts of necessary and sufficient causes.
  • Pareto analysis "80/20 rule"
  • RPR Problem Diagnosis - An ITIL-aligned method for diagnosing IT problems.
  • Kepner-Tregoe Approach
  • Project Management Approaches. An adverse event can be viewed as a project whose final product was harm. The event can be understood by re-casting it in the classical Project Management devices such as Work Breakdown Structure, Gantt Chart, and Planning Logic Network.

Common cause analysis (CCA) common modes analysis (CMA) are evolving engineering techniques for complex technical systems to determine if common root causes in hardware, software or highly integrated systems interaction may contribute to human error or improper operation of a system. Systems are analyzed for root causes and causal factors to determine probability of failure modes, fault modes, or common mode software faults due to escaped requirements. Also ensuring complete testing and verification are methods used for ensuring complex systems are designed with no common causes that cause severe hazards. Common cause analysis are sometimes required as part of the safety engineering tasks for theme parks, commercial/military aircraft, spacecraft, complex control systems, large electrical utility grids, nuclear power plants, automated industrial controls, medical devices or other safety safety-critical systems with complex functionality.

A major issue with common cause analysis is that it often depends on previously completed weak, ineffective, and erroneous root cause analyses on individual events.

Basic elements of root cause using Management Oversight Risk Tree (MORT) Approach Classification

  • Materials
    • Defective raw material
    • Wrong type for job
    • Lack of raw material
  • Man Power
    • Inadequate capability
    • Lack of Knowledge
    • Lack of skill
    • Stress
    • Improper motivation
  • Machine / Equipment
    • Incorrect tool selection
    • Poor maintenance or design
    • Poor equipment or tool placement
    • Defective equipment or tool
  • Environment
    • Disordered workplace
    • Poor job design and/or layout of work
    • Surfaces poorly maintained
    • Inability to meet physical demands of the task
    • Forces of nature
  • Management
    • Lack of management involvement
    • Inattention to task
    • Task hazards not dealt with properly
    • Other (horseplay, inattention....)
    • Stress demands
    • Lack of Process
    • Lack of Communication
  • Methods
    • No or poor procedures
    • Practices are not the same as written procedures
    • Poor communication
  • Management system
    • Training or education lacking
    • Poor employee involvement
    • Poor recognition of hazard
    • Previously identified hazards were not eliminated

See also


Department of Energy, Management Oversight Risk Tree (MORT) NASA

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Apollo Root Cause Analysis — The primary purpose of root cause analysis is to identify the causes of undesirable events such that they do not happen again. Apollo root cause analysis can also be used to identify the causal relationships of desirable events such that they can …   Wikipedia

  • Диаграмма для анализа коренных причин/ROOT-CAUSE ANALYSIS DIAGRAM — используется для определения основных факторов, порождающих результат. По форме напоминает скелет рыбы. Иногда называется диаграммой Ишикавы …   Толковый словарь по информационному обществу и новой экономике

  • Root cause — A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest. Commonly, root cause is used to describe the depth in the causal chain where an intervention could reasonably be implemented to change… …   Wikipedia

  • Cause Mapping — is a problem solving method that draws out, visually, the multiple chains of interconnecting causes that lead to an incident. The method, which breaks problems down specific cause and effect relationships, can be applied to a variety of problems… …   Wikipedia

  • Analysis of Competing Hypotheses — The Analysis of Competing Hypotheses (ACH) provides an unbiased methodology for evaluating multiple competing hypotheses for observed data. It was developed by Richards (Dick) J. Heuer, Jr., a 45 year veteran of the Central Intelligence Agency,… …   Wikipedia

  • analysis — /euh nal euh sis/, n., pl. analyses / seez /. 1. the separating of any material or abstract entity into its constituent elements (opposed to synthesis). 2. this process as a method of studying the nature of something or of determining its… …   Universalium

  • root — Synonyms and related words: IC analysis, accidence, acclaim, affix, affixation, allomorph, ancestors, anchor, antecedents, applaud, base, basis, beat the bushes, bed, bed on, bedrock, beginning, birthplace, boost, bottom, bottom on, bound… …   Moby Thesaurus

  • Fault tree analysis — (FTA) is a failure analysis in which an undesired state of a system is analyzed using boolean logic to combine a series of lower level events. This analysis method is mainly used in the field of safety engineering to quantitatively determine the… …   Wikipedia

  • Analyse de Cause Racine — L analyse de cause racine (en anglais Root cause analysis) part du constat qu il est plus judicieux de traiter les causes d un problème que d en traiter les symptômes immédiats. Puis qu analyser les causes d un problème permet d en déterminer une …   Wikipédia en Français

  • Analyse de cause racine — L analyse de cause racine en anglais : Root cause analysis part du constat qu il est plus judicieux de traiter les causes d un problème que d en traiter les symptômes immédiats. Puis qu analyser les causes d un problème permet d en… …   Wikipédia en Français