Business Analysis Guidebook/Root Cause Analysis

What it is/Why Important
Root cause analysis, simply put, is a careful examination of a particular situation to discern the underlying reasons for a specific problem or variance. The Business Analysis Body Of Knowledge (BABOK) defines this as a “structured examination of the aspects of a situation to establish the root causes and resulting effects of the problem”. Depending upon the rigorousness of the examination conducted, it is possible to identify several layers of symptoms before reaching the underlying cause or causes of a particular situation.

When is it Used
Root causes analysis is most commonly affiliated with Problem Solving, although it can also be applied to organizational analysis, variance analysis, process improvement and software bug fixing. Essentially, whenever an outcome is less than ideal, it is generally possible to find a causal relationship or two or more. Given that some of the tools to discern root cause analysis can be subjective, it can often a judgment call as to the underlying contributing factor causing the variance—especially when the system undergoing evaluation is complex.

Questions to Consider
By carefully seeking out the root cause to a particular problem, and then applying some mitigation to the root cause, problems generally go away. By merely treating the symptoms of the problem, the underlying problem is likely to manifest itself in a new way, but not go away. Take for instance (dream up a good example). Often problems may not be severe enough to apply rigorous evaluation. In deciding how deep or quick to dive for the root causes, here are some questions to consider:


 * What are the consequences of this issue/problem? Is it front-page headlines?  Life-threatening or merely annoying?
 * Is this a single occurrence or has it happened before?
 * What is the probability of the situation occurring again?
 * Were there events leading up to the problem/issue that could have served as an early warning signal?
 * Was there a recent change prior to the occurrence which may have directly or indirectly facilitated its occurrence?
 * Is this a system wide type of issue or is it limited to a single office or department?
 * Are there controls in place to detect this type of issue/problem?

Sources of Problems
Root causes can be quite vast. Often it is a series of small problems and not just one single problem. The following list was adapted from Paul Wilson et al.’s book “Root Cause Analysis” published in 1993. Having a list of contributing factors can oftentimes help with identifying the actual root cause.
 * Training (formal and informal)
 * Management Methods (resource and schedule planning)
 * Change Management (Modifications to existing process)
 * Communication (effective or not)
 * External (factors outside of the control of the agency)
 * Design (equipment and systems that support the work)
 * Work Practices (methods used to achieve the task)
 * Work Organization (organizing performance and sequence of tasks)
 * Physical Conditions (factors impacting performance)
 * Procurement (getting necessary resources)
 * Documentation (instructions and procedures)
 * Maintenance/Testing (including preventative maintenance)
 * Man/Machine relationships (alarms and controls in place)

It is important to note that these potential root causes could be symptoms instead, and in some situations, it is possible to have multiple root causes.

Methods
There are a number of tools that can be used in determining the root cause. Each of these methodologies will be explained briefly, a sample chart provided to illustrate the concept, and tools for construction and interpretation will be provided. The tools profiled include:
 * Fishbone diagram
 * Ask why 5 times
 * Check Sheets/Pareto Chart
 * Interrelationship Diagram
 * RPR (Rapid Problem Resolution)

Fishbone Diagram
This tool is also referred to as the Ishikawa Diagram or the Cause and effect Diagram. It was named for Karou Ishikawa who pioneered TQM processes in the 1960s at the Kawasaki shipyards. It derives its common name from the shape of the diagram as evidenced in the Figure below:



To construct a Fishbone Diagram, start with the problem you are addressing near the eye of the fish. From there, identify the primary causes of the problem. Typically, these are the 4 Ps or the 4 Ms, but can be what makes sense for the particular problem at hand. The 4 Ps are People, Procedure, Policy and Plant. The 4 Ms are Man, Machinery, Methods and Material. A sample chart showing an example of why a cup of coffee “could” be bad is as follows:



These diagrams can easily be constructed with pen and paper, and also various charting tools such as Visio (See business processes/cause and effect diagram). To analyze the results, looks for common examples. Is something listed several times? In this instance, no training and poor quality inputs (e.g. bad sugar, dirty cups, etc.) appear to be very common themes to explore further. This is an excellent tool to use in a group setting.

Ask Why 5 Times
While it is easy to jump to a solution, it is often more difficult to pinpoint why something occurred. One of the most commonly used root cause analysis tools is referred to as the “5 Whys”. This is based on the premise of continually asking why. Using the dirty coffee in the previous illustration, you could start with the apparent problem that the coffee is bad. By asking “why is the coffee bad”, one of the first responses could be its weak. The next why would be, “Why is the Coffee weak”, and the reply could be not enough coffee. In asking “why not enough coffee used”, the reply could be we ran out. The asking of “why” continues until you get to possible root causes. To illustrate this concept, see the figure below:



A real life example on using this tool can be found in Washington DC at the Jefferson Memorial. The National Park Service noted that this monument was deteriorating at a faster rate than other DC monuments. By asking Why 5 times, they were able to get at the root of the problem as follows:

In evaluating various solutions to this problem (e.g. pesticides, special coatings, different light, etc.), groups will identify different areas to focus on. In this particular case study, the Park Service chose to turn on the lights an hour later every evening. This one change reduced the bird dropping problem by 90%.
 * Why is the memorial deteriorating faster? Because it was being washed more frequently.
 * Why was the monument being washed more frequently? Because there were a lot of bird droppings.
 * Why were there more bird droppings on the monument? Because birds were very attracted to the monument.
 * Why were birds more attracted to the Jefferson memorial? Because of the number of fat spiders in and around the monument.
 * Why are there a lot of spiders? Because of the number of insects that fly around the monument during evening hours.
 * Why more insects? Because the monument's illumination attracted more insects.

When using this technique, it is possible to follow different paths and derive different solutions. Should this occur, several factors can be considered when identifying the appropriate solution, such as what is within the group's ability to control. In the case of the Jefferson Memorial, they had the ability to control lighting and selected a no cost option that addressed the problem.

This technique is also used for requirements elicitation, particularly when interviewing subject matter experts. See the section Documenting and Managing Requirements.

Check Sheets/Pareto Chart
There is an old saying: “what gets measured, gets done.” In the case of root cause analysis, the combination of creating a simple checksheet to collect data from observations or occurrences and charting onto a Pareto Chart can help pinpoint problem areas. In the absence of data, often perceived or apparent problems can lead you down the wrong path. By observing and recording the frequency of an occurrence for a specific period of time, it is possible to determine relative severity. See figure below for an example of a checksheet.

In constructing a check sheet, it's as simple as identifying the things you want to count and then counting as they occur. After a reasonable period of time, just count up the occurrences. In this example, the errors identified point to a paper jam (problem with paper and equipment) and incorrect information entered by the operator. For the paper jam—it could be the printer or it could be the material you are trying to print (weight, material, coating, etc.). To address this problem—it will be necessary to do several trial tests to help discern what the true root cause is. For the “incorrect info entered” it may be as simple as retraining cashier B or adding some behind the scenes edit checks to look for common errors. In all instances—it is best to focus on items within your immediate control and environment first, before trying to throw technology at the problem. Once the data has been collected, one powerful tool you can use to document the results is called a Pareto Diagram. The Pareto Chart displays the relative importance of problems or occurrences and is based on the principle that 80% of the problems result from 20% of the causes. The basis for the 80-20 rule was an Italian economist, Vilfredo Pareto, who noted that 80% of the land was owned by 20% of the people. By applying the results from the check sheet above, a sample diagram is below:



Note that the figure has two vertical axes. The one on the left provides a relative count of the number of occurrences, where the one on the right focuses on cumulative % of total occurrences. By focusing problem solving efforts on the largest volume, the total errors will be reduced significantly.

Interrelationship Diagram
An interrelationship diagram is another valuable tool that helps to compare related issues in order to determine which ones are driving forces (root causes) and which ones are being influenced by others (symptoms). This exercise is best done in a group setting where you have a variety of perspectives. The matrices can take some time to get through, but typically provide valuable insights once completed. Using a list of symptoms/root causes, create a matrix (we are using a 5x5 example here), and then add 3 additional summary columns to the right.

For this example, we will look at causes of ineffective meetings and the 5 potential symptoms or root causes are: lack of an agenda, lack of facilitation, wrong people at the meeting, airtime dominated by a few and rehashing same stuff. For this example, the symptoms will vary by group and organization and not doubt with group input, it is possible to come up with more items. The small number is more to demonstrate how to construct, facilitate and evaluate the results. A completed matrix is below:

EDIT NOTE: Insert table representing interrelationship diagram for poor meetings

For each issue identified—ask the group the impact of each item against another. Starting with A. Lack of Agenda against B. Lack of Facilitation, ask the group, does A drive or influence B or does B influence A? Typically, if you have a facilitator, you often have an agenda so in this instance, an “up” arrow is placed on row A/column B to show impact of A on B and on row B column A. you will put an “in” arrow to show the influence of A onto B.  Next you will look at Lack of an Agenda and Wrong people at the meeting. While you “could “ make a weak case that if you had an agenda, it could be obvious that you have the wrong people at the meeting—there are other drivers for this—so in this case—we will put in a “-“ dash signifying no relationship. For each pair—the matrix will receive a relationship mark. Once completed—it is time to add things up. All arrows pointing inwards (items being influenced) get added for each row and the sum is reported in the “In” column. All arrows pointing upwards (items influencing) get added for each row and the sum reported in the “Out” column and then both in and out are added together. In evaluating the results, look for the largest number of “out” as your root cause. In this example, the lack of a facilitator leads to rehashing the same stuff (meeting after meeting). This tool is also very good for determining critical processes, as well as root causes. Instead of listing problems or issues, record all of your processes with letters and evaluate which processes influence or directly impact other processes.

Rapid Problem Resolution (RPR)
This technique was designed specifically to identify the root cause of IT problems. While it is aligned with ITIL Problem Management Process, it requires that the problem be replicated and the method is designed to focus on a single symptom at a time until a root cause is identified. The method is comprised of three steps: 1) Discover 2) Investigate and 3) Fix. During the discovery phase, it is important to obtain as much information about the problem as possible (what is the problem, when does it occur, in what environment, frequency, etc.) and settle on what is the problem we are trying to solve.  The investigate phase focuses on being able to replicate the problem so that it is possible to discern what is causing it.  In this instance, it is necessary to develop and execute a diagnostic data capture tool so that results can be obtained to identify what is causing the problem.  Once the root cause has been determined, then it is possible to trace where it occurs through reviewing diagnostic data.  Once the problem is found, then a fix needs to be developed and implemented, and the solution verified.

In Summary
Root cause analysis is a critical component to problem solving. If you do not treat the root cause(s) of a problem, it is likely that the problem will not go away. By treating symptoms, the problem often manifests itself differently, offering a new set of symptoms. Since time and resources available to solve problems vary, it is good to have several tools available for seeking out root causes.