Phase 2: Analysis |
Analysis is the second phase in the troubleshooting process. In this phase, you need to get a solid understanding of the problem. You cannot successfully carry out the third phase, Implementation, until you completely understand the problem.
The Analysis phase is composed of the following steps:
Identify as many possible causes as possible for the problem. Be disciplined. Try to think through each aspect/part of the issue you are troubleshooting. Determine if any possible cause can be ruled out, based on the information you have gathered.1 For example, if someone tells you that their television picture is grainy and choppy, you can immediately rule out "no power" as a possible cause.
Using the resources at your disposal, identify if any of the information that you gathered in the Investigation phase points to a known issue. Product documentation and Quantum information sources (such as TSBs, CSWeb, the Knowledge Base, and Qwikipedia) can be helpful.
If the problem does not match a known issue, it may be helpful to collaborate with your peers to identify possible causes, if you work in a team environment. They may have additional ideas about possible causes, or might approach the troubleshooting problems from a different perspective. It's hard to think of everything, so keep an open mind about your peers' ideas.
Example
Using the car example, a list of possible reasons why the car would not start might be:
Consider how likely each potential cause is. Do not eliminate a possible cause until you absolutely disprove it.1
Apply Falsification
Apply falsification to eliminate possible causes. The idea behind using falsification is to treat your initial conclusions about a complex troubleshooting problem as being untrustworthy. Determine what evidence disproves a possible cause, rather than looking for something to confirm what you think might have caused the problem.
By disproving a possible cause, you can save a lot of time. You can discard that cause and move on to the next one. If you cannot find any evidence that a possible cause is wrong, you will have more confidence that you may be on the right track.2
Using the car example mentioned above, you could perform falsification testing on the ignition switch, battery, and alternator. Here are some things you might find.
Consider the Root Cause
In addition to applying falsification, consider identifying the root cause as the goal when identifying the most likely causes. The practice of root cause analysis is predicated on the belief that problems are best solved by attempting to address, correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is more probable that problem recurrence will be prevented.3
For most problems, you can get to the root cause by drilling into proposed explanations by repeatedly asking "Why?" The "5 Whys" method was developed by the Toyota Motor Corporation. It is based on the observation that five iterations of asking "Why?" are usually enough to get to the root cause of most real-world problems. The answer for each Why adds up to the overall big picture and helps get to the root cause. For example:4
Root cause: We did the heat-load projections ourselves, rather than bringing in a qualified expert.
Rank Likely Causes
After performing falsification and considering the root cause, rank the remaining likely causes, from most likely to least likely.
In the car example, falsification was used to rule out the other three possible causes -- the only likely cause left is that the starter is faulty. This means that you can test this cause in the next phase. If there had been another possible cause left, you would need to rank these causes in order of probability.
As mentioned earlier, check the available documentation, such as user guides. service manuals, and other Quantum resources. These often include recommendations about how to test. Sometimes there are built-in testing facilities, and sometimes there are hardware-specific issues to consider, which may be covered in the documentation or other resources.
If possible, always back up data before testing. Then, start by testing the remaining likely causes, in the least disruptive fashion possible. Follow up with less likely causes. If non-disruptive tests can be done, always start with those.
Depending on the situation, it may even be appropriate to test the likely cause by directly applying a recommended fix for that problem. If you do this, always apply only one fix at a time. If the fix fails to solve the problem, remove it (back out of it) before you test the next fix. Otherwise, applying multiple fixes may keep you from getting a good handle on the root cause of the problem.1
Remember, it is important to emerge from the Analysis phase with a solid understanding of the problem. Do not move ahead to the Implementation phase before you understand the issue at hand and the possible reasons for the problem.
Example
A bad memory module was identified as a possible cause of a server problem. The vendor's product documentation contained step-by-step procedures on how to verify if the memory module was faulty, and it included procedures on how to properly remove the old module, install a new module, and test the new module. Starting here is clearly a good idea.
Notes |
This page was generated by the BrainKeeper Enterprise Wiki, © 2018 |