Phase 3: Implementation

 

 

 

After you understand the problem and have tested the possible causes, you can move to the Implementation phase. This phase includes the following steps:

 

  1. Apply the fix.
  2. Verify the fix.
    • Perform a root cause analysis
  3. Document the resolution.

Apply the Fix

Apply the fix in the least-disruptive, lowest-cost manner possible. If you previously tested the fix in a test environment, you would now apply the fix to the production environment. Ideally, the fix should be performed in a way that lets you completely verify that the fix has resolved the problem.1

 


Verify the Fix

Check to see that the problem has been resolved. Also, verify that you have not introduced any new problems.1  If the chosen fix does not correct the problem, remove it, and return to analyzing the problem. Otherwise, it may be difficult or impossible to identify the root cause, and there is a possibility that the fix may have caused new problems.

 

Perform a Root Cause Analysis

 

Make sure that the fix is not just a band-aid solution or workaround.1 As part of the verification process, it is important to again perform a root cause analysis. This helps ensure that you have actually resolved the root cause of the problem, so that the problem does not reoccur in the future.2

 

An example of this is when a tape drive has been replaced, but the root cause turns out to be the use of overly abrasive media, and NOT a failure of the drive itself. When the tape drive is replaced, it will work for a period of time, until the abrasive media wears down the new tape drive head and another drive failure occurs. Until the abrasive media is removed from the system, the problem will persist.

 

 


Document the Resolution

Be sure to document the problem, and how you resolved it. This can be a valuable resource if the same problem occurs in the future. It can be used to track recurring problems over time, which can help with future root cause analyses. Or it can be used to continue the troubleshooting process if it turns out that the problem was not really resolved.1 

 

Resolution information can also be used to improve the performance and reliability of future versions of a product, or the design of new products. 

 

Examples

 


What's Next?

Case Study: My Car Won't Start >

 


Additional Resources


References

  1. Cromar, Scott. "Troubleshooting Methodology," Princeton University Enterprise Servers and Storage, 2007; available from http://www.princeton.edu/~unix/Solaris/troubleshoot/methodology.html; accessed on January 4, 2011.
  2. Cromar, Scott. "Root Cause Analysis," Princeton University Enterprise Servers and Storage, 2007; available from http://www.princeton.edu/~unix/Solaris/troubleshoot/rca.html; accessed on January 10, 2011.

 

 

 

 
Notes

Ed, good point about RCA. I've kept some information about RCA in this phase, but also added information to the Analysis phase.

Note by Tom Sajbel on 02/15/2011 12:24 PM

As I mentioned in a note on the previous article, on Analysis, RCA also seems like something for earlier phases, especially for Analysis, so that all possible causes are found before implementation of a fix. 

 

The article on RCA referred to in the Analysis article does say that RCA is an iterative process. If you mention it in the previous article, refer to the detailed description of it in this article (or move it there).

Note by Ed Winograd on 02/14/2011 03:54 PM

Tim, good idea about "backing out" the fix if it doesn't work. I added that info to the "Verify the Fix" section instead of the "Apply the fix" section. We can talk more about this when we meet.

 

Also note that we do cover this in the "Testing the Fix" section. We say that if they need to apply the fix in order to test it, they need to back out the fix before testing another fix.

Note by Tom Sajbel on 02/10/2011 03:34 PM

Apply the Fix

Herbst:  This is where we should state something like...replace one part at a time.  If the part does not fix the problem remove it and replace the original.  While replacing many parts at the same time might 'fix' a problem, root cause will be difficult if not impossible to determine and there is an increased possibility that new problems may be induced.

Note by Tim Herbst on 02/10/2011 01:52 PM

Tim, I liked your abrasive media example. I added it to the Verify the Fix/Root Cause section.

Note by Tom Sajbel on 01/30/2011 01:52 PM

Tim, I updated the Document the Resolution section with your comment that this information could be used to improve the product.

Note by Tom Sajbel on 01/30/2011 01:37 PM

Document the Resolution

Herbst:  Resolution information can also be used to better a product.  It can lead to proactive measures, not just better/faster reactive measures.

Note by Tim Herbst on 01/27/2011 03:32 PM

Verify the Fix

Herbst:  Possible example...replacing a tape drive when the root cause is overly abrasive media.  When the drive is replaced, it would work for a period of time, until the abrasive media wore the drive head down and another drive failure occurred.  Until the abrasive media is removed, the problem will never be resolved.

Note by Tim Herbst on 01/27/2011 03:29 PM


This page was generated by the BrainKeeper Enterprise Wiki, © 2018