
![]() |
This technique really shows its power in systems of
several hundred thousand components. For instance, binary search could
find a single component in a system of 1,048,576 components (a moderate
sized automated system) using only 20 tests. NOTE: Implicit in all this is that if you keep narrowing it down, whether binary or not, as long as you don't repeatedly double back in areas you've already tested, it is a MATHEMATICAL CERTAINTY you'll eventually solve the problem. |
Intermittence:
Intermittence invalidates most tests which could
split the search area, resulting in backtracking. It thus renders binary search
a useful but insufficient tool for troubleshooting. Intermittence eliminates the
mathematical certainty of solution -- indeed many intermittents remain unsolved.
There are several techniques to maximize your chance of solving an intermittent.
Ordered Set:
Remember the order comes from your knowledge of the
system, and nobody knows everything, including the system documentation. The
less complete your knowledge, the more trial and error is necessary.
Nevertheless, in real life even a minimum of knowledge allows a reasonable
approximation of binary search, so this isn't much of a limitation.
Quadruple Tradeoff: Ease vs. Likelihood vs. Even Divisions vs. Safety:
A more significant limitation is the fact that troubleshooting tests are
often time consuming and risky. The test which would most exactly split the
remaining search area in half is often the toughest. Thus we temper our desire
for even divisions with the reality that we need to minimize time and risk.
Often our troubleshooting instinct tells us it's likely the problem resides in a
tiny portion of the remaining search area. In that case it's perfectly
permissible to test to prove or disprove it's in the tiny area, but it's never
permissible just to assume it. If a test carries a credible risk of harm to the
system, property or person, try to find a safer test, even if it's harder, less
likely, and doesn't divide the remaining problem scope as evenly.
Letting the Problem Out of the Box:
The
Divide and Conquer process can be thought of as continually forcing the problem
into ever smaller boxes, until it's trapped. Some of the worst troubleshooting
debacles I've seen involved the problem escaping the box. In other words, the
troubleshooter thought he had proved it was in one area, when it was really in
another. When that happens, tests become inconclusive and the troubleshooter
starts to doubt himself. Whole days can be wasted. Take every precaution to
avoid this -- don't skip steps.
| NOTE: The March 1998 issue of Troubleshooting Professional Magazine, themed "Bottleneck Analysis", is essential reading for narrowing problems in systems whose symptom description includes words like "too" or "insufficient". You can see it at http://www.troubleshooters.com/tpromag/9803.htm. The December 1998 TPM describes the narrowing process on intermittent problems, and can be read at http://www.troubleshooters.com/tpromag/9812.htm. |
[ Next step | Back to Universal Troubleshooting Process | Email Steve Litt | Home Page ]