Zero Defects => Zero Productivity!

Dead flies cause the ointment of the apothecary to send forth a stinking savour: so doth a little folly him that is in reputation for wisdom and honour. Ecclesiastes 10:1

I'm sure that when 'The Preacher' penned those words regarding the stink of flies within an ointment there were many Hebrew ladies nodding their heads in agreement. Preserving substances, especially precious ones, was as big an issue then as preserving our environment is today. The point he was trying to make though was that the 'folly of wisdom' is a similar problem. However smart we are, however well learned, however careful we will eventual say, or do, something that will make us look complete idiots. The more so for the contrast to our normal behaviour. The reality of these words is so strong that almost 3,000 years later this phrase is still preserved in the saying 'the fly in the ointment'. We have come to accept that in our greatest endeavours there will inevitably be something that spoils the effect.

Interestingly though, this verse has actually been mis-read (this is true of many biblical verses that have found their way into common parlance). The issue being discussed is not some isolated blemish (or blemishes) that detract from the appearance of the whole. The issue is that the dead flies have caused the ointment itself to stink. The result may be similarly repellent but the distinction is extremely important. Dead flies can be fished out, off ointment is off ointment.

In defence of bugs

In the same way that our verse has been mis-represented for many years I wonder if bugs are similarly mis-understood. Our energies tend to be directed towards minimising, or fishing out, dead flies. But software still stinks. It is larger, slower and just as buggy (if not buggier) than it was 10 years ago. In this article I want to investigate the extent and nature of bugs (and bugginess) and suggest some broad strategies for dealing with them. The angle I will take is that the key is not is worrying (so much) about dead flies but worrying about stopping the ointment from going off.

But first I probably need to squash (or attempt to squash) a myth that is extremely dangerous. The myth of zero defects. I can well understand how this myth gets about, many people have a lot of $$$ invested in persuading the populace that zero defects can be achieved. In my previous article I tackled this from the design angle, you cannot get a perfect program because it is modelling a world that is not completely understood (some would argue not even deterministic). I believe you can also prove it from a commercial angle. If your programmers are not putting in bugs then they are not trying hard enough. The aim of any program is to make progress, similarly the aim of any car driver is to make progress. It is generally true that the slower you drive a car the safer, yet if you are aiming to drive a long distance then typically you will drive with some speed. If you don't then you are actually failing your objective. Computer software is not different, it has an objective, part of achieving that objective is the program has to be deployed in a timely fashion, and that means zero defects is not the only priority. Most programmers accept this in the code they write (though not the code they buy of course!).

A statistical proof

Perhaps one of the reasons zero defects has not caught on is that if ever you find a bug in a code base which is aiming for zero defects then you have to scrap the lot and start again. The proof for this is statistical and therefore subject to dispute (a statistician is someone who can go from an unwarranted assumption to a preconceived conclusion in the shortest possible number of steps) but here goes.

  1. Let us assume that the arrival of bug reports follows a negative exponential distrubution. That may sound like a reach but if you ponder it becomes reasonable. Assuming a finite number of bugs then for each one found it will take that little bit longer to find the next. Let us assume (for ease of computation) that each bug takes 10% longer to find than the previous.
  2. Let us assume that we have 10% of our customers in the beta test and that each tester tests for on average 1 hour per day (compared with using the product for 8)
  3. Our boss is very bug averse and states that we will not ship unless the product has been free of serious defects (show stoppers) for 2 weeks.

These are approaching ideal conditions, few people in a commercial environment could hope to achieve this, but now roll the numbers forward until after the ship date.

Given our 'bug flow' model is correct we should expect to wait 2 weeks for a serious defect. But wait, that's two weeks is 2 weeks for 10% of our customer base (the beta testers) working for 1/8 of a day. Or 1 (working) day with the whole customer base working for 1/8 of a day. In other words the first show-stopping bug will be found before coffee on day 1!

Is that acceptable? No? Then maybe we can throw money at it. We'll pay 10% of our customer base to beta-test 8 hours a day. That still means a bug found before the end of the first day.

So lets lengthen the beta cycle, let us wait a couple of months from the last serious defect. This gives us a week (almost) from launch before the first serious defect is found. But now look at the timing problems. 60 days from the last serious defect means 54 from the previous, 49 from the one before that, 45 from the one before that etc. I'll leave you to compute the actual length of the beta cycle but simply ask the question, "do you really think marketing are going to hang around 6 months waiting for the last three bugs?". So maybe we can throw some more money at it, lets pay 33% of our customers to beta test. That probably crams the beta cycle into 6 months and still gives us a week before the first serious defect is found. But know look at the commercial problems. The Q&A costs of this product alone amount to 3 months wages of the person buying the application. How many people would pay $15,000 for a product upgrade that went wrong in the first week?

Beta testers do it badly

One of the most demoralising things for any developer is to write a program, test it thoroughly, and then watch it fall over and die as soon as you hand it to anyone else. It is generally assumed this is because programmers under-test, I suspect this is not the primary cause. Occasionally I get e-mail of the form "this ******** compiler is useless, don't you ever test it? I gave it this simple program to compile and it fell over, how come?" My standard response is "it died laughing". The main reason for my reply is negative re-enforcement, tart questions get tart retorts. But it actually embodies a truth that goes far deeper. Producing an application that does the right thing well is actually quite easy, producing a program that does everything well is nigh-on impossible.

Let me illustrate with an example from the compiler, but let us simplify things radically. Let us assume the compiler has a three line limit and a 1 character identifier limit (remember: smart-Alec e-mails will get smart-Alec retorts). Further, let us assume semi colons are not allowed. Now how many valid programs can you write? Well two of the lines are fixed PROGRAM and CODE. Your choice is thus a labelled data item on line two (with a fixed number of types to choose from) or any one procedure call on line 3. Given time (and inclination) I could probably list every possibility and checked they worked. Now how many invalid programs can you write in 3 lines. Well basically there are 256 choices for column 1 line 1. 256 more for column 2 line 1 etc. Even with a 1024 column limit (which there isn't!) the number of invalid programs has disappeared beyond reasonable computability.

The point is that the range of all possible inputs to any (sizeable) program is vast and only a very small sub-section of these inputs are actually reasonable. The truth is more selective yet, an experienced computer user will often develop a style (or technique) that utilises only a very tiny proportion of the code in the underlying application. AND THE WORST OFFENDERS ARE BETA TESTERS. The reasons are subtle but clear. A diligent beta tester will attempt to perform (or simulate) real work with your application. They will find bugs, in order to keep going they will need to work-around the bugs. If the bugs are nasty (machine-lock, gpf, corruption etc) they will rapidly learn to avoid the dodgy product areas almost by reflex. Thus when the next release comes out it will be tested by a group of people who are conditioned not to test those parts of the product you've had to change. Clearly, some people will be brave and wade back in but over time your beta testers become a non-representative sample of your customer base.

The result is that any main-stream application will have a selection of heavily trafficked well tested pathways and vast acres of un-chartered bug-ridden swamp. Which is actually OK. Driving on a highway through a swamp is relatively safe, provided you know where the road is. Of course the people who don't know where the road is are those people who buy in to the application once it is gold. Your paying customers. They will simply charge off in the direction they wish to go and find they get bitten. How come no-one tested this thing?

Shrink-wrapping flies

Having convinced you (I hope) that dead flies are inevitable we can turn to the main event, stopping the ointment from stinking. In order to solidify my argument, and make it more relevant, I would like to distinguish a couple of ideas, bugs and bugginess.

A program has bugs if there are parts of the product that do not function as expected, a program is buggy if you hit a bug whilst working around a bug you found earlier.

Although it can be argued that the distinction is fine I believe the difference is crucial to the usability of the application. Human beings (and thus most computer users) are resourceful. If they hit a road-block they'll find another route. They may curse you but provided the new route is not too obscure it will not deflect them from their long-term goals. Conversely, if they find themselves hemmed in, unable to make progress in the direction they require then the long-term goals suffer.

You can actually take the analogy further. The actual rate of progress is not simply determined by the number of road blocks (bugs) but also by the speed of the roads (ease of use and power). It may even be better to produce fast roads with some road-blocks than a selection of immaculate dirt tracks.

So what does this really mean? It means our primary concern should not be the number of flies in the ointment but the diseases and microbes they are carrying. It also means a dead bug may be fouling our code long after the bug report is forgotten.

Hence the title of this section. Suppose our apothecary had had access to a shrink-wrap machine and had managed to trap and seal every fly in his laboratory before carefully mixing them into his ointment. Our Hebrew ladies would still, no doubt, have objected to having to fish each fly out but the product would still be useable, this is our aim.

The problem, of course, is that our target is unreachable. The flaw in the previous paragraph comes when we see our chemist busily mixing each shrink-wrapped fly into the potion, he wouldn't do it. If you can shrink wrap flies then you can stop them getting in, period. We can't stop them getting in so we can't shrink wrap them (reductio absurdium) so we're back to square one.

Divide and sacrifice

Almost. Suppose we adopt a different strategy. Let us divide our ointment into 100 different pots. On average we ship with 10 flies. That means 90% of our ointment gets through clean, minimum. Sure, we have more packing costs but we also have some degree of success almost guaranteed. That means we can be relied upon, that means we can compete better which means we can charge more.

Does this strategy apply to code? I think it does. Perhaps even better than it does in our example. Programming is about complexity, a modern program will often contain a level of complexity that is beyond the grasp of any one person. The very act of focusing on one part of a problem is probably enough to make us forget another part. By dividing the application down in to manageable sections we don't only reduce the 'cross-contamination' possibilities but we potentially reduce the chances of screwing up in the first place.

Now to work there has to be genuine division between these code 'lumps'. In future articles I will discuss in depth the nature and mechanism of this 'encapsulation' but for now let us just assume it is possible, but it will hurt. We are going to put up walls deliberately to stop 'errant' information (bugs) from flowing, but we must expect these walls to restrict potentially useful information flow too. The name used for this 'restriction' is the interface.

Those of you who believe in interfaces may wonder why I have spent >2000 words working my way around to introducing them. Put simply; to many people interfaces are not intuitively popular. This is because programming without an interface is like free-fall parachuting without a parachute. You make rapid progress, there is freedom of movement and the ride is exhilarating … right up until the moment of impact. Then it is down to someone else to scoop up the mess.

Using an interface exemplifies one of my favourite notions, the up-front hit. I am prepared to sacrifice ultimate productivity now for better productivity down the line.

Evolution or Entropy?

As this article draws towards a close I would like to challenge one more commonly held belief. The code-base of an application gets better as you fix it. Let us imagine our application divided up into little compartments, for now let us assume each compartment is water tight and autonomous. Let us also assume this application has been around for a while and the compartments vary in author, age and quality. In fact, let us forget our mythical application and think of the ones sitting on our machines. I'm sure that if I asked you then you could point to some parts of your system that you are (rightly) proud of, great ideas beautifully implemented. But I'm just as sure that if you were honest there are some bits that really suck. A mixture of half-baked ideas that almost worked and that you blue-tacked together to meet some completely unrealistic deadline.

Now suppose you have to add some new feature (or tweak) to your product. Let us also suppose that it so happens that you could do the bulk of the work in one of the nice, clean, modern parts of the system you wrote or down in some appalling pile of spaghetti code written by an idiot you long since fired. Where are you going to put the code? My guess is you went for the clean module and 'left the can of worms un-opened'. If you did then your code will suffer from what most code bases I've ever seen suffer from, Entropy, or "survival of the weakest" Basically the only code that manages to survive unscathed on your machine is code that is so bad you can't face it. And any code that is good you will fearlessly mutilate until such time as it becomes so bad you can't face it any more.

It also turns out that this code degeneration doesn't just happen within a compartment, there is every danger of the 'stink' going across the interface. Time and again we see good parts of the system jumping through hoops to baby 'poor' parts of the system.

This leads to one of my most cherished (but most unpopular) beliefs, there comes a point where code gets so rotten you have to throw it away and start it again. In extreme cases it can even be beneficial to get the module written by someone that has never seen the predecessor to avoid all contamination.

Dirty foot-prints

Much of this article has been by allegory, this is deliberate. Many of us have been dealing with these issues directly for many years and in order to get some in-built assumptions re-evaluated I felt a fresh tack may be useful. However, it is not my intention to leave this discussion as an academic exercise, I want to provide some pointers that may be useful. So I want to turn all the above on its' head and suggest that with thought you can turn the 'souring effect of dead flies' to your advantage. Dead flies are hard to find, rancid fat isn't. So, the argument goes, if you learn to spot rotten code you have a good way of finding dead flies. Here are a few ideas I have used :

  1. Count the bug fixes. This may seem obvious but not everyone does it. I keep a log of the line-number and file of every bug I fix. You can then plot a distribution graph of where the trouble spots are. Many bug-fixes tend to mean duff code.
  2. Count the bugs. Same as 1? Not quite. 1) spots a bad-hair-day on the coding front. 2) counts a bad hair day on the design front. If the user find 10 bugs in the same 'area' which are fixed in 10 different places then you're underlying application design does not match the program usage.
  3. Count the lines of code. If some part of your system is bloating faster than the rest the chances are it doesn't work. Either the programmer is scared of the procedures inside and is writing new ones or there is some desperate patching up going on to try to get the required functionality from a flawed design.
  4. Put all the procedure names in one file and sort them, look for duplicates. A superb way of spotting a duff library is when it starts cloning itself all over the system like a virus. What is basically happening is the 'user' knows he should be calling the library but can't because it doesn't quite work, therefore he has produced his own 'potted' version. This is the #1 method of ruining a system, you've actually managed to get dead flies to breed.
  5. Sort the code in your source modules and look for duplicates. This can be useful combined with 3). It basically spots situations where you have not correctly abstracted the problem into procedures. This one does not necessarily find bugs but if it does find bugs it will suggest they're going to be hard to fix.
  6. Look for patterns. There is no easy way to automate this but you should be alert for repeating sequences of 3-10 line code fragments, especially if they surround an inter-module call. This is a sure sign of 'babying' in action, the bug is almost certainly in the routine being called, the 3-10 line fragment is the 'hocus-pocus' required to make the bad procedure work. Your instinct will be to wrap the ten lines into a procedure call, don't do it. This is the 'onion skin' approach favoured by a world-famous software house. The correct solution is to roll up your sleeves and go-fix the routine that is mis-behaving.


The Christian Counter

The Fundamental Top 500