Home Home Posts Rants about IT DevOps Stuff I'm working on

A failure as an opportunity

Arseni Mourzenko
Founder and lead developer, specializing in developer productivity and code quality
April 4, 2012

Failures happen, especially in software development industry, where multiple factors can induce failure, from unrealistic expectations by the customer to the incompetency of the staff to the severe issues with communication.

In general, studies show that around 50% of IT projects fail. As is, the number is of course irrelevant: what was the study about? What is a failed project? How the study was done? Still, it’s a good hint, and a good indication that approximately half of the projects will either completely fail, or at least end up over budget and/or late.

Personally, observing the software development industry in France, I would expect more than 99% of projects end up being a failure, or at least being over budget or late. Even projects done for multinational corporations based in France have too many flaws: legacy systems, code base too ugly to attract competent people, management issues, communication issues, the lack of standardization, etc. Add to this a multitude of tiny companies which don’t have money for their IT projects, and you understand that 99% failure rate is not an over-estimation.

Failures being inevitable, they are at the same time a good opportunity to improve. This means that instead of being fatalist and throw away failed projects to quickly start failing another one, it would be much better to benefit from failures by studying and understanding them.

Study your failures

When a project fails, there is sometimes one, often several reasons. The reasons are too numerous to be listed here, but I can quote a few examples:

  • Lack of communication. For example, developers of a team are unable to speak to each other without screaming.

  • Lack of processes and workflows. For example, the meetings are organized randomly, without a predefined workflow.

  • Lack of documentation. For example, there is no documentation explaining how the solution is deployed and by whom.

  • Lack of elementary tools. For example, two developers work on the same code but don’t use source control.

  • Lack of understanding of the software development process. For example, the developer is expected to start by writing code, with no requirements/architecture or design phases.

  • Lack of QA. For example, the project has zero unit tests.

  • etc.

Those are the first-class reasons, which may be caused by second-class reasons as the budget being too low, or the company having a low score at Joel Test, or the staff being incompetent.

When studying the failures, it is mostly useful to identify the first-class reasons. The other ones, well, nobody cares about, except the marketing folks or the person who will need to explain the reason of the failure to his boss. Still, in order to improve, you need to know the first-class reasons only.

First-class reasons show what you need to improve. Second-class reasons show what will help you to improve it.

For example, the fact that the project failed for the reason of the lack of tools like the bug tracking system or version control (first-class) is relevant technically. It means that unless you setup a bug tracking system and a version control, you will fail stupidly again and again. The fact that you don’t have those elementary tools because you don’t have money to deploy and maintain them (second-class) is irrelevant technically, but helps asking more money for the IT department for this sort of tasks.

Another example: the fact that the project failed because of the lack of communication between the QA department and the developers (first-class) is relevant technically: unless you improve the communication between two departments, there will be more and more failures. The fact that the communication is so bad because the leader of the QA department is a moron and nobody can bear him (second-class) is a reason to fire him and to hire a better person instead.

Technically speaking, first-class reasons are crucial in order to improve the quality of your software. Large, fatal failures are no more than hints that something goes wrong. Not totally wrong, but wrong anyway. It’s not totally wrong to never document the deployment of your product to the server, but if it costs you a $50 000 project and you’re a small company, well, you’d better create the deployment plan for your next projects.

Prepare for failure

You don’t have to wait until the fatal failure in order to fail. As I previously said, fatal failures are a hint that something goes wrong, but those are hints among others. Because of their consequences and the impact on the business, we perceive them as something very special and very important, but as indicators, they are similar to any other hint.

Consider the following dialog at the phone, between two developers:

Sorry, have you modified the AutomationWorkflow.cs file?
I am. Is there something wrong?
I believe that you’ve erased my yesterday’s work.
Oh… em… I’m sorry. Do you have a backup copy?
No, I don’t. Do you?
I don’t. You know, I never backup the source code.
OK. I think I should redo what I’ve done for the last four hours yesterday. Bye.
Good luck. Bye.

The project hasn’t failed just because Lucy lost four hours of her work. She can still do the same work again, and Lucy and Robert can deliver the project in time. But this is a good hint which clearly shows that something goes terribly wrong in their company, and is not really different from a total failure of the entire project because of the lack of source control and backup plan.

This means that instead of waiting until the project fails, you can study failure right now, before it fails for real. Document the lost productivity, document the loss of data or any other issues, and study them. And if one day the whole project fails, study not only the failure itself, but also how you could predict the imminent failure.

Remember: the early you fail, the less money will be wasted. Failing early enough also gives you the opportunity to start over on improved basics.

It still fails

The continuous failure assessment is a good thing to improve the failure rate, since it shows what you are doing wrong. It’s like a profiler of a program: it shows where your program is slow.

Like knowing the results of profiling doesn’t magically improve your code, continuous failure assessment doesn’t reduce by itself your failure rate in every case. There are few reasons for this:

  • The failure may be outside of the scope of your business. If the database administrator sucks but is a brother of your boss, there is nothing to do.

  • You may not be able to influence the second-class reasons. For example if you’re asked by your boss to migrate your data center to the new location in two weeks, your failure rate will be 100%, no matter what. You may be the most competent IT professional, knowing the most advanced stuff related to data center migration, but you also know that you need at least six months to do it.

  • The failure is acceptable from the business point of view. For example, you may need two weeks to study the failure, but you know that the team who worked on this failed project is fired and in all cases, the project was too specific and you’ll never see such projects again. Technically, it still seems interesting finding the first-class reasons of failure, but from the business point of view, you really shouldn’t waste two weeks for this.

  • You’re doing it wrong. For example, your assessment covers the unrelated elements, or you’re inducing the wrong ways to improve from your observations.