Take risks, fail fast
In many software development teams, the source code is sacred. Its sacredness is caused by two factors:
- The code is the ultimate expression of the intellectual work of the developers. If you have anything against the code, it means that you're against its author.
- If you touch the code, anything can happen. The service may crash, and your pager will ring, because it's all your fault.
It also has two effects.
- You should not modify code unless you have a serious reason to do it.
- Everything you put in a form of the source code is nearly-permanent.
Naturally, the first effect leads to code rot and the ever-increasing technical debt. The second effect is however more subtle, and probably more interesting. This article is about this effect.
There are two approaches in software development when it comes to solving a problem. Either you experiment, or you think ahead.
When you experiment, you try different approaches, moving in iterations towards the goal. Sometimes you'll succeed; other times, you'll fail, maybe even miserably, requiring you to back a few iterations up, and restart with a different approach. Facing the unknown, you try different approaches, and you receive a sometimes immediate, and usually fast feedback which helps you deciding if you should continue or not. For instance, facing a specific logical challenge, you can proceed by iterations, writing in every iteration a test, then modifying your code to make the test pass, then refactoring and asking yourself if you're on the right track. Sometimes, you'll have to remove a bunch of tests, because you will find that you misunderstood the problem, or that the solution you had in mind won't work.
When you think ahead, you don't start by writing code. You start by doing meetings. And writing documents. And searching for all the possible solutions. And taking a great care of selecting the best solution to the problem. It is only when you have chosen one that you can get back to your keyboard.
The distinction looks a lot as the one between Agile methodologies and the classical Waterfall and V-Model approaches. Indeed, this is about it. Do you recall why Agile methodologies appeared in the first place? One of the reasons is that classical approaches just don't work for most projects. When you think ahead and you come with your beautiful solution which looks so good on paper, as soon as you start coding, you discover things you haven't thought about. Once you discover them, what should you do? Do all those meetings again, and rewrite all the documentation and the UML diagrams? Probably not. So you adapt the design to the constraints you discovered during the coding step, and then you do it again, and again, and again, and when the code is finally shipped, it doesn't look at all like the original UML diagrams. In this case, what was the value of all those meetings and documents? Why not spending those wasted hours to do something which... I don't know... brings some value to the company?
This also means that it would be pretentious to believe that you can find the best solution to a programming problem by doing a bunch of meetings. You did the meetings and you did the UML diagrams, and once you started the coding step, you found that those UML diagrams make no sense, right? So what makes you think that you actually have chosen the right solution in the first place? And if you admit that you haven't chosen the best solution, then what the heck were all those meetings about?
Experimentation is just so much easier. You try something. If it works, great. If it doesn't, well, you throw the code away and you start over; no big deal. With the think ahead approach, things are however very different. Imagine that eight persons spent four hours talking about the problem during the meetings. Then, two persons were working for six hours each to produce a bunch of documents and UML diagrams explaining how the feature should be implemented. Then you try to implement it this way, and it looks like it won't work, or it will, but not exactly as good as you thought. What do you do? By producing the documentation and the UML diagrams, by taking the decision, all together, to go with this solution, you passed the point of no return. Sunk cost fallacy works extremely well here: you can't possibly throw everything away, because you are not throwing just your work for the past hour, but the forty four hours of work from everyone from your team. This is not an easy decision to take, and pair pressure won't make it easier (“Sure you encountered some difficulties, but have you tried hard enough to solve them? Remember, we decided all together, including you, that this is the right solution; you can't just decide, alone, that it's not a good one!”)
Note that I intentionally avoided the term prototyping in this article. Prototyping means “writing crappy code which you know you'll throw away anyway.” On the other hand, I don't limit myself to the actual prototypes: experimentation can work at any scale, including the whole projects. Google does that all the time. Other companies do that too, some intentionally, some—forced by economic pressure. This means that the code issued from the experimentation could (and often will) find itself running in production. After all, some ideas which look great during development appear disastrous when faced with the actual users. Therefore, it should be possible to undo the experimentation at any stage. This, in turn, leads to the discussion about the quality of the architecture.
Too many products are written in a way as if the code will stay in its current form forever. There is no separation of concerns, everything is mangled, interfaces, when they exist, are leaky. In such software products, every change could lead to a disaster. Such environments stifle experimentation. The funny thing is that the reliance on think ahead approach, in turn, leads to poor architecture, which discourages experimentation even more. While I haven't done the actual studies, my personal observation from dozens of projects is that teams who heavily rely on experimentation do have a much better architecture, and teams who never experiment have no architecture whatsoever (badly-implemented dependency injection doesn't count as architecture, by the way).
You've probably heard of “if it hurts, do it more often” rule in relation to automation in particular and workflows in general. Here, it's quite similar: if it's so painful to swap parts of the application, then you should start to swap a lot. If you're a technical lead or a project manager, you can train your team like that. When you need a feature, ask the team to implement it, but slightly modify the request. Tell that the details are likely to change later. Once the feature is implemented, ask to do the changes, and look at the reaction (“Are you kidding me?! We can't change that now!”) and at the time the developers would spend implementing the change. Then do the retrospective: what processes or elements inherent to the software product could explain that it took so long? And, more importantly, how to mitigate the risk of similar delays in the future? With enough insistence and repeated exposure to volatile requirements, the team would finally get into habit of having a decent architecture with proper isolation between the layers and all the techniques which make it possible to modify the source code without fearing to cause mayhem. Naturally, if the team is not skillful enough in architecture, don't do that: you'll just make things worse.
Practical example
Let me finish with a practical (and classical) example of experimentation and its relation to the architecture: imagine you were asked to develop a website based on the corporate Oracle database. It's a small website, but it does some fancy stuff, which requires a bunch of optimizations when it comes to the data; in other words, there is much more than CRUD there.
By the way, while your customer believes that his corporate Oracle database can be used by your app, there is an ongoing discussion about the high cost of the database, so there is a slight risk that the project might have to use instead a PostgreSQL cluster in AWS. What would be the changes you would do in terms of the architecture? Would you have a better isolation between the data access layer and the business layer?
But, wait, you started to develop the application, and, frankly, a relational database may not be the best choice. You are pretty sure you would achieve better performance at a much lower cost if you use ElasticSearch. You do a small experiment, which appears to be conclusive. You talk to your client, who is amazed at the results you obtained, and accepts the idea to move part of the data to ElasticSearch. Now that you have two very different data sources, how does it reflect the architecture? Was it difficult to move? If yes, do you think the difficulty is related to the shortcuts which were taken at the architecture level?
More importantly, how would it difficult to perform the successive changes:
- If, originally, you were thinking ahead, believing that the code would never change?
- If, originally, you were knowing that things would change, often radically?