Take risks, fail fast

Arseni Mourzenko
Founder and lead developer
April 11, 2019
Tags: productivity 36 quality 35 refactoring 12 rant 34 requirements 3

In many soft­ware de­vel­op­ment teams, the source code is sa­cred. Its sa­cred­ness is caused by two fac­tors:

  1. The code is the ul­ti­mate ex­pres­sion of the in­tel­lec­tu­al work of the de­vel­op­ers. If you have any­thing against the code, it means that you're against its au­thor.
  2. If you touch the code, any­thing can hap­pen. The ser­vice may crash, and your pager will ring, be­cause it's all your fault.

It also has two ef­fects.

  1. You should not mod­i­fy code un­less you have a se­ri­ous rea­son to do it.
  2. Every­thing you put in a form of the source code is near­ly-per­ma­nent.

Nat­u­ral­ly, the first ef­fect leads to code rot and the ever-in­creas­ing tech­ni­cal debt. The sec­ond ef­fect is how­ev­er more sub­tle, and prob­a­bly more in­ter­est­ing. This ar­ti­cle is about this ef­fect.

There are two ap­proach­es in soft­ware de­vel­op­ment when it comes to solv­ing a prob­lem. Ei­ther you ex­per­i­ment, or you think ahead.

When you ex­per­i­ment, you try dif­fer­ent ap­proach­es, mov­ing in it­er­a­tions to­wards the goal. Some­times you'll suc­ceed; oth­er times, you'll fail, maybe even mis­er­ably, re­quir­ing you to back a few it­er­a­tions up, and restart with a dif­fer­ent ap­proach. Fac­ing the un­known, you try dif­fer­ent ap­proach­es, and you re­ceive a some­times im­me­di­ate, and usu­al­ly fast feed­back which helps you de­cid­ing if you should con­tin­ue or not. For in­stance, fac­ing a spe­cif­ic log­i­cal chal­lenge, you can pro­ceed by it­er­a­tions, writ­ing in every it­er­a­tion a test, then mod­i­fy­ing your code to make the test pass, then refac­tor­ing and ask­ing your­self if you're on the right track. Some­times, you'll have to re­move a bunch of tests, be­cause you will find that you mis­un­der­stood the prob­lem, or that the so­lu­tion you had in mind won't work.

When you think ahead, you don't start by writ­ing code. You start by do­ing meet­ings. And writ­ing doc­u­ments. And search­ing for all the pos­si­ble so­lu­tions. And tak­ing a great care of se­lect­ing the best so­lu­tion to the prob­lem. It is only when you have cho­sen one that you can get back to your key­board.

The dis­tinc­tion looks a lot as the one be­tween Ag­ile method­olo­gies and the clas­si­cal Wa­ter­fall and V-Mod­el ap­proach­es. In­deed, this is about it. Do you re­call why Ag­ile method­olo­gies ap­peared in the first place? One of the rea­sons is that clas­si­cal ap­proach­es just don't work for most pro­jects. When you think ahead and you come with your beau­ti­ful so­lu­tion which looks so good on pa­per, as soon as you start cod­ing, you dis­cov­er things you haven't thought about. Once you dis­cov­er them, what should you do? Do all those meet­ings again, and rewrite all the doc­u­men­ta­tion and the UML di­a­grams? Prob­a­bly not. So you adapt the de­sign to the con­straints you dis­cov­ered dur­ing the cod­ing step, and then you do it again, and again, and again, and when the code is fi­nal­ly shipped, it doesn't look at all like the orig­i­nal UML di­a­grams. In this case, what was the val­ue of all those meet­ings and doc­u­ments? Why not spend­ing those wast­ed hours to do some­thing which... I don't know... brings some val­ue to the com­pa­ny?

This also means that it would be pre­ten­tious to be­lieve that you can find the best so­lu­tion to a pro­gram­ming prob­lem by do­ing a bunch of meet­ings. You did the meet­ings and you did the UML di­a­grams, and once you start­ed the cod­ing step, you found that those UML di­a­grams make no sense, right? So what makes you think that you ac­tu­al­ly have cho­sen the right so­lu­tion in the first place? And if you ad­mit that you haven't cho­sen the best so­lu­tion, then what the heck were all those meet­ings about?

Ex­per­i­men­ta­tion is just so much eas­i­er. You try some­thing. If it works, great. If it doesn't, well, you throw the code away and you start over; no big deal. With the think ahead ap­proach, things are how­ev­er very dif­fer­ent. Imag­ine that eight per­sons spent four hours talk­ing about the prob­lem dur­ing the meet­ings. Then, two per­sons were work­ing for six hours each to pro­duce a bunch of doc­u­ments and UML di­a­grams ex­plain­ing how the fea­ture should be im­ple­ment­ed. Then you try to im­ple­ment it this way, and it looks like it won't work, or it will, but not ex­act­ly as good as you thought. What do you do? By pro­duc­ing the doc­u­men­ta­tion and the UML di­a­grams, by tak­ing the de­ci­sion, all to­geth­er, to go with this so­lu­tion, you passed the point of no re­turn. Sunk cost fal­la­cy works ex­treme­ly well here: you can't pos­si­bly throw every­thing away, be­cause you are not throw­ing just your work for the past hour, but the forty four hours of work from every­one from your team. This is not an easy de­ci­sion to take, and pair pres­sure won't make it eas­i­er (“Sure you en­coun­tered some dif­fi­cul­ties, but have you tried hard enough to solve them? Re­mem­ber, we de­cid­ed all to­geth­er, in­clud­ing you, that this is the right so­lu­tion; you can't just de­cide, alone, that it's not a good one!”)

Note that I in­ten­tion­al­ly avoid­ed the term pro­to­typ­ing in this ar­ti­cle. Pro­to­typ­ing means “writ­ing crap­py code which you know you'll throw away any­way.” On the oth­er hand, I don't lim­it my­self to the ac­tu­al pro­to­types: ex­per­i­men­ta­tion can work at any scale, in­clud­ing the whole pro­jects. Google does that all the time. Oth­er com­pa­nies do that too, some in­ten­tion­al­ly, some—forced by eco­nom­ic pres­sure. This means that the code is­sued from the ex­per­i­men­ta­tion could (and of­ten will) find it­self run­ning in pro­duc­tion. Af­ter all, some ideas which look great dur­ing de­vel­op­ment ap­pear dis­as­trous when faced with the ac­tu­al users. There­fore, it should be pos­si­ble to undo the ex­per­i­men­ta­tion at any stage. This, in turn, leads to the dis­cus­sion about the qual­i­ty of the ar­chi­tec­ture.

Too many prod­ucts are writ­ten in a way as if the code will stay in its cur­rent form for­ev­er. There is no sep­a­ra­tion of con­cerns, every­thing is man­gled, in­ter­faces, when they ex­ist, are leaky. In such soft­ware prod­ucts, every change could lead to a dis­as­ter. Such en­vi­ron­ments sti­fle ex­per­i­men­ta­tion. The fun­ny thing is that the re­liance on think ahead ap­proach, in turn, leads to poor ar­chi­tec­ture, which dis­cour­ages ex­per­i­men­ta­tion even more. While I haven't done the ac­tu­al stud­ies, my per­son­al ob­ser­va­tion from dozens of pro­jects is that teams who heav­i­ly rely on ex­per­i­men­ta­tion do have a much bet­ter ar­chi­tec­ture, and teams who nev­er ex­per­i­ment have no ar­chi­tec­ture what­so­ev­er (bad­ly-im­ple­ment­ed de­pen­den­cy in­jec­tion doesn't count as ar­chi­tec­ture, by the way).

You've prob­a­bly heard of “if it hurts, do it more of­ten” rule in re­la­tion to au­toma­tion in par­tic­u­lar and work­flows in gen­er­al. Here, it's quite sim­i­lar: if it's so painful to swap parts of the ap­pli­ca­tion, then you should start to swap a lot. If you're a tech­ni­cal lead or a pro­ject man­ag­er, you can train your team like that. When you need a fea­ture, ask the team to im­ple­ment it, but slight­ly mod­i­fy the re­quest. Tell that the de­tails are like­ly to change lat­er. Once the fea­ture is im­ple­ment­ed, ask to do the changes, and look at the re­ac­tion (“Are you kid­ding me?! We can't change that now!”) and at the time the de­vel­op­ers would spend im­ple­ment­ing the change. Then do the ret­ro­spec­tive: what process­es or el­e­ments in­her­ent to the soft­ware prod­uct could ex­plain that it took so long? And, more im­por­tant­ly, how to mit­i­gate the risk of sim­i­lar de­lays in the fu­ture? With enough in­sis­tence and re­peat­ed ex­po­sure to volatile re­quire­ments, the team would fi­nal­ly get into habit of hav­ing a de­cent ar­chi­tec­ture with prop­er iso­la­tion be­tween the lay­ers and all the tech­niques which make it pos­si­ble to mod­i­fy the source code with­out fear­ing to cause may­hem. Nat­u­ral­ly, if the team is not skill­ful enough in ar­chi­tec­ture, don't do that: you'll just make things worse.

Prac­ti­cal ex­am­ple

Let me fin­ish with a prac­ti­cal (and clas­si­cal) ex­am­ple of ex­per­i­men­ta­tion and its re­la­tion to the ar­chi­tec­ture: imag­ine you were asked to de­vel­op a web­site based on the cor­po­rate Or­a­cle data­base. It's a small web­site, but it does some fan­cy stuff, which re­quires a bunch of op­ti­miza­tions when it comes to the data; in oth­er words, there is much more than CRUD there.

By the way, while your cus­tomer be­lieves that his cor­po­rate Or­a­cle data­base can be used by your app, there is an on­go­ing dis­cus­sion about the high cost of the data­base, so there is a slight risk that the pro­ject might have to use in­stead a Post­greSQL clus­ter in AWS. What would be the changes you would do in terms of the ar­chi­tec­ture? Would you have a bet­ter iso­la­tion be­tween the data ac­cess lay­er and the busi­ness lay­er?

But, wait, you start­ed to de­vel­op the ap­pli­ca­tion, and, frankly, a re­la­tion­al data­base may not be the best choice. You are pret­ty sure you would achieve bet­ter per­for­mance at a much low­er cost if you use Elas­tic­Search. You do a small ex­per­i­ment, which ap­pears to be con­clu­sive. You talk to your client, who is amazed at the re­sults you ob­tained, and ac­cepts the idea to move part of the data to Elas­tic­Search. Now that you have two very dif­fer­ent data sources, how does it re­flect the ar­chi­tec­ture? Was it dif­fi­cult to move? If yes, do you think the dif­fi­cul­ty is re­lat­ed to the short­cuts which were tak­en at the ar­chi­tec­ture lev­el?

More im­por­tant­ly, how would it dif­fi­cult to per­form the suc­ces­sive changes: