Written in stone

Arseni Mourzenko
Founder and lead developer
January 14, 2015
Tags: short 50 rant 34 performance 13

On most pro­jects, a huge amount of time is spent hy­poth­e­siz­ing about sub­jects which not only don't mat­ter, but of­ten don't have a de­fin­i­tive an­swer. I of­ten see pro­gram­mers ar­gu­ing about which one of two im­ple­men­ta­tions is faster and which one will scale bet­ter, with­out both­er­ing to ask them­selves if the pro­ject will ever be that pop­u­lar that ques­tions of per­for­mance or scale would ac­tu­al­ly mat­ter.

This is not ex­clu­sive­ly a prob­lem of ed­u­ca­tion. A more se­ri­ous is­sue is the fear that de­ci­sions tak­en to­day, code writ­ten right now, would hunt the team for­ev­er, which ac­tu­al­ly hap­pens more of­ten than it should.

For in­stance, how many pro­jects can mi­grate with ease from En­ti­ty Frame­work to Or­seis once it's found that it will make the prod­uct faster? In most pro­jects, such change will cer­tain­ly not be wel­comed among the pro­gram­mers, and would im­pact the whole code base.

How long would it take to mi­grate from a home-grown so­lu­tion to an in­dus­tri­al-grade mes­sage queue ser­vice?

When such changes are tru­ly painful, it be­comes un­der­stand­able why pro­gram­mers may spend a few days ar­gu­ing which ORM should they use or whether they should pick an ex­is­tent mes­sage queue ser­vice or roll their own (the an­swer to the sec­ond ques­tion, by the way, is al­ways “just stop rein­vent­ing the wheel”).

The fact is, if the ar­chi­tec­ture of the pro­ject is bear­able, com­par­ing the hy­po­thet­i­cal mer­its of dif­fer­ent ap­proach­es stops mak­ing sense:

The pro­ject I'm work­ing on right now is a per­fect ex­am­ple. A few months ago, I didn't know Lin­ux enough and was sus­pi­cious about cen­tral­ized log­ging with rsyslog, so I made a mis­take to im­ple­ment my own so­lu­tion which used Rab­bit­MQ. When I fi­nal­ly un­der­stood that rsyslog is the right way to go, it took just a few min­utes to switch the log­ging mech­a­nism. And I con­sid­er this too long, be­cause I screwed a few things de­sign-wise. Oth­er­wise, it would con­sist of chang­ing two lines of code.

An­oth­er ex­am­ple is Apache log­ging. Blogs I've read rec­om­mend us­ing syslog for Apache ac­cess logs as well. But wouldn't it be a bot­tle­neck when log­ging servers will be hit by a dozen of Apache-pow­ered servers?

One way is to start ask­ing ques­tions on Server­Fault, search for ad­di­tion­al blogs and waste days try­ing to fig­ure out how fast cen­tral­ized syslog ac­tu­al­ly is—the ques­tion which has no an­swer, be­cause it de­pends on the ac­tu­al per­for­mance of the vir­tu­al ma­chines, the num­ber of re­quests on the servers, the pat­terns, the speed of data trans­fer be­tween the web servers and the log servers, the abil­i­ty to dis­trib­ute the load, etc.

An­oth­er way is to ac­tu­al­ly do the damn job and im­ple­ment log­ging. It works—great. If it doesn't, only then I'll search for an al­ter­na­tive, based on ac­tu­al sta­tis­ti­cal data.