Home Home

Written in stone

Arseni Mourzenko
Founder and lead developer, specializing in developer productivity and code quality
130
articles
January 13, 2015

On most projects, a huge amount of time is spent hypothesizing about subjects which not only don't matter, but often don't have a definitive answer. I often see programmers arguing about which one of two implementations is faster and which one will scale better, without bothering to ask themselves if the project will ever be that popular that questions of performance or scale would actually matter.

This is not exclusively a problem of education. A more serious issue is the fear that decisions taken today, code written right now, would hunt the team forever, which actually happens more often than it should.

For instance, how many projects can migrate with ease from Entity Framework to Orseis once it's found that it will make the product faster? In most projects, such change will certainly not be welcomed among the programmers, and would impact the whole code base.

How long would it take to migrate from a home-grown solution to an industrial-grade message queue service?

When such changes are truly painful, it becomes understandable why programmers may spend a few days arguing which ORM should they use or whether they should pick an existent message queue service or roll their own (the answer to the second question, by the way, is always “just stop reinventing the wheel”).

The fact is, if the architecture of the project is bearable, comparing the hypothetical merits of different approaches stops making sense:

  • It takes too much time. It's faster to implement one approach, than and only if it appears to be the wrong one, implement another one and compare both.

  • Most arguments in favor of an alternative given early during the development of the product would be wrong anyway.

  • It simply doesn't matter.

The project I'm working on right now is a perfect example. A few months ago, I didn't know Linux enough and was suspicious about centralized logging with rsyslog, so I made a mistake to implement my own solution which used RabbitMQ. When I finally understood that rsyslog is the right way to go, it took just a few minutes to switch the logging mechanism. And I consider this too long, because I screwed a few things design-wise. Otherwise, it would consist of changing two lines of code.

Another example is Apache logging. Blogs I've read recommend using syslog for Apache access logs as well. But wouldn't it be a bottleneck when logging servers will be hit by a dozen of Apache-powered servers?

One way is to start asking questions on ServerFault, search for additional blogs and waste days trying to figure out how fast centralized syslog actually is—the question which has no answer, because it depends on the actual performance of the virtual machines, the number of requests on the servers, the patterns, the speed of data transfer between the web servers and the log servers, the ability to distribute the load, etc.

Another way is to actually do the damn job and implement logging. It works—great. If it doesn't, only then I'll search for an alternative, based on actual statistical data.