Don't use it, it's slow
Recently, I’ve received an e-mail like this from a colleague of mine, the e-mail being sent to a few people in the company:
“FYI, we have removed SQL computed columns used when computing the gauges. The profit is impressive: the prices are now loaded immediately, while before, they took up to fifteen seconds to load. Therefore, I recommend you to not use computed columns in SQL Server.”
The only acceptable reaction to that would be:
WHAT THE…! ARE YOU CRAZY, DUDE?!
Surprisingly, some companies have an opposite reaction and welcome such nonsense.
I start to believe that there are people who don’t get performance and benchmarking, and nothing would help them. For them, benchmarking is:
A buzzword. It doesn’t matter what it means. It sounds nice, so why not using it?
A sort of voodoo magic which should consist of randomly changing code until you get some performance gains somewhere. It doesn’t matter if those performance gains indicate a real optimization or if nobody cares. It doesn’t matter neither whether those random changes affected negatively the project somewhere else.
A way to seek approval for their work (if we accept that wasting days or weeks doing useless crap can be called “work”), with no risk of being criticized. Something was “slow”, and now it’s “fast”. We don’t care about details. We don’t care about what it means. We don’t care about implications. It was “bad”, and now, it’s “good”. Objectiveness rocks.
This deep misunderstanding is particularly harmful on three levels.
The first to be harmed is the code base itself. Voodoo optimization will often degrade the code base and cause important issues somewhere in the project. Often, those optimizations slow down the product significantly: unless the change actually solves a bug or replaces an inappropriate algorithm by an appropriate one, the decrease in performance is usually here.
The second to be harmed is the project itself. Voodoo optimization performed by inexperienced programmers usually takes time. Randomly changing code in order to move from “bad” to “good” can take days or weeks. Since there are no profiling, the process can only be random, so instead of working on 4% of the code base which causes 50% of performance issues, one works on 100% of the code base, i.e. 25 times more than needed. The lack of both proper tools and the proper techniques makes things even more difficult. If we also consider that the “optimization” slows down the product and decreases the quality of the code base, without bringing anything useful, it’s easy to see how harmful such tasks could be.
The third to be harmed is the company. By accepting such practices, the company builds a culture based on myths, rather than facts. When the unverified, unfounded assertions of this type become an accepted practice, some important issues would appear sooner or later, with dreadful consequences on the ability, for this company, to make reasonable choices. Anything may become a fact, and a nonsense such as: “I remember working with Python. It’s terribly slow. We should really use Java instead.” becomes a commonly accepted fact, since nobody questions the assertions related to performance.
Not only people stop questioning the assertion itself, but they even stop thinking about the generalization. Since we don’t have any actual statistical data, we can’t tell what the context is. Somebody done an ad-hoc performance comparison of A versus B in a specific context, using specific tools, under specific circumstances. Given the lack of details, generalization occur, and now the accepted fact becomes “A is always, globally slower than B”. It has nothing to do with the original comparison, but who cares?
I try to imagine a scientist who would try to publish a report such as:
“I’ve done some research, and found that eating ice cream every day increases the risk of a heart attack. I advise everyone to stop eating ice cream: a person who never eats ice cream cannot have a heart attack.”
Such nonsense is impossible in scientific community. The similar nonsense is common in many IT-related companies.
What would a correct performance report look like?
If we were comparing some technical choice versus another, we could, indeed, find that the alternative A is faster than the alternative B. A report could be made from this observation. This report would mention:
How profiling was done. Your way to gather the data can be wrong, therefore I want to be able to verify it. For example, using Stopwatch in C# is not the correct way to measure a performance of a method.
The exact metrics. Your interpretation of the data should be verifiable, therefore I want to see the data.
The context. There is no such a thing as context-agnostic profiling or context-agnostic benchmarking. Profiling of an algorithm done in a given context is irrelevant for another language, another platform, another framework, another hardware or another compiler.
The studied case. It makes no sense to compare two programming languages or two ORMs. What was actually tested? Take a comparison between sort algorithms: was the benchmark done on multiple sets of a few thousand values, or a set of trillions of entries? Was it comparing numbers or blobs?
The implementation itself. What if the alternative which, according to your claims, is slow, is just implemented with a bug?
The possible side effects. Take a comparison between sort algorithms: was the priority a higher speed or a lower memory footprint? What if the “faster” algorithm uses enormous amount of time, and thus cannot be used for large sets?
If the report lacks one of those elements, its credibility should be questioned, and the author can be invited to work a bit more on it. If the report lacks two or more of those elements, the report is pretty useless and should be thrown away.