Most performance questions are wrong

Arseni Mourzenko

Founder and lead developer

179

articles

March 15, 2015

I don't remember any question posted on SoftwareEngineering.SE related to performance which would be a good, on-topic question.

I remember questions which were completely wrong or simply unanswerable.

It seems like performance-related discussions are purely speculative. They could be very concrete, but it requires great efforts and a lot of experimentation.

Questions asked too soon

In general, people start asking questions too soon. The worst case is when a person starts bothering before even hitting the performance problem. This usually takes two forms: how to make a given (unprofiled) code faster and which approach to chose.

How to make a given code faster?

Many programmers start asking this question too soon, that is before they can even suspect any performance issue. This leads to premature optimization which in turn leads to unreadable—and ironically often slower—code.
What approach to chose?

We mistakenly believe that core choices, such as the choice of a programming language or a database, have substantial consequences on the project, and may be the thing which makes all the difference. I already explained why this belief is completely flawed.

This is why I'm particularly disappointed by questions such as What is better for scalability for this specific dataset, MongoDB or MySQL?. The discussion is irrelevant and speculative. I can answer by telling about a great experience from a similar project where Oracle Database provided outstanding performances and excellent scalability. Another person may tell that CouchDB was the answer to all his scalability and performance issues in a very similar project. Both answers may be based on statistical data—objective stuff, but still be completely irrelevant, being context-dependent and restricted to a scope of a specific project.

There is no absolute answer to questions like this. What the author should do is to focus on the actual stuff that matters, that is a proper abstraction of the data access layer.

Both situations lead to a paradoxical situation where we believe that making a choice right now is easy then postponing it and dealing with the consequences later, but by dealing with it right now, we just shift our focus from important things to stuff we can't predict.

Or can we? This leads to a problem of optimization techniques knowledge, which is a different kind of questions on Programmers.SE.

Optimization techniques knowledge

Imagine creating a C# application which deals with pictures doing pixel-based stuff such as a diff between two images for motion detection. Also imagine that there are no libraries which do the thing, so we need to do it from scratch.

One way is to access pixels using .NET Framework's GetPixel() method. Simple and straightforward, this approach will quickly result in terrible performance.
Another way is to use unsafe and deal with pointers. Very scary, but still easy and much faster.
Finally, the most brutal way is to delegate the job to a GPU, obtaining excellent performance but at a cost of much greater effort from an ordinary C# programmer.

Choosing the last option looks like premature optimization: why using an overly complicated solution when the image manipulation may not even be the bottleneck?

On the other hand, the first two alternatives are interesting. It doesn't take much effort implementing the second one from the beginning, and it is likely that the first one will indeed become the bottleneck. Is there premature optimization?

I'm hesitant. If I work with a C# programmer who wants to write such application, I would advise him to use pointers from the beginning. The real performance benefit is much more important compared to the slightly more complicated code, even in a context where the bottleneck might be somewhere else.

But this is not that important. Imagine a programmer who doesn't know that GetPixel() is inherently slow. So what? He tries the first solution, hits a performance issue, profiles his application and immediately sees that the problem is GetPixel(). The next half an hour, he spends replacing the current code by the one which uses pointers, the benefit being that he learnt something interesting today (and learning through practicing is much more interesting than learning from someone). The waste of 30 minutes is not the most terrible thing which can happen to a project.

On Programmers.SE, questions of this type are problematic for two reasons:

They are often too localized and may become invalid. What if GetPixel() becomes much faster in the next version of .NET Framework? What if the next version of the framework adds an elegant way to work with bitmaps, hiding all pointers-related stuff?

I remember, long time ago, replacing a = a + 3; by a++; a++; a++; in C programs. And this simple optimization had its effects which actually mattered. Today, I would immediately flag any a++; a++; a++; in C# during a code review as being a mistake. Yes, a mistake. It makes the code difficult to maintain, while having zero impact on the performance due to JIT compiler optimization techniques.
They may attract opinionated answers which are not supported enough by the facts, or are simply irrelevant. Instead of helping the person, those answers will only make it more difficult to understand the thing and to focus on the important things. It's often simpler to just do the profiling.

Exploring optimization techniques

The last category of performance questions consists of asking not which thing is faster or how to make it faster, but why is one thing faster than another one.

Those discussions are usually missing context or don't go deep in the implementation details. For example, a question may ask about an optimization technique in Java, while forgetting to specify the version of Java compiler, as well as the options. Different versions may have different optimizations, and different options may affect performance as well.

A relevant question will list all the specifics (language version, compiler version, compiler options, platform, architecture and how benchmarks were done) and a relevant answer will explain, based on the source code of the compiler or its official documentation or some reliable sources, why is some piece of code faster than another one. This is not even close to what happens on SoftwareEngineering.SE.

Most performance questions are wrong

Ques­tions asked too soon

Op­ti­miza­tion tech­niques knowl­edge

Ex­plor­ing op­ti­miza­tion tech­niques

Questions asked too soon

Optimization techniques knowledge

Exploring optimization techniques