Why is positive premature optimization still wrong?
When junior programmers do premature optimization, it is easy to see how it could harm the project. They guess that a given change would make their code faster, and since they don't measure the actual performance, most often than not, they end up with code that is not just harder to maintain, but also appears to be slower.
Now, some developers with a bit more experience in a given language are often tempted writing code in a way that makes it optimized. It happens that, as they have an extensive knowledge of the inner workings of a language or its framework, they also learn many tricks related to performance—little techniques that are a real ice breaker when they met their fellows during mandatory corporate team building sessions.
Those little optimizations, often, do work, that is, they do make the code faster.
That's a good thing, right?
Recently, I had an opportunity to work with such developers.
Did you know that, in C#, Count > 0 is faster than Any()? They do.
Or did you know that it's faster to use x = Array.Empty(); than x = [];? They do.
What's the actual impact, in terms of memory, when you replace a conditional in a loop by a Where() at its source? I can't tell you that, but they can.
Let's do a few benchmarks.
As I said, one of the optimizations is to rely on Array.Empty<T>() when it comes to creating an empty collection. Let's compare it to [].
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<ArrayEmpty>();
[SimpleJob(RuntimeMoniker.Net10_0)]
[PlainExporter]
public class ArrayEmpty
{
[Benchmark] public ICollection<int> ArrayDotEmpty() => Array.Empty<int>();
[Benchmark] public ICollection<int> SquareSquare() => [];
}
| Method | Mean | Error | StdDev |
|---|---|---|---|
| ArrayDotEmpty | 0.0000 ns | 0.0000 ns | 0.0000 ns |
| SquareSquare | 4.4372 ns | 0.0310 ns | 0.0290 ns |
For some reason, I always thought that the compiler will optimize [] by replacing it by something that would be similar to Array.Empty<T>(). It appears that not only it doesn't do that, but the [] approach actually creates an instance of a collection—an operation that costs 4.4 nanoseconds on my machine.
Let's try another one. You know about CA1860? The one that says that whenever you have a collection, you should never ever use Any(), unless “performance isn't a concern.”
public class CountOrAny
{
private ICollection<int> collection;
[Params(0, 1, 10)] public int length;
[GlobalSetup] public void Setup() => collection = [..Enumerable.Range(0, length)];
[Benchmark] public bool Any() => collection.Any();
[Benchmark] public bool Count() => collection.Count > 0;
}
| Method | Length | Mean | Error | StdDev |
|---|---|---|---|---|
| Any | 0 | 0.3292 ns | 0.0040 ns | 0.0037 ns |
| Count | 0 | 0.1827 ns | 0.0043 ns | 0.0040 ns |
| Any | 1 | 0.2503 ns | 0.0032 ns | 0.0025 ns |
| Count | 1 | 0.2246 ns | 0.0019 ns | 0.0016 ns |
| Any | 10 | 0.2525 ns | 0.0091 ns | 0.0081 ns |
| Count | 10 | 0.2216 ns | 0.0070 ns | 0.0058 ns |
On a collection that has elements, using .Count > 0 instead of .Any() saves up to 0.03 nanoseconds. On an empty collection, it saves 0.15 nanoseconds.
What about using LINQ? Those guys really despise LINQ, as, according to them, it has a negative impact on memory. During a recent review, a developer had a method that, given a collection, would filter its elements based on a criteria, and return the sum of the elements that passed the filter. He used LINQ's Where() and Sum(), and was invited by a reviewer to rewrite the code, using a simple loop and a conditional statement.
var values = Enumerable.Range(0, 100);
Console.WriteLine($"Loop: {MeasureMemory(() => Loop(values))} bytes.");
Console.WriteLine($"Partial: {MeasureMemory(() => Partial(values))} bytes.");
Console.WriteLine($"Linq: {MeasureMemory(() => Linq(values))} bytes.");
static long MeasureMemory(Action action)
{
action(); // Warm up JIT.
var before = GC.GetAllocatedBytesForCurrentThread();
action();
var after = GC.GetAllocatedBytesForCurrentThread();
return after - before;
}
static int Loop(IEnumerable<int> collection)
{
var sum = 0;
foreach (var x in collection)
{
if (x % 2 == 0 || x % 3 == 0)
{
sum += x;
}
}
return sum;
}
static int Partial(IEnumerable<int> collection)
{
var sum = 0;
foreach (var x in collection.Where(x => x % 2 == 0 || x % 3 == 0))
{
sum += x;
}
return sum;
}
static int Linq(IEnumerable<int> collection) =>
collection.Where(x => x % 2 == 0 || x % 3 == 0).Sum();
| Method | Memory usage |
|---|---|
| Loop | 40 bytes |
| Partial | 96 bytes |
| Linq | 96 bytes |
Essentially, every one of their suggestions was technically true, that is, the solution they were suggesting was either faster or used less memory.
And still, they are completely wrong.
It's like those people who would unplug their USB charger when it's not in use, and feel well, because they did something great for the planet and their electricity bill—missing that they have that old 2 kW oven where you need to wait for an extra ten minutes to heat anything.
Do you remember the last time you were spending a week trying to figure out why a nasty task takes hours in production on beefy servers, but would happily run in a matter of minutes on your decade old laptop? Compared to the painful hours spent at a tricky problem, a straightforward rule such as CA1860 looks like a bargain. But it is by no means zero-cost.
And I'm not even talking about the time it takes typing Array.Empty<T>(). I'm pretty sure there is a way to make an IDE make such change for you.
The actual cost comes from how puzzling the code is.
A simple x = []; is great. It is not clear—if it was, I wouldn't be puzzled by the actual types being used by the compiler under the hood. But it is at least a very intuitive abstraction, intuitive even for beginners—in a sort of a magic way where every time you use it, it will do what you intended to do.
On other other hand, Array.Empty<T>() is not great. It's not an abstraction. It's low level stuff creeping into code. Not only that, but it also presents a risk of propagating arrays into code, which is a bad thing.
How many times do you use AVX2 in your C# code? I'm pretty sure there are lots of places where you can save much more than 0.15 nanoseconds by using SIMD. And there are opportunities where introducing SIMD would be as “easy and straightforward” as adding Array.Empty<T>(). But we don't do that, because it's low level. Well, we do, but in extremely rare cases where the profiler have shown that moving to SIMD would make a specific computation much faster.
And this is it. The key is how much is being saved, and is it worth it. Aside funny case studies, low level optimization has its place only when the profiler have shown that introducing, and then keeping low-level code has a gain that makes it worth the effort. Outside, a professional developer is expected to write code that is clear and clean, code that communicates intent, and that relies on the abstractions, as soon as they are intuitive and natural. And not prone to errors; that's important too. Consider the CA1860 example:
collection.Any()
is as simple and abstract as it could get. There is zero chance to make a mistake here. Not so much for:
collection.Count > 0
as a > misspelled as >= could break havoc. “What havoc?—would you ask—The error would be caught immediately by unit tests!” Well, first, not immediately, but when and if someone runs them. And then, failing the test to go back to the code to search for a mistake could easily waste minutes of time.
Let's say a test did fail and the programmer wasted two minutes noticing the problem, searching for the culprit, fixing it and rerunning the failed test. Let's also imagine that at runtime, half of the time, the collection is empty. In average, the reliance on Count > 0 would save us, then, 0.09 nanoseconds—one can round it to 0.1 ns. Let's also pretend that the company is paying 100$/hour to the programmer, and the same amount for the server farm composed of ten servers. If each server is calling this method in a loop a huge one million times per hour, it would take about 120,000 hours, that is, about 13.7 years, to “compensate” for the original loss of two minutes. That's money well spent!
Nano-optimization has an additional negative impact: it makes it look like the job is being done, and so steals time from the actual optimization efforts.
As an illustration, the software product developed by the aforementioned team is just terrible. Perfectly unusable. On a 32 GB laptop, it frequently fills all the memory. Pretty much any action is slow as hell. Just starting it requires to wait for several minutes, the time it eats a few gigabytes of memory, just for the sake of showing an empty screen.
And nano-optimization is one of the reasons for such a mess.
From the perspective of management, it looks like optimization is being done. After all, one guy had to rewrite his code to stop using LINQ, another one was chasing collection.Any()—that's effort, right? It is. And it gives an impression that the team cares about performance. But by not focusing on what matters, they end up with an unusable product. Instead, they should have been profiling, gathering data about the time their users are waiting for the software, and listening to the actual users.