Do we think enough about coding standards we use?
What is wrong about coding standards and “best practices” in general is that they throw the thinking process away. This may be a good idea for manual workers, but is quite problematic for developers.
Coding standards, for instance, usually take a form of a set of mandatory rules pushed down to the developers by a manager. If those coding standards are written by a skilled person, everything goes right. If not, the document does more harm than good.
Let me start by explaining a model which sorts the rules you may find in a coding standards document by objectivity degree.
At the very left, you find choices between two or more equally valid alternatives. None of those choices is plainly right or plainly wrong, and one will pick one over another not because of its intrinsic benefits, but either because personal preferences, habit, or some reasons which are not that important.
Those choices should be made for a reason of consistency across the code. What is important here is to actually make a choice, and not which choice was made.
Tabs vs. spaces is an archetype of such choices. It seems that more and more people agree that spaces make more sense, but it also makes complete sense to chose tabs. What doesn't make sense is to let developers mix tabs and spaces through the code base.
At the very right, there are best practices which follow a precise reasoning. Any other choice is absolutely wrong, either because it doesn't make sense, or because it will lead to bugs with impressive consistency.
For instance, if you declare variables in the middle of a function in JavaScript, you're doing it wrong. If you omit semicolons, you're absolutely, completely wrong too.
Some guidelines are somewhere in the middle. For instance, putting comma at the end of the last line of an array initialization in C#, like this:
var numbers = new []
{
182,
54,
317,
220, // ← Noticed the comma here?
};
makes sense in a context of a commit: if a developer adds an element, the commit's diff
shows that only one line changed. If there is no comma, the diff
will show that two lines changed, which is not exactly representative of the actual change.
Unfortunately, some coding standards are written without in-depth knowledge of the domain. This leads to rules which either don't make sense or make sense only in a limited scope. While reviewing a .NET coding standards document of a company, I found a great example.
The document encourages the use StringBuilder
. Indeed, many developers forget that doing thousands of concatenations to an already long string could lead to performance problems through high memory usage, and StringBuilder
is a great solution. Unfortunately, the statement in the document doesn't give any details:
Use StringBuilder instead of using "+" on the String class.
Now this is problematic. Without providing enough explanation and without specifying the scope of the rule, the statement becomes harmful. By blindly following it, one would use StringBuilder
for simple concatenation of short strings. This leads to several problems.
C# compiler is clever enough to optimize concatenation in some cases, but in similar cases, it won't be able to optimize much if StringBuilder
is used. For instance, this piece of code:
const string Name = "Jeff";
var text = "Hello, " + Name + "!";
Console.WriteLine(text);
becomes:
IL_0000: ldstr "Hello, Jeff!"
IL_0005: call System.Console.WriteLine
IL_000A: ret
while:
const string Name = "Jeff";
var text = new StringBuilder()
.Append("Hello, ")
.Append(Name)
.Append("!")
.ToString();
Console.WriteLine(text);
is compiled into:
IL_0000: newobj System.Text.StringBuilder..ctor
IL_0005: ldstr "Hello, "
IL_000A: call System.Text.StringBuilder.Append
IL_000F: ldstr "Jeff"
IL_0014: callvirt System.Text.StringBuilder.Append
IL_0019: ldstr "!"
IL_001E: callvirt System.Text.StringBuilder.Append
IL_0023: callvirt System.Object.ToString
IL_0028: call System.Console.WriteLine
IL_002D: ret
Looks ugly, right? What bothers me more is C# code. It became less readable, and this alone is a good reason to avoid StringBuilder
unless there is an actual proof, obtained through actual profiling, that string concatenation is the actual bottleneck.
I cheated. In most cases, concatenation wouldn't involve just constants. What if we use a parameter instead?
var text = "Hello, " + i + "!";
Console.WriteLine(text);
gives:
IL_0000: ldstr "Hello, "
IL_0005: ldarg.1
IL_0006: box System.Int32
IL_000B: ldstr "!"
IL_0010: call System.String.Concat
IL_0015: call System.Console.WriteLine
IL_001A: ret
while a similar piece of code which uses StringBuilder
looks a lot like the one which was concatenating constants, the only difference being at IL_000F
:
IL_0000: newobj System.Text.StringBuilder..ctor
IL_0005: ldstr "Hello, "
IL_000A: call System.Text.StringBuilder.Append
IL_000F: ldarg.1
IL_0010: callvirt System.Text.StringBuilder.Append
IL_0015: ldstr "!"
IL_001A: callvirt System.Text.StringBuilder.Append
IL_001F: callvirt System.Object.ToString
IL_0024: call System.Console.WriteLine
IL_0029: ret
Performance suffers too, instead of improving. StringBuilder
variant takes two to four more CPU cycles compared to the concatenation. This is because in the basic case where strings are short and there are few strings to concatenate, concatenation is faster. Under the hood, the concatenation operator “+” is transformed into string.Concat
. If you look at the implementation of string.Concat
, you can easily see why. The method computes the total length of the resulting string, and then calls C++'s FastAllocateString
. Not bad. Instead of recreating a string at every concatenation, it just creates one big string and puts the results inside by doing:
fixed(char *pDest = &dest.m_firstChar)
fixed (char *pSrc = &src.m_firstChar) {
wstrcpy(pDest + destPos, pSrc, src.Length);
}
This means that following the guidelines without understanding what happens under the hood could lead to wrong decisions. In the example, it leads to code which is more difficult to read and maintain, which is slower, and which prevents the compiler from optimizing it.
Questioning stylistic choices (the rules which are at the left in the model I was talking about at the beginning of the article) is out of question. Those choices should be done at the beginning of the project, made mandatory, and never ever questioned since then. What you should question, on the other hand, is the reasoning behind the rules which are on the other side. Was the author skilled enough to draft those rules? Are they still valid? Do they apply to any situation, or just some edge cases? For instance, Crockford's Javascript: The Good Parts is exactly what I expect from any author of a coding standard: an in-depth explanation of why jslint
's standards are the way they are. Not reading the book and not understanding the underlying reasoning makes developers adopt tools such as jshint
which cause bugs in their code. Understanding what's happening under the hood helps taking the right decisions.