Home Home Posts Rants about IT DevOps Stuff I'm working on

Linters rock, but they are slow

Arseni Mourzenko
Founder and lead developer, specializing in developer productivity and code quality
110
articles
November 11, 2014

I love linters. I love them so much that I'm actually considering to use some of them at pre-commit stage to reject commits which contain errors. The only thing which is the reason I hesitate to do it is that some of them are quite slow.

Speed matters

Currently, my pre-commit process is the following:

  • Verify the log message. Here, several checks are made to mitigate the risk of wrong commits—commits made by mistake, when the person intended to do something else.

    For example, the log message should end by a dot; enforcing this helped me in situations when I pressed Enter by mistake after entering only a part of the message, such as svn ci -m "Implemented syntax highlighting for live previews using ": since I don't remember what I was using and need to check for the exact name, there is a risk, when coming back to the terminal, to do the commit as-is instead of continuing to write the message.

    Another example is that the log message cannot be the same as one of the five previous messages, so if revision 2044 is “Made it possible to measure the length of the stories.” and at revision 2047, I'm doing a svn ci -m "Made it possible to measure the length of the stories.", the pre-commit hook will reject it. Same logic: if the message is the same, I've probably just went up the history in the terminal and pressed Enter inadvertently. I do it all the time, by the way, given that I often use a separate terminal window for SVN commits and something else (such as running the Node.js application).

    Some have noted that it's strange to not let the commit log message end by an exclamation point or an ellipsis. This is because I don't want such messages in my repository. Exclamation points are a sign that the message is not neutral, descriptive and formal; if the person doing the commit is too excited, he shouldn't do the commit in the first place. Ellipsis are another indication that something is wrong with the message. A message should be complete, self-sufficient. It should tell everything the reader needs to know. In this context, ellipsis doesn't make sense.

  • Enforce style. I've already explained why enforcing style at pre-commit stage is crucial. This is so important that I'll probably write a dedicate article about it. Not enforcing style at pre-commit always leads to poor code base, poor productivity, communication issues and wasted time. There are no situations where not doing it is beneficial¹.

    Prototypes are given a special treatment here: the checker skips directories named “[P|p]rototype”.

That's everything which happens here. Replication to two other SVN servers and backup strategy is out of scope, because it happens after the commit, not before.

One of the goals to keep it basic is to ensure commits are extremely fast. Speed is important: if commits are slow, developers will be encouraged to skip opportunities to commit and do fat commits instead. What may not be obvious is that by extremely fast, I don't mean that a commit which takes in average five seconds is slow; what I mean is that a 900 ms. commit is already too slow. It is slow because the person has to wait, i.e. remain passive. When the developer writes code, does a review, adds constructive comments, does refactoring, checks the changes made since the last commit or writes a descriptive log message, he spends much more than 900 ms., but he is active, so it doesn't matter. Being passive for a few hundreds of milliseconds, on the other hand, is really annoying, and the variability of the delay, i.e. the fact that you don't even know how long would it take makes things only worse.

But linters are nice to have

Despite the time it takes, linters are nice to have during a pre-commit stage, because they have an outstanding value. I stopped counting cases where linters saved me hours of painful debugging. This is expected, since at the opposite of a style checker, which cares only about style, linters check for suspicious things which can result in bugs.

The logic behind setting them at pre-commit level is the same as for style: developers (including myself) are too lazy to check for things on regular basis. If I'm not forced to write code compliant with style rules, I won't, no matter how important I think style is. I just won't check for errors regularly enough, and then, when I check for it a week later, the number of errors to correct will be just too overwhelming.

In the same way, when working with C#, I don't run Code analysis regularly enough unless it is run when project is compiling. But then, the same speed problem occurs: it is too slow (one second for a small project, much more for larger ones), so I find myself compiling less frequently. When Code analysis doesn't run automatically, I might run it a few times a week, but then I find myself spending hours debugging, while I could have found the error within seconds with Code analysis.

Of course, some linters, such as JSLint, are extremely fast. Partially, this is because they don't do too much powerful stuff. Google Closure Compiler, for example, goes further, but gosh, its speed is too terrible.

Let CI server call them

The only way I can see is to move linters to the same level as system and functional tests (both types being often very slow).

But this creates an additional problem which is solved in big-scale projects, but not in small companies: how to inform the developers that their code contains errors?

  • In large projects, the system can push the notifications to developers' machine, or even use SMS.

  • In small projects, there is no formal way to actively notify the developers. Either they are asked to visit on regular basis the CI website, or the manager does it and talks with specific developers when wrong things happen.

This means that for small projects, running linters within the CI workflow doesn't really bring the benefit of nearly real-time information about the potential problems with the code.

Tests, in this way, are somehow different. If I break a few tests, this doesn't mean the code I committed is wrong. This doesn't mean I inadvertently did something I shouldn't. This doesn't mean anything. I may have consciously broke tests, because I know that they'll pass again this evening, or maybe I just don't care about those tests, since they deal with legacy code. Regression testing, while useful, leads too often to the changes in tests, not in code. On the other hand, if the linter tells that I'm wrong, there are chances that I really am wrong.

I'm not sure if I can push the difference further, but I have also an impression that I need to know that a linter found my code wrong much faster than I need for tests. Two reasons:

  1. Essentially, it comes to the importance of being notified very frequently of all style errors. This helps correcting those errors unobtrusively: if I'm stuck after a hard day of work with one hundred errors, this is much more obtrusive than if I have to correct five errors every thirty minutes.

  2. Having an immediate feedback on possible bugs help reducing debugging. Unfortunately, I don't have hard data, but my very subjective impression is that many bugs found by linters are debugged immediately after the code is written, and many bugs debugged immediately after the code is written can be found by linters.

Regression testing, on the other hand, doesn't help much when I need to debug the newly written code—it comes handy when newly written code breaks existent functionality somewhere else.

This makes linters special. On one hand, they have the same importance as style checkers and must have immediate feedback. On the other hand, they are slow, and performance-wise, are closer to system and functional tests.

Where would you put them? How would you design the notification mechanism?

1 Of course, I'm talking about ordinary projects. Prototypes shouldn't enforce any given style, because in a prototype, we can't care less. Rapid development projects, i.e. projects where the priority is speed, are not an exception, by the way: for such projects, you still have to enforce style, because this helps to reduce time wasted later when reading and maintaining the code. The difference with a prototype is that prototypes are usually small, and their code will be neither read a lot, nor maintained. Code of a rapid development project will not be thrown few weeks later.