Gated checkins are evil
I was recently talking with a developer about a pre-build strategy he wants to set up in his team which uses trunk-based development. His idea is that in order to prevent problematic code from hitting the trunk in version control, a client-side pre-commit hook (i.e. a hook which runs on the developer's machine) will run the linter, compile the application and run some of the tests. The problem he's trying to solve is that the developers don't look regularly at the state of the build; doing those early tasks in a pre-commit hook should make them more aware of the build problems.
I already explained why doing anything other than style checking in a server-side pre-commit hook is not an option. I then claimed that a commit which takes 900 ms. is too slow. In an earlier article on SoftwareEngineering.SE, I addressed specifically the subject of unit tests in a server-side pre-commit hook.
Important note: in the linked article and answer, I was talking about the pre-commit hook which runs on the server, i.e. on the same machine which hosts the repository. In the current article, I'm talking about pre-commit hook which runs on developers' machines. When it comes to server-side pre-commit hooks, putting anything which involves the execution of custom source code is crazy anyway, for obvious security reasons. This includes unit tests, but excludes linters.
In this article, I'll start by explaining more in details why commits should be short, and then explain how to decide which steps of a build pipeline can be moved in a client-side pre-commit hook.
Why commits should be fast
Long commits break the flow
I don't mean that a commit which takes in average five seconds is slow; what I mean is that a 900 ms. commit is already too slow. It is slow because the person has to wait, i.e. remain passive.
For software developers, breaking the flow leads to a huge productivity loss. When a commit takes time, it breaks the flow. If a commit takes a few seconds, the developer remains in front of the monitor, but the flow breaks anyway. If a commit takes a minute, the developer switches to a different task, which necessarily breaks the flow. If a commit takes several minutes, the developer leaves his desk and goes for a cup of coffee or a walk.
Getting back in the zone takes time: fifteen minutes according to Joel Spolsky, and I see no reason not to believe this metric. It may not be particularly important for teams where developers cannot get in the zone in the first place—such as for teams which work in an open space—but for developers who may actually be productive from time to time, this is a huge issue.
Given that it is not unusual to do in average ten commits per day, slow commits will waste two and a half hours per day per developer, not counting the time to do the commits themselves. I don't know how about you, but it seems like... a lot, given the hourly rate of a professional developer.
Long commits discourage committing frequently
Waiting sucks. If developers have to wait at each commit, this will inevitably push some of them to postpone their commits. This leads to multiple issues:
Postponed commits lead to painful merge. As simple as that.
Developers doing minor changes—refactoring, fixing style, correcting typos—will be inclined to put those changes in a bigger commit, such as the fix of a bug. This pollutes those bigger commits, complicates code review and makes it easier for the issues to creep in. When such commit breaks the build, it takes more time to see what caused a regression, especially when the regression is caused by something small, such as a change in a name of a variable during refactoring.
Some developers will commit when they can afford waiting, such as at noon or in the evening, instead of committing after they finished performing an atomic change. Needless it is to say that such commits are perfectly meaningless, and heavily decrease the usefulness of a version control system.
What could be put in pre-commit hooks
The build pipeline can usually have numerous and diverse steps. Here's an example of a build pipeline:
The pipeline can be broken in two: some steps can be performed locally, launched by a pre-commit hook; others will remain on the build server.
The question is where to put the line. If each task is now represented proportionnally to its duration, should the limit be put here?
Or maybe here?
Or here?
I explained in the previous section that commits which take 900 ms. is already too slow. This alone doesn't give us too much opportunities: the choice is probably between no steps in pre-commit hook, or just a style check, or a style check and a linter. In the article I already quoted above, I explained the difficulties I encountered when deciding where to put the linters. Years later, I still keep linters in the build pipeline, and the pre-commit hook only checks the style (the exception being Douglas Crockford's jslint
, given how fast the tool is). Putting linters first in the pipeline, by the way, ensures that I get the feedback in terms of seconds when I break one of their rules.
The duration argument put aside, what else can help us decide what to put in the build pipeline, and what should be done in a pre-commit hook?
One of the goals of the build pipeline is to catch things which were broken by a commit. It could be a bug caught by a unit test, or a performance issue spotted by a stress test, or a problematic pattern detected by a linter, or a style error found by a style checker. A benefit of a pre-commit hook over a build server is that whenever something get caught, the problematic code doesn't stand a chance of reaching the version control (and affect other developers). This is just huge, and if we could run the entire pipeline locally, in a matter of a few dozens of milliseconds, I couldn't be more happy. Just do svn ci -m "..."
, and instantly, the change is deployed on the servers, or you're presented with a message telling that your commit was rejected because it would break something.
In real life, it doesn't happen like this, and not only because the build takes time, but also because some tasks cannot be performed on developer's machine, or don't scale well, or are simply unreliable. Let's discuss those three aspects in detail.
Some tasks cannot be performed on developer's machine. A smoke test or a system test, for instance, requires substantial effort to set up the infrastructure required to run it. For any but the most basic projects, it is impossible to have such infrastructure on a PC of a developer.
Security is another important factor here; even if one trusts a developer by letting him to write code which will be pushed to production, it may not be a good idea to let developers' machines have access allowing to rebuild production servers.
Large projects may require slightly more than an ordinary developer's desktop to be built. It is not unusual to have dozens of beefy servers working in parallel on a single build. Imagining that the build could work on a single machine prevents the approach from scaling.
For smaller projects which can technically be built on an average development workstation, it's still usually a good idea to throw more hardware in order to reduce the build time from, say, twenty minutes down to two. Faster the developer is informed of the issue introduced by his commit, cheaper it is to solve the issue, and this short cycle alone can justify a thousands of dollars expense on dedicated servers.
“It works on my machine.” That's great, but it doesn't mean that it will work anywhere else. One of the issues I encounter regularly, for instance, is that I use a Python package in my app, but forget to declare the dependency to the package. It works perfectly fine locally and all the tests pass, but when I commit the changes, the build server starts telling offending things about my intellectual capacity. On one side, it is obvious that I am the problem here; on the other side, if I was never the problem, I wouldn't need regression testing the first place, would I?
Build servers are based on scripts. They are destroyed and rebuilt several times per day. They run on standardized virtualized hardware. They are clones. Developers' workstations are slightly more volatile, which means that every such machine has its quirks, its versions of packages, its personality. Even if they were reinitialized on regular basis, say once per week (which would be unbearable for their users, but this is a different subject), discrepancies will still be introduced, especially when working on a hard to reproduce bug or a tricky feature. If I made something work after two hours of head-banging, it doesn't mean it actually works everywhere; it may also mean that I adjusted my workstation so that the feature works locally, but wouldn't be able to reproduce what I did on a different machine.
Here's a funny story: a year ago I introduced a regression which was caught at the very end of the chain by the canary tests. I forgot to replicate a change in DNS records in the application I was working on, so the application was still using the old domain name of a dependency service. Nevertheless, it worked perfectly fine locally, where the old DNS records were still in cache, as well as on the build server which also had the DNS records in cache. The problem appeared only in production where the newly deployed servers with the new version of the app and an empty cache were trying to access the domain which didn't exist, leading to lots of HTTP 500. This was the hard way to learn that (1) the build server should never run the tests, and should delegate them to volatile test servers rebuilt after they perform their job, (2) serving HTTP 500 just because a dependent service is unreachable in unprofessional, (3) changing DNS records sucks in all cases.
So here again, it doesn't leave us too much options. Since tests—even unit tests—cannot be trusted when ran locally, they have to be part of the build pipeline, running on dedicated and volatile test servers. What about linters? It depends. It may work for some languages, and may be more difficult for some others to ensure that they produce the same result on every developer's machine. Style checking remains that basic thing which can relatively easy be performed locally.
Or there is another way. One can perform all those tasks on dedicated virtual machines anyway—just like the build does—and just do the orchestration from developer's machine instead of a build server. This would mean that linters, tests and deployment will be as reliable as if they were launched by the build server, as well as be able to scale perfectly well. In this case, what is really the difference between orchestrating the process from a build server versus doing it from a workstation of a developer? It all comes to two factors:
Doing it locally prevents faulty code from reaching version control. You may ask yourself how could that be bad. Easily. Failed builds are important in terms of data analysis and publicity. It is very interesting (and representative of the team and the company) to watch how the ratio evolves over time, and what are the errors which lead to failures. Moreover, the fact that a build failed doesn't mean the commit has no place in this world. If I have to have the last modifications of my colleague right now to avoid a painful merge later, I don't care if it breaks some unit tests. If a developer needs to commit right now because in five minutes, he's leaving for a two-weeks vacation in Dubai, it is much more important for the team to get his modifications and to fix a few broken tests later than to redo what he did during the last two hours.
Doing it locally forces the developer to be informed if the build breaks. Putting aside my explanation about the reasons commits should be fast, blocking the commit is not the best way to notify a developer about anything. There are much more effective ways to do it: show the build in red on a monitoring screen, or send a SMS to the developer, or assault him physically (don't do the last one; had that thing in my eye once, didn't like it). A commit which stalls for two minutes won't be watched by the developer anyway: the developer will launch it, and then go work on something else, and will see a possible error message much later than he could with proper notification mechanism.
To conclude, if you're hesitant at moving some of the steps from the build pipeline to the pre-commit hook, just don't. While style checkers can (and usually should) be in a pre-commit hook, anything else, including the linters, belongs to the build pipeline itself. Doing otherwise will necessarily have a negative impact on the productivity of the team, while the benefits are very questionable.