How metrics impact the way we work

Arseni Mourzenko

Founder and lead developer

177

articles

December 8, 2020

Tags:

A long time ago, I started studying how quality and productivity can be measured. I wrote articles about it, and I convinced a few dozen entrepreneurs about the importance of measurement in their respective companies. In most of the cases, the difficult part was not to explain that measurement has an impact, but to make them understand how to measure, and how measurements influence the system being measured.

One of the things which seems to be difficult to grasp is that when an employee knows that the company is measuring a given factor, he will optimize his work to perform better on this specific factor. And, naturally, the other factors could decline. This means two things. One is that measurement itself is an active tool. You don't just probe a system. You are influencing it. The second one, more subtle, is that the influence may be good or bad, whatever good and bad means. Ideally, measurements should lead to better quality and productivity. But in practice, they don't always have this positive effect.

I'm not even talking here about measurements which are obviously bad. A classical example is the one of the manager who pays his programmers depending on how much lines of code they write per month. Although, the fact that this metric was and still is quite popular in some companies indicates either the complete dumbness of some managers, or more likely the fact that it's not that easy to see what's good and what's not so good when you are yourself part of the system.

Instead, I talk about metrics which seem to flow naturally from very obvious factors, such as how much copies of the software the company can sell, or how little employees the company hires. In order to fully grasp an impact of a given measurement, one needs not only to have a good understanding of the system, of the measurement itself, and of the supposed impact of the measurement on the system. What's also crucial is to understand how the measurement can be performed in order to obtain the results which actually have a meaning.

Here's a practical example of such difficulty, which was recently encountered by a colleague. The original difficulty he had in his team was the constant waste of time and energy because of the merge conflicts. Every programmer within the team was working on his own branch, and on regular basis, the branch was merged to the trunk. More often than not, this merge resulted in conflicts, because given the nature of the software product and of the team, very often multiple programmers found themselves working on the same files, and restructuring the project or the team to avoid that wasn't a good idea.

When my colleague had enough of the complaints about how a bad merge wasted two hours of work, he discovered that there were two additional problems with the way the version control was used.

First, some programmers misunderstood what a commit should contain, and were committing sporadically and writing incomplete commit messages such as “WIP” or “Fixed bug.” Some of them were considering commits as a tool to save their changes by the end of the day, and obviously, making any meaningful commit message in such cases was practically impossible. One guy had written “A bunch of changes to some files” as a commit message; quite illustrative, I think. All this made it challenging for others to understand what exactly those programmers were doing.

Second, some other programmers committed extremely rarely, sometimes not making any commits for a week or more. Their commit messages were clearer, since they were simply referencing the feature or the bug on which they were working. However, the massive number of changes in the commit made it very challenging to merge later as well.

The management created three measurements:

The number of commits. The metric was displayed and discussed at stand-ups, and once per sprint, the persons who didn't commit often enough had to explain what happened. Sometimes, it made sense for them to have few commits. Sometimes, it didn't, and they were asked to focus on it.
The number of merges to the trunk. Every time someone merged his changes to the trunk, if the build was green, the author received a virtual cookie from the build system. Ten cookies could be exchanged for one very real and physical cookie. As a result, the manager spent a few Euros for a few boxes of cookies shared during retrospectives, and programmers gained weight and became happier overall.
The quality of the commit messages. At the stand-up, the commit messages were quickly reviewed, and the persons other than the author had to tell if they find a given message clear enough. The score was noted by the team leader in an Excel file.

Three months later, it was time to see how those metrics influenced the project. It was clear that the number of commits increased: actually, it was about seven times higher now than back then. But the number of commits is a metric, not a goal. In other words, if one developer does twenty commits per day, and the other one does only five, it doesn't automatically mean that the first one produces more, or is a greater developer, or is more productive, or more valuable for the company. As a standalone metric, it doesn't mean anything.

The original goal was to reduce the pain of merge conflicts, and this is the actual thing which mattered to the company, because the time programmers spend on merge conflicts is the time they don't spend producing value: in other words, those merge conflicts represent a waste which should either be eliminated, or at least reduced as much as possible in order to increase productivity.

But how do you measure the time actually wasted handling merge conflicts? My colleague came with the idea of asking developers themselves.

Are you impacted less or more by merge conflicts for the last three months?

The response was rather unexpected for him. Out of twelve programmers, ten answered that they are impacted more now. One said he's impacted less. Another one couldn't answer, because he joined the company three and a half months ago.

Such result was especially unexpected in the context where my colleague himself could see that there are no blaming and arguing about a particularly challenging merge conflict any longer in the office.

When my colleague complained to me, I suggested to ask a differently phrased question the next day:

How painful are the merge conflicts for the last three months?

This time, he got an acclaim. All eleven programmers were expressing their happiness about the merges.

What happened is that with more commits and more merges, programmers got much more merge conflicts. If previously, they were having one or two merge conflicts per week (and spending perhaps hours trying to resolve them), now they were getting tiny little conflicts all the time. One programmer, for instance, complained that the other day, he got eight conflicts in a row: he was fixing some nasty bug, which was difficult to understand, while another person was doing refactoring and committing all the time. This created a situation where the first person was getting a conflict on every merge.

The major difference, however, is that most of the time, those conflicts were solved in a matter of seconds. Look at the diff. Understand what's going on. Check the right option. Enjoy. As easy as that.

This is why they answered this way to the first question: indeed, they were impacted more, not less by the conflicts now. Encountering a conflict eight times in a row is indeed something one would remember for a long time, and talk about it, because it's funny. However, the pain of original, huge conflicts, was gone, and so the change was deemed positive for the company.