Using LOCs to validate hypotheses
In my previous article, I was talking about the tool which gathers the diffs from the version control commits, and uses them to compute the number of lines of code (LOC) per language over time, in order to be able to extract some cool stuff from the data.
For fun, I decided to run the tool on a code base of a client. There, I was working for the past year and a half in a team of two, and so I gathered the metrics both for myself, but also for my colleague, Nicolas. Although I would be very skeptical about comparing two persons through the LOC metric, there were still a few interesting things I've discovered and a few hypotheses I confirmed or refuted.
Work from home impact
Hypothesis: since we started to work from home, I committed more code than before, because of the better working conditions; Nicolas committed less code, because he has a small child at home.
I see absolutely no difference, visually, between the period where we were working at the office, and the period where we worked from home. I was, in fact, truly convinced that I would see, for me, an increase of at least 1.5 between those periods, but it is impossible to assert that there is even a small increase. I would imagine that either the logarithmic scale cannot represent correctly such differences, or, more likely, that the number of LOCs actually remained the same.
Client side, server side
Hypothesis: I work mostly on server side, Nicolas—on client side. He'll outperform me in terms of TypeScript LOCs, and I'll outperform him for C# part.
By comparing our LOCs for specific languages, specifically C#, LESS, SQL, and TypeScript, he outperforms me in the number of SQL lines added 1, but not in other categories. He gets the expected 24% 2 for added C# lines (i.e. for every four lines of C# code I add, he adds one line), but only 61% 3 for added TypeScript.
Table 1 Comparing LOCs between two contributors. The everything line is not exactly the sum of the previous lines, as it also includes the languages such as Bash or Python which weren't listed in the table.
Keeping code base small
Hypothesis: a quite worrisome aspect is that Nicolas doesn't refactor his code. I was imagining that I would have nearly as much lines removed as added, but he would have a very low score on removed lines.
This is clearly visible in Table 1, where I'm removing twenty times as much C# code as he do 4, and even more for LESS, no pun intended. Overall, 5 he doesn't make enough effort cleaning stuff up. This could be sustainable for now, since I remove more lines that he adds (82,754 versus 39,318), but if he finds himself all alone on this project, or with another colleague who also doesn't refactor enough, the code base would grow much faster than it does now.
The number of lines I remove compare to the lines I add represents 89%, which is not bad. More importantly, it is 91% for C#, 98% for SQL, and 95% for TypeScript. Ideally, I think that for a legacy code base, the value should be above 100% (in other words, there should be more lines removed than added), so I should try to improve that.
I write mostly C#
Hypothesis: probably half of the lines I add should be C# code, as I don't do a lot of client side programming, and I don't do much SQL either.
Indeed, 44,705 lines of C# code added represent 48% of all lines of code I add. For the removals, it gets up to 49%—a perfect half. I'm not surprised either to discover that I don't work on LESS code, but SQL metrics are quite surprising: so surprising, actually, that I suspect a flaw, such as some generated code being included by mistake in the metrics.
Paying enough attention to personal projects
Hypothesis: over the period I was working on the project of this customer, I also remained active on my own projects, having a similar LOC/year ratio for both.
According to Table 1, I had added 6 93,250 and removed 82,754 LOCs. Those metrics correspond to a time span going from February 2020 to May 2021. For my personal projects, if I isolate the section which goes from the 1st of February 2020 to May 18th, 2021, the metrics are +86901, −31864. Sure, I haven't removed too much lines, but the first number seems to indicate that I'm quite active on my own projects.
The statistics based on LOCs are not only limited, but inherently dangerous. For every metric, one has to ask himself is it the best metric? Is it even the correct one? How would it be gamed? What could be the negative effect of measuring it? But LOCs are particular in that they are really easy to measure, and, historically, were used to do terrible things, because they are so tempting and so easy to misuse.
A particular care should be taken when comparing two persons from the perspective of a metric which relies on LOCs. The thing is really dangerous, because it encourages you to believe things that are not true, and make decisions that would be flawed. It can be as basic and stupid as saying that the person who produces twice as much LOCs as his colleague should be payed twice as much, but it can also be much more subtle and convoluted.
There are, however, valid cases where LOCs can be applied. For instance, one could use it to have a hint at how fast the code base grows, in order to react before it's too late. Or one can target a specific removed vs. added ratio in order to encourage programmers to refactor their code. In my case, I used the metric to do several things:
I compared two periods of time where a given factor, that is, the work location, changed. This would be problematic if LOCs were measured continuously, and programmers would have been aware that the project manager looks at this metric and would take work-from-home related decisions based on that—essentially, people who want to stay home would inflate their commits by trying to push more LOCs when working from home, while others who want to go back to the office could end up doing less work in order to show how depressed they are. In my case, the measurement was not done over time, and therefore nobody, not even me, thought about gaming it, which makes it quite representative in this case.
I checked how Nicolas and me were using the different languages. While a comparison between two languages for the same person could be problematic because of the compactness factor I explained in my previous article (the same code written in different languages could end up with a very different LOC metric), I don't think there is anything wrong in measuring how two persons are using a given language, aside the fact that each person's style may introduce slight variations in the results.
I measured the difference between added LOCs and removed ones. Valid point, when you want to see how the size of the code base evolves over time, and also a valid point, when you want to ensure all programmers do their best to keep the code base small. I would be cautious, however, if the metric is actually used to change the behavior. In fact, one has to switch to something more meaningful, such as complexity points, if he wants to encourage the team to reduce complexity. Negative LOCs have a drawback of encouraging programmers to complact their code, making it more difficult to read, and nobody wants that. By comparison, things such as cyclomatic complexity (when less is better) is more difficult to trick.
I put a specific language into a more global context of all LOCs added or removed. Maybe not the most wise thing to do because of the compactness factor; nevertheless, it gives a rough view of the things.
I compared two code bases. This could possibly be the weirdest thing to do, as there are so many differences between them, but here again, I needed only a very approximative number to understand how much do I work on personal projects.
Those five illustrations show how LOCs enriched with additional information about the language or the author can be used effectively to extract some useful information from a version control. It extends the observations I did in my previous article to a level of multiple repositories, with multiple persons working on the code base. They allow to confirm or refute some of the hypotheses I had in a non-ambiguous way, and although the value of those metrics is limited, they can be used effectively in lots of cases.