Those things we measure

Arseni Mourzenko

Founder and lead developer

177

articles

May 20, 2017

Tags: quality 36

A recent change in Bookshelf service removed the counter which indicated how many books were displayed on the page. To be more precise, the counter can only be visible by the user in person, but not publicly.

This is essentially a shift to substance over quantity.

Imagine you're looking at the profiles of two persons. The first profile contains ten books. The second one contains one hundred books. Should it mean that the second person is ten times… better? Clever? Knowledgeable? Skillful? How is the number of books relevant?

Long time ago, I was interviewed by Azeo. When I started to talk about an exciting project I was working on, the interviewer interrupted me and asked how many projects I worked on. She noted my answer in her notepad and seemingly decided that details are not relevant. But why? Why would a person who did fifty e-commerce websites who all look similar be ten times more valuable than a person who worked in her whole career on five large-scale projects which are all different? And yet, they are for Azeo, which is why this company should be avoided as a plague by anyone who cares about working with skillful colleagues.

This mistake is done by many, many companies. Recently, I was surprised to learn that the company I deeply respect is actually basing their promotion decisions on the number of videos the person have watched on Pluralsight. Aside the stupidity of this approach, there is an additional problem of encouraging the employees to run the videos without actually watching them, just to make the statistics grow.

The reason which pushes the companies to prefer quantity over substance is that it's much easier to gather some irrelevant numbers than to measure substance. Then, by giving the measure a wrong name, they give to this measure an impression to have any meaning as to be used in taking decisions.

Let's take a few examples.

The number of hours in the office

The most popular metric in France, used to differentiate good developers from bad developers is the number of hours they spend in the office. Someone who would stay late in evening and come earlier is automatically considered a valuable employee. Someone who doesn't work the minimum number of hours would have to have an uncomfortable talk with the management, and will sooner or later be fired.

The uselessness of this metric is blatant. I can spend eight hours making it look like I'm working. This doesn't mean I actually do any useful work.

The number of lines of code

This metric was popular in the USA, and is now quite frequent in India: in some companies, more lines of code is better. This is possibly the dumbest possible metric, not only because of its irrelevance, but also because it forces programmers to write ugly code. You see, a piece of code which would usually be written in one line can often be written using two, three, four, dozen lines. Those artificial line breaks don't add anything, and just make the code much more confusing. For instance, this piece of code:

@property
def syscall(self):
    logger = logging.getLogger("engine.system_calls")
    return self.syscall_nolog.on_stdout(logger.info)

can also be written like this:

@property
def syscall(
    self
):
    logger = \
        logging.getLogger(
            "engine.system_calls"
        )

    return self.\
        syscall_nolog.\
        on_stdout(
            logger.info
        )

Quite unreadable, but if this is what the employer wants… More importantly, instead of 4 LOC, there are now 13 lines, more than three times more.

Some companies noticed this pattern, and in order to prevent the programmers from adding artificial lines, they decided to make it even worse: to compute the logical lines of code. LLOC metric cannot be tricked through artificial line breaks, although radon—a popular tool to compute metrics for Python—still reports an increase: 8 LLOC for the second piece of code, against 5 LLOC for the first one. Anyway, when LOC measurement encouraged to add artificial line breaks, LLOC encourages to add artificial logical statements. Like this:

@property
def syscall(self):
    parts = ["engine", "system_calls"]
    name = ""
    for p in parts:
        name += "."
        name += p

    no_dot = name[1:]
    logger = logging.getLogger(no_dot)

    tmp = True
    level = logger.info if tmp else logger.debug
    if tmp:
        return self.syscall_nolog.on_stdout(level)

The piece of code does the same thing as the original one. But in a very convoluted way. Its LLOC is 17, against the original 5. And it's a complete mess.

The number of closed tickets

As if it wasn't enough with the two previous metrics, some employers invented another metric: the number of tickets the person closes per month.

An Indian colleague who was working in a company where his promotion depends directly on the number of tickets he closes sent me a few screenshots of the tickets created by himself and his colleagues. It is barely believable that an employer could actually be so stupid as to force the employees to waste their time this way. One ticket, for instance, was about adding a comment to a function. You see, I can either write a function and its comment at the same time, or I can write the function, and leave it without a comment. Once committed to the version control and the original ticket closed, I may come back and say that it's not okay to leave the function without a comment; and so, a ticket will be created, artificially increasing the number of tickets. The most terrifying aspect was that those Indian programmers were using a primitive bug tracking system, which required you to spend several minutes creating a ticket. I bet the programmers are spending more time dealing with the bug tracking system than they are actually writing code.

The code coverage

This is another popular metric. For some reason, some managers decided that 100% code coverage is great, and anything less than that is not great. And so, instead of measuring things that matter, they push their 100% code coverage target.

The fact is, there are blocks of code which need to be tested in depth, and there are blocks of code which are not particularly interesting to test. Covering in unit tests the getters and setters, for instance, doesn't bring anything useful, but does increase code coverage.

Conclusion

In order to increase the quality of the product and the productivity of a team, one has to measure things. However, some measurements are valuable, but other can be not just meaningless, but actively harmful. Don't just measure things that are easy to measure: usually, those metrics are not interesting, and many of them are dangerous. Always think about the possible effects of a given metric, observe the actual effects, and act accordingly.

Those things we measure

The num­ber of hours in the of­fice

The num­ber of lines of code

The num­ber of closed tick­ets

The code cov­er­age

Con­clu­sion

The number of hours in the office

The number of lines of code

The number of closed tickets

The code coverage

Conclusion