Those things we measure

Arseni Mourzenko
Founder and lead developer
May 20, 2017
Tags: quality 36

A re­cent change in Book­shelf ser­vice re­moved the counter which in­di­cat­ed how many books were dis­played on the page. To be more pre­cise, the counter can only be vis­i­ble by the user in per­son, but not pub­licly.

This is es­sen­tial­ly a shift to sub­stance over quan­ti­ty.

Imag­ine you're look­ing at the pro­files of two per­sons. The first pro­file con­tains ten books. The sec­ond one con­tains one hun­dred books. Should it mean that the sec­ond per­son is ten times… bet­ter? Clever? Knowl­edge­able? Skill­ful? How is the num­ber of books rel­e­vant?

Long time ago, I was in­ter­viewed by Azeo. When I start­ed to talk about an ex­cit­ing pro­ject I was work­ing on, the in­ter­view­er in­ter­rupt­ed me and asked how many pro­jects I worked on. She not­ed my an­swer in her notepad and seem­ing­ly de­cid­ed that de­tails are not rel­e­vant. But why? Why would a per­son who did fifty e-com­merce web­sites who all look sim­i­lar be ten times more valu­able than a per­son who worked in her whole ca­reer on five large-scale pro­jects which are all dif­fer­ent? And yet, they are for Azeo, which is why this com­pa­ny should be avoid­ed as a plague by any­one who cares about work­ing with skill­ful col­leagues.

This mis­take is done by many, many com­pa­nies. Re­cent­ly, I was sur­prised to learn that the com­pa­ny I deeply re­spect is ac­tu­al­ly bas­ing their pro­mo­tion de­ci­sions on the num­ber of videos the per­son have watched on Plu­ral­sight. Aside the stu­pid­i­ty of this ap­proach, there is an ad­di­tion­al prob­lem of en­cour­ag­ing the em­ploy­ees to run the videos with­out ac­tu­al­ly watch­ing them, just to make the sta­tis­tics grow.

The rea­son which push­es the com­pa­nies to pre­fer quan­ti­ty over sub­stance is that it's much eas­i­er to gath­er some ir­rel­e­vant num­bers than to mea­sure sub­stance. Then, by giv­ing the mea­sure a wrong name, they give to this mea­sure an im­pres­sion to have any mean­ing as to be used in tak­ing de­ci­sions.

Let's take a few ex­am­ples.

The num­ber of hours in the of­fice

The most pop­u­lar met­ric in France, used to dif­fer­en­ti­ate good de­vel­op­ers from bad de­vel­op­ers is the num­ber of hours they spend in the of­fice. Some­one who would stay late in evening and come ear­li­er is au­to­mat­i­cal­ly con­sid­ered a valu­able em­ploy­ee. Some­one who doesn't work the min­i­mum num­ber of hours would have to have an un­com­fort­able talk with the man­age­ment, and will soon­er or lat­er be fired.

The use­less­ness of this met­ric is bla­tant. I can spend eight hours mak­ing it look like I'm work­ing. This doesn't mean I ac­tu­al­ly do any use­ful work.

The num­ber of lines of code

This met­ric was pop­u­lar in the USA, and is now quite fre­quent in In­dia: in some com­pa­nies, more lines of code is bet­ter. This is pos­si­bly the dumb­est pos­si­ble met­ric, not only be­cause of its ir­rel­e­vance, but also be­cause it forces pro­gram­mers to write ugly code. You see, a piece of code which would usu­al­ly be writ­ten in one line can of­ten be writ­ten us­ing two, three, four, dozen lines. Those ar­ti­fi­cial line breaks don't add any­thing, and just make the code much more con­fus­ing. For in­stance, this piece of code:

def syscall(self):
    logger = logging.getLogger("engine.system_calls")
    return self.syscall_nolog.on_stdout(

can also be writ­ten like this:

def syscall(
    logger = \

    return self.\

Quite un­read­able, but if this is what the em­ploy­er wants… More im­por­tant­ly, in­stead of 4 LOC, there are now 13 lines, more than three times more.

Some com­pa­nies no­ticed this pat­tern, and in or­der to pre­vent the pro­gram­mers from adding ar­ti­fi­cial lines, they de­cid­ed to make it even worse: to com­pute the log­i­cal lines of code. LLOC met­ric can­not be tricked through ar­ti­fi­cial line breaks, al­though radon—a pop­u­lar tool to com­pute met­rics for Python—still re­ports an in­crease: 8 LLOC for the sec­ond piece of code, against 5 LLOC for the first one. Any­way, when LOC mea­sure­ment en­cour­aged to add ar­ti­fi­cial line breaks, LLOC en­cour­ages to add ar­ti­fi­cial log­i­cal state­ments. Like this:

def syscall(self):
    parts = ["engine", "system_calls"]
    name = ""
    for p in parts:
        name += "."
        name += p

    no_dot = name[1:]
    logger = logging.getLogger(no_dot)

    tmp = True
    level = if tmp else logger.debug
    if tmp:
        return self.syscall_nolog.on_stdout(level)

The piece of code does the same thing as the orig­i­nal one. But in a very con­vo­lut­ed way. Its LLOC is 17, against the orig­i­nal 5. And it's a com­plete mess.

The num­ber of closed tick­ets

As if it wasn't enough with the two pre­vi­ous met­rics, some em­ploy­ers in­vent­ed an­oth­er met­ric: the num­ber of tick­ets the per­son clos­es per month.

An In­di­an col­league who was work­ing in a com­pa­ny where his pro­mo­tion de­pends di­rect­ly on the num­ber of tick­ets he clos­es sent me a few screen­shots of the tick­ets cre­at­ed by him­self and his col­leagues. It is bare­ly be­liev­able that an em­ploy­er could ac­tu­al­ly be so stu­pid as to force the em­ploy­ees to waste their time this way. One tick­et, for in­stance, was about adding a com­ment to a func­tion. You see, I can ei­ther write a func­tion and its com­ment at the same time, or I can write the func­tion, and leave it with­out a com­ment. Once com­mit­ted to the ver­sion con­trol and the orig­i­nal tick­et closed, I may come back and say that it's not okay to leave the func­tion with­out a com­ment; and so, a tick­et will be cre­at­ed, ar­ti­fi­cial­ly in­creas­ing the num­ber of tick­ets. The most ter­ri­fy­ing as­pect was that those In­di­an pro­gram­mers were us­ing a prim­i­tive bug track­ing sys­tem, which re­quired you to spend sev­er­al min­utes cre­at­ing a tick­et. I bet the pro­gram­mers are spend­ing more time deal­ing with the bug track­ing sys­tem than they are ac­tu­al­ly writ­ing code.

The code cov­er­age

This is an­oth­er pop­u­lar met­ric. For some rea­son, some man­agers de­cid­ed that 100% code cov­er­age is great, and any­thing less than that is not great. And so, in­stead of mea­sur­ing things that mat­ter, they push their 100% code cov­er­age tar­get.

The fact is, there are blocks of code which need to be test­ed in depth, and there are blocks of code which are not par­tic­u­lar­ly in­ter­est­ing to test. Cov­er­ing in unit tests the get­ters and set­ters, for in­stance, doesn't bring any­thing use­ful, but does in­crease code cov­er­age.


In or­der to in­crease the qual­i­ty of the prod­uct and the pro­duc­tiv­i­ty of a team, one has to mea­sure things. How­ev­er, some mea­sure­ments are valu­able, but oth­er can be not just mean­ing­less, but ac­tive­ly harm­ful. Don't just mea­sure things that are easy to mea­sure: usu­al­ly, those met­rics are not in­ter­est­ing, and many of them are dan­ger­ous. Al­ways think about the pos­si­ble ef­fects of a giv­en met­ric, ob­serve the ac­tu­al ef­fects, and act ac­cord­ing­ly.