Utility classes are wrong

Arseni Mourzenko
Founder and lead developer
177
articles
January 20, 2017
Tags: rant 34 terminology 2

Every­one has some­where on a PC a di­rec­to­ry called “mis­cel­la­neous”, or sim­ply “misc.” This is where we put or­phaned stuff which haven't found its place some­where else, grouped with sim­i­lar files in a well-named di­rec­to­ry. For some peo­ple, the desk­top di­rec­to­ry is the ac­tu­al lo­ca­tion for mis­cel­la­neous stuff.

In a sim­i­lar man­ner, most pro­jects I've seen de­vel­oped over time a tu­mor and gave it an ugly name “util­i­ty.” What hap­pen is that the pro­ject grows or­gan­i­cal­ly, and the ar­chi­tec­ture and over­all code or­ga­ni­za­tion doesn't nec­es­sar­i­ly keep up, ei­ther be­cause the team doesn't have time for refac­tor­ing, or be­cause no­body has enough knowl­edge of the over­all pro­ject struc­ture, or sim­ply be­cause no one cares. Dis­crep­an­cies ap­pear, and soon­er or lat­er, some­one cre­ates a class which doesn't fit well any­where. So the class end up in a name­space or pack­age which has “util­i­ty” in its name.

The sec­ond en­counter of the “mis­cel­la­neous”-type tu­mor doesn't have a sin­gle pro­ject as its scope, but ei­ther a bunch of pro­jects or, po­ten­tial­ly, all the pro­jects of a com­pa­ny. What hap­pens is that some­one, some­where, needs for a pro­ject to use some code which was al­ready writ­ten for a dif­fer­ent pro­ject with­in the same com­pa­ny. With all the good in­ten­tions, the de­vel­op­er, in­stead of du­pli­cat­ing code, makes a shared li­brary or pack­age and, to high­light the reusable as­pect of the new li­brary, calls it “util­i­ty.”

In the fol­low­ing two sec­tions, I'll ex­plain why both us­ages of “util­i­ty” word are prob­lem­at­ic by them­selves and in­dica­tive of a larg­er is­sue.

“Util­i­ty” as a part of a pro­ject

When ear­li­er in this ar­ti­cle I was talk­ing about the “mis­cel­la­neous” di­rec­to­ry, I said noth­ing about the rea­sons which lead to the cre­ation of such di­rec­to­ry in the first place. The mu­tu­al­ly ex­clu­sive as­pect of di­rec­to­ries cou­pled with a flawed tree struc­ture are the ma­jor rea­sons we end up with the mess. Giv­en the rigid­i­ty of file sys­tem tree struc­tures and the fact that an es­tab­lished struc­ture don't evolve eas­i­ly, it is not sur­pris­ing that many files can­not find their way in the hi­er­ar­chy.

Code source as well takes a form of a lim­it­ed-depth typed-lev­els tree struc­ture. Li­braries con­tain pack­ages. Pack­ages con­tain class­es. Class­es con­tain meth­ods. As with the file struc­ture, some class­es and meth­ods will fit per­fect­ly well with­in the ex­is­tent struc­ture, while oth­ers will tend to be­come mar­gin­als.

When the class­es or meth­ods which don't find their way in the cur­rent code struc­ture are la­beled as “util­i­ty,” this sim­ply means that the team was un­able to adapt the struc­ture to the new­com­er. As I said at the be­gin­ning of this ar­ti­cle, there may be mul­ti­ple rea­sons for that, but in­de­pen­dent­ly of the rea­son, the pres­ence of such class­es or meth­ods is in­dica­tive of a lega­cy struc­ture which wasn't refac­tored over time. This shouldn't hap­pen in a cor­rect­ly main­tained pro­ject.

An­oth­er pos­si­bil­i­ty is that the thing which was la­beled as util­i­ty was ac­tu­al­ly put in the right place with­in the code hi­er­ar­chy, but just in­cor­rect­ly named. This, in­deed, hap­pens a lot. There are two rea­sons for that:

“Util­i­ty” as a li­brary shared across pro­jects

Shar­ing code be­tween pro­jects pre­vents du­pli­ca­tion, which is a Good Thing™ in any sit­u­a­tion. And so, com­pa­nies cre­ate util­i­ty li­braries which grow, and grow, and grow over time. Their growth is usu­al­ly hor­i­zon­tal; in oth­er words, new com­po­nents are added to a col­lec­tion of old ones. All com­po­nents be­ing rel­a­tive­ly in­de­pen­dent—there is re­al­ly few things in com­mon be­tween se­ri­al­iz­ing an ob­ject to JSON, log­ging an event, read­ing a file and cre­at­ing a trans­ac­tion over a net­work—the term “util­i­ty” is of­ten used by the lack of a bet­ter al­ter­na­tive (per­son­al­ly, I pre­fer “Ali Baba,” but since I re­cent­ly ex­plained that cul­tur­al ref­er­ences have no place in our code, I'm afraid that the term won't stick well with­in the com­mu­ni­ty.)

The prob­lem here is that as the li­brary grows, it be­comes more and more cum­ber­some. In­stead of ex­plain­ing the con­cept it­self, I will bare­ly give two ex­am­ples from two ecosys­tems.

This mess is ex­act­ly what com­pa­nies tend to pro­duce when they cre­ate the “util­i­ty” pro­ject for every­thing which is used by two or more pro­jects.

The so­lu­tion? Cre­ate sep­a­rate pack­ages or li­braries. If two pro­jects share code which pro­duces EDI in­voic­es, cre­ate an ED­I­FACT pack­age for that. Name it ED­I­FACT. The next time when you cre­ate code which gen­er­ates se­cure pass­words based on some fan­cy cri­te­ria, cre­ate a sep­a­rate pack­age for that and name it ac­cord­ing­ly. Don't mix it with EDI un­der the “util­i­ty” um­brel­la, be­cause de­vel­op­ers who need to gen­er­ate in­voic­es may not nec­es­sar­i­ly need to gen­er­ate pass­words, and de­vel­op­ers who work with pass­words may nev­er hear about ED­I­FACT.

The abil­i­ty to run pri­vate npm, pypi or NuGet repos­i­to­ries means that you don't even need to han­dle de­pen­den­cies your­self, even for pro­jects which are not open sourced. So for your own's good, please, don't put to­geth­er things which have noth­ing to do: keep them sep­a­rate, and reuse the ones you need to, when you need to.