Elaboration through sharing

Arseni Mourzenko
Founder and lead developer
170
articles
December 13, 2014
Tags: stack-exchange 4 communication 26 short 50

The risk of work­ing alone is that one may start to have a sim­pli­fied vi­sion of a prob­lem, and not be­ing able to see be­yond this sim­ple mod­el. This hap­pens to me a lot; to­day, I have an ex­cel­lent il­lus­tra­tion of the prob­lem.

Every­thing start­ed with the ques­tion How should I han­dle log­ger fail­ures? The au­thor of the ques­tion was won­der­ing how to deal with ex­cep­tions which oc­cur with­in the ex­cep­tions log­ger it­self.

Easy peasy, I had to han­dle it for dozens of pro­jects be­fore, in­clud­ing with the lega­cy in-house log­ging and re­port­ing sys­tem which col­lect­ed ex­cep­tions from serv­er ap­pli­ca­tions and ser­vices and al­lowed de­vel­op­ers to process the list of ex­cep­tions and deal with each one. When log­ging it­self failed (for ex­am­ple be­cause data­base was down), the ex­cep­tions were stacked lo­cal­ly on-disk and re­port­ed lat­er when the log­ger was work­ing again. An ex­cep­tion with­in the log­ger was en­com­pass­ing the orig­i­nal­ly re­port­ed ex­cep­tion through In­nerEx­cep­tion. There were tests for the fall­back to en­sure it works as ex­pect­ed, so I was con­fi­dent that the ap­proach is good enough for any busi­ness app.

That was what my an­swer on Stack Ex­change was about.

Then, Jon Raynor added a dif­fer­ent view of the sub­ject. Es­pe­cial­ly, he men­tioned two things I nev­er thought of:

  1. The dis­tinc­tion be­tween crit­i­cal and non-crit­i­cal log­ging. In a case of crit­i­cal log­ging, the ap­pli­ca­tion should sim­ply stop, which, in­deed, is the only ac­cept­able so­lu­tion. Since I was work­ing ex­clu­sive­ly with ap­pli­ca­tions with non-crit­i­cal log­ging, I nev­er thought about the dif­fer­ence.

  2. The im­por­tance of log mes­sages fre­quen­cy. In oth­er words, if an ap­pli­ca­tion was re­port­ing in av­er­age 5 mes­sages per minute for the last six months, but haven't re­port­ed any mes­sage for the last two days, chances are some­thing is wrong with the ap­pli­ca­tion or the log­ging.

    I could have grasped that as­pect since one of the vi­su­al­iza­tions I of­ten use for log­ging is the chart show­ing the num­ber of mes­sages per minute or hour. But, in­deed, the pri­ma­ry goal of those charts was to re­act quick­ly if there is a sud­den peak. This hap­pened twice when Con­tin­u­ous De­ploy­ment pushed in pro­duc­tion an app which wasn't test­ed enough, which caused thou­sands of er­rors to hit the log­ging plat­form in the next few min­utes.

Then, Aaronaught com­ment­ed my ques­tion, adding two oth­er things I was nev­er re­al­ly think­ing about:

Now that's re­al­ly dis­turb­ing, be­cause I was con­sid­er­ing that I thought enough about log­ging ex­cep­tion­al be­hav­ior with­in the log­ger it­self, but I nev­er took in con­sid­er­a­tion those two things (nor that I knew what ex­po­nen­tial back­off is be­fore look­ing Wikipedia).

Now, the next time I im­ple­ment log­ging, I'll have at least three things to re­con­sid­er and do dif­fer­ent­ly.

This is a good ex­am­ple of the rea­son why is it so cru­cial to share own ideas and so­lu­tions with oth­ers and see what they think. No mat­ter how con­fi­dent a per­son is that his ap­proach is fine, he prob­a­bly missed a lot of as­pects and de­tails.