Ascending and descending representations of quality

Arseni Mourzenko

Founder and lead developer

179

articles

November 6, 2014

There are many projects which start with very low expectations of the quality of the source code and project organization. The idea is to get something which simply works, start to sell it, and then, either continuously or in one brutal step, refactor the code to achieve better quality and reorganize what should be reorganized at project level. In practice, this doesn't really work like this.

Code bases tend to degrade over time. It applies to a code base of a product written by a team of inexperienced programmers who don't care about the product; it applies as well to a code base maintained by experienced team working in Agile environment in a context where strict standards are required and enforced nearly in real time. What differentiates those two examples is that in the first one, the quality, being already much lower at the beginning, will degrade much faster.

There are multiple factors which accelerate or slow down the decay. Some, such as coding standards enforced at commit, or such as regression testing, help at slowing down the decay continuously. Others, such as code reviews or refactoring increase the quality at a precise moment of time, giving some margin for the further degradation.

Improving an already bad project is extremely complicate.

Firstly, the resistance against change may come from several places:

Developers themselves. Changing any habit, good or bad, is hard. When you're accustomed at seeing a project as something which doesn't deserve care, much should be done to convince you that from this moment on, the quality is the first priority of the project.
Business people. A team released a product. It works, and it brings money to the company. Would it be more interesting to work on new exciting features, or to spend months rewriting what already works?
Customers. They want the product to work correctly. They want new features. They don't care about code coverage or about Continuous integration, nor they care about the price it costs to the company to fix something which was originally built broken.

Secondly, introducting means intended to lead to higher quality in the middle of the project may be painful. At Pelican Design & Development, it is impossible to commit Python code which is not PEP 8 compliant. When doing extensive changes, it is not unusual to see five - ten errors during commit. It hurts, especially when the commit is done late at night and all you want is to go to bed. While it hurts, it's still relatively easy to correct even ten errors, most being simply related to identation whitespace in empty lines. By comparison, I have a project I've written in C# four years ago. When working on it, I was unaware about StyleCop. Today, doing StyleCop scan stops at 1 000 warnings: that's the limit used by default. Comming back now and fixing all those coding standard errors would be not only time consuming, but also risky: such monotonous work easily leads to human errors, and having to solve newly introduced bugs on a legacy project is out of question.

Imagine one introduces coding standards in the middle of the project. Are they enforced at commit?

If yes, this leads to a serious problem. With thousands and thousands of errors, the team should stop working, and spend days solving those errors. Then days solving the newly introduced errors.
If no, nobody would care. There is a rule: anything which demands an effort and which is neither measured, nor enforced, won't be done. Nobody cares about tabs vs. spaces, unless you can't commit a piece of source code which uses tabs. Nobody cares about indentation, unless the IDE deals automatically with it. And nobody cares about comments, unless code which is not commented enough is rejected during code review.

About that, I recently had a case where there was an SQL script which contained some data to insert. This SQL script was used by a PowerShell script during deployment. It was essential to keep it readable, and aligning values to form some sort of visual columns improved readability greately. The alignement was done with spaces, but given that the file was modified by several persons using different editors, tabs quickly appeared, creating a mess.

There was one effective way to prevent tabs. I added a control in PowerShell script which, if tabs were found, stopped and alerted the user that there is a tab on a given line. The first days, there were questions such as:

— The PowerShell script doesn't work!
— What's the problem with it?
— It shows an error.
— Which error?
— That there is a... wait, I think I can fix it myself.

Then, the problem disappeared forever, and everything in the SQL script was always aligned perfectly well since then.

Conclusion: start with the highest possible expectations; if you don't do it right now, you won't do it later. Then, stick with them as long as possible, using automated enforcement when possible.