Comments classification
This is a bad comment:
int i = 0;
i + 1; // adding 1 to i should work but dont work i dont know why!!!!
print(i);
For every good comment, I've seen hundreds of bad comments. They harm the code base and give the wrong impression to the author that he have made the code better. The following classification may help those programmers to understand why they should avoid writing most of their comments. Once they understand that, learning to write good comments wouldn't be too difficult.
Name | Harm level | Steps to remove | Indicative of programmer level |
---|---|---|---|
Extract-friendly comments | harmful | refactoring | beginner |
Ownership comments | harmful | immediate removal | newbie |
Misleading comments | harmful | immediate removal | N/A |
Redundant comments | annoying | immediate removal | newbie |
Code comments | annoying | immediate removal | newbie |
Naming comments | harmful | refactoring | beginner |
Lolcat comments | annoying | immediate removal | beginner |
Drunken man comments | harmful | immediate removal | newbie |
Prose comments | annoying | reformulation or removal | N/A |
I have a problem comments | annoying | reformulation or removal | N/A |
Explanatory comments | none | none | none |
Extract-friendly comments
A few months ago, I was talking with a beginner programmer who told me that he was proud of his recent progress, one of the enhancements to his code being the fact that he started to put more comments.
I've seen his code. It looks like this:
public class Product
{
// Other properties and methods.
public string ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, bool includeRanges,
List<PricingRange> pricingRanges, DateTime overrideStartTime)
{
// Let's get the base price.
<<Ten to twenty lines of code which compute the base price; no comments>>
if (includeRanges && overrideStartTime != DateTime.MinValue)
{
// The price depends on the princing ranges. Let's calculate it.
foreach (PricingRange r in pricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
<<Thirty lines of difficult to understand code; no comments>>
}
}
}
else if (includeRanges && overrideStartTime == DateTime.MinValue)
{
// Use the start time instead of now.
foreach (PricingRange r in pricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
<<Thirty lines of difficult to understand code which looks
like the code from the previous block, but has slight,
barely noticeable differences; no comments>>
}
}
}
// Including VAT.
<<Ten to twenty lines of code which compute VAT; no comments>>
if (includeRebate)
{
// Substract the rebate.
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
}
return price;
}
Let's be fair: this programmer did include some comments. Are they useful? Sort of. Does it mean he understand basics of programming? Nah, not really.
There are many negative things to say about the code above, but I want to focus on a single thing: the lack of refactoring and how is this related to comments. When inspecting the signature and the body of the method, it becomes obvious how it grew. Originally, the method was short with few arguments; something like:
public string ComputePrice(
Customer customer, PricingOptions options, bool includeRebate)
Then there was a need to add default VAT. Then the ranges. And finally the ability to override the start time, at which time the method becomes particularly creepy.
Each addition was done without carying much about the readability. Instead of keeping the signature transparent, the arguments were just appended to the others. Today, without looking at the code, there is no way to even try to guess what overrideStartTime
is used for. The same careless, uncontrolled organic growth is clearly visible in the body of the method. It looks like when the programmer added overrideStartTime
, the code was already particularly unreadable, which explains (but not excuses) the ugly copy-paste of thirty LOC followed by small changes in the copied code.
Let's do the basic refactoring work which should have been done from the beginning. The first refactoring would consist of extracting the parts of the method. This is when I should admit that actual comments are not totally useless: they show pretty well the locations where the code should be cut out. After the refactoring, those comments could be removed with no effect rather than cleaning up the code base.
public class Product
{
// Other properties and methods.
public string ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, bool includeRanges,
List<PricingRange> pricingRanges, DateTime overrideStartTime)
{
// Ugly type here. We can't do anything with it for the moment, since
// both the decimal and the string are used later when including the VAT
// and the rebate.
Tuple<decimal, string> basePrice = this.ComputeBasePrice();
if (includeRanges && overrideStartTime != DateTime.MinValue)
{
// The price depends on the princing ranges. Let's calculate it.
foreach (PricingRange r in pricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
basePrice = this.ComputeBasePriceWithinRange(
customer, options, pricingRanges);
}
}
}
else if (includeRanges && overrideStartTime == DateTime.MinValue)
{
// Use the start time instead of now.
foreach (PricingRange r in pricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
basePrice = this.ComputeBasePriceWithinRange(
customer, options, pricingRanges, overrideStartTime);
}
}
}
Tuple<decimal, string> priceWithTaxes = this.IncludeVAT(
basePrice, customer, options, defaultVAT);
if (includeRebate)
{
return this.IncludeRebate(priceWithTaxes.Item1, customer);
}
return priceWithTaxes.Item2;
}
private Tuple<decimal, string> ComputeBasePrice()
{
<<Ten to twenty lines of code which compute the base price; no comments>>
}
private Tuple<decimal, string> ComputeBasePriceWithinRange(
Customer customer, PricingOptions options, List<PricingRange> pricingRanges)
{
<<Thirty lines of difficult to understand code; no comments>>
}
private Tuple<decimal, string> ComputeBasePriceWithinRange(
Customer customer, PricingOptions options,
List<PricingRange> pricingRanges, DateTime overrideStartTime)
{
<<Thirty lines of difficult to understand code which looks like the code
from the overload, but has slight, barely noticeable differences; no
comments>>
}
private Tuple<decimal, string> IncludeVAT(
Tuple<decimal, string> price, Customer customer, PricingOptions options,
decimal defaultVAT)
{
<<Ten to twenty lines of code which compute VAT; no comments>>
}
private string IncludeRebate(decimal price, Customer customer)
{
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
}
Those thirty lines of duplicated code in ComputeBasePriceWithinRange
is really annoying. Hopefully, refactoring can solve that.
public class Product
{
// Other properties and methods.
public string ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, bool includeRanges,
List<PricingRange> pricingRanges, DateTime? overrideStartTime)
{
// Ugly type here. We can't do anything with it for the moment, since
// both the decimal and the string are used later when including the VAT
// and the rebate.
Tuple<decimal, string> basePrice = this.ComputeBasePrice();
if (includeRanges)
{
foreach (PricingRange r in pricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
basePrice = this.ComputeBasePriceWithinRange(
customer, options, pricingRanges, overrideStartTime);
}
}
}
Tuple<decimal, string> priceWithTaxes = this.IncludeVAT(
basePrice, customer, options, defaultVAT);
if (includeRebate)
{
return this.IncludeRebate(priceWithTaxes.Item1, customer);
}
return priceWithTaxes.Item2;
}
private Tuple<decimal, string> ComputeBasePrice()
{
<<Ten to twenty lines of code which compute the base price; no comments>>
}
private Tuple<decimal, string> ComputeBasePriceWithinRange(
Customer customer, PricingOptions options,
List<PricingRange> pricingRanges, DateTime? overrideStartTime = null)
{
<<Thirty lines of difficult to understand code; no comments>>
}
private Tuple<decimal, string> IncludeVAT(
Tuple<decimal, string> price, Customer customer, PricingOptions options,
decimal defaultVAT)
{
<<Ten to twenty lines of code which compute VAT; no comments>>
}
private string IncludeRebate(decimal price, Customer customer)
{
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
}
Another particularly annoying thing is the lack of appropriate types, which became only worse with the introduction of tuples during the refactoring.
public class Product
{
// Other properties and methods.
public Price ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, RangeOptions rangeOptions)
{
var basePrice = this.ComputeBasePrice();
if (rangeOptions.IncludeRanges)
{
foreach (PricingRange r in rangeOptions.PricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
basePrice = this.ComputeBasePriceWithinRange(
customer, options, pricingRanges,
rangeOptions.OverrideStartTime);
}
}
}
var priceWithTaxes = this.IncludeVAT(
basePrice, customer, options, defaultVAT);
return includeRebate ?
this.IncludeRebate(priceWithTaxes, customer) :
priceWithTaxes;
}
private Price ComputeBasePrice()
{
<<Ten to twenty lines of code which compute the base price; no comments>>
}
private Price ComputeBasePriceWithinRange(
Customer customer, PricingOptions options,
List<PricingRange> pricingRanges, DateTime? overrideStartTime = null)
{
<<Thirty lines of difficult to understand code; no comments>>
}
private Price IncludeVAT(
Price price, Customer customer, PricingOptions options, decimal defaultVAT)
{
<<Ten to twenty lines of code which compute VAT; no comments>>
}
private Price IncludeRebate(Price price, Customer customer)
{
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
}
Do all those methods need to know what they know from the parameters? Probably not.
public class Product
{
// Other properties and methods.
public Price ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, RangeOptions rangeOptions)
{
var basePrice = this.ComputeBasePrice();
if (rangeOptions.IncludeRanges)
{
foreach (PricingRange r in rangeOptions.PricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
basePrice = this.ComputeBasePriceWithinRange(
customer.RegisterDate, options.EnableVipStatus,
pricingRanges, rangeOptions.OverrideStartTime);
}
}
}
var priceWithTaxes = this.IncludeVAT(
basePrice, customer.vatOverride, options.taxMap, defaultVAT);
return includeRebate ?
this.IncludeRebate(priceWithTaxes, customer.RetrieveRebateFor(this)) :
priceWithTaxes;
}
private Price ComputeBasePrice()
{
<<Ten to twenty lines of code which compute the base price; no comments>>
}
private Price ComputeBasePriceWithinRange(
DateTime customerRegisterDate, bool enableVipStatus,
IEnumerable<PricingRange> pricingRanges, DateTime? overrideStartTime = null)
{
<<Thirty lines of difficult to understand code; no comments>>
}
private Price IncludeVAT(
Price price, decimal customerVAT, TaxMap vatMap, decimal defaultVAT)
{
<<Ten to twenty lines of code which compute VAT; no comments>>
}
private Price IncludeRebate(Price price, Rebate customerRebate)
{
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
}
The ComputePrice
method is now readable enough for this article, but let's just push the refactoring a bit further by introducing a builder pattern.
internal class PriceBuilder
{
public PriceBuilder(Product product)
{
this.
}
public FindBasePrice()
{
<<Ten to twenty lines of code which compute the base price; no comments>>
}
public FindBasePriceWithinRange(
RangeOptions rangeOptions, DateTime customerRegisterDate,
bool enableVipStatus, IEnumerable<PricingRange> pricingRanges,
DateTime? overrideStartTime = null)
{
if (!rangeOptions.IncludeRanges)
{
return this;
}
foreach (PricingRange r in rangeOptions.PricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
<<Thirty lines of difficult to understand code; no comments>>
}
}
}
private PriceBuilder IncludeVAT(
decimal customerVAT, TaxMap vatMap, decimal defaultVAT)
{
<<Ten to twenty lines of code which compute VAT; no comments>>
}
private PriceBuilder IncludeRebate(Rebate customerRebate)
{
if (customerRebate)
{
return this;
}
<<Ten to twenty lines of code which substract the rebate; no comments>>
}
private Price Build()
{
// Return the price.
}
}
public class Product
{
// Other properties and methods.
public Price ComputePrice(
Customer customer, PricingOptions options, bool includeRebate,
decimal defaultVAT, RangeOptions rangeOptions)
{
var builder = rangeOptions.IncludeRanges ?
new PriceBuilder().FindBasePriceWithinRange() :
new PriceBuilder().FindBasePrice();
return builder
.IncludeVAT(customer.vatOverride, options.taxMap, defaultVAT)
.IncludeRebate(includeRebate ? customer.RetrieveRebateFor(this) : null)
.Build();
}
}
If we look at the way the base price is computed and think twice about object oriented programming, we can't avoid noticing that this is the classical case where if logic can be replaced by inheritance. For that, we make PriceBuilder
abstract and create two children: OrdinaryPriceBuilder
and RangePriceBuilder
. The chain can then be replaced by:
return builder
.FindTaxFreePrice()
.IncludeVAT(customer.vatOverride, options.taxMap, defaultVAT)
.IncludeRebate(includeRebate ? customer.RetrieveRebateFor(this) : null)
.Build();
simplifying even more the code.
The ugliest part is now the following piece of code:
foreach (PricingRange r in rangeOptions.PricingRanges)
{
if (r.Start >= DateTime.Now && r.End <= DateTime.Now)
{
<<Thirty lines of difficult to understand code; no comments>>
}
}
This is solvable as well. First, PricingRange
may include a method Contains
which takes a date as its single parameter. With correctly renamed variables, the code now becomes:
foreach (PricingRange range in rangeOptions.PricingRanges)
{
if (range.Contains(DateTime.Now))
{
<<Thirty lines of difficult to understand code; no comments>>
}
}
The property IsCurrent
may then be created within the PricingRange
:
public bool IsCurrent
{
get
{
return this.Contains(DateTime.Now);
}
}
and, unless the code was object of profiling and optimization, the further readability enhancement should be done by removing one level of indentation:
foreach (PricingRange range in rangeOptions.PricingRanges.Where(r => r.IsCurrent))
{
<<Thirty lines of difficult to understand code; no comments>>
}
Optionally, one can push this even further, by encompassing the list of pricing ranges within a class such as class PricingRanges : IEnumerable<PricingRange>
, and create a specific method which selects only the current ranges. The foreach
then becomes:
foreach (PricingRange range in rangeOptions.PricingRanges.Current)
{
<<Thirty lines of difficult to understand code; no comments>>
}
What we obtain now? A mix between clear, object oriented code, and blocks of ten to thirty lines of not always obvious code. Each of those blocks should be refactored in the same way we did above. There are few chances to have a few lines which are too obscure and practically impossible to make better through refactoring. Then and only then, comments may be used to enhance the readability.
We have seen on this example one type of comment: the one which tries to explain what the following block of code is doing, but mostly cries for refactoring. I call it extract-friendly comment. What are other types of comments?
Let's study the method which subtracts the rebate in the following sections.
private PriceBuilder IncludeRebate(Rebate customerRebate)
{
if (customerRebate)
{
return this;
}
var result = this.Clone();
foreach (var productRebate in this.Product.Rebates)
{
// Ensure the rebate applies to this product.
if (productRebate.Id == customerRebate.Id)
{
// Ensure the rebate is gold and applies globally.
if (productRebate.AppliesGlobally ||
(customerRebate.IsGold &&
customerRebate.CanEnlistFor(this.Product)))
{
var errors = customerRebate.Enlist(this.Product);
if (!errors.Any())
{
//rebate.MarkAsApplied();
//if (!rebate.IsApplied)
//{
// throw new Exception("Not working!");
//}
result.Price.Amount -= customerRebate.Amount;
}
}
else
{
// Changed on 2013-10-08 by Jeff.
if (this.Product.CanAcceptExceptionalRebates() &&
(customerRebate.IsSilver || customerRebate.IsGold))
{
// In the current case, the rebate can apply to the products
// with the price superior or equal to $20.
if (result.Price.Amount > customerRebate.Amount &&
result.Price.Amount > 20)
{
// Substracting the amount.
result.Price.Amount -= customerRebate.Amount;
}
}
}
}
}
return result;
}
Ownership comments
The comment // Changed on 2013-10-08 by Jeff.
is an ownership comment. Instead of helping understanding the code, it pollutes it with useless information and shows that the author doesn't understand or doesn't use version control.
If the author doesn't use version control, there is no point talking about usefulness of comments: there are more important things to do, i.e. starting to use tools which are mandatory for every team of one or more programmers. If the author uses version control, this comment is either redundant (the log already shows the author and the date), or wrong (for example if the code was copy-pasted by Nick in January 2014).
Such comments are harmful and should be killed immediately.
Misleading comments
Some comments may simply be wrong. They are particularly harmful, since they can lead to bugs and force programmers to spend more time reading code than they need to. This is the case of the comment // Ensure the rebate is gold and applies globally.
. The condition which follows doesn't check for that. It checks whether the product rebate applies globally, or the customer rebate has a gold status or can be enlisted for the specific product. This has nothing to do with the comment.
The comment // In the current case, the rebate can apply to the products with the price superior or equal to $20.
falls into this category as well, and is even more dangerous. While it's easy to see that the first one is misleading, it takes some time to notice the slight mistake in the second one: the price is not superior or equal, but just superior. Comments which contain subtle errors are the most dangerous.
Such comments are harmful and should be killed immediately.
Redundant comments
Lots of comments I see when inspecting code bases are simply repeating code. Even when they don't fall into the previous category, they add too much noise to deserve any screen space.
// Subtracting the amount.
, for example, is a redundant comment. It doesn't bring anything useful, because any programmer who knows the meaning of -=
understands what the following line is doing. The comment // In the current case, the rebate can apply to the products with the price superior or equal to $20.
, aside being misleading, is redundant as well. It tries to prove its usefulness by specifying the currency, but would become completely useless after basic refactoring which will replace the magical 20
by either a property or a constant such as MinProductPriceForExceptionalRebate
.
Such comments are useless and make too much noise. They should be removed.
Code comments
Commented code is another source of noise currently seen in the code of newbie programmers.
//rebate.MarkAsApplied();
//if (!rebate.IsApplied)
//{
// throw new Exception("Not working!");
//}
Is problematic for several reasons:
It doesn't help understanding the rest of the code.
Nobody knows why this piece of code was kept.
The information is redundant with version control.
The code is obsolete and cannot be used anyway. In the current case, it also has important issues, which may indicate that the code was only temporary, but may also mean that the author doesn't know what he does.
Commented code usually wastes an important amount of place. I've seen blocks of code spanning hundreds of lines.
Such comments should be treated similarly to redundant comments: be removed.
Naming comments
The other comments which invite refactoring are the comments which are here to explain a variable or a method with a name which is not clear enough. The refactoring is obvious: name the variable or the name differently and remove the comment.
Example:
// Finding the highest rebate for the current product to be used as primary rebate.
var pr = this.BestMatch();
should be refactored into:
var primaryRebate = this.FindHighestRebate();
Lolcat comments
Some comments, while being written with a good intention, are simply too badly written to make the code easier to read.
I already gave a few examples of such comments in my answer to Should sanity be a property of a programmer or a program? question. In general, those comments include:
Excessive punctuation, especially the exclamation mark. I'm happy for the original author to be so excited about the comment, but I would appreciate more if he took more care writing the comment itself.
Example:
// Got the final result!!!!
All capitals or the lack of capitals when needed. This doesn't include words and names which are not obvious to write (is it jQuery or JQuery? Is it LESS and Sass or Less and SASS?), but is more about a complete lack of attention to capitalization rules.
Example:
// the transaction is about to be COMMITTED! if it fails, the counters will not be affected and the ajax call will fail with http 304.
Grammar and spelling.
Example:
// in some case there are no enouh 3 space on harddisk so file cannot save
Mix of languages. This is proper to foreign programmers. I had to read code of many French programmers and found some magnificent sentences. While many beginners don't understand that it is mandatory to know English to do programming, people who know English often use this argument to explain why are they mixing two languages together. They are not being honest; the fact is, they are too lazy to write a complete sentence in either English or French, and mixing multiple languages enables them to not making any effort to formulate the sentence in the first place.
Example:
// Le 2ème product sera traité seulement si enabled est à true.
Lolcat comments give a bad image of the code base and distract the programmers who read them. Such comments should be transformed into well formulated, correctly written pieces of text or removed.
Drunken man comments
Drunken man comments are similar to lolcat comments, but combine usually all the elements of lolcat comments inside a badly constructed, usually meaningless sentence. It looks like the author was writing it at 3 AM and is not clear why he did it or for whom.
Sometimes, those comments are fruit of people who really don't speak English. Sometimes, they are indeed written at 3 AM after six bottles of beer.
Example: //not doin' but since same value it still swap!!
Such comments are harmful and should be killed immediately.
Prose comments
Some programmers are too afraid writing lolcat comments, so they do the opposite. They start telling too much, instead of being concise.
Example:
// Since both values (the value of “original” and the value of “current”) can be compared together, we can start comparing them in order to know if the value was changed since the last time the entity was loaded. This comparison will work correctly at this point, since we already compared the types before, which means that we just have to compare the values themselves. Once this is done, we can decide whether we should continue or not.
A presence of a comment indicates that the piece of code is not as clear as it could be, i.e. a flaw. There is no need to focus the attention of the reader on this flaw by making long comments. The comment should be as short as possible: it should contain all the information the reader could need right now, but only this information. Any additional verbosity is harmful.
In the same way, there is no need to use figures of speach or rich vocabulary.
If a programmer wants to show things he learnt taking writing lessons, he can share his thoughts in a blog post or a book. Comments are not a good place for that, and anything not essential and not constructive should be removed.
I have a problem comments
A few programmers don't write prose or poetry between lines of code, but still love talking about their problems. While it might be helpful, we don't really care about their problems, so the formulation is not the best one.
The danger in those comments is that while some reflect problems within the code, others are not constructive and unrelated to code.
Example of a constructive but badly formulated comment:
// I can't find a way to make this work for values inferior or equal to 0. if (quantity <= 0) { throw new NotImplementedException(); }
This comment indicates there is an issue within the code: it doesn't work for some input values. The next programmer would find such comment helpful, and may react to it by inspecting the requirements (should negative or zero quantity be a valid input?) and eventually the code (why the code fails for values inferior or equal to zero?)
While useful, the comment may be reformulated to be more neutral:
// For values inferior or equal to 0, the code fails for an unknown reason.
This helps differentiating it better from the second type of "I have a problem" comments: inconstructive ones.
Example of an inconstructive comment:
// There is a second argument, but I don't understand what it does. this.Validate(action, 0);
This is not helpful. Nobody cares that the original author understands or not the code, and usually, the lack of understanding is visible enough through the code quality to require additional comments from the author.
Explanatory comments
The final category represents the comments which explain in a constructive and concise way things which are not explicit in the current code and can't be made explicit through easy refactoring.
Example:
// When using Unicode, strings normalized differently would be considered different, while users would expect them to be considered equal. For that reason, the strings are denormalized first.
This is the only category of comments which should be kept. Well, until the code is refactored further, or the language or framework evolves to enable more expressiveness within the source.
How to avoid bad comments in the code base?
Code reviews are a precious way to get rid of bad comments and teach beginners why those comments are bad. The primary question which should be asked during the code review when looking specifically for bad comments is:
What does the comment bring to the clarity of the code, and why would it be overwhelming to change the code in a way that it would be self-documenting?
This would lead to four statuses:
Useless comments. Those ones can simply be removed immediately with no loss of information for the readers of the code. Most categories are described below are useless comments.
Comments which can be removed after changing code. Those ones are useful right now, but won't be once the code is properly refactored. Instead of being truly useful, they are more indicative of the issues within the code itself.
Useful comments which are formulated badly. Those ones are useful in a context of a reasonably written piece of code, but their form is not appropriate for a comment. The current formulation may lead the reviewers to think that the comment is useless, but closer inspection will show that it can't be simply removed.
Useful, correctly phrased comments. Those ones are the rare comments which are short, sharp and constructive comments which bring light to an obscure piece of code which can't be refactored easily.
Aside code reviews, showing examples of bad and good comments may be helpful. I strongly recommend looking at .NET source, since it is an excellent example of high-quality comments. Inspecting different open source projects written by expert developers may be helpful as well.