Reviewing code through dialogue
Recently, I tried a new technique which mixes pair programming and code review. I don't know if this technique has a name, and I don't even know if it is used by somebody else—sure, I haven't seen it elsewhere.
My colleague and I picked a piece of code he worked on. As he's a junior programmer, unfamiliar with many of the details of C# and .NET Framework, there were two goals in what I did. First, I wanted to elucidate the misconceptions and misunderstandings he has with the technologies he tries to use. Second, and much more important, I wanted to show him what exactly should he put focus on, how should he think, when he codes, what questions should he ask to himself. I miserably failed at both points; perhaps because it was new for me, likely too because he wasn't ready for this sort of things. Nevertheless, I believe that the technique can be very effective in some cases—it would take me some time to identify, what cases exactly, and what are the limits.
It goes like this. We take one line of code—not necessarily the first one; it may be anything anywhere in the newly written code—and then we talk about it. Say, we take this line of code:
// Counting opportunities which are active. int counter = this.Opportunities .Where(opportunity => opportunity.Enabled) .ToList() .Count();
When I see this, I have some questions which pop in my head. So I start with the first one, and ask my colleague:
— What do you think
ToList() is doing?
— No idea.
— Do you know the difference between collections and sequences in C#?
— I don't think so.
— Do you know about lazy evaluation?
— I do, I use it all the time in Python, you know, the square brackets and the parenthesis...
— Exactly. So you do have the same thing in C#, but it is a bit implicit. There are sequences, which are based on an iterator, and there are collections, which have their elements stored in-memory. Some operations in LINQ expressions are lazy and return a sequence; others evaluate the underlying sequence, and return a collection. Others, finally, evaluate a sequence and return something else—a number, a boolean, whatever. The
ToList() returns a collection. Do we really need that?
— I imagine that we don't.
— Don't imagine. If you think we don't need it, just explain why.
— Because it would store the elements in memory, just in order to count them later on.
— Exactly. As
ToList() is directly followed by
Count(), there is absolutely no need to do that, and it wastes memory, and CPU cycles.
Count() can simply iterate through the elements of the sequence, and get to the same result faster, with less memory being used. Let's change our code.
// Counting opportunities which are active. int counter = this.Opportunities.Where(opportunity => opportunity.Enabled).Count();
— Okay, so you have a
Where() clause followed by a
Count(). Do you see how we could compact this?
— I don't know.
— Do you know the overloads of
— Is the term “overload” clear?
— Yes, of course.
— What does it mean?
— It means that a function can have optional arguments.
— Not arguments, but parameters; and it's not about optional parameters per se, but rather about the ability to have multiple methods that share the same name, but differ by their parameters, allowing the compiler to figure out which one of the overloads to call at a given moment. So, how do you know what are the available overloads of a method in C#?
— I suppose I can get this from the documentation.
— You suppose?
— I can get this from the documentation.
— Please, do.
— Let's search on the internet...
— Is there an easier way?
— You have your Visual Studio opened. Wouldn't it show you the overloads?
— Oh, sure. Let's check. So, there is a parameterless one, and another one which seems to ask for... a predicate?
— Seems? You don't look sure. It is because you don't know what a predicate is?
— Yeah, the term is a bit unclear.
— A predicate is a condition which needs to be fulfilled. In LINQ, the term predicate is used to denote that only the elements of a sequence which do match a specified condition will pass to the next step of the expression—or be part of the final result. When you used
Where() in the expression, you actually specified a predicate. Now, it happens that
Count(), too, accepts a predicate. Do you remember what was our original question we were asked ourselves?
— I don't think so...
— It was whether the expression can be simplified or not. Do you see a way now?
— I could put the predicate into the
— And get rid of
— Is this a question?
— And I could get rid of
— Terrific. Let's do it.
// Counting opportunities which are active. int counter = this.Opportunities.Count(opportunity => opportunity.Enabled);
opportunity look appealing to you?
— It has a merit of being clear.
— Sure it is. My personal taste is to use short names for narrow scopes. Essentially, here, I would go with an ugly and unclear
x, with its sole benefit of being really short. But it's up to you to pick the best solution.
— I think it's a good idea to use short names. Let's go with
// Counting opportunities which are active. int counter = this.Opportunities.Count(x => x.Enabled);
— Nice. What's the type of the return value of
— It's the number of opportunities marked as enabled.
— My question is rather about the type...
— Right. An integer, I would imagine?
— How could you check whether your belief is correct or not?
— I don't know...
— Visual Studio could provide some help. What happens if you move your cursor right on
— Oh, I see, it's an integer.
— In this case, you don't need an explicit
int, and can use
var, that the compiler would replace by
// Counting opportunities which are active. var counter = this.Opportunities.Count(x => x.Enabled);
— Lovely. Do you think the comment is valuable?
— Of course.
— Please, continue.
— Well, it tells what the code is doing.
— Is this the goal of a comment?
— What would it be?
— Well, let's think together about it. If a comment repeats what the code is doing, what would you learn about the code when reading it? After all, you could simply read the actual code.
— Why should a comment teach me something?
— If it doesn't, what is its purpose?
— To help me understand the code.
— To help you understand the code. Interesting. Is this the case where the comment helps you to understand the code?
— Well... I'm afraid, it doesn't. Let's remove it.
— Wait, wait, wait. First things first, let's read it again. “Counting opportunities which are active.” Active? Really? It is not exactly what the code is doing, as it checks for the opportunities which are enabled, not active. What's the difference?
— Enabled opportunities are the ones which are active in the GUI.
— So it's the term “active” which is being used in the user interface?
— Yes, the business analysts asked us to use this precise term, as it is more explicit to the users.
— And to us?
— What to us?
— Is it more explicit to us, developers?
— I think it sure is, as “enabled” refers to whether the opportunity is available or not, whereas “active” means that it is possible to use it at this given moment—at least this is how the business analysts, and I, understand it.
— Honestly, the distinction is unclear for me, but if it is clear for you, and if you think that “active” is the correct term here, let's rename the property. And now that we did it, the comment can be removed, as it became truly redundant.
var counter = this.Opportunities.Count(x => x.Active);
As you can see, several iterations were needed in order to go from a piece of code which has quite a lot of issues, to a piece of code which is nearly perfect—I leave to you the job of deciding whether
counter is a correct name for a variable or not. Anyway, the example shows not only how the code can be improved, but more importantly how the knowledge and the beliefs of a person can be challenged, while giving a learning opportunity. It is not just about knowing things: the essential part here is to train the person to work in a given way, to leverage the tools he has, to know what to check, to understand the code he writes.
An ordinary code review would be shorter, but much less effective. Never use
Count(). Don't put explicit types when they match the ones which would be deduced by the compiler. Use short names in tiny scopes. Those are statements that do give an opportunity to learn, but not that much; moreover, they don't even try exploring the limits of knowledge of the author of the piece of code under review.
During the hiring interviews, I often launch a discussion, even on questions where a simple answer is required. Even a nasty:
What is the value of
var n = Enumerable.Range(1, 10).Count(x => x % 2 == 0);?
Could make it possible to talk for quite a long time about the explicitness of the parameters (right, here, it's not about talking, but more about ranting, as this is a perfect example where a confusion is really easy to do, which would make one believe that the return value is 4, rather than 5), the different operands available in C#, the magnificent beauty of LINQ with its method chaining, the way method chaining is implemented under the hood, the importance of extension methods, the dangers of custom extension methods on common types and interfaces, such as
IEnumerable<T>, and dozens and dozens of other subjects which could progressively be derived from a simple piece of code. And, actually, it's those discussions which always allowed me to have a clear understanding of the level of a given programmer, rather than the actual answers.
I do believe that the technique I described here could be used in lieu of ordinary training courses, as soon as the environment allows it. It could be a valuable approach in mentoring.
For now, one thing that I find unfortunate about this technique is how demanding it is towards the junior programmer. It's its psychological impact which should be taken in account, as not everyone can accept that the limits of his knowledge would be brought up so clearly. I think that it could be very demoralizing for programmers who lack self-confidence, or who simply don't have a good enough relation with the more skillful developer.
Another drawback, which is inherent to myself and the persons who have the tendency to work in the same way I do, is the attention to the exact words, and a further analysis of those words. Its effect can be compared to the effect of psychoanalysis, where everything you say could be scrutinized, and you may not exactly be willing to think about the exact words you use. This creates a sort of incompatibility with the persons who don't think too much about the words they use, as well as the persons who are in general not very affirmative, not convinced about the things they say. If one is not careful, this could quickly shift the communication from the actual code to the way things are getting formulated. While being interesting from a linguistic perspective, this has no value—and no real place when working on code, and only distracts both persons from the subject. With the colleague, I have found myself in such situation during our session. The fact is, my colleague has this habit of asserting something, while immediately backing up.
— Was Monday's issue fixed in the last release?
— Sure it was, absolutely. Well, I think so. I'm not sure, maybe we forgot.
In some circumstances, this is extremely annoying and dangerous. When you're in a middle of a crisis because the product is not working any longer, and the business is losing money, it is essential to be factual. It's really not about what somebody thinks of believes. Either you assert something you can verify—or that you know for sure—or you don't. When it's the code which is being reviewed, commented, and modified, imprecise language is possibly less of an issue, as it's not an issue to call a method a “function,” or to mix parameters and arguments—no need to be overly pedantic.
This being said, the exact formulations and words being used by a junior programmer do still matter, as it could lead to a discovery of some misunderstandings, that would be difficult to spot otherwise. For now, it remains unclear how one would do that, while avoiding to appear pedantic, or just an unlikable person.