Trimming strings
Regularly, I find interesting questions on Stack Overflow, that look basic at the first sight, but appear to be not that simple after all. Today, it was a question on the Russian version of Stack Overflow which got my attention. The author wants to implement, just as an experiment, a trim
function in JavaScript (which removes spaces only, ignoring all other whitespace characters). He tried to draft it, but he got unexpected results, and asked for help.
The more I thought about it, the more I loved the question. There are at least three aspects I find interesting:
- There are plenty of ways to face this problem.
- It is a perfect exercise when it comes to algorithms.
- Through this question, one can learn a lot about the practices which help solving this sort of problems.
For the sake of experiment, let's implement ourselves the piece of code, and let's do it with TDD. As I want to keep it as easy as possible, I won't use any testing framework, nor Node.js. One should be able to run the code in the browser by copy-pasting it to the console. Let's start, therefore, with a function which will help us to perform the tests:
function assertEquals(expected, actual) {
if (expected === actual) {
console.log(`Case ${actual} passed.`);
} else {
console.warn(`Case ${actual} failed: expected ${expected}.`);
}
}
One can copy this function to the browser console and play with it. Entering the same strings, it tells that the test passes. If the strings are different, a warning is shown. Lovely.
Now the actual code. What's the simplest case one can test? Right, a string which neither starts nor ends with a space. The expected result: the string itself.
function trim(text) {
return text;
}
assertEquals("abc", trim("abc"));
The code passes, so time to move to something more complex. Such as a string which starts with spaces: assertEquals("abc", trim(" abc"))
.
Being completely lazy, I certainly don't want to overthink it:
function trim(text) {
return text.split("").filter(c => c !== " ").join("");
}
Yep, that's right, let's just get rid of all the spaces. This leads to the next test, where spaces are inside the string, with the expectation that they need to be preserved: assertEquals("abc def", trim(" abc def"))
. If this was C#, I could have done a minor transformation of the code by adding a bunch of LINQ statements, and call it a day. No problem, I can either use an existent library which adds a functionality similar to LINQ, or I can draft my own function:
function* skipUntil(sequence, predicate) {
let skipping = true;
for (const element of sequence) {
if (!predicate(element)) {
skipping = false;
}
if (!skipping) {
yield element;
}
}
}
function trim(text) {
const characters = text.split("");
const matches = skipUntil(characters, c => c === " ");
return Array.from(matches).join("");
}
In production code, I would put skipUntil()
function in a separate module, and test it separately. As this is just an example, let's forget about this function, and focus on trim()
. As the function we wrote trims only the beginning of the string, let's rename it to ltrim()
, and implement trim()
as a simple call to ltrim()
—at least for now.
What's next? Right, trimming of spaces at the end of a string. Now that's where things start to be interesting. With just one test, assertEquals("abc def", trim("abc def "))
, there are multiple possibilities to make this test pass, and all involve an approach radically different from what we had for the left trim.
One possible approach is to loop through the elements and remember the last occurrence of a non-space character:
function rtrim(text) {
let i = 0;
let end = 0;
for (const c of text) {
if (c !== " ") {
end = i;
}
i += 1;
}
return text.slice(0, end + 1);
}
function ltrim(text) {
[...]
}
function trim(text) {
return rtrim(ltrim(text));
}
While it works and all four tests pass, the trim()
function is way too complex. Aside, I don't like having to loop through the whole string in order to manipulate something which is necessarily at the very end of it. Let's go for an ordinary loop instead of a for...of
statement:
function rtrim(text) {
for (let i = text.length - 1; i >= 0; --i) {
if (text[i] !== " ") {
return text.slice(0, i + 1);
}
}
return "";
}
Now that I compare ltrim()
and rtrim()
, they look very different. Sometimes this is not a problem: two algorithms which do two very different things don't necessarily need to look the same just because the name of the containing function looks the same. Here, however, I have an impression that the solution with the generators is a bit too smart for what it's worth. Having a dedicated skipUntil()
is really an overkill here, and the whole approach looks wrong. So I can rewrite it, using the very same approach as I used for rtrim()
:
function ltrim(text) {
for (let i = 0; i < text.length; ++i) {
if (text[i] !== " ") {
return text.slice(i);
}
}
return "";
}
Now, the two look exactly the same. So much that we can extract the algorithm in a common function, like this:
function sliceOn(text, from, to, step, slice) {
for (let i = from; to(i); i += step) {
if (text[i] !== " ") {
return slice(text, i);
}
}
return "";
}
const ltrim = (text) => sliceOn(
text, 0, i => i < text.length, 1, (t, i) => t.slice(i));
const rtrim = (text) => sliceOn(
text, text.length - 1, i => i >= 0, -1, (t, i) => t.slice(0, i + 1));
const trim = (text) => rtrim(ltrim(text));
I'm quite hesitant as whether this last change is an actual improvement or not. On one side, a loop with a condition inside of it wasn't particularly elegant. On the other hand, it takes more time to understand what's going on when looking at the new code. Similarly, one could move the condition to a different function, isSpace()
, but is it worth it? I don't know.
Here's an alternative which doesn't involve any separation between the left trim and the right trim, and reuses the approach I started to implement above:
function trim(text) {
let start = -1;
let end = 0;
for (let i = 0; i < text.length; ++i) {
if (text[i] !== " ") {
start = start === -1 ? i : start;
end = i;
}
}
return text.slice(start, end + 1);
}
This one goes in a single loop, and records the positions of the first and the last characters it finds. At first, it seems more optimized than the earlier version. However, it all depends on the actual strings being trimmed. If you give it a very long string with spaces neither at the beginning, nor at the end, the algorithm will perform actually rather poorly. When the earlier solution would stop the loop immediately, the second algorithm will walk through the entire string, needlessly.
Aside performance considerations, I would think about the actual interface and how would it be used. In fact, the early approach has a benefit of being able to provide, separately, the left and right trimming functions. This can become handy, as sometimes, one doesn't want to trim on both sides, but only one. This makes the first approach my preferred one.