Trimming strings

Arseni Mourzenko
Founder and lead developer
163
articles
June 21, 2021
Tags: refactoring 12 short 49 javascript 2

Reg­u­lar­ly, I find in­ter­est­ing ques­tions on Stack Over­flow, that look ba­sic at the first sight, but ap­pear to be not that sim­ple af­ter all. To­day, it was a ques­tion on the Russ­ian ver­sion of Stack Over­flow which got my at­ten­tion. The au­thor wants to im­ple­ment, just as an ex­per­i­ment, a trim func­tion in JavaScript (which re­moves spaces only, ig­nor­ing all oth­er white­space char­ac­ters). He tried to draft it, but he got un­ex­pect­ed re­sults, and asked for help.

The more I thought about it, the more I loved the ques­tion. There are at least three as­pects I find in­ter­est­ing:

  1. There are plen­ty of ways to face this prob­lem.
  2. It is a per­fect ex­er­cise when it comes to al­go­rithms.
  3. Through this ques­tion, one can learn a lot about the prac­tices which help solv­ing this sort of prob­lems.

For the sake of ex­per­i­ment, let's im­ple­ment our­selves the piece of code, and let's do it with TDD. As I want to keep it as easy as pos­si­ble, I won't use any test­ing frame­work, nor Node.js. One should be able to run the code in the brows­er by copy-past­ing it to the con­sole. Let's start, there­fore, with a func­tion which will help us to per­form the tests:

function assertEquals(expected, actual) {
    if (expected === actual) {
        console.log(`Case ${actual} passed.`);
    } else {
        console.warn(`Case ${actual} failed: expected ${expected}.`);
    }
}

One can copy this func­tion to the brows­er con­sole and play with it. En­ter­ing the same strings, it tells that the test pass­es. If the strings are dif­fer­ent, a warn­ing is shown. Love­ly.

Now the ac­tu­al code. What's the sim­plest case one can test? Right, a string which nei­ther starts nor ends with a space. The ex­pect­ed re­sult: the string it­self.

function trim(text) {
    return text;
}

assertEquals("abc", trim("abc"));

The code pass­es, so time to move to some­thing more com­plex. Such as a string which starts with spaces: assertEquals("abc", trim(" abc")).

Be­ing com­plete­ly lazy, I cer­tain­ly don't want to over­think it:

function trim(text) {
    return text.split("").filter(c => c !== " ").join("");
}

Yep, that's right, let's just get rid of all the spaces. This leads to the next test, where spaces are in­side the string, with the ex­pec­ta­tion that they need to be pre­served: assertEquals("abc def", trim(" abc def")). If this was C#, I could have done a mi­nor trans­for­ma­tion of the code by adding a bunch of LINQ state­ments, and call it a day. No prob­lem, I can ei­ther use an ex­is­tent li­brary which adds a func­tion­al­i­ty sim­i­lar to LINQ, or I can draft my own func­tion:

function* skipUntil(sequence, predicate) {
    let skipping = true;
    for (const element of sequence) {
        if (!predicate(element)) {
            skipping = false;
        }
        
        if (!skipping) {
            yield element;
        }
    }
}

function trim(text) {
    const characters = text.split("");
    const matches = skipUntil(characters, c => c === " ");
    return Array.from(matches).join("");
}

In pro­duc­tion code, I would put skipUntil() func­tion in a sep­a­rate mod­ule, and test it sep­a­rate­ly. As this is just an ex­am­ple, let's for­get about this func­tion, and fo­cus on trim(). As the func­tion we wrote trims only the be­gin­ning of the string, let's re­name it to ltrim(), and im­ple­ment trim() as a sim­ple call to ltrim()—at least for now.

What's next? Right, trim­ming of spaces at the end of a string. Now that's where things start to be in­ter­est­ing. With just one test, assertEquals("abc def", trim("abc def ")), there are mul­ti­ple pos­si­bil­i­ties to make this test pass, and all in­volve an ap­proach rad­i­cal­ly dif­fer­ent from what we had for the left trim.

One pos­si­ble ap­proach is to loop through the el­e­ments and re­mem­ber the last oc­cur­rence of a non-space char­ac­ter:

function rtrim(text) {
    let i = 0;
    let end = 0;

    for (const c of text) {
        if (c !== " ") {
            end = i;
        }

        i += 1;
    }

    return text.slice(0, end + 1);
}

function ltrim(text) {
    [...]
}

function trim(text) {
    return rtrim(ltrim(text));
}

While it works and all four tests pass, the trim() func­tion is way too com­plex. Aside, I don't like hav­ing to loop through the whole string in or­der to ma­nip­u­late some­thing which is nec­es­sar­i­ly at the very end of it. Let's go for an or­di­nary loop in­stead of a for...of state­ment:

function rtrim(text) {
    for (let i = text.length - 1; i >= 0; --i) {
        if (text[i] !== " ") {
            return text.slice(0, i + 1);
        }
    }

    return "";
}

Now that I com­pare ltrim() and rtrim(), they look very dif­fer­ent. Some­times this is not a prob­lem: two al­go­rithms which do two very dif­fer­ent things don't nec­es­sar­i­ly need to look the same just be­cause the name of the con­tain­ing func­tion looks the same. Here, how­ev­er, I have an im­pres­sion that the so­lu­tion with the gen­er­a­tors is a bit too smart for what it's worth. Hav­ing a ded­i­cat­ed skipUntil() is re­al­ly an overkill here, and the whole ap­proach looks wrong. So I can rewrite it, us­ing the very same ap­proach as I used for rtrim():

function ltrim(text) {
    for (let i = 0; i < text.length; ++i) {
        if (text[i] !== " ") {
            return text.slice(i);
        }
    }

    return "";
}

Now, the two look ex­act­ly the same. So much that we can ex­tract the al­go­rithm in a com­mon func­tion, like this:

function sliceOn(text, from, to, step, slice) {
    for (let i = from; to(i); i += step) {
        if (text[i] !== " ") {
            return slice(text, i);
        }
    }

    return "";
}

const ltrim = (text) => sliceOn(
    text, 0, i => i < text.length, 1, (t, i) => t.slice(i));

const rtrim = (text) => sliceOn(
    text, text.length - 1, i => i >= 0, -1, (t, i) => t.slice(0, i + 1));

const trim = (text) => rtrim(ltrim(text));

I'm quite hes­i­tant as whether this last change is an ac­tu­al im­prove­ment or not. On one side, a loop with a con­di­tion in­side of it wasn't par­tic­u­lar­ly el­e­gant. On the oth­er hand, it takes more time to un­der­stand what's go­ing on when look­ing at the new code. Sim­i­lar­ly, one could move the con­di­tion to a dif­fer­ent func­tion, isSpace(), but is it worth it? I don't know.

Here's an al­ter­na­tive which doesn't in­volve any sep­a­ra­tion be­tween the left trim and the right trim, and reuses the ap­proach I start­ed to im­ple­ment above:

function trim(text) {
    let start = -1;
    let end = 0;
    for (let i = 0; i < text.length; ++i) {
        if (text[i] !== " ") {
            start = start === -1 ? i : start;
            end = i;
        }
    }

    return text.slice(start, end + 1);
}

This one goes in a sin­gle loop, and records the po­si­tions of the first and the last char­ac­ters it finds. At first, it seems more op­ti­mized than the ear­li­er ver­sion. How­ev­er, it all de­pends on the ac­tu­al strings be­ing trimmed. If you give it a very long string with spaces nei­ther at the be­gin­ning, nor at the end, the al­go­rithm will per­form ac­tu­al­ly rather poor­ly. When the ear­li­er so­lu­tion would stop the loop im­me­di­ate­ly, the sec­ond al­go­rithm will walk through the en­tire string, need­less­ly.

Aside per­for­mance con­sid­er­a­tions, I would think about the ac­tu­al in­ter­face and how would it be used. In fact, the ear­ly ap­proach has a ben­e­fit of be­ing able to pro­vide, sep­a­rate­ly, the left and right trim­ming func­tions. This can be­come handy, as some­times, one doesn't want to trim on both sides, but only one. This makes the first ap­proach my pre­ferred one.