Duplicating database constraints in code

Arseni Mourzenko
Founder and lead developer
176
articles
November 30, 2016
Tags: database 2

When I re­view data­base mod­els for RDBMS, I'm usu­al­ly sur­prised to find lit­tle to no con­straints. For in­stance, per­cent­age is of­ten stored in a col­umn of type int (while tinyint would be more ap­pro­pri­ate) and there is no CHECK con­straint to re­strict the val­ue to 0..100 range.

In my own pro­jects where I use re­la­tion­al data­bas­es, I have a habit to de­fine a very strict schema, in or­der to en­sure the data is valid as much as pos­si­ble. When I adopt a more le­nient at­ti­tude is when per­for­mance is at stake, usu­al­ly when it comes to pre­vent­ing prop­er de­nor­mal­iza­tion. In all cas­es, when I deal with data as sim­ple as a per­cent­age, there is no way I wouldn't in­tro­duce a lim­it con­straint on it with­in the data­base it­self.

When talk­ing with oth­er de­vel­op­ers about con­straints, they usu­al­ly tell that:

There seems to be an im­por­tant mis­un­der­stand­ing on both points, and I'll ex­plain it in this ar­ti­cle.

SSOT and du­pli­cat­ed con­straints

Imag­ine you're writ­ing a web ap­pli­ca­tion which has a form. In this form, one of the fields re­quires the user to spec­i­fy her age. It is im­por­tant to han­dle a case where the user spec­i­fies some­thing which is not a num­ber, or a neg­a­tive num­ber, or zero, or a num­ber su­pe­ri­or to an ar­bi­trary lim­it of one hun­dred and twen­ty.

What would hap­pen if you write this ap­pli­ca­tion with no fan­cy frame­works (for in­stance, you can use the ba­sic func­tion­al­i­ty of PHP serv­er-side, and just jQuery on client-side)?

Or maybe you know some fan­cy frame­work such as ASP.NET MVC which does all the hard work for you: the only thing you need to do is to spec­i­fy the rules in a sin­gle place, such as a data an­no­ta­tion at­tribute, and the frame­work pro­duces the cor­re­spond­ing C# and JavaScript code for you.

So:

  1. There will al­ways be a val­i­da­tion at the serv­er lev­el,
  2. There could be a du­pli­cat­ed val­i­da­tion at the client lev­el,
  3. If tools al­low, it's bet­ter to avoid du­pli­ca­tion by spec­i­fy­ing the val­i­da­tion rules once, and let­ting the tools prop­a­gate them both at the serv­er and the client lev­el.

This is ex­act­ly what should hap­pen when us­ing the data­base.

The data­base doesn't have to trust the ap­pli­ca­tions, so spec­i­fy­ing con­straints at data­base lev­el makes per­fect sense. In my pre­vi­ous per­cent­age ex­am­ple, this means hav­ing those con­straints:

alter table [Shop].[Rebate]
add constraint [CK_PercentageRange] check ([Amount] <= 100)

The low lim­it shouldn't be checked as soon as the type of the Rebate col­umn is tinyint (one byte, val­ues rang­ing from 0 to 255).

From there, if you in­ter­act with the data­base di­rect­ly (for in­stance by writ­ing SQL queries by hand), then the data­base log­ic will be du­pli­cat­ed. This means du­pli­cat­ing the struc­ture of the data­base it­self, would it be ta­bles, views, columns or col­umn types, but also its con­straints, which is of­ten not very dif­fer­ent from types them­selves. In the pre­vi­ous ex­am­ple, this would lead to class­es such as:

public class Percentage
{
    private readonly int value;

    public Percentage(int value)
    {
        if (value < 0 || value > 100) throw new ArgumentOutOfRangeException(...);

        this.value = value;
    }

    public int Value
    {
        get { return this.value; }
    }
}

public class Rebate
{
    public Percentage Amount { get; set; }
}

If man­u­al du­pli­ca­tion is an­noy­ing, then code gen­er­a­tion tools should ex­ist, trans­pos­ing the en­tire data­base schema to data struc­tures in code. And I'm not talk­ing here about ORMs, which hi­jack the ac­cess to the data­base, but about a tool which only con­structs the data struc­tures and their cor­re­spond­ing con­straints. In the pre­vi­ous ex­am­ple, the is­sue is that the tool won't prob­a­bly be able to guess that it should cre­ate a Percentage class, and will push the val­i­da­tion log­ic into the Rebate class it­self, lead­ing even­tu­al­ly to se­vere code du­pli­ca­tion. While this is not an is­sue for gen­er­at­ed code, it is still a heavy lim­i­ta­tion if we care about en­cap­su­la­tion and the sin­gle re­spon­si­bil­i­ty prin­ci­ple.

Han­dling the vi­o­la­tion ex­cep­tions

In a lan­guage such as C#, what is the dif­fer­ence be­tween ex­cep­tions and as­ser­tions?

While vi­o­la­tions of data­base con­straints will usu­al­ly be re­port­ed as ex­cep­tions by the SQL li­brary, it is es­sen­tial to un­der­stand that those vi­o­la­tions are not ex­cep­tion­al per se: in­stead, they in­di­cate a crit­i­cal bug in the sys­tem—ei­ther the code was flawed, lead­ing to a sit­u­a­tion where it vi­o­lat­ed the con­straint, or the con­straint it­self is wrong (which could hap­pen for in­stance if the change in re­quire­ments was re­flect­ed in code, but not in the data­base.) There­fore, this sit­u­a­tion should be treat­ed ex­act­ly as a vi­o­lat­ed as­ser­tion: it should be logged, and then the code should be fixed.

This means that there is noth­ing to han­dle at any ap­pli­ca­tion lev­el here. The data­base lay­er is un­able to han­dle this ex­cep­tion: it won't, for ex­am­ple, retry the query—some­thing it could do for in­stance if the ex­cep­tion re­ports that the data­base is tem­po­rary un­avail­able. Sim­i­lar­ly, the busi­ness lay­er doesn't know how to han­dle the ex­cep­tion ei­ther. Nor would it be pos­si­ble at the pre­sen­ta­tion lay­er. The ex­cep­tion will sim­ply reach the glob­al han­dler which will log it and let the ap­pli­ca­tion crash, even­tu­al­ly grace­ful­ly.

But NoSQL...

And then there are pro­po­nents of NoSQL who claim that there are no CHECK con­straints in Mon­goDB or Elas­tic­Search, and that NoSQL data­bas­es are great with­out them, and that log­i­cal­ly, there is no use for con­straints in RDBMS.

While I find the log­ic it­self par­tic­u­lar­ly dis­turb­ing (I mean, should we start mak­ing all vari­ables in C# dynamic just be­cause Python is a dy­nam­i­cal­ly-typed lan­guage and works great?), there is an im­por­tant as­pect which those de­vel­op­ers for­get: the way NoSQL data­bas­es are of­ten used to­day.

In most cas­es where a NoSQL data­base is in­volved, there is ei­ther a sin­gle ap­pli­ca­tion which uses it, or a ser­vice in an SOA sys­tem.

With SOA, the sit­u­a­tion is very sim­ple. The same con­straints ex­ist, but at a dif­fer­ent lev­el. In­stead of be­ing with­in the data­base, they are sit­u­at­ed at the lev­el of the ser­vice, in code. The re­sult is the same: a client, in­stead of con­nect­ing to a RDBMS, con­nects to a REST/SOAP ser­vice as if it was a data­base. All the ap­pli­ca­tions in­ter­act ex­clu­sive­ly through the end­point, which also con­trols the data which comes in.

In a case of a sin­gle ap­pli­ca­tion, the log­ic is moved to the data ac­cess lay­er. Di­rect ac­cess to the data hap­pens rarely, if at all, and scripts such as cron jobs usu­al­ly tend to reuse the data ac­cess lay­er of the ap­pli­ca­tion it­self. This doesn't mean that prob­lem­at­ic cas­es where the wrong data reach­es the data­base are im­pos­si­ble, it sim­ply means that the habit of play­ing with data di­rect­ly as de­vel­op­ers of­ten do with RDBMS is not there yet.

But this is busi­ness log­ic...

One point I'm used to hear from col­leagues is that con­straints are busi­ness log­ic, and busi­ness log­ic doesn't be­long in data­base.

This is a valid point for stored pro­ce­dures. In­deed, SQL is rarely the right tool for com­plex busi­ness log­ic, and the fact that it's pos­si­ble to use SQL for this pur­pose doesn't mean we should do it.

Con­straints, how­ev­er, are a dif­fer­ent beast.

What hap­pens when you make a choice be­tween a tinyint and a smallint? You're choos­ing the most mem­o­ry-ef­fi­cient so­lu­tion which match­es the re­quire­ments. In oth­er words, if you know that you need to store val­ues from 1 to 2000, tinyint would be the wrong type.

Many con­straints, in essence, are just ex­ten­sions of types. My ex­am­ple of 0..100 range for a per­cent­age is just that: a cus­tom type which has a tinyint foot­print but ac­cepts only a sub­set of its val­ues. In Ada, for in­stance, the range of ac­cept­ed val­ues can be part of the type de­f­i­n­i­tion:

subtype Percentage is Integer range 0 .. 100;

In­clud­ing pieces of code in Ada in a blog ar­ti­cle in­creas­es kar­ma.

Those of the con­straints that are not ex­ten­sions of types could be con­sid­ered as data con­tracts. For in­stance, hav­ing a con­straint which in­di­cates that a re­bate should be ex­pressed ei­ther as an ab­solute val­ue or a per­cent­age, but nev­er both, is nat­u­ral­ly a busi­ness rule, but also a data con­tract. Its goal is to ex­tend the ba­sic data­base schema by pro­vid­ing ad­di­tion­al in­for­ma­tion which can­not be de­scribed by oth­er means. This is not very dif­fer­ent from hav­ing the data­base schema in the first place: defin­ing that the table Person con­tains columns a non-null FirstName and a non-null LastName con­sists of defin­ing spe­cif­ic busi­ness rules, which could have to change lat­er (for in­stance be­cause one no­tices that not every­one has a first and a last name). If you want to keep your re­la­tion­al data­base free of busi­ness rules, your only choice is to have a sin­gle table with a sin­gle nvarchar(max) col­umn con­tain­ing a sin­gle row which stores all the data.

Con­clu­sion

Over­all, con­straints are an ex­cel­lent tool which, while not cru­cial, makes your data­base a slight­ly safer place. What hap­pens is that when both the num­ber and the com­plex­i­ty of the apps ac­cess­ing a giv­en data­base start to grow, the risk of in­tro­duc­ing in­valid data grows as well. Some prac­tices, such as ac­cess­ing the data­base di­rect­ly by hand in or­der to fix stuff, clean things, etc., are in­creas­ing this risk even more; while they should be avoid­ed at all costs, prag­mat­ic ap­proach makes it some­times im­pos­si­ble to pro­hib­it them.

While the same con­straints should be repli­cat­ed with­in the ap­pli­ca­tions, it should be not­ed that such repli­ca­tion could be au­to­mat­ed through some clever code gen­er­a­tion tools—some­thing which is tech­ni­cal­ly fea­si­ble, al­though I don't re­mem­ber see­ing such tools yet, maybe be­cause of the pop­u­lar­i­ty of ORMs.