Bulky, heavy, fat ORMs

Arseni Mourzenko

Founder and lead developer

180

articles

December 31, 2017

Tags: productivity 37

A long time ago, I wrote the following in the description of Orseis.

I never liked ORMs. They are bulky, heavy, fat. They build SQL queries behind my back, and I don't like it. I want to be in control.

Recently, a person asked on SE.SE how is it possible that we still write SQL by hand, despite the presence of lots of very capable ORMs. An interesting aspect in this question is the comparison with other domains where developers rely heavily on an abstraction layer: in general, one doesn't send an e-mail by playing with SMTP, and doesn't write web applications by processing raw HTTP.

This leads to a subject of abstractions: what do they bring, how do they leak, and what it means for business. Allow me to start with a short story which has nothing to do with ORMs: my experience with ASP.NET WebHooks library.

A few months ago, I was contributing to an API which needed to receive some events from the underlying API. WebHooks being a hype, it was decided to use them for the task, and since the API uses ASP.NET Web API, anything with “ASP.NET” in its name was a mandatory choice anyway. So I struggled with ASP.NET WebHooks for a week before re-implementing a large part of the functionality we needed by hand.

From there, it appeared that:

The library is documented very badly, which is not surprising from anything released by Microsoft in the last five years, by opposition to the early .NET Framework assemblies which usually had excellent and comprehensive documentation.
The library is designed specifically for narrow “marketing” situations. Want to receive notifications from Slack or Facebook? You can do it within minutes, writing minimal amount of code. Want to do something which is outside those showcase situations? You either need to tweak and tear the library a lot, or even end yourself being on your own.
The library makes easy things extremely complicated. By adding things such as HMAC verification (which we didn't need anyway, but which couldn't be disabled), the library makes it very complicated to interact with it from anything else than the library itself. In other words, I wasted literally days trying to make my own implementation work on one side, interacting with Microsoft's library on the other, while I could implement a simple notification mechanism myself in a matter of hours.

This is an excellent example of a harmful abstraction. It pretends to simplify things, and it actually does as soon as you stay within its limits—within the area the abstraction was explicitly designed for. Within this area, it saves you time, and a lot of time. Hours of work writing code which doesn't necessarily have enough tests to be relied upon, you write two-three lines of code, and that's it; it works, flawlessly. However, as soon as you need something else, the abstraction harms you and makes you waste days or weeks, or simply makes it impossible to do what you want.

This phenomenon is global, and practically every library or framework has this flaw. Helping a team who uses Angular, I'm often surprised that they prefer to have a poor user interface, because doing otherwise would offend Angular. If a choice of a framework means that I can't do certain things I can do with a dozen lines of code with jQuery, I say: “throw the framework.” Users matter more than frameworks.

Every abstraction could shorten the amount of time it takes to design a part of software, and make it longer to design other parts. Take a simple example. Green bars indicate the time it takes to develop six parts of a software product without using a specific framework or library. Blue bars show the time it takes to develop the same parts while using the framework or library.

Any successful salesman would emphasize how the library helps with the part E. It's fantastic: with the help of the library, instead of spending time developing all the stuff yourself, you just write a line of code and, magically, it works.

The same salesman won't mention anything about the parts A and C which takes much more time, but where the library doesn't help. And, obviously, he will stay silent on the negative consequences of the library on the parts B, D and F. But if you do the math, it appears that the library is maybe not that fantastic after all:

Agreed, it saved you time because you didn't have to develop the part E, but overall, you spent more time adapting the product to this library. Not nice.

There are situations where a given library only wastes your time. My experience with ASP.NET WebHooks is exactly that: days instead of hours of work. Several factors could cause that. In some rare cases, it comes from poorly written, poorly documented libraries. More frequently, it comes from the inadequacy between the tasks the library is designed for and the context where the library is used. This is why it is essential for the authors of a library to be very specific about the library limits. This is exactly what I did when releasing the nicecall library:

Note that nicecall is not a substitute to subprocess, because much of subprocess functionality doesn’t exist. For instance, one can’t use stdin or pipes with nicecall. The goal is not to replace subprocess, but only to provide an easy way to do the most common tasks.

In the same way, the authors of ASP.NET WebHooks library should have specified that the library is good to receive notifications from GitHub or Azure, but is a poor choice for anything outside the very basic cases. Obviously, many authors of libraries are recalcitrant to put clear information about the library scope, because of a salesperson mentality.

How a library or a framework would impact your software product depends on horizontal and vertical adequacy and on horizontal and vertical cost.

Horizontal adequacy means that the dependency helps you with a broad range of tasks. .NET Framework or J2E are good examples: they are huge, and have so much stuff in it that you rely on them for nearly everything, from creating a substring from a string to streaming data to creating ZIP archives. Web frameworks such as ASP.NET MVC or Flask generally have a good horizontal adequacy for web applications. Now, if you're writing a desktop application, horizontal adequacy of ASP.NET and Flask is zero.
Vertical adequacy means that the dependency helps you a lot. Take a library which sends e-mails. Its scope is very narrow: it won't create CSS bundles or help me implement CSP for my web application, meaning that its horizontal adequacy is low. However, its benefits are tremendous if my goal is to send punctual e-mails: I don't need to know SMTP protocol to send e-mails. If my goal is, instead, to broadcast thousands of e-mails per second, the vertical adequacy is close to zero: there is no real help from the library, as I would still have to learn the underlying protocols, spam filtering, and dozens of other aspects if I want to succeed.
Horizontal cost is the size of the impact of the abstraction on your application. If you use a library to send e-mails from a web application, the horizontal cost is close to zero: you only have an additional dependency you have to maintain. If, on the other hand, you use an MVC framework, an important part of your application is affected, since it forces you to use the MVC pattern. Another example is SharePoint: tried to get rid of SharePoint from an application written for it in the first place?
Vertical cost is the cost added to the task by the abstraction. With e-mails library, this cost is very low. With libraries such as RxJS, the cost is very high, because you have to learn a new paradigm (if you already knew the paradigm and every person who will join your team in the future will know it too, then the cost drops to nearly zero).

We would all prefer dealing with abstractions which have a large horizontal and vertical adequacy (i.e. they have all we need, and do it great), and no horizontal and vertical cost (i.e. they don't constrain us in the parts of the application where we don't need those abstractions and they are easy to set up). But more often than not, abstractions tend to be costly, and bring questionable benefits in narrow situations.

What does all that have to do with ORMs?

ORMs have a small horizontal cost but a considerable vertical cost.

Small horizontal cost: high quality ORMs such as Entity Framework do know how to avoid impacting your whole application; they restrict themselves to the areas where you want them to be, and even make it possible for you to bypass them when you need to.
Considerable vertical cost: ORMs are complex. LINQ to SQL was simple, but not very powerful. With power, comes complexity, as Entity Framework shows us. You can do a lot, but there is a learning curve for that. WCF was another great example: very powerful, it could lead you to hours of head banging and desperate StackOverflow browsing.

As for their adequacy:

As the horizontal adequacy grows, the vertical cost grows too. This is the consequence of the fact that ORMs are a leaky abstraction: they do tend to let you do lots of things, but in order to do those things, you have to know well the underlying technology (in this case, SQL and the database), and, in addition, you have to learn how such or such thing is implemented in the ORM itself. For instance, if I need to optimize a slow search on a table containing lots of rows, not only do I need to know what indexes are and how to use them, but I also need to find how to tell to Entity Framework to put an index on a specific column.
The vertical adequacy could be good for very narrow situations: the simple ones, the academic ones. Selecting all products where a price is between this and that? Can do that. As soon as one starts moving to more complex situations, the vertical adequacy decreases, and in more and more cases, you'll find that it's just easier to write the SQL query by hand than to tweak your favorite ORM.

I would love an ORM which could figure how to store the data, be able to cross the boundaries of SQL and move to databases such as MongoDB or Redis as well. I would love it to analyze the data and the requests and determine how the schema should be optimized based on the current usage. I would love simply telling: “Hey, ORM, please store this object for me” or “Hey, ORM, could you tell me all the records of this type for the last two weeks” and never have to look under the hood. Unfortunately, this isn't happening soon.

What happens soon is that ORMs stay among the great tools which do a great job in specific situations, for specific projects. As I don't develop in Assembler, because Java or Python compilers do a much better job for me, I hardly see myself writing SQL by hand for an ordinary business application. But there are cases where you need to dive into Assembler because you need specific performance optimizations, and there are cases where you need to write SQL by hand. During my career, it appeared that none of my projects led me to write Assembler code, and that nearly every project I did led me to hand-made SQL. Possibly developers working on embedded software do use Assembler and don't write SQL. YMMV.