For starters, this post is not a theoretical demonstration regarding whether you should use or not use ORMs (Object Relational Mapping) in your projects. It’s a description of why, in my own context, I decided to stop using ORMs. The ambition is to give a particular point of view that may be of use for people working in tech, to make up their mind.
First, the context. I’ve been working for 6 years as a tech startup CTO, at times as a cofounder, at times as a freelance. I’ve worked in or with a dozen of tech teams, using various kinds of technologies. Some teams were venture-backed, some were not.
If I had to sum up the context for the subject at hand, I would say that my point of view on ORMs is relevant for startup tech teams with < 30 persons.
When I started coding, like most people, I started with a specific language, and that was not SQL. So, as soon as I started to need using a database, I naturally used ORMs to handle the database stuff. If we’re being honest here, most people don’t do a thorough research regarding the ORM topic, and just end up using one because they feel like they can get things done quicker, in a language they’re familiar with. I myself began coding in Python, and I quickly ended up using Django.
As time passed, I began appreciating several things about it, especially the migrations management part. Having all data schema modifications done in code (= versioned in Git) felt safer, especially when working with other people on that data schema. As the complexity of my projects grew, the backend code became more and more complex, but I didn’t give it second thoughts, since I had always used ORMs.
One thing that bothered me is that each time a requirement a little bit complex came up on a project, I had to dig very deep in the ORM documentation, like if I was cursed and that when a customer asked me something it would invariably be a specific corner case where the ORM fell short.
One day, I started taking data science projects, and I started learning SQL, since I had to perform a lot of queries on structured data.
I did my research and for various reasons I ended up selecting Postgres for my database of choice, due to its open source nature, its awesome views features (named reusable SQL queries), and its GIS extension Postgis.
One thing leading to the other, I worked on multiple data science projects in a row, and began to be quite good with SQL. It started to dawn on me how quick I was to make complex queries, with very few lines of SQL code.
I then worked again on web projects, and with my experience in SQL, decided that this time I would still write the backend in Python, but without ORM, and use a database driver instead (Psycopg2) to interact with the DB (database).
It has been a few years now that I work without ORMs, and I think my productivity more than doubled. I’m currently a freelance CTO, and I’m responsible (among other things) for the production infrastructure, including database schemas, of 4 startups.
I tried to analyze retrospectively what happened and here are a few thoughts.
There is no Free Lunch
ORM make one promise: that they will abstract the DB away. Solving problems in a generic way that works for everybody can mean 2 things: either it’s a technological breakthrough, either someone made some kind of middle-ground trade-off with existing tech. ORMs clearly fall in the latter.
ORMs make so many abstractions that starting off may be easier, but very quickly as you encounter more evolved real-world use cases, you end up spending your time reading documentation to understand the ORM abstractions, and how to configure it the way you need.
When a piece of software abstracts away a problem for you, if the scope of the problem is well defined, you can end up always using sensible defaults config parameters and a few of their variants. In this case, abstraction is a winning proposition.
The problem with ORMs is that the scope of what they usually try to achieve is too big, and in a real-world situation, when complexity arises, you often end up reading dark corners of your ORM documentation to find out whether your situation can fit in their abstractions. A great blog post sums this up:
“Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.”
ORMs often seemed at first to be a winning proposition, but as time passes, I spent more time reading complex documentation for seemingly simple real-world situations, and I realized that the scope of what ORMs try to accomplish is just too large. When life keeps throwing at you different use cases with many little variants (and rest assured, this never stops), any endeavour to abstract it all quickly turns into a quagmire.
Requirements You don’t have
A huge problem of ORMs is that they abstract away the database, which means that if you have a favorite database, you can’t use its specific features.
If we’re being honest, in the context I’m talking about (startup with tech team < 30 persons), who really needs DB portability?
The DB field is competitive and different DBs do differentiate themselves with great features (MySQL, Postgres, Mongo…). Abstracting the DB away means you have to settle for a common denominator, which is usually frankly quite mediocre.
For example, if in Django you need to perform a classic array aggregation, you end up using django.contrib.postgres.aggregates module, and lose DB portability right away. Array aggregations are very useful in many use cases, but not all DB propose it, so you end up using DB-specific modules anyway.
The Performance Problem
ORMs usually come with a SQL generator, that you’re never supposed to see. Problem is, in real-life, your DB schema will eventually get complex enough, and you will start having performance issues.
Business requirements change over time, retrocompatibility requirements pile up, and there is no way you can keep your DB schema theoretically optimal for your company’s needs at time t.
This fact translates into increased complexity for your DB queries, and a need to be smart about your queries.
When this complexity increases a bit, there is simply no way the ORM SQL generator can output optimized queries. Things will get slow, and I’ve yet to see an efficient way of querying a complex DB other than by writing the SQL yourself.
DB Schema Consistency
Initially I loved ORM for the fact that data structures were defined in code, which means versioned in Git, which means people could collaborate on them. Over the years I’ve changed my mind completely about this. Having multiple people make changes on DB schema invariably leads to huge inconsistencies, suboptimal choices and huge legacy.
This is why I enjoy working in small teams. I know it’s not possible for big corporations to have only 1 or 2 guys be the data masterminds, and I think that’s the reason why software quality drops so rapidly with the size of an organization. We just don’t know how to work efficiently in tech when the problem becomes too big for one brain (and by the way I think this is one of the biggest challenges to the advancement of humankind but that’s a story for another time).
I think that in the context of a startup with a tech team < 30 persons, you should not have multiple persons making decisions about the DB schema. There should be 1 owner for this (which does not mean there can’t be collaborative discussions of course).
Currently I usually am the owner of DB schemas, and I manage SQL migrations by hand, via SQL scripts that are versioned in git, separately from backend code repos. I’m still regularly surprised by how many bad product/code decisions become obvious when considering DB schema issues, with context and history in mind.
I had the opportunity to discuss the ORM topic with friends, and many had experienced more or less the same things, however some of them had an interesting take on the matter: they kept using ORMs, but only for particular things such as migrations management, handling connection pools with DB (for concurrent querying), input validations… To come back to the Vietnam analogy, I think it’s perfectly OK to use ORMs, if you know exactly what for, with a clear goal in mind and a well defined perimeter.
The stack I use now
The resulting backends, if you know SQL, are insanely quick to write, easy to read, and thus to maintain.