A couple of years ago I broke my basic rule of sticking to practical how-to and general programming philosophy and wrote Why I Do Not Use ORM. It sure got a lot of hits, and is read every day by people searching such things as "orm bad" or "why use orm". But I have never been satisfied with that post, and so I decided to take another stab from another angle. There are legitimate problems that led to ORM, and those problems need to be looked at even if we cannot quite agree on what they are or if ORM is the answer.
UPDATE: In response to comments below and on reddit.com, I have a new post that gives a detailed analysis of an algorithm implemented as a sproc, in app code with embedded SQL, and in ORM.
Here then, is one man's short history of commercial database application programming, from long before the ORM system, right up to the present.
The Way Back Machine
When I began my career the world was a different place. No Web, no Java, and Object Orientation had not yet entered the mainstream. My first application was written on a timeshare system (a microVAX) and writing LAN applications made me a good living for awhile before I graduated to client/server.
In those days there were three things a programmer (We were not "software engineers" yet, just programmers) had to know. Every programmer I knew wanted to master all of these skills. They were:
- How to design a database schema for correctness and efficiency.
- How to code an application that could process data from the database, correctly and efficiently.
- How to make a good UI, which came down to hotkeys and stuffing the screen with as much info as possible.
In this essay we are going to look at those first two.
My own experience may be somewhat peculiar in that I have never worked on a team where the programmers were separated from the database. (OK, one exception, in my current assignment there is an iron curtain between the two, but happily it is not my problem from where I sit). Coders made tables, and "tablers" wrote code. So this focus on being a good developer by developing both skills may be rare, enjoyed by those who have the same ecumenical background that I enjoyed.
Some Changes That Did Not Matter
Things changed rapidly, but most of those changes did not really affect application development.
When Windows 95 came out, being "almost as good as a Mac", we recoded our DOS apps into Windows apps without too much trouble and life went on as before.
Laser printers replaced dot-matrix for most office use, CPUs kept getting faster (and Windows kept getting slower), each year there were more colors on the screen, disks got bigger and RAM got cheaper.
Only the internet and the new stateless programming required any real adjustment, but it was easy for a database guy because good practice had always been to keep your transactions as short as possible. The stateless thing just kind of tuned that to a sharp edge.
Finally, with the internet, the RDBMS finally lost its place as sole king of the datastore realm, but those new datastores will have to wait for another day, lest we get bogged down.
Enter Object Orientation
Arguably nothing changed programming more than Object Orientation. Certainly not Windows 95, faster graphics or any of those other Moore's Law consequences. I would go so far as to say that even the explosion of the web just produced more programming, and of different kinds of apps, and even that did not come close to the impact of Object Orientation. Disagree if you like, but as it came in, it was new, it was strange, it was beautiful, and we were in love.
Now here is something you may not believe. The biggest question for those of us already successfully developing large applications was: What is it good for? What does it give me that I do not already have? Sure its beautiful, but what does it do?
User interfaces were for me the easiest first place to see the benefits. When the widgets became classes and objects, and we empolyed encapsulation, inheritance and composition, the world changed and I don't know anybody who ever looked back.
OOP, Data, and Data Structures
But in the matter of processing data, things were not so clear cut. The biggest reason may have been that all languages back then had specialized data structures that were highly tuned to handling relational data. These worked so well that nobody at first envisioned anything like ActiveRecord because we just did not need it.
With these structures you could write applications that ran processes involving dozens of tables, lasting hours, and never wonder, "Gosh, how do I map this data to my language of choice?" You chose the language you were using precisely because it knew how to handle data!
I would like to throw in just one example to show how OOP was not relevant to getting work done back then. I was once asked to optimize something called "ERP Allocation" that ran once/day, but was taking 26 hours at the largest customer site, obviously a big problem. It turned out there was a call to the database inside of a tightly nested loop, and when I moved the query outside of the loop the results were dramatic. The programmers got the idea and they took over from there. The main point of course is that it was all about how to efficiently use a database. The language was OOP, and the code was in a class, but that had nothing to do with the problem or the solution. Going further, coding a process so data intensive as this one using ActiveRecord was prima facia absurd to anybody who knew about data and code.
Java and the Languages of The Internet
But the web had another impact that was far more important than just switching to stateless programming. This was the introduction of an entirely new family of languages that took over the application space, listed here in no particular order: Perl, PHP, Python, Ruby, and the king of them all: Java.
All of these languages have one thing in common that positively jumps out at a veteran: a complete lack of data structures specialized for handling relational data. So as these languages exploded in popularity with their dismal offerings in data handling, the need to provide something better in that area became rapidly clear.
Java has a special role to play because it was pure OOP from the ground up. Even the whitespace is an object! The impact of Java is very important here because Object Orientation was now the One True Faith, and languages with a more flexible approach were gradually demoted to mere 'scripting' languages. ( Of course proponents will quickly point out that 1/12 of the world's population is now using a single application written in one of those 'scripting' languages).
So the explosion of languages without decent data handling abilities, coupled with a rise in OOP-uber-alles thinking led us quite naturally to:
The First Premise of ORM: The Design Mismatch
The first premise of ORM is that there is a design mismatch between OOP and Relational, which must resolved before any meaningful work can be done.
This view is easy to sympathize with, even if you disagree, when you consider the points raised in the above sections, that the languages in play lack any real specialized data structures, and that a certain exclusive truthiness to OOP has arisen that is blind to entire classes of solutions.
So we must grant the ORM crowd their first premise, in modified form. It is not that there is a design mismatch, it is that there is something missing, something that was in older systems that is just not there in the newer languages. Granting that this missing feature is an actual mismatch requires a belief in the Exclusive Truth of OOP, which I do not grant. OOP is like the computer itself, of which Commander Spock said, "Computers make excellent servants, but I have no wish to be servant to a computer."
But anyway, getting back to the story, the race was on to replace what had been lost, and to do it in an OOPy way.
The Second Premise of ORM: Persistence
Fast forward and we soon have an entire family of tools known as Object-Relational-Mappers, or ORM. With them came an old idea: persistence.
The idea has always been around that databases exist to persist the work of the programmer. I thought that myself when I was, oh, about 25 or so. I learned fast that my view of reality was, *cough*, lacking, and that in fact there are two things that are truly real for a developer:
- The users, who create the paycheck, and
- The data, which those users seemed to think was supposed to be correct 100% of the time.
From this perspective, the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user (who creates the paycheck), and takes instructions back from the user and puts them in the database (correctly, thank you, and don't make the user wait). No matter how beautiful the code was, the user would only ever see the screen (or page nowadays) and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.
However, in the ORM world the idea of a database as the persistence layer now sits on a throne reserved for axiomatic truth. Those who disagree with me on this may say that I have the mistaken perspective of an outsider, to which I could say only that it is this very idea that keeps me an outsider.
An Alternative World View
There are plenty of alternatives to ORM, but I would contend that they begin with a different world view. Good business recognizes the infinite value of the users as the generators of the Almighty Paycheck, and the database as the permanent record of a job well done.
This worldview forces us into a humble position with respect to our own application code, which is that it is little more than a waiter, carrying orders to the kitchen and food back to the patrons. When we see it this way, the goal becomes to write code that can efficiently get data back and forth. A small handful of library routines can trap SQL injection, validate types, and ship data off to the database. Another set can generate HTML, or, can simply pass JSON data up to those nifty browser client libraries like ExtJS (now "Sencha" for some reason).
This covers a huge amount of what an application does, if you do not have much in the way of business logic.
But How Do You Handle Business Logic?
I have an entire essay on this about half-written, but in short, it comes down to understanding what business logic really is. Update: This post is now available
The tables themselves are the bottom layer of business logic. The table design itself implements the foundation for all of the business rules. This is why it is so important to get it right. The tables are organized using normalization to have a place for everything and everything in its place, and after that the application code mostly writes itself.
The application code then falls into two areas: value-add and no value-add. There is no value-add when the application simply ships data off to the user or executes a user request to update the database. Those kinds of things should be handled with the lightest possible library that gets the job done.
But the value-add stuff is different, where a user's request requires lookups, possibly computations and so forth. The problem here is that a naive analysis of requirements (particulary the transliteration error (Scroll down to "The Customer Does Not Design Tables) will tend to generate many cases of perceived need for value-add where a simpler design can reduce these cases to no value-add. But even when the database has been simplified to pristine perfection, there are jobs that require loops, multiple passes and so forth, which must be made idempotent and robust, which will always require some extra coding. But if you know what you are doing, these always turn out to be the ERP Allocation example given above: they are a lot more about the data than the classes.
Another huge factor is where you come down on the normalization debate, particularly on the inclusion of derived values. If you keep derived values out of the database, which is technically correct from a limited perspective, then suddenly the value-add code is much more important because without it your data is incomplete. If you elect to put derived values into your database than value-add code is only required when writing to the database, so huge abstractions meant to handle any read/write situation are unnecessary. (And of course, it is extremely important to Keep denormalized values correct ).
And the Rest of It
This essay hardly covers the entirety of making code and data work together. You still have to synchronize schema changes to code, and I still think a data dictionary is the best D-R-Y way to do that.
I hope this essay shows something of why many programmers are so down on ORM, but much more importantly that there are coherent philosophies out there that begin with a different worldview and deliver what we were all doing before ORM and what we will all still be doing after ORM: delivering data back and forth between user and database.