Relational Databases Are Not Relational
I’ve been meaning to write about this ever since reading Databases in Depth, a punchy presentation of relational theory by C.J. Date. I’m sure you already know this, but Date worked closely for many years with E.F. Codd, the man who introduced the relational model in his 1969 paper Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks.
Anyway, here are some highlights from the book:
- The term “relational” has nothing to do with relating two tables on a common set of columns.
- Relations are multi-dimensional. Don’t let the term “table” fool you into thinking they are two-dimensional.
- Nulls are not values.
- SQL is not a set-oriented language (it is bag-oriented).
I used to think I had a pretty good understanding of databases until I read that book. Now at least I know how little I know. I think that’s progress.
If you work with databases — and that includes anybody writing Rails code — I recommend you read the book. Think of it as part of your edumacation. After all, I doubt you’ve read many other database books recently :-)
So why write this now? Well, I was reading about CouchDB and found in the comments a highly intelligent¹ contribution from somebody called Breton. Here it is:
There’s a few myths floating around in this article and in the comments. I normally don’t have much luck diffusing myths, but hopefully if the people read this have the ability to think critically, and do their own research instead of knee-jerk react, then it should go alright.
Myth 1: SQL is relational. It turns out this is rubbish. SQL is so much not relational that the committee in charge of SQL had to remove the words “relation” and “relational” from the standard. This is easily verified: find the standard and do a simple find. What’s not relational about it? In short: in SQL, tables and rows are not sets; the order of records and columns are significant; SQL allows duplicates, and introduces three valued logic with the NULL — all violations of the relational model which introduce bugs, inconsistencies, and a significant risk of data corruption.
Myth 2: Relational Databases are flawed. Truth: There’s no way to know since there’s never been a widely used relational database. There are some who argue that should the relational model ever be implemented in a DBMS system, it would outperform current database systems in terms of speed and stability. It seems to me this is probably true, but then, what do I know? (Though some vendors claim that they produce relational database systems, this could not possibly be true as long as such systems use SQL).
Myth 3: Systems like CouchDB are in every way better than relational database systems.
Again, there’s no way to know since there isn’t a widely available and usable relational database. But a truly relational database system would hold significant advantages over a “flat” data storage system such as CouchDB. Assuming, of course, that it is utilized properly, a relational database is essentially a large scale inference engine. Meaning that if you can reason about a subject using formal logic, you can reason about data on the subject on a large scale using the relational algebra or relational calculus. This is amazingly powerful, but since most people never bother learning about the relational model they never take advantage of it.
And then, to wrap it up:
A good analogy might be claiming that Web Standards, semantic html, and seperation of presentation and content are unworthy concepts because they happen to be difficult to achieve in IE6. Don’t blame the inadequacies of the implementation on the concept itself.
¹ I know it was intelligent because it agreed with me.