DaveWentzel.com All Things Data
What exactly is wrong with SQL and RDBMSs?
This is my third post on my evaluation of NoSQL products for a client who wishes to replace their (rather expensive) SQL Servers. Many relational guys ask themselves, "why all the fuss about NoSQL? What is wrong with relational data stores?" I felt the same way 15 years ago when people wanted to replace RDBMS with XML stores. But the movement is underway because people feel the relational model is flawed. In my mind, NoSQL is not a direct replacement for relational data stores, NoSQL has its place, just like relational does. But, we relational guys are making it tough on ourselves. We don't want to understand WHY people like relational alternatives.
This post is a series of things that annoy me with relational people. These annoyances bother lots of people...the same people who control the decision making powers of the data persistence engines that are chosen for new projects. If we relational guys don't change our ways we are going to be out of jobs. My complaints, in no particular order:
Relational database Schemas are Inflexible
I've been to numerous NoSQL presentations and this is the key point that these vendors drive home.
How many times have you asked your DBA to add a new column to a table and they refused? With <insert NoSQL product here> we are schema-less. We don't care if you add new data elements to your schema or not, because we have no schema. Add whatever data elements you want, whenever you want. We'll support it.
The fact is, relational databases can do this too. It's not the RDBMS that is inflexible, it is the DBA. More on that in the next section.
There are no valid reasons why a relational database schema has to be inflexible. The reasons are man-made contrivances. I have many blog posts where I show that you can add new columns and value them while a system is up with ZERO impact to the users. Some examples are here,here,here, and most importantly here.
There are some schema changes that are disallowed by the major RDBMS vendors. For instance, adding a non-nullable column to a billion row table generally means some downtime. But that is not a flaw of the relational model, that is a flaw in the vendor's implementation of the relational model. There is nothing in relational theory that limits schema flexibility. The problem is that the data modeler and data architect are not sufficiently well-versed in how to decouple the logical from the physical model such that evolutionary relational databases can be a reality. These people want to tie table structure to its on-disk representation. Data architects and data modelers need to update their skills.
DBAs, Data Architects, and Database Developers SUCK
I don't feel this way, but many of our fellow software developers do. Sorry, but it's true.
DBAs tend to ask too many questions and we make the developers think differently about their data. They don't like that. We data professionals like to think we are doing our employers a valuable service by asking the difficult questions about new data requirements, but in reality we are not. Developers are fighting us at every turn and we don't even realize it.
We've all seen this happen: A developer/analyst requests a new column in a table. What does the DBA do?
- We ask tons of questions about WHY the data is needed.
- We deny the request because
- the requestor wanted to call it "Status" and our naming conventions require the name to be StsVal. We direct the requestor to read the 2000 page "Data Standards and Naming Conventions Document, V3.15". I'm not against standards, I'm against the expectation that EVERYONE knows EVERY standard. Let's be a little more helpful.
- the requestor wanted to use the BIT datatype but our standard is to use TINYINT.
- We did not specify an appropriate DEFAULT for the new column.
- The new column was requested to be NOT NULLable and that is disallowed.
- Will require too much data conversion code to be written
- After the request is DENIED we ask that the requestor resubmit the request, making the necessary changes, and submit it for the next Change Review Board meeting, which is held the first Tuesday of every month. So basically, wait a month.
Get the picture? DBAs are notorious for not being helpful. Sorry, but that's my opinion, and the perception of many others too.
Don't believe me. Then why do we all see so many of the following traits in our databases:
- varchar(max) columns everywhere because if the requestor asks for varchar(50) and later needs it to be varchar(100) that the DBAs will deny the request. Best to ask for "too much" than "too little".
- varchar(max) columns that end up storing JSON, XML, or data in some homespun markup language. Developers do this to avoid data modeling and DBAs.
- EAVs. An EAV has almost infinite schema flexibility.
- Tables with ZERO DRI. The developers don't want to admit that there might be relationships to existing data because then they'll need to deal with DRI.
- Data Hoarding. We constantly see tables with no data lifecycle management. It's easier for a developer to say that data must always be maintained without being honest about the data retention requirements with the DBAs.
To my DBA friends, PLEASE change your attitudes.
I Can't "Discover" My Schema in a Relational Database
This is another complaint I hear too often. Developers want to experiment and discover their schemas without a lot of up-front formal modeling. And I agree. And I do that every day too. Nobody says you need to have DBA-approval to build some tables on your scrum server and do a proof-of-concept. When you are ready then you should get proper change management approval. This argument is another function of rigid policies and processes.
Many companies practice rigid "waterfall" development...they MUST do data modeling before any coding is done. And that is often WRONG. In these companies the model is ALWAYS determined to be lacking late in the development phase. The schema can't support all of the data requirements perfectly. But again, the developers and analysts fear the DBAs so they "work around" the schema deficiences.
- Structures that are difficult to query
- Structures with poor referential integrity
- Poor performance
Did you ever notice there aren't a lot of jobs for NoSQL DBAs? This is because the NoSQL vendors don't want DBAs. DBAs limit the creativity process. Flexibility to assist in prototyping and experimentation is not a function solely of NoSQL.
SQL is hard to use and is not expressive enough
This is true. But it is getting better. Doing "paging" in SQL 10 years ago required lots of lines of code and performed poorly. Now we have constructs like TOP and LIMIT that are easier to use. And there will be even more improvements to the SQL language that will make this even easier, for instance, we'll likely soon have the ability to use a TOP without needing a CTE first. That's just a guess.
The NATURAL JOIN syntax would be a welcome addition too.
Here are some other things SQL really needs:
- array structures
- macros to remove code duplication efficiently
- performant scalar functions
It is getting better. And I defy you to tell me that Hive or Pig is more expressive than SQL.
And, of course, we always have ORMs to help us avoid hand-crafting SQL for mundane, repetitive tasks.
There are good reasons to use a NoSQL solution. There are also bad reasons. This blog post was an attempt to outline a few egregious reasons why people choose NoSQL solutions. These are all issues of perception and education.