DaveWentzel.com            All Things Data

MD3

The OO Design Principles of MD3

Over the last few posts we've covered how MD3 handles stateful database object deployments.  You may still be wondering why you should care about deploying your database code using metadata instead of hand-crafted DDL.  In this post I'm going to take a detour from the technical aspects of MD3 and instead cover the design principles behind MD3 that are not found in (m)any other database deployment tools.  Even if you decide MD3 isn't for you you may learn a few things that help you become a better data developer.  

I designed MD3 to follow standard Object-oriented design principles...using TSQL.  I haven't always succeeded, but the OO principles in MD3 make it so much more robust and reliable than standard DDL.  Let's cover a few of these principles, starting with the most important first (IMHO).  

"Tell, Don't Ask"  

...is a object-oriented programming principle that helps programmers remember that they should strive to tell their functions what to do rather than ask them what their state is, then ask them to perform some action based on that.  We need more of this in SQL and DDL.  Procedural code, such as DDL, does far too much "asking" for information and then making decisions based on the response.  It would be nice if us database developers could just tell our RDBMS what we wanted and the RDBMS would just "do the needful" and make it so.  

Here's an example of some code that probably looks familiar to you...it builds an index if it doesn't exist:  

This is an example of "Ask, then Tell" which is terrible.  As data developers we write code like this because we need to ensure the index doesn't already exist before we create it.  This is a code pattern that you have to remember and may only use a few times a year.  Variants of this code will be scattered in tons of index creation scripts.  All we really should need to do is "tell" our RDBMS what the index should look like and the RDBMS should be able to figure out how to get us that index in the most efficient manner possible.  I call this "declarative DDL".  This is what MD3 does for you.  This graphic shows how MD3 handles index creation declaratively. Very simple...we tell MD3 what our index should look like and it ensures that after EVERY deployment we get an index called nciFoo with the declared properties.  It doesn't matter if this is a "net new" database, a database updated from a 5 year old version of our application, or a database upgraded from yesterday's build.     

DRY Principle

Index creation DDL with an "if exists check" also violates the DRY Principle (Don't Repeat Yourself).  As mentioned before, you may have variants of that code in lots of sql scripts.  Let's suppose you need to modify all of those scripts because you decide to implement ROW compression?  You may need to change HUNDREDS of scripts.  If you use MD3 you simply modify MD3.CreateIndex to default all indexes to use ROW compression.  Very easy!  Change one script instead of hundreds.  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So why do the ANSI SQL standards not specify a Declarative DDL syntax like this?  Probably because the ANSI standards don't specify concepts like "indexes" and "clustering keys" and "fillfactors".  DDL is, in fact, declarative in that you "declare" things like column names and data types, but the process to implement those changes is still a series of imperative verbs that carry out those actions (ie, CREATE and ALTER).  The ANSI specifications allow the "hows" to be vendor-implementation-specific.  And, of course, SQL and DDL were created before object-oriented programming became de facto.  

So why do none of the major (R)DBMS vendors provide native declarative DDL like MD3?  I have no idea.  They really should.  If you look at tools like Microsoft's Visual Studio database projects you can see that there is an attempt to make things more "tell, don't ask", but it is half-baked at best.  In VS database projects you declare the index using standard CREATE INDEX DDL, like this, which I pulled directly from one of my VS database projects:  

This is certainly much easier than having to write an IF EXISTS statement, but it could be better.  The way VS database projects work is that if you need to ALTER an index you simply change the CREATE INDEX command in the index script.  So, to add an INCLUDE clause to this index is simply a matter of adding that clause to the above command.  When the deployment runs it will run the necessary commands to get you what you requested, whether that is a DROP followed by CREATE, a CREATE WITH DROP_EXISTING, or whatever.  But again, if you need to make bulk changes to all indexes, such as adding a FILLFACTOR or COMPRESSION you can't.  That's why I feel it is half-baked.  

Hopefully one day we'll have full declarative DDL in our RDBMSs.  Even the big NoSQL implementations don't handle DDL any better.   If you want to manipulate an index in Hadoop/Hive you still need to use CREATE/ALTER/DROP syntax.  No improvement there.  Mongo is a bit better.  You don't use DDL at all, instead you call a .ensureIndex() method that will ensure your index "looks" a specific way before it allows your job to run.  There are problems with this from an operational standpoint...such as the fact that this may cause your entire collection to serialize while the index is built...but syntactically this is a great improvement over standard SQL DDL and is getting very close to full declarative DDL.  

The Single Responsibility Principle

The SRP is a restatement of Unix's philosophy of Do One Thing and Do It Well.  I've tried to make MD3 adhere to the SRP as much as possible.  As I've shown already, unlike other deployment tools that make you use all of nothing of their solution, MD3 does not.  I've had customers write their own deployer (covered soon in a post about RunMD3.ps1) to avoid using/learning Powershell.  I've had others using VS database projects to deploy but use the MD3.Create procedures to manage their stateful objects. And every client loves the MD3 Model Data Patterns scripts.    

Most deployment tools, such as RunMD3.ps1 and VS database projects, are single-threaded...meaning they are designed to run only one script at a time, serially.  I had a client that was EXTREMELY sensitive to downtime and wanted to multi-thread their deployments, yet not have to deal with object dependency issues.  Since MD3 is architected as independent services we were able to make the deployment multi-threaded very quickly simply by running the single-threaded RunMD3.ps1 concurrently with itself.  We made the deployment code "multi-threaded" in two days and cut their deployments times by a factor of 4.  

YAGNI

"You ain't gonna need it" is a principle of eXtreme programming that states that you should never add a feature unless it is needed.  I totally agree with this.  For instance, I only added the ability to use index partitioning with MD3 in 2012 after I finally found a use case where partitioning would be the best solution to a problem.  Prior to that I always solved my problems using other solutions that were at least equal to the task.  

I've had co-workers and clients complain that MD3.CreateIndex does not support critical features they think they need.  An oft-cited example is filtered indexes.  My retort is quite simple...I've yet to see a use case where filtered indexes are the best solution.  I'm not saying filtered indexes are useless, I'm merely saying I, professionally, have yet to need them.  Perhaps I'm just lucky and have never worked on a system that truly needed filtered indexes.  Rest assured that the first time I need a filtered index I'll refactor MD3.CreateIndex so it properly supports this feature.   So, if there are features you need either implement them yourself or contact me.  

Separate Interface from Implementation

Some folks call this ISP (Interface Separation Principle).  I prefer to call it "separating the interface from the implementation" because it seems to make more sense to people.  The "Interface" is how you interact with your object, the "implementation" is what your object does under-the-covers.  You should always try to isolate your implementation details from your users.  As we've seen time and again, standard DDL mixing the "interface" (what you want to do) with the "implementation" (how you do it).  When you use the MD3 procedures you are not exposed to the implementation details.  You don't need to worry about what to do if you need to CREATE your object vs ALTER it, nor do you need to worry about dependent objects.  Everything is handled for you.  Yet since MD3 is implemented in TSQL you have the ability to see how the implementation is handled by MD3 and of course you can change that if it doesn't suit your requirements.  The implementation details are hidden, until you need them.  

Convention over Configuration

This is sometimes known as "coding by convention".  CoC strives to keep the number of configuration options a developer/admin has to worry about to a minimum and supply sensible defaults instead.  Certainly having lots of configuration options is cool, but have you ever had a situation where you've changed a bunch of configurations only to find out that your software doesn't work and can't figure out which buttons and knobs you changed that are affecting the behavior?  

MD3 has NO configuration file.  None.  Configuration isn't needed.  When you look at your folder hierarchy you should be able to tell the order that scripts will deploy in based on their naming convention.  All folders and scripts execute alphanumerically.  This is covered further in my post on the MD3 executor, coming up soon.  

I've had clients add various knobs and settings to customize MD3 for their specific needs and what they thought was a better implementation.  That's fine.  But the goal of good software is to not have to worry about configuring the software to have it do what you need.  

So how do you configure MD3 without an XML, ini, or settings table?  You use the MD3 patterns and conventions I've already talked about in these posts and will talk about in future posts.  We don't need a configuration option that allows you to execute .sql and .sqlx files (whatever that might be).  We use the most common option that everyone is familiar with...the convention of using .sql file extensions to indicate this is a SQL file.  Of course you can change this if you need to, that's totally up to you.  

Write Shy Code

This is sometimes called the Law of Demeter.  I hate that name because it is an unnecessarily wordy, confusing, and obtuse way to explain a simple concept.  Do you even know who Demeter was or what she is famous for?  Probably not.  The LoD can be summed up by stating that your objects should be "loosely coupled".  In other words, modules should not reveal anything unnecessary to other modules that don't rely on other modules' implementations.  An example...if you need to change an object's state, get the object to do it for you.  This way your code remains isolated from the other code's implementation and increases the chances that you'll remain orthogonal.  Orthogonal is a fancy way of saying your code will "stay at right angles to other code"...which is another wordy way of saying you won't introduce "side-effects" if you change something.  

We accomplish this in MD3 quite simply...all code has an interface in the form of MD3.Create procedures.  These procedures take a given set of "properties" as arguments and nothing else.  All state changes are hidden behind those interfaces.  If those interfaces change, such as if we decide to add a new "property" to an object (perhaps adding @FilterClause to MD3.CreateIndex) we can rest assured that we do not have to worry about that code having unintended side-effects that may break other modules.  And we can always test this with our CI build loops (Continuous Integration Testing with MD3).  

Summary
This post was a divergence from my regular posts on the technical details of the MD3 system.  Understanding the design principles of MD3 will help you quickly and safely make your system extensible later, if needed.  I'll cover examples of this in a future post (MD3 Extensibility Stories).  The goal of MD3 is to think declaratively (specifying what is to be done, not how) and create a highly dynamic and adaptable deployment tool.  
 

You have just read "The OO Design Principles of MD3" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Script Idempotency and MD3.CreateColumn

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

In the last few posts of my MD3 blog series I've focused on how MD3 handles "stateful" database object deployments.  Today I'm going to cover the most important stateful object in your database...the table.  You really want to make sure you get your table scripts correct otherwise you risk data los.  Basic CREATE TABLE DDL is already really simple to understand so MD3 doesn't get in your way with some cumbersome new syntax that you need to learn.  However, MD3 adds some scaffolding code around the DDL to help you make your deployments bullet proof.  Here's an example:  

MD3 includes a script generator that will put your existing tables into the above format automatically and will be covered in How to Reverse-Engineer Your Database with MD3.  But the script generator will not handle the "Alter" section of the script since this is the stateful history that you must provide.  Sorry, but there's no way around that...no tool will provide this for you.  

First, All MD3 scripts must be idempotent

An idempotent operation is one that can be applied multiple times without changing the result beyone the initial application.  In SQL this is an operation that can be "rerun" without having any unintended side affects.  For instance, this piece of SQL is NOT idempotent:  

UPDATE dbo.Employees SET Salary = Salary * 1.10

Every time you run this code it will give an additional 10% raise to every employee.  

Every MD3 script must be idempotent, or "rerunnable", without "side-effects".  This is because MD3 runs ALL scripts during each deployment.  Some folks struggle with this concept.  Two suggestions:  ensure that you create proper CI testing (see nodetitle) and whenever you write a script simply press F5 twice, then see if the results are what you expected.  Some of my clients call this the "F5 F5 test".  After a few weeks you'll be writing rerunnable scripts without even realizing it.  

MD3.CreateColumn

...will help you add or alter a column based on the properties you provide.  Here are the parameters:  

There are some benefits to using MD3.CreateColumn vs using standard DDL:  

  • removes constraints (unique, foreign key, check, and PK) if the column change requires those objects to be dropped
  • removes indexes from the col, if needed
  • less esoteric DDL to remember

When you drop a column with @DropColumn (instead of using ALTER TABLE...DROP) MD3 will automatically remove the dependent objects and will provide the developer with a message like this to remind them to remove those objects from source control.  

Back to our Example

All MD3 scripts must be rerunnable.  Note that the table script pattern above isn't required with MD3, but I've found this is the best pattern to use.  In this case the script is rerunnable because the table will only be created on the first execution of the script.  On subsequent executions only the PRINT statement will be output.  

Let's assume that you have a new requirement to add a URL column to this table.  Using the above pattern this is how I would do it.  Note that we add the URL code to two places...the CREATE TABLE statement (covers a "net new" database) and as a call to MD3.CreateColumn (covers upgraded databases.  On the second and subsequent executions of this script nothing will happen since the URL column has already been created as declared.  

Interspersing DDL with data migration code

Some shops absolutely insist on separating DDL from data migration code.  This is wrong.  In many cases you must run data migration code either before or after your DDL runs.  In these situations the data migration code is tightly coupled with the DDL and should be stored together.  This aids in documentation and understanding for future developers.  

Let's look at a simple, contrived example.  For some reason you have a requirement to shrink the GroupName column from varchar(500) to varchar(100).  Further for existing data the requirement is simply to retain the first 100 characters.  Here's how you would accomplish that using my MD3 patterns.   First, you would change the definition in the CREATE TABLE statement (omitted from screenshot).  Then you would run the UPDATE statement to retain existing data, then you would make the call to MD3.CreateColumn changing the parameter for @Length.  As a side note, the UPDATE statement must be placed in dynamic sql otherwise the script will not compile on net new databases because the table does not actually exist.  This is goofiness with TSQL, not with MD3.  You can of course add any additional code you may want here.  A backup of the table (or at least the GroupName column) would be a wise idea, just in case.  

Add a non-nullable column to a table

This is another common use case that I see.  This is a three step process:  

  1. Add a nullable column to the table
  2. UPDATE the column to the default value for existing data per your requirements.
  3. Set the column to NOT NULL.  

Here's how we do it with MD3.  If you are sensitive to downtime and this is a large table we can do most of this work while the system is online, if we prepare properly and write the code to accomodate this.  

Here we've added an AccountingCode column and populated existing data with a "default" code, then set the column to NOT NULL.  

Summary

All MD3 scripts must be rerunnable.  This is, frankly, a good habit to get into.  You never know who'll accidentally rerun your script without you knowing so scripts you write should never have "side-effects".  We then looked at the pattern to use for MD3 table scripts.  You don't have to use this pattern, but in my experience this pattern works best.  There are scripts available with MD3 that will reverse-engineer your database tables into this pattern.  

In the next post we'll take a break from the technical aspects of MD3 and look at nodetitle.  When you understand these principles you'll see how a tool like MD3 can solve a lot of problems that DDL can't.  


You have just read "MD3 Script Idempotency and MD3.CreateColumn" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3.CreatePrimaryKey in Action

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered MD3 and "state" which showed how MD3 deploys "stateful database objects" using The Four Rules.  In that post we covered how MD3.CreateIndex handles stateful index deployments.  In The Other MD3.CreateProcedures we covered the other stateful object deployment procedures briefly.  In this post I want to cover some use cases where MD3.CreatePrimaryKey can really help you evolve your database without having to code complex, error-prone DDL.  

In the AdventureWorks sample database there is a table named HumanResources.Department.  The PK is clustered and there is also one non-clustered index called AK_Department_Name.  

After doing some performance testing you determine that the table should really be clustered on Name and the PK should instead be non-clustered.  That seems like an easy request so you start by trying to drop the existing clustered PK first.  

But as you can see, that won't work because there is an FK that references that PK that would need to be dropped first.  Dropping the FKs, in turn, will require you to recreate those FKs later, and any associated indexes, which is a lot of code to write, test, and deploy.  

In MD3 this is much simpler.  In MD3 and "state" I showed how there are really two ways to deploy stateful objects...by calling the MD3 procedures directly or by populating a metadata table with the desired properties.  

Using the metadata table script

This is by far the easiest method.  The metadata table that holds PK information is called MD3.PrimaryKeys.  Here is what the (abbreviated) deployment script looks like for AdventureWorks' primary keys:

Let's a little hard to read so here is just the metadata entries specific for HumanResources.Department:  

Since MD3 is declarative I only need to change the entry for the IsClustered property and I'm done.  Here is the code I changed...the IsClustered property was changed from 1 to 0:

Now let's execute that script just like a standard deployment.  Here is the output:  

This is visible to the developer executing the script on her local development database, as well as the in the logs when this is finally deployed to your production databases.  Since we are modifying a PK that requires the associated FKs to be dropped a message is logged to remind the developer that either the FK MD3 scripts will need to be rerun, or if running a full MD3 deployment this dependency is handled automatically and the FKs will be run.  No custom DDL...very fast.  

For a point of reference, here is FK MD3 script that the developer would run, or modify as necessary.  And of course all of this can be run automatically using the MD3 deployer (RunMD3.ps1:  The MD3 Executor).  

And when you run the FK script you will see this output:  

Note that all FKs already exist EXCEPT the FK that was automatically removed when we altered the PK.  In the post The Other MD3.Create Procedures I mentioned that all FKs will try to first be applied as trusted, and you can see that this is in fact the case.  

Using a direct call to MD3.CreatePrimaryKey

About half of my MD3 clients don't like keeping all of their PK properties in one metadata table (see the screenshots above).  Instead, some like to create their stateful objects using calls to the MD3 procedures directly.  They keep these calls in .sql files segregated by table name.  There is a scripting utilitythat will script your existing stateful database objects into either the metadata format or direct MD3 calls (nodetitle).  

HumanResources.Department is currently configured with its PK as non-clustered.  This is what the direct call would look like.  Note that when I executed this I simply see "Command(s) completed successfully" because, according to MD3 and The Four Rules, the object already exists with the same name and properties, so it silently continues.

Let's say that after a while you determine that your decision to make DepartmentId a non-clustered PK was a bad idea and it really should in fact be a clustered index.  What do we do?  Write a bunch of hand-crafted, error-prone DDL?  Nope, we simply change the @IsClustered flag.   When a developer does this she sees these messages:  

Note that the FKs were automatically dropped and a helpful message tells the developer to update the FK scripts if necessary.  These are the same messages that will be seen during standard deployments as well.  

Summary and Benefits of Using MD3

The goal of this post was to cover how the MD3 scripts in general, and MD3.CreatePrimary in particular, can spare you from writing a lot of complex, error-prone DDL.  It is much faster to make these changes using MD3 and then have the proper CI testing in place (nodetitle) to ensure all valid deployment paths are covered.  

There are certainly lots of GUI-based tools that will also drop and recreate your dependent FKs whenever a PK changes.  However, most of the implementation details are hidden from the developer and DBA.  What if you need a custom deployment mechanism that those tools don't provide?  MD3 can be customized...MD3.CreatePrimaryKey is just TSQL.  In nodetitle I'll give you a bunch of common use cases where MD3 can provide a quick solution where these tools cannot.  Further, since MD3 is just a bunch of TSQL calls these scripts can be properly versioned so you can quickly see the evolution of your database from release to release.  That's much harder to do with a GUI compare tool.  

 

 


You have just read "MD3.CreatePrimaryKey in Action" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

The Other MD3.Create Procedures

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered nodetitle which covered how MD3 deploys "stateful database objects" using The Four Rules.  If you handle The Four Rules correctly (and MD3 can greatly help you with this) then your database deployments should be a breeze.  No more hand-crafted, error-prone DDL.  In this post we are going to cover the other MD3.Create* procedures which help with database deployments and The Four Rules.  

The Other Stateful Database Objects

Here are the other stateful database objects and how MD3 handles their deployment.  In most cases this is handled with an MD3.Create* procedure.  These procedures deploy your stateful objects "smartly".  

Stateful Object MD3's deployment mechanism
Tables/columns Tables are created using standard DDL and can be scripted using SSMS tools.  Columns can be added or altered using MD3.CreateColumn, this will be discussed further in the post MD3 Script Idempotency and MD3.CreateColumn(source code).  
Indexes This was covered in the post MD3 and "state" using MD3.CreateIndex.  Source code.  
Primary Keys We'll cover this below.  Source code for MD3.CreatePrimaryKey.  
Foreign Keys We'll cover this below.  Source code for MD3.CreateForeignKey.  
CHECK constraints We'll cover this below.  Source code for MD3.CreateCheckConstraint.  
DEFAULT constraints Handled with MD3.CreateDefaultConstraint, and covered below.  This may not seem like a "stateful" database object, but it is.  Source Code.  
Model Data (states/provinces, valid order statuses, other "lookup" data) Covered in the post nodetitle

Each of the above referenced MD3.Create procedures handles The Four Rules correctly.  Let's look at these routines.  

MD3.CreatePrimaryKey

Conceptually, primary keys are very similar to indexes.  You can cluster them, give them a name, give them compound keys, put them on a filegroup, etc.  Therefore it shouldn't be surprising that MD3.CreatePrimaryKey takes the exact same parameters as MD3.CreateIndex.  

The difference is what each procedure does under-the-covers.  If you change a PK you may actually have to deal with objects that depend on it.  Here's a list of objects that are dependent on PKs and how MD3 handles PK changes.  You can of course change this functionality to suit your requirements:  

  • If the table is replicated then the table must be removed from any subscribed publications.  Currently MD3 merely throws an error if this situation exists. 
  • Removes all FKs that reference the PK.  MD3.CreatePrimaryKey simply removes them.  Later, when MD3.CreateForeignKey is executed (remember, MD3 handles the dependent ordering of objects for you) the FKs will be recreated properly with the new keys.  
  • If we are changing the PK to be clustered we first drop an existing clustered index that already exists on the table.  As with foreign keys, MD3 handles the dependencies for you...meaning the dropped clustered index can be recreated later as a non-clustered index if required.  

In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where MD3 can save you vast amounts of time and rework when you have to modify a primary key.  Stay tuned.  

MD3.CreateForeignKey

FKs are stateful database objects because they require a properly-applied "parent" key to be in place first.  FKs can also be very expensive to recreate if the associated tables are large.  Here are the parameters (properties) for MD3.CreateForeignKey:

Like the other stateful object creator routines, MDs3.CreateForeignKey properly implements The Four Rules properly.  The parameters should be self-explanatory, except maybe @AllowNoCheck.  Constraints can be either TRUSTED or UNTRUSTED. When you create a FK using ALTER TABLE...WITH CHECK ADD CONSTRAINT you are saying that existing data in the table must adhere to the constraint, as well as any future data.  When you create a FK using ALTER TABLE...WITH NOCHECK ADD CONSTRAINT you are allowing existing data to violate the constraint, it will be enforced for future data only.  This is an untrusted constraint.  There are 2 reasons why untrusted constraints are frowned upon:  

  1. It means you may have data that violates your constraint and you should probably determine why that is, and fix it.  
  2. Some query plans will be less-optimal with an untrusted constraint, leading to possible performance issues.  

MD3 by default does NOT allow untrusted FKs to be created.  However, the @AllowNoCheck allows you to override that, if needed.  MD3 will try first to create the constraint using WITH CHECK and then will fallback to WITH NOCHECK if you have @AllowNoCheck set.  MD3 throws a WARNING in this case that you can then research later.  

MD3.CreateCheckConstraint

You may wonder why CHECK constraints are stateful database objects.  The reason is that recreating a CHECK constraint on a large table can be EXTREMELY time consuming as all data must be read and evaluated.  We also need to take into consideration whethere the constraint is trusted or not, just like with foreign keys.  Here is the set of parameters that MD3.CreateCheckConstraint uses:  

 

MD3.CreateDefaultConstraint

Defaults are also stateful objects only because they are declared very similarly to CHECK constraints and the code is very similar.  

Summary

In nodetitle I covered how MD3 covers stateful database objects, specifically indexes.  In this post I covered how MD3 deploys the other stateful database objects like PKs, FKs, CHECK constraints, and defaults.  In MD3 Script Idempotency and MD3.CreateColumn I'm going to cover what is probably the most important stateful database object...the table.  In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where using MD3 instead of hand-crafted DDL can save you a lot of grief.  


You have just read "The Other MD3.Create Procedures" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and The Four Rules

In the last few posts I covered what MD3 is (Metadata Driven Database Deployments) and why it is better than hand-crafted DDL.  In this post we are goingto cover The Four Rules of Database Deployments.  Regardless of whether you decide to use MD3 or not, if you understand The Four Rules your database deployments will run much smoother.  

 
Some Background
In the last post (nodetitle) we covered a scenario where we had to write really complex DDL if we wanted to deploy an index change properly.  We need an index, called MyIndex, created on dbo.Example with a fill_factor of 90, on column Foo, with included_column Bux.  Here is the DDL:
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem in the post nodetitle).  In MD3 DDL is much simpler.  We just "declare" the properties of our object and let MD3 determine what needs to be done to get us to "model".  Basically, we enter "metadata" about our object into an MD3 table (for instance MD3.Indexes), which looks very similar to the "sys" system tables (in this case sys.indexes).  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  Under the covers MD3 is making calls to MD3.CreateIndex (which you can call instead of populating the MD3.Indexes table) similar to the nearby graphic that is adding the index nciFoo to dbo.Example.  
 
Each parameter to MD3.CreateIndex (and each column in the MD3.Indexes table (see above screenshot) is a metadata property that we can declare for an index.  These properties cover just about every option you can think of regarding an index.  There are some properties missing...for instance, there is no option for filtered indexes because I've never really found a good use case in my career where I absolutely needed a filtered index.  Since MD3 is open source you could always add this functionality if you needed it.  Here are some other index properties that are missing and why:
Index Property Why it is missing?
WHERE <filter predicate> filtered indexes.  You can always add this yourself.  
IGNORE_DUP_KEY I've never found a good use for this option.  
STATISTICS_NORECOMPUTE I've never found a need for this.  
DROP_EXISTING ON|OFF MD3.CreateIndex will smartly determine if this should be set to ON or OFF based on what is being asked.  
ONLINE = ON|OFF MD3.CreateIndex will determine whether this option should be used or not for you, based on whether the index follows the rules for ONLINE index creation.  Here is another case where MD3 is a huge help.  

So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL? 

The Four Rules of Stateful Object Deployment
 
If you always follow The Four Rules when you code your DDL you will always have a successful deployment (no failures!) and you will never do unnecessary DDL. MD3 does all of this for you.  Assume you have some DDL that you need to apply:  
  1. if the object exists by name and by "properties", then silently continue, there is no work to do.  In other words, there was no change in the object between deployments.  The "properties" are the parameters to the MD3.Create procedures (see the screenshot above).  
  2. if an object exists with the same “properties”, but the name is different, then RENAME the object.  Example:  I have an index with the same column list, order, fill factor, on the same filegroup, with the same compression, etc...but it is named differently.  All MD3 does is run sp_rename.  This scenario occurs, for instance, when your DBA adds an emergency index to a table and you check in the same index structure, but you use your naming conventions.  
  3. if the object exists by name, but the "properties" are different, ALTER the object accordingly.  Example, you have an index with the same name and mostly the same properties, but you need to change the INCLUDED col list and FILLFACTOR.  MD3 handles this without esoteric, hand-crafted DDL. Usually an ALTER can bring the current object up to the desired properties, otherwise DROP/CREATE is used.  The most important thing to remember is that YOU don't have to code the DDL and worry about covering all possible scenarios.  
  4. Else, CREATE object with properties as defined.  At this point MD3 realizes this is a missing object and creates it according to your declarations.  
If you don't do The Four Rules correctly you risk, at a minimum, recreating expensive objects at every deployment.  You don't need MD3 to do a DDL deployment, but it makes it a lot easier and less error-prone.  
 
An Example
Let's take our example code again and determine what MD3 will do in various situations regarding The Four Rules.  The below table is kinda like the "unit tests" for MD3.CreateIndex (tsqlt unit tests available on request).  
 
Database State Expected Result Why? Covers which of The Four Rules
dbo.Example table does not exist FAILURE This scenario will be caught during continuous integration testing and will never happen during a production deployment.  (See nodetitle) None.  
index nciFoo does not exist Index created The index will be created with the declared properties.   4
An index exists called dbaFoo with the same properties Index renamed The index will be renamed using sp_rename to nciFoo.  This ensures that a duplicate index is not created with the same properties.   2
nciFoo exists, but with different "properties" than what is declared Index altered When possible the index will be ALTERed.  Otherwise the existing index will be dropped and the desired index will be created.  In some cases, depending on what is being changed, an index cannot be ALTERed.   3
nciFoo exists with identical properties Silently continue There is no work to do.   1
 
Summary
In this post we looked at The Four Rules of stateful database object deployments.  If you follow The Four Rules when you deploy your objects you will find that you have fewer errors and more reliable, repeatable deployments.  You don't have to use MD3 to use The Four Rules.  You can use any tool, even plain 'ol DDL.  MD3 just makes everything a whole lot easier.  
 
In the next post, nodetitle, we'll cover how The Four Rules are applied to other stateful database objects.  In a later post we'll cover how MD3's model data patterns work with The Four Rules to ensure your model/system/lookup data deployments work just like your other stateful object deployments.  
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and "state"

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last post (Why do we need another database deployment tool like MD3?) I covered some reasons why database deployments give DBAs and developers fits.  In a nutshell, it's because we write too much hand-crafted DDL..  Even when we use third-party GUI tools to manage our deployments we still hit edge cases where the migration isn't smooth.  So why are schema changes so difficult?  

Database Object State and Builds
 
Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

When you "build" your Java app (...or C# or whatever...) you are really just compiling the code on your machine.  If compilation is successful then you can rest assured that it will compile on, basically, any machine (OS and processor architecture might be exceptions).  The deployment is simply a matter of copying your compiled code.  In the .NET world you usually build a Windows Installer package or perform an "xcopy deployment".  Build and deploy is really easy, as is building the necessary "CI loops" (...they run your unit tests...) to ensure your builds aren't broken when a developer checks in a change.  

 
Databases work differently.  Just because your scripts compile and deploy on your machine/database does not mean they will deploy anywhere else.
 
Why?
 
Java code, when it compiles, couldn't care less about the "state" of the previous build.  The code is “stateless”.  Some database code, on the other hand, is “stateless” while some is “stateful”.  Database developers need to worry about “history”, or "state", which is the existing data in the tables that must be maintained into the next release.  The code must do more than just compile, it must be respectful of existing data.  And data isn't the only stateful database object.  You don't want to recreate a bunch of indexes needlessly either.
 
Stateless Objects Stateful Objects
Views Tables/columns
Procedures Indexes
Functions Constraints (PKs/FKs/CHECK)
Model Data (sometimes called seed/lookup/master data).  An example is a list of states and abbreviations. This data rarely changes and when it does it changes via an approved process (ie, a Master Data Management update) Model Data (if the data can be altered by the customer.  An example is order status codes.  If you allow your customer to change (or add to) their model data then you don't want to "reset" their model data with your updates...or...maybe you do depending on the scenario.  
 
Stateless Object Patterns
Stateless db objects are the easiest to deploy...you simply compile them.  You do not care about what the previous version of the object looks like, you simply create the new version, overwriting the old.  Just like you would overwrite the last Java build with the current one.  Traditionally you code stateless objects using a pattern similar to the graphic to the right.  
 
Since the object, in this case a stored procedure, is stateless we can simply drop it and re-create it.  When doing that we lose our permissions so we must re-apply them.  It would be nice if we had the CREATE or REPLACE pattern for Transact SQL like Oracle does.  CREATE OR REPLACE is simplified syntax that obviates the need for re-applying permissions and checking if the object exists.  Oh well...maybe in the next replace of SQL Server.  
 
Generally you should deploy your stateless objects near the end of your deployment.  There are some exceptions to this rule:  
  1. You may have object dependencies which require certain stateless objects to deployed in a specific order.  For instance, if viewA references functionB which in turn references viewC then you have to ensure those objects are applied in a certain order.  
  2. if you have SCHEMABINDING applied to any of your objects and you try to first manipulate a stateful object (such as a table) you'll need to drop the object with the SCHEMABINDING clause first.  

MD3 handles all of this for you and in the demo files I give examples of this and how it is very simple to handle these edge cases. 

Stateful Objects and Patterns
As mentioned before, Java has no "stateful" code...if the code compiles you are good...if it doesn't compile you have problems.  In the database world it is the stateful code that causes database developers fits.  
 
An example of "stateful" database code that can be maddening is index DDL.  Let's see why that is.  You can download the example code here.   We start by creating a very simple table with a few columns.  Most developers just create their tables and then either worry about the indexes later or let the DBAs worry about it when performance becomes a problem.  At a later time the developes realize that they need to add an index to the Foo column.  If you are using a tool like Visual Studio db projects then you can use very simple CREATE INDEX syntax.  Otherwise you need to do an "existence check" to ensure that a subsequent execution of your script won't generate an "object already exists" error.  
 
 
The "existence check" syntax isn't terribly difficult to handle...yet.  Now let's say you notice a lot of page splitting and realize that your table design and data access patterns indicate that you need to change the fillfactor to 90.  Quick...do you remember where the fillfactor settings are stored in the metadata tables so you can modify the "existence check"?  You could just code up something quickly like this:  
 
 
The problem is that you will drop and recreate the index at every deployment.  Again, a tool like VS db projects will obviate the need for this.  Better "existence checking" code would be something like this:    
 
 
Our code is getting a little more "advanced" and much more "hand-crafted" than I am comfortable with.  
 
Here's a common scenario...once your index is released to production your DBAs notice that you have a bunch of Bookmark Lookups causing customers to report application slowness and they believe that they should add a couple of included columns to your index.  The DBAs are nice enough to tell you that they've done this and that you need to make this new index "model code".  You determine this is the new index creation code you need: 
 
 
Yikes!  That's getting to be really complicated code.  
 
Here's a slightly different example.  Your DBA calls you and tell you that he saw a gigantic "blocking storm" and needed to add an index on the Bar column.  You agree with his analysis and you both decide to make this a model index. The first block of code is what the DBA added (note the index name doesn't follow your naming conventions) and the second block is the code you made "model".  Do you see the problem?  
 
 
You really should've renamed his index, if his index exists.  Now you risk a duplicate index during a deployment.  
 
A Better Way to Handle Stateful Objects
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem later).  In MD3 DDL is much simpler.  Wouldn't it be nice if you could just enter the "properties" of the desired object in a metadata table and SQL Server just did what it needed to do to make that object for you?  MD3 does that for you.  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  
 

Now let's say you want to alter that index to change your compression settings from NONE to ROW...just update the entry in MD3.Indexes and rerun MD3. Need to add an Included column?  Change the NULL above to the name of the column you want to include and rerun MD3.  Maybe the index is no longer needed...just change the DropIndex column to 1.  
Under-the-covers MD3 is simply cursor-ing over the metadata table (MD3.Indexes) and determining if there is any work to do (more on how this is done in the next section).  It does this by calling the MD3 stored procedures that do the actual work...for instance, EXEC MD3.CreateIndex.  You can call these procedures and use them yourself without using the MD3.Indexes metadata table.  Here's an example call that creates nciFoo.  Let's say you decide to change this index to have a FillFactor of 90...how do you think you would do that?  That's right, change the parameter and run the procedure.  Want to change the filegroup (or partition)...take a guess at the process?  No complex hand-crafted DDL is required.  
 
So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL?  MD3 uses The Four Rules.  We'll cover nodetitle in the next post.
 
 
 
 
 
 
 
 
 
 
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Why do we need another database deployment tool like MD3?

This is the next post in my MD3 series (Metadata Driven Database Deployment).  MD3 is my build-and-deploy system that I've evolved over the past 15 years. Quick history...I worked at an ISV and we supported 4 versions of our software (Version X back to Version X-3) and we needed to support upgrading any of those versions to "current". We had about 40 customers with the average db size ~40GB. To make matters even more complicated we supported either SQL Server or Oracle.  Both had the same schema...mostly.

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So we needed a tool to upgrade a database reliably without having non-database developers writing complex, error-prone, hand-crafted DDL. Most of our data developers knew Oracle or SQL Server, but few knew both.  I learned a lot about what works and what doesn't when upgrading a database in a complex data environment like this. I created a database CI loop (also included with MD3) before CI loops were sexy and ubiquitous. Every day we tested our db scripts against real customer databases and ensured everything upgraded and functioned properly. We never worried about the actual customer production upgrade event because our CI loops already upgraded the customer hundreds of times (albeit on older, stripped, and scrubbed databases).  

I've deployed versions of this tool at many different employers and clients. Each has been radically successful.  

  • The developers love how easy it is to make schema changes without hand-crafting DDL.  It is totally flexible and customizable to whatever you need.
  • DBAs and app admins love it because it is scriptable, has customizable logging, and show the SQL that is being executed.  Nothing is hidden from the DBA.   
  • Management likes the fact that it requires zero training and creates a repeatable, reliable process.  

The latest incarnation, which I haven't yet open-sourced, is a version that will do a near Zero Downtime database upgrade. Contact me if you might be interested in this.  

Why do we need another database deployment tool?

There are lots of database deployment tools out there.  They each work great...mostly.  But development teams and DBAs don't trust them when it comes to complex schema changes.  Don't believe me?  Assume the following scenario...you need to store new data in your database due to a change in business requirements.  There are lots of ways you could do this but you honestly believe the best way is to refactor and split the largest table in your database into 3 new tables.  
 
How likely are you to propose this solution to your team?  My guess is not very likely.  The reasons are simple: 
  • Modifying a table's schema requires you to "port" the old database structure to the new one.  And also the data.  This is fraught with danger.  Perhaps each version of the table has also experienced schema changes which you also need to take into consideration.  You have to then change keys (both primary and foreign) and clustering as well as all of the non-clustered indexes.  What if each customer/version has a different set of indexes and keys?  
  • Then you need to modify each database object that accesses the table.  All of your stored procs, functions, and views.  Then you need to modify your Java (or C# or whatever).  All of this scares the average database developer.  
  • Since this is a large table how can we guarantee this won't cause our customer excessive downtime while we migrate the data?  How do we verify the customers' data afterwards?  
NoSQL:  "RDBMS schemas are too hard to change"
 
Altering schemas scares developers so much that an entire cottage industry grew around the fact that relational databases are too darn hard to "evolve" with ever-changing business rules.  One of the primary reasons the NoSQL movement (whenever I hear this I think, "technology or bowel?") grew, especially document and keystores is because they touted the fact that their technologies allowed you to modify your schemas without needing to worry about modifying all of that data access code that scares us all.  And there was no need to "port" the data from old-to-new schema.  
 
But relational schemas really aren't that difficult to evolve.  You really don't need to write a bunch of complex hand-crafted DDL.  And you don't need a third-party tool either where you hit edge cases where your data won't migrate to the new schema properly.  So why are schema changes so difficult?  
 
Just Enough to Whet Your Appetite
MD3 deployments are "declarative", meaning you simply declare what you want your object to look like and MD3 figures out the best DDL commands to run (and you can review them) to get your schema to that destination.  Here is a simple example.  Assume we have a table named dbo.Example and we decide a non-clustered index on Foo would be beneficial to performance.  See the nearby graphic to see how easy this is with MD3.  (Actually, the command is even easier than that, and I'll show you that in the next post.)  But MD3 gets even better.  Let's say a DBA already noticed a performance problem and indexed Foo already but named the index dbaFooPerfProblem.  In that case the index will be renamed.  You won't get a duplicate index.  Let's say later you decide to make nciFoo into a compound index with Foo,Bar as the keys.  Simply changing Line 21 to 'Foo,Bar' will automatically change the existing nciFoo, if it exists, or build a new nciFoo with the declared properties, if it doesn't exist.  
 
In the next post I'll cover MD3 and "state", which is the key to a good database deployment routine.  
 

You have just read "Why do we need another database deployment tool like MD3?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Introduction to Metadata Driven Database Deployments

MD3 (Metadata Driven Database Deployment) is my build-and-deploy system that I've evolved over the past 15 years. Deploying database code tends to be challenging for most organizations, especially ISVs that may support multiple versions of their software at many customers.  MD3 handles this all very elegantly.  Over the next few posts I'll cover why I think MD3 is the best choice for your database deployments.  

Table of Contents

Page Summary
nodetitle This page.  
Why do we need another database deployment tool like MD3? Covers some scenarios where traditional database deployment tools fail and how MD3 makes hand-crafted DDL a thing of the past.  
MD3 and "state" In this post we cover stateful vs stateless database objects.  It is the stateful objects (those with "history" that we need to consider) that cause our database deployments grief.  We'll cover how MD3 handles this problem.  
nodetitle The Four Rules will ensure you always deploy your stateful database objects correctly.  You don't need MD3 to use The Four Rules, but it helps.  
nodetitle Covers the other "stateful" database objects and how MD3 handles their deployment.  

MD3.CreatePrimaryKey in Action

Shows some use cases with primary keys where MD3 can really save you a lot of time and rewrote.  We also further demonstrate how The Four Rules work.  
nodetitle We cover the last MD3.Create procedure in this post and cover the one aspect of MD3 scripts that is mandatory...scripts must be idempotent, which is just a fancy way of saying that all scripts must be rerunnable.
nodetitle There are certain object-oriented design principles I incorporated into MD3.  These principles are what sets MD3 apart from other database deployment tools.  
nodetitle Model data is the data, usually static, that you deliver with your application.  MD3 deploys this data in much the same way as other stateful objects are deployed.  We cover how this is done in this post.  
RunMD3.ps1: The MD3 Executor RunMD3.ps1 is a short Powershell script that simply executes each .sql file in each subfolder in your deployment folder.  In this post I'll cover how it does what it does, and why.  
MD3 Deployments in Action In this post we put everything together and actually show how deployments work with MD3.  
nodetitle The hardest part of any new deployment tool is getting your database migrated into the new structure.  This is actually easy with MD3.  I'll demo how to reverse-engineer AdventureWorks in the MD3 format, which takes about a couple of hours to do.  
nodetitle  CI Loops are a definite development best practice.  Yet few shops do this with their database code.  And if they do CI Loops they may not be doing them correctly.  In this post I'll share some secrets to doing really good CI testing with your database code and then I'll show you how to make it really great with MD3.  
MD3: Why you should care about declarative database deployments If you still don't see the value of declarative/properties/metadata driven database deployments I'll give you some additonal arguments in this post.  
nodetitle I've deployed MD3 at numerous employers and clients and it has always been a success.  In this post I'll recount some stories about how I saved tons of man-hours refactoring database schemas using MD3.  
What are the benefits of using MD3 over my existing solution? You don't have to use the entirely of MD3 to get some of its benefits.  You can pull out just pieces that you feel may help you.  This is a summary post that reviews all of benefits of the tool.  

 

Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Presentation on Metadata Driven Database Deployments is tonight

As a reminder, I am giving a presentation on metadata driven database deployments using my tool at 5:30 tonight at Microsoft's Malvern office.  The presentation and source code can be downloaded from CodePlex.  

See you there.

--dave

Presenting at the December meeting of PSSUG

I am presenting at the December meeting of PSSUG.   December The topic is "Metadata Driven Database Deployments."  I have a mechanism I have been using for about 13 years where we can deploy a new database or upgrade an existing database in a fast, reliable, repeatable manner.  And the best part is, you don't need to know esoteric DDL commands.  Certainly you can use things like "scripting wizards" and "SQL compare" tools to do this but I'll show you some benefits that having a custom DDL deployer can handle that other tools cannot.   

Feel free to come out and learn about my deployment techniques.  Or just come out to heckle me.

Pages

Subscribe to RSS - MD3