DaveWentzel.com            All Things Data

Dave Wentzel's blog

How to Reverse-Engineer Your Database with MD3

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server databases.  In the last couple of posts we covered what problems MD3 solves and how it solves them.  In this post in my MD3 blog series I'm going to cover how to reverse-engineer your database using MD3's native capabilities.  

One of the biggest hurdles when migrating to a new build/deploy system, VCS, or data development tool is migrating all of your existing data and code into the new system.  This is actually very simple to do in MD3.  But first remember that MD3 is modular...if you don't want to use the MD3.Create* procedures then you don't need to run the migration scripts for those objects.  You only need to reverse-engineer those objects that you want to use with MD3.  The full reverse-engineering documentation is 32 lines of instructions...that's how easy MD3 is to get up and running.  Here's the process:  

  1. Find a pristine copy of your database to reverse-engineer.  If you work in a shop with one production database then this is simple...script that database.  But if you are an ISV (which MD3 is geared towards) then finding a pristine copy of a database to reverse-engineer can be difficult.  Especially if your database was not properly versioned in the past.  I can help you with this process if you contact me.  
  2. Deploy the MD3 objects into a copy of that database.  Basically just run MD3.ps1 with a clean copy of MD3.  This will deploy everything you need and will not introduce any risk into your existing database.  
  3. Run the MD3.Repopulate* procedures for each object type you want to migrate into MD3.  
  4. Run the MD3.Script* procedures which actually generate the .sql files with the MD3 calls using simple PRINT statements.  
  5. Check your code into source control and begin working on your CI Loops.  
  6. If something isn't working correctly you can simply re-script those items as many times as you need to, or you can manually update the metadata yourself.  

It's that simple.  

Let's look at what the Repopulate and Script procedures are doing under-the-covers.  

Deep Dive - MD3.Repopulate Procedures 

As mentioned in other posts in this series, stateful object properties are stored in tables that look very much like the underlying catalog views where SQL Server maintains its own running metadata (for instance, sys.indexes).  The MD3.Repopulate procedures rebuild MD3's version of these metadata tables.  Here's the code that handles index repopulation:

The bulk of the code starts on Line 25 and merely repopulates MD3.Indexes with metadata from your database.  Notice how succinct the code is...database deployments are simple when you think through the actual requirements and understand The Four Rules.  

Line 21 might be confusing...we blow away what was last scripted...except cases where DropIndex was set.  Why do we do maintain DropIndex data? Remember that indexes are stateful data and we must maintain their history.  MD3 will script our current database, but it knows nothing about the history that existed in prior versions of your database.  So, for instance, perhaps you had some old indexes in previous releases that you determined served no real purpose (ie, all activity was writes with no reads) and you dropped them.  Now you script your current database into MD3 but you want to use MD3 to drop those old indexes from older customer databases.  By manually adding the DropIndex metadata (just the index name) you can be sure that any databases that still have those indexes will have them properly dropped.  

Essentially you are telling MD3 what the history of your database really was.  No database reverse-engineering tool on the market can handle recreating history for dropped objects, and neither can MD3.  You must supply this history, if you need it.  

By using this method you can re-script your database numerous times while you test your MD3 migration with CI Build Loops without interrupting your current developers who are still actively using your previous deployment method.  

All of the MD3.Repopulate procedures work similarly to MD3.RepopulateIndexes.  Here is an example demonstrating how CHECK CONSTRAINT repopulation works:  

 

Deep Dive - MD3.Script Procedures 

Once you have the metadata scripted to the MD3 metadata tables you need a method to get that metadata into .sql files so you can properly place it under source control.  To do this we have stored procedures that use PRINT statements to "generate SQL from SQL".  You simply save the output to your indexes folder.  The output is a MD3-formatted script that follows the ModelData Pattern discussed earlier.  The output will look something like this (this is from Adventureworks):  

The screenshot shows the last index in AdventureWorks that was scripted (Line 110).  Note that after we populate MD3.Indexes with simple INSERT statements during a deployment we immediately apply the indexes on Line 113 by calling MD3.RunIndexes, which does all of the heavy lifting.  Very simple. 

Putting It All Together...Reverse-Engineering AdventureWorks

I have included a copy of the reverse-engineered AdventureWorks2012 SQL scripts in MD3 format with the source code (click here for the code).  The scripts include all necessary code to reproduce a "net new" (shell database without the data) copy of AdventureWorks2012.  And this only takes about 10 minutes to reverse-engineer on your own.  


You have just read "How to Reverse-Engineer Your Database with MD3" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Deployments in Action

Over the last few posts we've covered how MD3 handles stateful database object deployments.  We've covered all of the MD3.Create procedures and how they help you avoid writing error-prone, hand-crafted DDL.  We then looked at RunMD3.ps1 which is the MD3 executor script.  In this post we are going to look at various outputs you can expect from RunMD3.ps1.  

Why should we build and deploy EVERY database script EVERY time, even if the object hasn't changed?  

In MD3 Script Idempotency and MD3.CreateColumn we looked at why all sql scripts in MD3 must be "rerunnable", meaning that a script cannot have "side-effects" if it is subsequently executed multiple times.  With MD3 every SQL script is run during every deployment.  Folks that are new to MD3 question this approach.  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

Just suspend disbelief and you'll see the benefits of doing this.  

  • It’s the safest thing to do…you always deploy what you tested and what you expect.  You never know who altered a stored proc on a database without going through the proper build/deploy mechanism.  Simply redeploying all objects ensures that your database is exactly how you want it.  Developers and QA folks can manipulate their environments and can later "reset" them by simply getting the latest code and deploying it.  
  • It only takes a few milliseconds to check metadata “properties” for an object and determine there is no net change. Even with thousands of database objects that haven't changed in years you will only spend at most a few minutes checking the metadata.  Trust me.  I've run MD3 scripts against HUGE databases where no objects have changed since the last deployment...this still only takes a few minutes.  
  • We are not locked into build-by-build deployments where we run only the scripts that we think have changed. We can upgrade any database to “current” without having to worry about the starting point.  If you have customers who are behind by several releases you can rest assured that they will upgrade and have identical schemas to your newest customer.  
  • Frankly, this is how every other language does build/deploy.  This is an industry standard best practice that encourages continuous integration.  SQL should be no different.  MD3 enables this.  

Running RunMD3.ps1

There are lots of ways to run PoSh scripts.  I like to open a command prompt and start powershell.  You then cd to the root directory where your MD3 deployment scripts reside.  Then you simply call the deployment executor using PoSh dot notation, like this:  .\RunMD3.ps1.  There are two parameters -ServerName and -DatabaseName.  

Prerequisites

  • You can run the PoSh script from any machine with the sql tools installed.  There are no special powershell requirements.  
  • If this is the first time you are running a PoSh script on the given machine then you must run set-ExecutionPolicy Unrestricted first.  This allows you to run PoSh scripts.  This isn't my requirement, rather a requirement of powershell itself.  
  • The specified database must exist.  This is the default behavior.  You can of course issue the command to create your database first, if it doesn't exist, by placing the relevant SQL in 00Pre-install\00CreateDatabase.sql.  In fact, this is a good idea if you are going to create CI Build loops (see Continuous Integration Testing with MD3).  But by default, for safety reasons, MD3 requires the db to exist.  If it doesn't you'll see an error similar to this:  

Errors

Whenever an error is thrown with a Severity >= 16 the deployment will stop and an error message will be displayed.  This works EXACTLY like SSMS.  

In this screenshot we can see that the failure occured in 3Tables\HumanResources.Employee.sql because the dbo.Flag user-defined data type does not exist.  By using start-transcript, or by piping RunMD3.ps1 output, you can capture the execution log seen above to a text file as well.  Execution logs are also saved to a table named MD3.Log.  This table is truncated at the start of every deployment, so if you wish to archive the data from every deployment you must either alter the MD3 system or squirrel the table away to another location.  

The following graphic shows the contents of MD3.Log after an error was thrown:  

In this case we can see the error that was thrown.  The constraint could not be created.  The actual SQL statement that errored, that was generated by the MD3 system, based on the properties passed to MD3.CreatePrimaryKey (in the example), is always logged just before the error.  This will help you determine what went wrong.  These errors are extremely rare and should always be caught during continuous integration testing.   

Here is what you can expect to see when MD3 succeeds:  

Warnings

Any WARNINGs will be displayed just before the success message, or can be queried from MD3.Log by looking at WHERE Severity = 'W'.  Out-of-the-box the only WARNINGs MD3 will throw are untrusted constraint violations (if you allow them).  You can code your own by tactically adding calls to MD3.InsertLogMessage with the @Severity = 'W' option.  

Summary

In this post we showed how to run the MD3 executor PoSh script, RunMD3.ps1.  This utility executes any .sql file it finds in its folder or any subfolder.  If a .sql script throws any errors those are bubbled up and logged to MD3.Log and are shown to the user running the PoSh script.  In the next post we'll look at nodetitle.  


You have just read "MD3 Deployments in Action" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

RunMD3.ps1: The MD3 Executor

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server databases.  In the last couple of posts of this blog series I covered what MD3 and why you should care about metadata-driven, declarative database deployments.  In this post I'm going to cover how MD3, by default, executes its scripts.

RunMD3.ps1

First, there is no requirement to use the script executor, called RunMD3.ps1.  You can always use your own sql script deployment tool, such as Visual Studio db projects.  Or you can write your own.  I've written script deployment tools using batch files, osql, and vbscript.  Do whatever works for you.  For the remainder of this post I'll assume you are using RunMD3.ps1. Before we cover what this script does let's cover how we should deploy sql objects.  

Database Deployment Best Practices

Here are some best practices for any database deployment tool:  

  1. deploy ALL objects with EVERY deployment.  This aids in CI Loops (nodetitle) where we should compare upgraded databases to "net new" (shell) databases.  
  2. Need to order the object deployments such that dependency errors are not thrown.  For instance, we should apply tables before their indexes.  
  3. A "status" should be available that tells the release engineer/DBA exactly what is currently being deployed.  This helps if a given script, for some reason, decides to execute for longer than it should.  
  4. Errors should be easy to recognize and should be logged.  The errors should not be caught by the tool and obfuscated into the tool's error message.  The actual SQL error message should be returned.  

How MD3 handles these requirements

You can of course customize this process to suit YOUR needs.  

  • RunMD3.ps1 only executes .sql files.  This means you can intersperse unit test files (such as tsqlt scripts) or documentation files with your .sql files.  Simply give them a different file extension.  
  • Files and folders are run alphabetically.  This means that you can simply look at your sql script folder structure and determine the order that scripts will execute.  See the folder structure I use in the nearby graphic.  Note that we deploy the MD3 system first (1MD3Objects) before we build any "user" objects like tables (3Tables).
    • Sometimes (very rarely) you'll get dependency errors and this execution method won't work.  For instance, you may have a view (4Views folder) called Bar (deployed in 4Views\Bar.sql) that calls another view called Foo (in 4Views\Foo.sql).  This will fail during your CI testing.  No problem...two possible solutions:  
      • rename Bar.sql to 00Bar.sql and create a new Bar.sql with a comment that states the file has moved due to an ordering dependency.  
      • use subfolders
  • Within a given folder, the .sql scripts are run PRIOR to any subfolders.  For instance, in the 1MD3Objects folder you'll see the objects and subfolders in the nearby graphic.  
    • In this case CreateMD3Schema.sql will run BEFORE any .sql files that are in the Functions folder.  All MD3 objects are deployed to the MD3 schema, hence why CreateMD3Schema.sql is run first.  Otherwise, you'll get a dependency error (mentioned above).  
    • This convention allows you to structure your folders into "software modules" folders.  For instance, you may have thousands of stored procedures under 7StoredProcedures.  You may want to organize those into subfolders like Accounting, HR, ETL, etc.  
  • You can use whatever folder structure you like, and nest as deep as you like.  
  • If you wish to skip a folder from execution, preface the folder with "_".  For instance, notice the _Documentation in the screenshot above.  That folder contains example .sql files.  Those will NOT be executed.  
  • If you wish to skip a file from execution either "zero out" (blank) the file or rename it to something other than .sql.
  • Any valid TSQL will be executed. You don't need any special formatting or syntax.  

If you use MD3's suggested folder hierarchy then ordering dependencies are minimized.  

"Special" Folders and Files

These items break with the "convention over configuration" paradigm that I've designed with MD3 (discussed in The OO Design Principles of MD3).  

  • PostApply folder:  anything placed in this folder will run at the end of an MD3 deployment.  This is useful for backup scripts, notification scripts, scripts to restart replication, SQLAgent jobs, etc.  
  • .\00Pre-install\00CreateDatabase.sql ALWAYS runs FIRST.  This script will CREATE DATABASE if needed.  This is useful for CI Build Loops (discussed in the post Continuous Integration Testing with MD3).  You can also put backups scripts here or anything else that needs to run at the start of a deployment.  

Errors, Logging, and Rerunnability

Since every script is (re-)run with each deployment you have to write your code to ensure it is idempotent/rerunnable.   (MD3 Script Idempotency).  This is easy to test with proper MD3 Continuous Integration Tests.

When an ERROR occurs (Severity >= 11) in one of your scripts then MD3 aborts and displays and logs an error. Simply fix the error and redeploy.  No need to restore the database first due to a failed deployment.  

Back to RunMD3.ps1

Now that I've covered what RunMD3.ps1 does and why, let's cover how it does it.  At its core it is a little Powershell script that recursively calls all folders and sql scripts starting in the root folder from where it is called.  It is not written in something like Python or Perl which many database professionals may not understand. Other than this little script, MD3 is written entirely in TSQL.  (I have an Oracle version as well, also entirely written in PL/SQL.  Contact me for details).  

It requires 2 parameters, the $ServerName and $DatabaseName of the database to be installed/upgraded.   The source code is available here.  

It uses NT authentication.  It can be run remotely, but the machine executing RunMD3.ps1 must have the sql client installed.  The entire script is about 200 lines of Powershell.  The bulk of the code is:  

Line 181 runs  .\00Pre-install\00CreateDatabase.sql.  This is discussed above.  If you don't need this functionality simply remove this line of code.  However, this code really helps with CI Testing by building your shell database.  Line 186 calls each .sql script in every subfolder starting with the folder where RunMD3.ps1 lives.  

That's it.  

In the next post we'll take a look at MD3 Deployments in Action.  


You have just read "MD3.ps1: The MD3 Executor" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Model Data Patterns

Metadata Driven Database Deployments (MD3) is my build-and-deploy system for SQL Server database applications.  In the last couple of posts we covered what problems MD3 solves and how it solves them, specifically involving stateful database objects.  In this post I'm going to cover how MD3 handles deploying "model data".  I'm not aware of ANY database deployment tool by ANY vendor that has the ability to deploy model data in a "stateful" way.  In fact, I have a few MD3 users who started their journey to properties-based, metadata-driven evolutionary databases by first implementing MD3's model data deployers.  

What is Model Data?

You may be familiar with model data by a different name.  I've had clients that have used these terms as well:  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)
  • lookup data
  • allowable values
  • system data
  • seed data
  • master data

I'll use the term "model data" for the remainder of this post, but the terms are interchangeable. A quick example of model data is any data that YOU deliver with your database.  For instance, you may have a table named dbo.StateProvince that holds a row for each state or province in the United States.  This data rarely, if ever, changs.  

Model data may:  

  • never change
  • rarely change
  • only be modifiable by the customer
  • only be modifiable by YOU (the vendor)
  • be modified by either YOU or the customer.  

Classes of Model Data

Not all model data is created equal.  There are classes of model data based on who may modify the data.  I've invented these terms, they aren't industry standards, but I think by describing model data in terms of its "type" it is easier to understand its requirements for deployment.  

Type Model Data Type Description Example Vendor-modifiable Customer-modifiable MD3 implementation
Type1 Vendor-supplied model data model data may not be changed by the customer.  It's likely this data rarely changes, but if it does it is the vendor's responsibility to migrate the database "state".   dbo.StateProvince data is not allowed to be changed by the customer. x   fully supported
Type2 customer-only modifiable model data this model data is "seeded" by the vendor but after that is modifiable only by the customer dbo.StateProvince data initially delivered to the customer, but after that the customer maintains the data   x fully supported
Type3 fully-modifiable model data data is "seeded" by the vendor and can be modified later by either party dbo.CustomerType is seeded with "Prospect", "Active", "Archived" by the vendor.  Customer may add or change these, and based on the business requirements the customer changes are either honored or discarded by later vendor data changes x x MD3 provides simple templates but the implementation is based on YOUR business requirements

Often your model data is not as black-and-white as the types I've listed above.  It is common for some metadata to be a combination of types 2 and 3.  That's OK, MD3 can still handle it.  

I've implemented MD3 model data patterns for several different ISVs (Independent Software Vendors).  In every case determining the model data type for each piece of model data was the most difficult part of the migration.  Software vendors just don't understand their data I guess.   Don't be surprised when model data becomes the Achilles' Heal of your MD3 migration.  But don't let that scare you.  In the process of getting your model data correct you'll learn A LOT about your database and application.  

Scripting your Model Data

Let's look at how MD3 handles model data.  We'll use Sales.Currency as our example model data.  Out-of-the-box this table contains 105 currency rows.  

Let's start by scripting this model data out using MD3's scripting tools.  We'll cover how to reverse-engineer your entire database in the post nodetitle, but for now let's start our look at model data by reverse-engineering Sales.Currency.  

Scripting model data is as simple as running a stored procedure, MD3.ScriptModelData  (code).  Here is the list of all scripting options.  

This procedure simply generates output that can be saved to a .sql file that will contain all of the commands necessary to generate a model data script for the given table that you can then modify as needed for the various "Types" of model data (discussed above).  There are lots of options you can call this procedure with that will alter the output to fit your needs.  You can regenerate your model data as many times as you like until you get it in the correct format to meet your requirements.  Here is the basic command to script Sales.Currency.  Before you execute the script press Ctl + T to ensure your output is in text mode.  

Simply save the output wherever you want to store your model data, preferably under version control.  If you follow the out-of-the-box MD3 patterns then this would be saved to 8ModelData\Sales.Currency_modeldata.sql.  

The Model Data Pattern

Now that the data is scripted, let's look at the model data pattern in detail. The graphic above shows the start of every model data script.  The first thing we do is temporarily disable all FK constraints on the tables associated with this model data table.  We do this so we can load and manipulate the model data without having to execute the scripts in dependency order.  Next we build a temp table that is schematically identical with the underlying table.  The WHERE 1 = 0 clause quickly gives us the structure without inserting any data.  Next we populate the temp table using the data in the table as it existed when it was scripted.  Every row will be scripted.  

The graphic below shows the code after we build the temp table with the scripted model data.  Note that there is an "insert section", an "update section", and a "delete section".  These sections simply compare what is in the temp table vs what is in the actual table.  If a row is found in the temp table that is not in the actual table, by key, then that row is a new row and is inserted.  If a row is in the actual table, by key, and is not in the temp table, then that row is removed from the actual table.  

The "update section" does a binary_checksum against non-key columns then compares that using the key and updates the actual table wherever the binary_checksum differs.  

Finally we drop the table and print a message that we should re-enable all FK constraints.  This is a message that is helpful to the developer.  If using MD3's executor then the FKs are automatically re-enabled and any error messages (DRI errors) will be thrown as normal.  

This is the basic "Type1" model data pattern that MD3 provides out-of-the-box.  

Model Data Use Cases

Assume you have a requirement to rename ZWD's description from "Zimbabwe Dollar" to "Zimbabwe DollarBill".  All you need to do is change the model data entry as shown below.  

You can see the output.  No rows inserted, none deleted, and one row updated.  

Suppose the requirement is to add a new currency called ZZZ FooBar Dollar.  Here is the change:  

Note that when executed we have one new row inserted this time, with no updates and no deletes.  Just what we expected.  

Finally, if you wanted to delete the metadata for ZAR Rand you could either remove that row or simply comment it so you had a "history" of that data element for future reference:  

Type2 and Type3 Model Data

These examples assumed you only needed to handle Type1 (vendor-only modifiable) model data.  MD3.ScriptModelData only scripts Type1 model data. The reason is simple...Type2 and Type3 model data are highly customized and are specific to your given use case.  You need to handle these cases by altering the output from the reverse-generated script.    How can we handle Type2 and Type3 model data?

Type2 model data is quite simple.  In this case the vendor initially supplies "seed data" that is never subsequently modified by the vendor.  We simply take the scripted model data and remove the "update" section.  

Type3 model data is totally dependent on your exact requirements.  I can give you ideas on what needs to change in the reverse-engineered data, but you can probably figure out what you need to do yourself if you fully understand your model data.  I can't stress that enough...you need to understand your data.  After that you simply modify the model data pattern to fit your needs.  

Summary

In this post we covered model data, or data that is delivered with your database application.  I (and MD3) view model data as just an extension of your database schema.  Model data needs to be "versioned" just like your schema.  It is a "stateful" database object, just like a table is.  When we modify model data we need to take care to ensure we've handled all of the historical data that may reference that model data.  The model data patterns in MD3 can be used without implementing any other MD3 features.  Implementing model data in a repeatable manner is by far the most difficult aspect of deploying MD3.  This is because most developers truly do not understand the model data in their applications.  Can the data be altered later?  Can it be altered by a customer?  How do we handle model data that was altered by the customer?  These are difficult questions when you don't understand your data.  Thankfully, once you understand your data it is very easy to script it into the MD3 model data pattern.  

 


You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

The OO Design Principles of MD3

Over the last few posts we've covered how MD3 handles stateful database object deployments.  You may still be wondering why you should care about deploying your database code using metadata instead of hand-crafted DDL.  In this post I'm going to take a detour from the technical aspects of MD3 and instead cover the design principles behind MD3 that are not found in (m)any other database deployment tools.  Even if you decide MD3 isn't for you you may learn a few things that help you become a better data developer.  

I designed MD3 to follow standard Object-oriented design principles...using TSQL.  I haven't always succeeded, but the OO principles in MD3 make it so much more robust and reliable than standard DDL.  Let's cover a few of these principles, starting with the most important first (IMHO).  

"Tell, Don't Ask"  

...is a object-oriented programming principle that helps programmers remember that they should strive to tell their functions what to do rather than ask them what their state is, then ask them to perform some action based on that.  We need more of this in SQL and DDL.  Procedural code, such as DDL, does far too much "asking" for information and then making decisions based on the response.  It would be nice if us database developers could just tell our RDBMS what we wanted and the RDBMS would just "do the needful" and make it so.  

Here's an example of some code that probably looks familiar to you...it builds an index if it doesn't exist:  

This is an example of "Ask, then Tell" which is terrible.  As data developers we write code like this because we need to ensure the index doesn't already exist before we create it.  This is a code pattern that you have to remember and may only use a few times a year.  Variants of this code will be scattered in tons of index creation scripts.  All we really should need to do is "tell" our RDBMS what the index should look like and the RDBMS should be able to figure out how to get us that index in the most efficient manner possible.  I call this "declarative DDL".  This is what MD3 does for you.  This graphic shows how MD3 handles index creation declaratively. Very simple...we tell MD3 what our index should look like and it ensures that after EVERY deployment we get an index called nciFoo with the declared properties.  It doesn't matter if this is a "net new" database, a database updated from a 5 year old version of our application, or a database upgraded from yesterday's build.     

DRY Principle

Index creation DDL with an "if exists check" also violates the DRY Principle (Don't Repeat Yourself).  As mentioned before, you may have variants of that code in lots of sql scripts.  Let's suppose you need to modify all of those scripts because you decide to implement ROW compression?  You may need to change HUNDREDS of scripts.  If you use MD3 you simply modify MD3.CreateIndex to default all indexes to use ROW compression.  Very easy!  Change one script instead of hundreds.  

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So why do the ANSI SQL standards not specify a Declarative DDL syntax like this?  Probably because the ANSI standards don't specify concepts like "indexes" and "clustering keys" and "fillfactors".  DDL is, in fact, declarative in that you "declare" things like column names and data types, but the process to implement those changes is still a series of imperative verbs that carry out those actions (ie, CREATE and ALTER).  The ANSI specifications allow the "hows" to be vendor-implementation-specific.  And, of course, SQL and DDL were created before object-oriented programming became de facto.  

So why do none of the major (R)DBMS vendors provide native declarative DDL like MD3?  I have no idea.  They really should.  If you look at tools like Microsoft's Visual Studio database projects you can see that there is an attempt to make things more "tell, don't ask", but it is half-baked at best.  In VS database projects you declare the index using standard CREATE INDEX DDL, like this, which I pulled directly from one of my VS database projects:  

This is certainly much easier than having to write an IF EXISTS statement, but it could be better.  The way VS database projects work is that if you need to ALTER an index you simply change the CREATE INDEX command in the index script.  So, to add an INCLUDE clause to this index is simply a matter of adding that clause to the above command.  When the deployment runs it will run the necessary commands to get you what you requested, whether that is a DROP followed by CREATE, a CREATE WITH DROP_EXISTING, or whatever.  But again, if you need to make bulk changes to all indexes, such as adding a FILLFACTOR or COMPRESSION you can't.  That's why I feel it is half-baked.  

Hopefully one day we'll have full declarative DDL in our RDBMSs.  Even the big NoSQL implementations don't handle DDL any better.   If you want to manipulate an index in Hadoop/Hive you still need to use CREATE/ALTER/DROP syntax.  No improvement there.  Mongo is a bit better.  You don't use DDL at all, instead you call a .ensureIndex() method that will ensure your index "looks" a specific way before it allows your job to run.  There are problems with this from an operational standpoint...such as the fact that this may cause your entire collection to serialize while the index is built...but syntactically this is a great improvement over standard SQL DDL and is getting very close to full declarative DDL.  

The Single Responsibility Principle

The SRP is a restatement of Unix's philosophy of Do One Thing and Do It Well.  I've tried to make MD3 adhere to the SRP as much as possible.  As I've shown already, unlike other deployment tools that make you use all of nothing of their solution, MD3 does not.  I've had customers write their own deployer (covered soon in a post about RunMD3.ps1) to avoid using/learning Powershell.  I've had others using VS database projects to deploy but use the MD3.Create procedures to manage their stateful objects. And every client loves the nodetitle scripts.    

Most deployment tools, such as RunMD3.ps1 and VS database projects, are single-threaded...meaning they are designed to run only one script at a time, serially.  I had a client that was EXTREMELY sensitive to downtime and wanted to multi-thread their deployments, yet not have to deal with object dependency issues.  Since MD3 is architected as independent services we were able to make the deployment multi-threaded very quickly simply by running the single-threaded RunMD3.ps1 concurrently with itself.  We made the deployment code "multi-threaded" in two days and cut their deployments times by a factor of 4.  

YAGNI

"You ain't gonna need it" is a principle of eXtreme programming that states that you should never add a feature unless it is needed.  I totally agree with this.  For instance, I only added the ability to use index partitioning with MD3 in 2012 after I finally found a use case where partitioning would be the best solution to a problem.  Prior to that I always solved my problems using other solutions that were at least equal to the task.  

I've had co-workers and clients complain that MD3.CreateIndex does not support critical features they think they need.  An oft-cited example is filtered indexes.  My retort is quite simple...I've yet to see a use case where filtered indexes are the best solution.  I'm not saying filtered indexes are useless, I'm merely saying I, professionally, have yet to need them.  Perhaps I'm just lucky and have never worked on a system that truly needed filtered indexes.  Rest assured that the first time I need a filtered index I'll refactor MD3.CreateIndex so it properly supports this feature.   So, if there are features you need either implement them yourself or contact me.  

Separate Interface from Implementation

Some folks call this ISP (Interface Separation Principle).  I prefer to call it "separating the interface from the implementation" because it seems to make more sense to people.  The "Interface" is how you interact with your object, the "implementation" is what your object does under-the-covers.  You should always try to isolate your implementation details from your users.  As we've seen time and again, standard DDL mixing the "interface" (what you want to do) with the "implementation" (how you do it).  When you use the MD3 procedures you are not exposed to the implementation details.  You don't need to worry about what to do if you need to CREATE your object vs ALTER it, nor do you need to worry about dependent objects.  Everything is handled for you.  Yet since MD3 is implemented in TSQL you have the ability to see how the implementation is handled by MD3 and of course you can change that if it doesn't suit your requirements.  The implementation details are hidden, until you need them.  

Convention over Configuration

This is sometimes known as "coding by convention".  CoC strives to keep the number of configuration options a developer/admin has to worry about to a minimum and supply sensible defaults instead.  Certainly having lots of configuration options is cool, but have you ever had a situation where you've changed a bunch of configurations only to find out that your software doesn't work and can't figure out which buttons and knobs you changed that are affecting the behavior?  

MD3 has NO configuration file.  None.  Configuration isn't needed.  When you look at your folder hierarchy you should be able to tell the order that scripts will deploy in based on their naming convention.  All folders and scripts execute alphanumerically.  This is covered further in my post on the MD3 executor, coming up soon.  

I've had clients add various knobs and settings to customize MD3 for their specific needs and what they thought was a better implementation.  That's fine.  But the goal of good software is to not have to worry about configuring the software to have it do what you need.  

So how do you configure MD3 without an XML, ini, or settings table?  You use the MD3 patterns and conventions I've already talked about in these posts and will talk about in future posts.  We don't need a configuration option that allows you to execute .sql and .sqlx files (whatever that might be).  We use the most common option that everyone is familiar with...the convention of using .sql file extensions to indicate this is a SQL file.  Of course you can change this if you need to, that's totally up to you.  

Write Shy Code

This is sometimes called the Law of Demeter.  I hate that name because it is an unnecessarily wordy, confusing, and obtuse way to explain a simple concept.  Do you even know who Demeter was or what she is famous for?  Probably not.  The LoD can be summed up by stating that your objects should be "loosely coupled".  In other words, modules should not reveal anything unnecessary to other modules that don't rely on other modules' implementations.  An example...if you need to change an object's state, get the object to do it for you.  This way your code remains isolated from the other code's implementation and increases the chances that you'll remain orthogonal.  Orthogonal is a fancy way of saying your code will "stay at right angles to other code"...which is another wordy way of saying you won't introduce "side-effects" if you change something.  

We accomplish this in MD3 quite simply...all code has an interface in the form of MD3.Create procedures.  These procedures take a given set of "properties" as arguments and nothing else.  All state changes are hidden behind those interfaces.  If those interfaces change, such as if we decide to add a new "property" to an object (perhaps adding @FilterClause to MD3.CreateIndex) we can rest assured that we do not have to worry about that code having unintended side-effects that may break other modules.  And we can always test this with our CI build loops (nodetitle).  

Summary
This post was a divergence from my regular posts on the technical details of the MD3 system.  Understanding the design principles of MD3 will help you quickly and safely make your system extensible later, if needed.  I'll cover examples of this in a future post (MD3 Extensibility Stories).  The goal of MD3 is to think declaratively (specifying what is to be done, not how) and create a highly dynamic and adaptable deployment tool.  
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 Script Idempotency and MD3.CreateColumn

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

In the last few posts of my MD3 blog series I've focused on how MD3 handles "stateful" database object deployments.  Today I'm going to cover the most important stateful object in your database...the table.  You really want to make sure you get your table scripts correct otherwise you risk data los.  Basic CREATE TABLE DDL is already really simple to understand so MD3 doesn't get in your way with some cumbersome new syntax that you need to learn.  However, MD3 adds some scaffolding code around the DDL to help you make your deployments bullet proof.  Here's an example:  

MD3 includes a script generator that will put your existing tables into the above format automatically and will be covered in nodetitle.  But the script generator will not handle the "Alter" section of the script since this is the stateful history that you must provide.  Sorry, but there's no way around that...no tool will provide this for you.  

First, All MD3 scripts must be idempotent

An idempotent operation is one that can be applied multiple times without changing the result beyone the initial application.  In SQL this is an operation that can be "rerun" without having any unintended side affects.  For instance, this piece of SQL is NOT idempotent:  

UPDATE dbo.Employees SET Salary = Salary * 1.10

Every time you run this code it will give an additional 10% raise to every employee.  

Every MD3 script must be idempotent, or "rerunnable", without "side-effects".  This is because MD3 runs ALL scripts during each deployment.  Some folks struggle with this concept.  Two suggestions:  ensure that you create proper CI testing (see nodetitle) and whenever you write a script simply press F5 twice, then see if the results are what you expected.  Some of my clients call this the "F5 F5 test".  After a few weeks you'll be writing rerunnable scripts without even realizing it.  

MD3.CreateColumn

...will help you add or alter a column based on the properties you provide.  Here are the parameters:  

There are some benefits to using MD3.CreateColumn vs using standard DDL:  

  • removes constraints (unique, foreign key, check, and PK) if the column change requires those objects to be dropped
  • removes indexes from the col, if needed
  • less esoteric DDL to remember

When you drop a column with @DropColumn (instead of using ALTER TABLE...DROP) MD3 will automatically remove the dependent objects and will provide the developer with a message like this to remind them to remove those objects from source control.  

Back to our Example

All MD3 scripts must be rerunnable.  Note that the table script pattern above isn't required with MD3, but I've found this is the best pattern to use.  In this case the script is rerunnable because the table will only be created on the first execution of the script.  On subsequent executions only the PRINT statement will be output.  

Let's assume that you have a new requirement to add a URL column to this table.  Using the above pattern this is how I would do it.  Note that we add the URL code to two places...the CREATE TABLE statement (covers a "net new" database) and as a call to MD3.CreateColumn (covers upgraded databases.  On the second and subsequent executions of this script nothing will happen since the URL column has already been created as declared.  

Interspersing DDL with data migration code

Some shops absolutely insist on separating DDL from data migration code.  This is wrong.  In many cases you must run data migration code either before or after your DDL runs.  In these situations the data migration code is tightly coupled with the DDL and should be stored together.  This aids in documentation and understanding for future developers.  

Let's look at a simple, contrived example.  For some reason you have a requirement to shrink the GroupName column from varchar(500) to varchar(100).  Further for existing data the requirement is simply to retain the first 100 characters.  Here's how you would accomplish that using my MD3 patterns.   First, you would change the definition in the CREATE TABLE statement (omitted from screenshot).  Then you would run the UPDATE statement to retain existing data, then you would make the call to MD3.CreateColumn changing the parameter for @Length.  As a side note, the UPDATE statement must be placed in dynamic sql otherwise the script will not compile on net new databases because the table does not actually exist.  This is goofiness with TSQL, not with MD3.  You can of course add any additional code you may want here.  A backup of the table (or at least the GroupName column) would be a wise idea, just in case.  

Add a non-nullable column to a table

This is another common use case that I see.  This is a three step process:  

  1. Add a nullable column to the table
  2. UPDATE the column to the default value for existing data per your requirements.
  3. Set the column to NOT NULL.  

Here's how we do it with MD3.  If you are sensitive to downtime and this is a large table we can do most of this work while the system is online, if we prepare properly and write the code to accomodate this.  

Here we've added an AccountingCode column and populated existing data with a "default" code, then set the column to NOT NULL.  

Summary

All MD3 scripts must be rerunnable.  This is, frankly, a good habit to get into.  You never know who'll accidentally rerun your script without you knowing so scripts you write should never have "side-effects".  We then looked at the pattern to use for MD3 table scripts.  You don't have to use this pattern, but in my experience this pattern works best.  There are scripts available with MD3 that will reverse-engineer your database tables into this pattern.  

In the next post we'll take a break from the technical aspects of MD3 and look at nodetitle.  When you understand these principles you'll see how a tool like MD3 can solve a lot of problems that DDL can't.  


You have just read "MD3 Script Idempotency and MD3.CreateColumn" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3.CreatePrimaryKey in Action

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered MD3 and "state" which showed how MD3 deploys "stateful database objects" using The Four Rules.  In that post we covered how MD3.CreateIndex handles stateful index deployments.  In The Other MD3.CreateProcedures we covered the other stateful object deployment procedures briefly.  In this post I want to cover some use cases where MD3.CreatePrimaryKey can really help you evolve your database without having to code complex, error-prone DDL.  

In the AdventureWorks sample database there is a table named HumanResources.Department.  The PK is clustered and there is also one non-clustered index called AK_Department_Name.  

After doing some performance testing you determine that the table should really be clustered on Name and the PK should instead be non-clustered.  That seems like an easy request so you start by trying to drop the existing clustered PK first.  

But as you can see, that won't work because there is an FK that references that PK that would need to be dropped first.  Dropping the FKs, in turn, will require you to recreate those FKs later, and any associated indexes, which is a lot of code to write, test, and deploy.  

In MD3 this is much simpler.  In MD3 and "state" I showed how there are really two ways to deploy stateful objects...by calling the MD3 procedures directly or by populating a metadata table with the desired properties.  

Using the metadata table script

This is by far the easiest method.  The metadata table that holds PK information is called MD3.PrimaryKeys.  Here is what the (abbreviated) deployment script looks like for AdventureWorks' primary keys:

Let's a little hard to read so here is just the metadata entries specific for HumanResources.Department:  

Since MD3 is declarative I only need to change the entry for the IsClustered property and I'm done.  Here is the code I changed...the IsClustered property was changed from 1 to 0:

Now let's execute that script just like a standard deployment.  Here is the output:  

This is visible to the developer executing the script on her local development database, as well as the in the logs when this is finally deployed to your production databases.  Since we are modifying a PK that requires the associated FKs to be dropped a message is logged to remind the developer that either the FK MD3 scripts will need to be rerun, or if running a full MD3 deployment this dependency is handled automatically and the FKs will be run.  No custom DDL...very fast.  

For a point of reference, here is FK MD3 script that the developer would run, or modify as necessary.  And of course all of this can be run automatically using the MD3 deployer (RunMD3.ps1:  The MD3 Executor).  

And when you run the FK script you will see this output:  

Note that all FKs already exist EXCEPT the FK that was automatically removed when we altered the PK.  In the post The Other MD3.Create Procedures I mentioned that all FKs will try to first be applied as trusted, and you can see that this is in fact the case.  

Using a direct call to MD3.CreatePrimaryKey

About half of my MD3 clients don't like keeping all of their PK properties in one metadata table (see the screenshots above).  Instead, some like to create their stateful objects using calls to the MD3 procedures directly.  They keep these calls in .sql files segregated by table name.  There is a scripting utilitythat will script your existing stateful database objects into either the metadata format or direct MD3 calls (nodetitle).  

HumanResources.Department is currently configured with its PK as non-clustered.  This is what the direct call would look like.  Note that when I executed this I simply see "Command(s) completed successfully" because, according to MD3 and The Four Rules, the object already exists with the same name and properties, so it silently continues.

Let's say that after a while you determine that your decision to make DepartmentId a non-clustered PK was a bad idea and it really should in fact be a clustered index.  What do we do?  Write a bunch of hand-crafted, error-prone DDL?  Nope, we simply change the @IsClustered flag.   When a developer does this she sees these messages:  

Note that the FKs were automatically dropped and a helpful message tells the developer to update the FK scripts if necessary.  These are the same messages that will be seen during standard deployments as well.  

Summary and Benefits of Using MD3

The goal of this post was to cover how the MD3 scripts in general, and MD3.CreatePrimary in particular, can spare you from writing a lot of complex, error-prone DDL.  It is much faster to make these changes using MD3 and then have the proper CI testing in place (nodetitle) to ensure all valid deployment paths are covered.  

There are certainly lots of GUI-based tools that will also drop and recreate your dependent FKs whenever a PK changes.  However, most of the implementation details are hidden from the developer and DBA.  What if you need a custom deployment mechanism that those tools don't provide?  MD3 can be customized...MD3.CreatePrimaryKey is just TSQL.  In nodetitle I'll give you a bunch of common use cases where MD3 can provide a quick solution where these tools cannot.  Further, since MD3 is just a bunch of TSQL calls these scripts can be properly versioned so you can quickly see the evolution of your database from release to release.  That's much harder to do with a GUI compare tool.  

 

 


You have just read "MD3.CreatePrimaryKey in Action" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

The Other MD3.Create Procedures

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered nodetitle which covered how MD3 deploys "stateful database objects" using The Four Rules.  If you handle The Four Rules correctly (and MD3 can greatly help you with this) then your database deployments should be a breeze.  No more hand-crafted, error-prone DDL.  In this post we are going to cover the other MD3.Create* procedures which help with database deployments and The Four Rules.  

The Other Stateful Database Objects

Here are the other stateful database objects and how MD3 handles their deployment.  In most cases this is handled with an MD3.Create* procedure.  These procedures deploy your stateful objects "smartly".  

Stateful Object MD3's deployment mechanism
Tables/columns Tables are created using standard DDL and can be scripted using SSMS tools.  Columns can be added or altered using MD3.CreateColumn, this will be discussed further in the post nodetitle(source code).  
Indexes This was covered in the post MD3 and "state" using MD3.CreateIndex.  Source code.  
Primary Keys We'll cover this below.  Source code for MD3.CreatePrimaryKey.  
Foreign Keys We'll cover this below.  Source code for MD3.CreateForeignKey.  
CHECK constraints We'll cover this below.  Source code for MD3.CreateCheckConstraint.  
DEFAULT constraints Handled with MD3.CreateDefaultConstraint, and covered below.  This may not seem like a "stateful" database object, but it is.  Source Code.  
Model Data (states/provinces, valid order statuses, other "lookup" data) Covered in the post nodetitle

Each of the above referenced MD3.Create procedures handles The Four Rules correctly.  Let's look at these routines.  

MD3.CreatePrimaryKey

Conceptually, primary keys are very similar to indexes.  You can cluster them, give them a name, give them compound keys, put them on a filegroup, etc.  Therefore it shouldn't be surprising that MD3.CreatePrimaryKey takes the exact same parameters as MD3.CreateIndex.  

The difference is what each procedure does under-the-covers.  If you change a PK you may actually have to deal with objects that depend on it.  Here's a list of objects that are dependent on PKs and how MD3 handles PK changes.  You can of course change this functionality to suit your requirements:  

  • If the table is replicated then the table must be removed from any subscribed publications.  Currently MD3 merely throws an error if this situation exists. 
  • Removes all FKs that reference the PK.  MD3.CreatePrimaryKey simply removes them.  Later, when MD3.CreateForeignKey is executed (remember, MD3 handles the dependent ordering of objects for you) the FKs will be recreated properly with the new keys.  
  • If we are changing the PK to be clustered we first drop an existing clustered index that already exists on the table.  As with foreign keys, MD3 handles the dependencies for you...meaning the dropped clustered index can be recreated later as a non-clustered index if required.  

In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where MD3 can save you vast amounts of time and rework when you have to modify a primary key.  Stay tuned.  

MD3.CreateForeignKey

FKs are stateful database objects because they require a properly-applied "parent" key to be in place first.  FKs can also be very expensive to recreate if the associated tables are large.  Here are the parameters (properties) for MD3.CreateForeignKey:

Like the other stateful object creator routines, MDs3.CreateForeignKey properly implements The Four Rules properly.  The parameters should be self-explanatory, except maybe @AllowNoCheck.  Constraints can be either TRUSTED or UNTRUSTED. When you create a FK using ALTER TABLE...WITH CHECK ADD CONSTRAINT you are saying that existing data in the table must adhere to the constraint, as well as any future data.  When you create a FK using ALTER TABLE...WITH NOCHECK ADD CONSTRAINT you are allowing existing data to violate the constraint, it will be enforced for future data only.  This is an untrusted constraint.  There are 2 reasons why untrusted constraints are frowned upon:  

  1. It means you may have data that violates your constraint and you should probably determine why that is, and fix it.  
  2. Some query plans will be less-optimal with an untrusted constraint, leading to possible performance issues.  

MD3 by default does NOT allow untrusted FKs to be created.  However, the @AllowNoCheck allows you to override that, if needed.  MD3 will try first to create the constraint using WITH CHECK and then will fallback to WITH NOCHECK if you have @AllowNoCheck set.  MD3 throws a WARNING in this case that you can then research later.  

MD3.CreateCheckConstraint

You may wonder why CHECK constraints are stateful database objects.  The reason is that recreating a CHECK constraint on a large table can be EXTREMELY time consuming as all data must be read and evaluated.  We also need to take into consideration whethere the constraint is trusted or not, just like with foreign keys.  Here is the set of parameters that MD3.CreateCheckConstraint uses:  

 

MD3.CreateDefaultConstraint

Defaults are also stateful objects only because they are declared very similarly to CHECK constraints and the code is very similar.  

Summary

In nodetitle I covered how MD3 covers stateful database objects, specifically indexes.  In this post I covered how MD3 deploys the other stateful database objects like PKs, FKs, CHECK constraints, and defaults.  In MD3 Script Idempotency and MD3.CreateColumn I'm going to cover what is probably the most important stateful database object...the table.  In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where using MD3 instead of hand-crafted DDL can save you a lot of grief.  


You have just read "The Other MD3.Create Procedures" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and The Four Rules

In the last few posts I covered what MD3 is (Metadata Driven Database Deployments) and why it is better than hand-crafted DDL.  In this post we are goingto cover The Four Rules of Database Deployments.  Regardless of whether you decide to use MD3 or not, if you understand The Four Rules your database deployments will run much smoother.  

 
Some Background
In the last post (nodetitle) we covered a scenario where we had to write really complex DDL if we wanted to deploy an index change properly.  We need an index, called MyIndex, created on dbo.Example with a fill_factor of 90, on column Foo, with included_column Bux.  Here is the DDL:
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem in the post nodetitle).  In MD3 DDL is much simpler.  We just "declare" the properties of our object and let MD3 determine what needs to be done to get us to "model".  Basically, we enter "metadata" about our object into an MD3 table (for instance MD3.Indexes), which looks very similar to the "sys" system tables (in this case sys.indexes).  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  Under the covers MD3 is making calls to MD3.CreateIndex (which you can call instead of populating the MD3.Indexes table) similar to the nearby graphic that is adding the index nciFoo to dbo.Example.  
 
Each parameter to MD3.CreateIndex (and each column in the MD3.Indexes table (see above screenshot) is a metadata property that we can declare for an index.  These properties cover just about every option you can think of regarding an index.  There are some properties missing...for instance, there is no option for filtered indexes because I've never really found a good use case in my career where I absolutely needed a filtered index.  Since MD3 is open source you could always add this functionality if you needed it.  Here are some other index properties that are missing and why:
Index Property Why it is missing?
WHERE <filter predicate> filtered indexes.  You can always add this yourself.  
IGNORE_DUP_KEY I've never found a good use for this option.  
STATISTICS_NORECOMPUTE I've never found a need for this.  
DROP_EXISTING ON|OFF MD3.CreateIndex will smartly determine if this should be set to ON or OFF based on what is being asked.  
ONLINE = ON|OFF MD3.CreateIndex will determine whether this option should be used or not for you, based on whether the index follows the rules for ONLINE index creation.  Here is another case where MD3 is a huge help.  

So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL? 

The Four Rules of Stateful Object Deployment
 
If you always follow The Four Rules when you code your DDL you will always have a successful deployment (no failures!) and you will never do unnecessary DDL. MD3 does all of this for you.  Assume you have some DDL that you need to apply:  
  1. if the object exists by name and by "properties", then silently continue, there is no work to do.  In other words, there was no change in the object between deployments.  The "properties" are the parameters to the MD3.Create procedures (see the screenshot above).  
  2. if an object exists with the same “properties”, but the name is different, then RENAME the object.  Example:  I have an index with the same column list, order, fill factor, on the same filegroup, with the same compression, etc...but it is named differently.  All MD3 does is run sp_rename.  This scenario occurs, for instance, when your DBA adds an emergency index to a table and you check in the same index structure, but you use your naming conventions.  
  3. if the object exists by name, but the "properties" are different, ALTER the object accordingly.  Example, you have an index with the same name and mostly the same properties, but you need to change the INCLUDED col list and FILLFACTOR.  MD3 handles this without esoteric, hand-crafted DDL. Usually an ALTER can bring the current object up to the desired properties, otherwise DROP/CREATE is used.  The most important thing to remember is that YOU don't have to code the DDL and worry about covering all possible scenarios.  
  4. Else, CREATE object with properties as defined.  At this point MD3 realizes this is a missing object and creates it according to your declarations.  
If you don't do The Four Rules correctly you risk, at a minimum, recreating expensive objects at every deployment.  You don't need MD3 to do a DDL deployment, but it makes it a lot easier and less error-prone.  
 
An Example
Let's take our example code again and determine what MD3 will do in various situations regarding The Four Rules.  The below table is kinda like the "unit tests" for MD3.CreateIndex (tsqlt unit tests available on request).  
 
Database State Expected Result Why? Covers which of The Four Rules
dbo.Example table does not exist FAILURE This scenario will be caught during continuous integration testing and will never happen during a production deployment.  (See nodetitle) None.  
index nciFoo does not exist Index created The index will be created with the declared properties.   4
An index exists called dbaFoo with the same properties Index renamed The index will be renamed using sp_rename to nciFoo.  This ensures that a duplicate index is not created with the same properties.   2
nciFoo exists, but with different "properties" than what is declared Index altered When possible the index will be ALTERed.  Otherwise the existing index will be dropped and the desired index will be created.  In some cases, depending on what is being changed, an index cannot be ALTERed.   3
nciFoo exists with identical properties Silently continue There is no work to do.   1
 
Summary
In this post we looked at The Four Rules of stateful database object deployments.  If you follow The Four Rules when you deploy your objects you will find that you have fewer errors and more reliable, repeatable deployments.  You don't have to use MD3 to use The Four Rules.  You can use any tool, even plain 'ol DDL.  MD3 just makes everything a whole lot easier.  
 
In the next post, nodetitle, we'll cover how The Four Rules are applied to other stateful database objects.  In a later post we'll cover how MD3's model data patterns work with The Four Rules to ensure your model/system/lookup data deployments work just like your other stateful object deployments.  
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and "state"

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last post (Why do we need another database deployment tool like MD3?) I covered some reasons why database deployments give DBAs and developers fits.  In a nutshell, it's because we write too much hand-crafted DDL..  Even when we use third-party GUI tools to manage our deployments we still hit edge cases where the migration isn't smooth.  So why are schema changes so difficult?  

Database Object State and Builds
 
Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

When you "build" your Java app (...or C# or whatever...) you are really just compiling the code on your machine.  If compilation is successful then you can rest assured that it will compile on, basically, any machine (OS and processor architecture might be exceptions).  The deployment is simply a matter of copying your compiled code.  In the .NET world you usually build a Windows Installer package or perform an "xcopy deployment".  Build and deploy is really easy, as is building the necessary "CI loops" (...they run your unit tests...) to ensure your builds aren't broken when a developer checks in a change.  

 
Databases work differently.  Just because your scripts compile and deploy on your machine/database does not mean they will deploy anywhere else.
 
Why?
 
Java code, when it compiles, couldn't care less about the "state" of the previous build.  The code is “stateless”.  Some database code, on the other hand, is “stateless” while some is “stateful”.  Database developers need to worry about “history”, or "state", which is the existing data in the tables that must be maintained into the next release.  The code must do more than just compile, it must be respectful of existing data.  And data isn't the only stateful database object.  You don't want to recreate a bunch of indexes needlessly either.
 
Stateless Objects Stateful Objects
Views Tables/columns
Procedures Indexes
Functions Constraints (PKs/FKs/CHECK)
Model Data (sometimes called seed/lookup/master data).  An example is a list of states and abbreviations. This data rarely changes and when it does it changes via an approved process (ie, a Master Data Management update) Model Data (if the data can be altered by the customer.  An example is order status codes.  If you allow your customer to change (or add to) their model data then you don't want to "reset" their model data with your updates...or...maybe you do depending on the scenario.  
 
Stateless Object Patterns
Stateless db objects are the easiest to deploy...you simply compile them.  You do not care about what the previous version of the object looks like, you simply create the new version, overwriting the old.  Just like you would overwrite the last Java build with the current one.  Traditionally you code stateless objects using a pattern similar to the graphic to the right.  
 
Since the object, in this case a stored procedure, is stateless we can simply drop it and re-create it.  When doing that we lose our permissions so we must re-apply them.  It would be nice if we had the CREATE or REPLACE pattern for Transact SQL like Oracle does.  CREATE OR REPLACE is simplified syntax that obviates the need for re-applying permissions and checking if the object exists.  Oh well...maybe in the next replace of SQL Server.  
 
Generally you should deploy your stateless objects near the end of your deployment.  There are some exceptions to this rule:  
  1. You may have object dependencies which require certain stateless objects to deployed in a specific order.  For instance, if viewA references functionB which in turn references viewC then you have to ensure those objects are applied in a certain order.  
  2. if you have SCHEMABINDING applied to any of your objects and you try to first manipulate a stateful object (such as a table) you'll need to drop the object with the SCHEMABINDING clause first.  

MD3 handles all of this for you and in the demo files I give examples of this and how it is very simple to handle these edge cases. 

Stateful Objects and Patterns
As mentioned before, Java has no "stateful" code...if the code compiles you are good...if it doesn't compile you have problems.  In the database world it is the stateful code that causes database developers fits.  
 
An example of "stateful" database code that can be maddening is index DDL.  Let's see why that is.  You can download the example code here.   We start by creating a very simple table with a few columns.  Most developers just create their tables and then either worry about the indexes later or let the DBAs worry about it when performance becomes a problem.  At a later time the developes realize that they need to add an index to the Foo column.  If you are using a tool like Visual Studio db projects then you can use very simple CREATE INDEX syntax.  Otherwise you need to do an "existence check" to ensure that a subsequent execution of your script won't generate an "object already exists" error.  
 
 
The "existence check" syntax isn't terribly difficult to handle...yet.  Now let's say you notice a lot of page splitting and realize that your table design and data access patterns indicate that you need to change the fillfactor to 90.  Quick...do you remember where the fillfactor settings are stored in the metadata tables so you can modify the "existence check"?  You could just code up something quickly like this:  
 
 
The problem is that you will drop and recreate the index at every deployment.  Again, a tool like VS db projects will obviate the need for this.  Better "existence checking" code would be something like this:    
 
 
Our code is getting a little more "advanced" and much more "hand-crafted" than I am comfortable with.  
 
Here's a common scenario...once your index is released to production your DBAs notice that you have a bunch of Bookmark Lookups causing customers to report application slowness and they believe that they should add a couple of included columns to your index.  The DBAs are nice enough to tell you that they've done this and that you need to make this new index "model code".  You determine this is the new index creation code you need: 
 
 
Yikes!  That's getting to be really complicated code.  
 
Here's a slightly different example.  Your DBA calls you and tell you that he saw a gigantic "blocking storm" and needed to add an index on the Bar column.  You agree with his analysis and you both decide to make this a model index. The first block of code is what the DBA added (note the index name doesn't follow your naming conventions) and the second block is the code you made "model".  Do you see the problem?  
 
 
You really should've renamed his index, if his index exists.  Now you risk a duplicate index during a deployment.  
 
A Better Way to Handle Stateful Objects
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem later).  In MD3 DDL is much simpler.  Wouldn't it be nice if you could just enter the "properties" of the desired object in a metadata table and SQL Server just did what it needed to do to make that object for you?  MD3 does that for you.  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  
 

Now let's say you want to alter that index to change your compression settings from NONE to ROW...just update the entry in MD3.Indexes and rerun MD3. Need to add an Included column?  Change the NULL above to the name of the column you want to include and rerun MD3.  Maybe the index is no longer needed...just change the DropIndex column to 1.  
Under-the-covers MD3 is simply cursor-ing over the metadata table (MD3.Indexes) and determining if there is any work to do (more on how this is done in the next section).  It does this by calling the MD3 stored procedures that do the actual work...for instance, EXEC MD3.CreateIndex.  You can call these procedures and use them yourself without using the MD3.Indexes metadata table.  Here's an example call that creates nciFoo.  Let's say you decide to change this index to have a FillFactor of 90...how do you think you would do that?  That's right, change the parameter and run the procedure.  Want to change the filegroup (or partition)...take a guess at the process?  No complex hand-crafted DDL is required.  
 
So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL?  MD3 uses The Four Rules.  We'll cover nodetitle in the next post.
 
 
 
 
 
 
 
 
 
 
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

Subscribe to RSS - Dave Wentzel's blog