DaveWentzel.com            All Things Data


Why do we need another database deployment tool like MD3?

This is the next post in my MD3 series (Metadata Driven Database Deployment).  MD3 is my build-and-deploy system that I've evolved over the past 15 years. Quick history...I worked at an ISV and we supported 4 versions of our software (Version X back to Version X-3) and we needed to support upgrading any of those versions to "current". We had about 40 customers with the average db size ~40GB. To make matters even more complicated we supported either SQL Server or Oracle.  Both had the same schema...mostly.

Quick Links
Introduction to Metadata Driven Database Deployments
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So we needed a tool to upgrade a database reliably without having non-database developers writing complex, error-prone, hand-crafted DDL. Most of our data developers knew Oracle or SQL Server, but few knew both.  I learned a lot about what works and what doesn't when upgrading a database in a complex data environment like this. I created a database CI loop (also included with MD3) before CI loops were sexy and ubiquitous. Every day we tested our db scripts against real customer databases and ensured everything upgraded and functioned properly. We never worried about the actual customer production upgrade event because our CI loops already upgraded the customer hundreds of times (albeit on older, stripped, and scrubbed databases).  

I've deployed versions of this tool at many different employers and clients. Each has been radically successful.  

  • The developers love how easy it is to make schema changes without hand-crafting DDL.  It is totally flexible and customizable to whatever you need.
  • DBAs and app admins love it because it is scriptable, has customizable logging, and show the SQL that is being executed.  Nothing is hidden from the DBA.   
  • Management likes the fact that it requires zero training and creates a repeatable, reliable process.  

The latest incarnation, which I haven't yet open-sourced, is a version that will do a near Zero Downtime database upgrade. Contact me if you might be interested in this.  

Why do we need another database deployment tool?

There are lots of database deployment tools out there.  They each work great...mostly.  But development teams and DBAs don't trust them when it comes to complex schema changes.  Don't believe me?  Assume the following scenario...you need to store new data in your database due to a change in business requirements.  There are lots of ways you could do this but you honestly believe the best way is to refactor and split the largest table in your database into 3 new tables.  
How likely are you to propose this solution to your team?  My guess is not very likely.  The reasons are simple: 
  • Modifying a table's schema requires you to "port" the old database structure to the new one.  And also the data.  This is fraught with danger.  Perhaps each version of the table has also experienced schema changes which you also need to take into consideration.  You have to then change keys (both primary and foreign) and clustering as well as all of the non-clustered indexes.  What if each customer/version has a different set of indexes and keys?  
  • Then you need to modify each database object that accesses the table.  All of your stored procs, functions, and views.  Then you need to modify your Java (or C# or whatever).  All of this scares the average database developer.  
  • Since this is a large table how can we guarantee this won't cause our customer excessive downtime while we migrate the data?  How do we verify the customers' data afterwards?  
NoSQL:  "RDBMS schemas are too hard to change"
Altering schemas scares developers so much that an entire cottage industry grew around the fact that relational databases are too darn hard to "evolve" with ever-changing business rules.  One of the primary reasons the NoSQL movement (whenever I hear this I think, "technology or bowel?") grew, especially document and keystores is because they touted the fact that their technologies allowed you to modify your schemas without needing to worry about modifying all of that data access code that scares us all.  And there was no need to "port" the data from old-to-new schema.  
But relational schemas really aren't that difficult to evolve.  You really don't need to write a bunch of complex hand-crafted DDL.  And you don't need a third-party tool either where you hit edge cases where your data won't migrate to the new schema properly.  So why are schema changes so difficult?  
Just Enough to Whet Your Appetite
MD3 deployments are "declarative", meaning you simply declare what you want your object to look like and MD3 figures out the best DDL commands to run (and you can review them) to get your schema to that destination.  Here is a simple example.  Assume we have a table named dbo.Example and we decide a non-clustered index on Foo would be beneficial to performance.  See the nearby graphic to see how easy this is with MD3.  (Actually, the command is even easier than that, and I'll show you that in the next post.)  But MD3 gets even better.  Let's say a DBA already noticed a performance problem and indexed Foo already but named the index dbaFooPerfProblem.  In that case the index will be renamed.  You won't get a duplicate index.  Let's say later you decide to make nciFoo into a compound index with Foo,Bar as the keys.  Simply changing Line 21 to 'Foo,Bar' will automatically change the existing nciFoo, if it exists, or build a new nciFoo with the declared properties, if it doesn't exist.  
In the next post I'll cover MD3 and "state", which is the key to a good database deployment routine.  

You have just read "Why do we need another database deployment tool like MD3?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Introduction to Metadata Driven Database Deployments

MD3 (Metadata Driven Database Deployment) is my build-and-deploy system that I've evolved over the past 15 years. Deploying database code tends to be challenging for most organizations, especially ISVs that may support multiple versions of their software at many customers.  MD3 handles this all very elegantly.  Over the next few posts I'll cover why I think MD3 is the best choice for your database deployments.  

Table of Contents

Page Summary
nodetitle This page.  
Why do we need another database deployment tool like MD3? Covers some scenarios where traditional database deployment tools fail and how MD3 makes hand-crafted DDL a thing of the past.  
MD3 and "state" In this post we cover stateful vs stateless database objects.  It is the stateful objects (those with "history" that we need to consider) that cause our database deployments grief.  We'll cover how MD3 handles this problem.  
MD3 and The Four Rules The Four Rules will ensure you always deploy your stateful database objects correctly.  You don't need MD3 to use The Four Rules, but it helps.  
The Other MD3.Create Procedures Covers the other "stateful" database objects and how MD3 handles their deployment.  

MD3.CreatePrimaryKey in Action

Shows some use cases with primary keys where MD3 can really save you a lot of time and rewrote.  We also further demonstrate how The Four Rules work.  
MD3 Script Idempotency and MD3.CreateColumn We cover the last MD3.Create procedure in this post and cover the one aspect of MD3 scripts that is mandatory...scripts must be idempotent, which is just a fancy way of saying that all scripts must be rerunnable.
The OO Design Principles of MD3 There are certain object-oriented design principles I incorporated into MD3.  These principles are what sets MD3 apart from other database deployment tools.  
MD3 Model Data Patterns Model data is the data, usually static, that you deliver with your application.  MD3 deploys this data in much the same way as other stateful objects are deployed.  We cover how this is done in this post.  
RunMD3.ps1: The MD3 Executor RunMD3.ps1 is a short Powershell script that simply executes each .sql file in each subfolder in your deployment folder.  In this post I'll cover how it does what it does, and why.  
MD3 Deployments in Action In this post we put everything together and actually show how deployments work with MD3.  
How to Reverse-Engineer Your Database with MD3 The hardest part of any new deployment tool is getting your database migrated into the new structure.  This is actually easy with MD3.  I'll demo how to reverse-engineer AdventureWorks in the MD3 format, which takes about a couple of hours to do.  
nodetitle  CI Loops are a definite development best practice.  Yet few shops do this with their database code.  And if they do CI Loops they may not be doing them correctly.  In this post I'll share some secrets to doing really good CI testing with your database code and then I'll show you how to make it really great with MD3.  
MD3: Why you should care about declarative database deployments If you still don't see the value of declarative/properties/metadata driven database deployments I'll give you some additonal arguments in this post.  
MD3 Extensibility Stories I've deployed MD3 at numerous employers and clients and it has always been a success.  In this post I'll recount some stories about how I saved tons of man-hours refactoring database schemas using MD3.  
What are the benefits of using MD3 over my existing solution? You don't have to use the entirely of MD3 to get some of its benefits.  You can pull out just pieces that you feel may help you.  This is a summary post that reviews all of benefits of the tool.  


Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Move a Drupal site from WAMP to LAMP

There's a dearth of information on google regarding how to migrate your Drupal site from WAMP to LAMP (Windows to Linux).  There's nothing particularly wrong with Windows for running Drupal, but let's face it, there is far more documention available for Linux.  Really useful articles just assume you are using Linux and therefore direct the user to place files in certain /etc/this/that directories and those don't exist on Windows.  Even enabling Apache modules is radically different in Windows from Linux.  Further, my WAMP server runs Windows 2003 which is soon end-of-life and I'm NOT shelling out more money to M$ for another license.  I can't stand the new UIs.  

I created what I believe is a novel approach for migrating a Drupal website from one server to another (especially when changing OSs) that uses git.  Using git allows everything between the WAMP and LAMP site to stay in sync because the migration will likely take quite a few days/weeks as you find every place where you reference "c:\documents and settings" and need to change it to "/home/drupal" or whatever.  This allows ample time to test the upgrade and refresh your test server often.  This method requires only a cursory knowledge of Linux (or any other piece of the LAMP stack, really).  You don't even need to know much about git.  You can keep your LAMP site running on a git branch and merge into that as needed.  

Get your Existing WAMP server ready

  1. Get git installed with a remote repo for your website.  
  2. Create a folder under your DocumentRoot for your mysql dumps.  I suggest DocumentRoot\<mysqldump>.  This should not be readable by your Apache user if your security settings are correct.   
  3. Create the initial mysql dump into this folder:   mysqldump --host=localhost --user=root --add-drop-table <sitedbname> > <sitedbname>.sql
  4. git add all of your files (make sure you get the hidden .htaccess, settings.php, etc files) and push them to your remote rep.  

Build your New LAMP server

Build your LAMP server and give it a static IP.  I always use Ubuntu because it is easiest.  VM or physical, it doesn't matter.  If you are using a hosting provider then you can largely skip this section.  Also, if you are installing on an existing LAMP server, you can also skip this.  

  1. sudo apt-get install lamp-server^ (yes you need that final ^)
  2. sudo nano /etc/apache2/sites-available/00-defaultblah :  change this as necessary.  
  3. sudo nano /etc/apache2/apache2.conf :  find every occurrence of "AllowOverride None" and replace with "AllowOverride All".  This is needed for mod_rewrite.  Also add ServerName localhost somewhere as well.  
  4. sudo apt-get install git
  5. cd /var/www
  6. sudo mkdir <yoursitefoldername>

Install a new Drupal Site with git

In theory this section is not needed.  You could git clone directly from YOUR git repo that you created above.  I prefer to get a clean, working Drupal installation first, then I git pull my site into it.  The benefit is we can ensure all pieces of the Drupal stack are functioning properly with a vanilla Drupal site before we get nuts and migrate our custom site.  This process only has to be done once.  

  1. sudo git clone --branch 7.x git://git.drupal.org/project/drupal.git ./<yoursitenamefolder>
  2. sudo nano /etc/apache2/sites-enabled/000-default.conf and change the DocumentRoot to <yoursitenamefolder>
  3. sudo service apache2 restart
  4. sudo mkdir /var/www/<yoursitename>/sites/default/files :  need to create this manually
  5. sudo chown www-data:www-data /var.../files :  provide necessary permissions
  6. sudo cp /var/www/.../default/default.settings.php /var/www/.../settings.php :  create the initial php settings file from the default
  7. sudo chown www-data:www-data /var/.../settings.php :  provide necessary permissions
  8. sudo apt-get install libssh2-php : this makes module updates easier later
  9. sudo apt-get install php5-gd : required php connector
  10. sudo service apache2 restart
  11. Create your mysql database.  Use the steps below.  
  12. now start the browser install (http://<yourwebsite_ipaddress>).  if you get a "white screen of death" then change your php error logging settings.  

I like to test a vanilla install just to make sure php, mysql, and permissions are correct.  Fix anything as needed to get a clean install.  

One last thing to install is drush, so you can script out upgrades and module updates easily in the future:

  1. sudo apt-get install drush
  2. the repository version may not be the latest version.  Let's fix that: which drush
  3. sudo drush update drush --destination= '<folder from which command>'

Once everything is working you can remove this site and database, or keep it around for the future, or you can git pull your WAMP site/db into it.  However you want to do it.  

The First MIgration

Follow these steps to get your site working on your LAMP server the first time.  

  1. cd /var/www/<website folder> (mkdir if needed)
  2. sudo git init
  3. sudo git remote add origin/master http://path/to/your/repo
  4. sudo git fetch origin/master 
  5. sudo git merge origin/master/master
  6. sudo mysql -u drupal -p <db name> < /path/to/dump/file
  7. See Things That Tend to Break below for more items to check.  

"Refreshing" LAMP 

These steps can be used for subsequent LAMP refreshes

  1. mysqldump --host=localhost --user=root --add-drop-table <sitedbname> > <sitedbname>.sql
  2. git push your WAMP code to your repository 
  3. sudo git pull on LAMP
  4. sudo mysql -u drupal -p <db name> < /path/to/dump/file
  5. Doublecheck that your site still works and your data was refreshed.  

Things That Tend to Break

  1. Fix up any issues in the site status report (admin/reports/status).  For instance, you'll likely see permissions issues since the permissions will NOT copy using git and windows permissions are different than Ubuntu.
    1.  "sites/default/files is not writable".  Simple fix:  sudo chown -R www-data:www-data /var/www/<site>/sites/*

  2. Your site email is probably broken.  Check and fix it as needed.  
  3. Cleanurls...do they still work?  
  4. Do your captchas still work?  

Database Creation Steps

These steps will create the Drupal databases for you, referenced throughout this post. You'll need at least one db for the initial "test" Drupal site and then one db for each site you are going to migrate.  

  1. create the mysql db :  mysqladmin -u root -p create <sitename> :  Your database name and dbuser/password should be the same on your server, otherwise you'll need to change your settings.php file to reflect any changes.  
  2. mysql -u root -p  :need to create user and give permissions
  3. GRANT ALL PRIVILEGES on <sitename>.* to 'your_user'@'localhost' IDENTIFIED BY 'password';

  4. flush privileges;
  5. \q

You have just read "Move a Drupal site from WAMP to LAMP" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  


Fun With Transactions

Demo files can be found here

This is a short presentation on some of the confusing aspects of transaction handling in SQL Server.  We start with transaction naming:  

Transaction Names

Even on a case-insensitive database this will fail:  

But this will not fail:  

And even this will work because transaction names are totally ignored when issuing a COMMIT:

Autonomous Transactions

A sorely missed feature of SQL Server is native "autonomous transactions."  This is where you "pause" the current transaction and spin up a new transaction.  When the new transaction "completes" (commits or rolls back) then we continue the paused transaction.  Some use cases:

  • out-of-band logging/auditing
  • custom sequence generators

.NET and Java developers have the concept of .REQUIRES_NEW to "pause" an existing transaction and start a new, autonomous transaction.  Under the covers this is implemented as a new spid with potentially different transaction semantics.  I give some demos on how to use Linked Servers in SQL Server to generate your own autonomous transactions. 

Implicit Transactions

Implicit Transactions (IT) cause inordinate confusion.  At their worst they can cause what appears to be massive blocking events on your server.  Most developers and data access technologies do not use IT, the notable exception being the Java world (Spring JDBC, WebSphere, etc).  When SET IMPLICIT_TRANSACTIONS ON is set a transaction is started for you...if it is needed.  That "if" statement is what causes problems because the "if" assumptions are not properly documented by Microsoft.  If your driver forces you to use IT then ensure that whenever a connection is pulled from the pool that sp_reset_connection is being called.  This will reset the IT setting.  If this isn't done then you can "leak" connections where a developer might have turned OFF IT to get finer-grain control.  That's a big problem.  

I have lots of demos on the confusing aspects of implicit transactions as well as how to monitor your server for IT anomalies that may cause blocking or "abandoned spid" problems for you.  

Demo files can be found here



You have just read "Fun With Transactions" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  


Presenting at PSSUG next week

On Tuesday, December 2, 2014 beginning at 5:30 I'll be doing a short presentation on "Transaction Trivia" (I'll probably come up with a better title later) at the monthly PSSUG meeting at Microsoft's Malvern office.

Service Broker Demystified - Services

Services and the [DEFAULT] contract can be very confusing.  In the post I'll show you why that is and some simple ways to resolve the problems in your mind.  Then we'll look at how to model send-only and receive-only services, which is another constraint you can use in your SSB design. 

SSIS and data contracts

Have you ever tried to use a stored proc with a temp table in an SSIS package?  Did it work?  SSIS is severely hampered by the fact that this doesn't work seamlessly.  In this post I'll demo the problem and I'll give you the best fix for it.  The objective is to run a stored proc from SSIS and optionally send the output to a text file. 


  • Create a very simple stored proc.  You can download the repro script here.  Our stored proc simply builds a temp table, inserts a few rows and then reads those rows as a result set to the caller.  No parameters...very simple.  
  • Create a new SSIS package.  
  • Add a Data Flow Task 
  • Double-click it.  
  • Add an "OLE DB Source" to the designer
  • Connect to your instance/db where you have dbo.SampleDataContract installed
  • Change the "Data access mode" to SQL Command and enter the call to our stored proc
  • Click "Preview..."
  • You should immediately generate an error 
  • The error text is:  

Exception from HRESULT:0xC020204A
Error code: 0x80004005
Description: "Invalid object name ''.".
Unable to retrieve column information from the data source. Make sure your target table in the database is available.  

This error occurs because you decided to use a temp table (#temptable...the kind in tempdb) somewhere in the text of your procedure.  That's not allowed.  SSIS (and SSRS and a number of ORM tools, etc) attempt to read the metadata about your procedure to determine what the output will look like (number of columns, names, and data types).  The error is telling you that the metadata could not be found for the temp table that you are attempting to use.  

BTW, this will not happen if you usse @TableVariables.  However, there are performance ramifications of using table variables in some versions of SQL Server.  

Data Contracts to the Rescue

I've written about Data Contracts for Stored Procedures before.  Quick recap...in the Oracle world you CANNOT return a result set from a stored procedure.  Instead you create a package to do this.  In the package "header" you declare the "data contract" (inputs and outputs, including ordinal position, name, and datatype).  Then, in the package body you write the actual code and that code must adhere to the declared data contract.  In the example, emp_actions is the package and it implements 2 procedures, a cursor (which is nothing like a SQL Server cursor, it is more like an ADO cursor in that it is a "CURrent Set Of Records"...ie a result set) and the output format of the cursor, which is known as a record type.  Note that the package body actually outputs the data when someone queries for the desc_salary result set.  

When I first started working with Oracle I thought this syntax was EXTREMELY cumbersome.  Too much typing of redundant information (a CURSOR is declared over a TYPE and then the actual query has to be bound to the CURSOR and to the TYPE.  Too much can go wrong here.  

I thought this was cumbersome until I started working with SQL Server in the Nineties.  SQL Server then only partially declared the full data contract.  Stored procs declared the inputs but not the outputs.  In later releases M$ graciously gave us functions and table-valued parameters which FULLY implement data contracts.  But they never fixed procedures to fully support data contracts.  Why do we need data contracts for stored procs?  See this connect item for a list of reasons...but having a true data contract solves the SSIS problem I noted above.  In this post a blogger lists exactly what he would want to see in a fully implemented data contract for stored procedures.  That's a great list.  

While having full-blown data contracts would be nice, you don't have to wait for M$ to roll your own data contracts and solve some of these issues.  Here's how I do it.  At the top of your stored procedure add a little code block that can NEVER execute that defines your data contract.  When these "smart tools" like SSIS, SSRS, and most ORMs try to read your metadata they see that you have a SELECT statement and determine that is the "shape" of the result set.  Even though the SELECT code block is in a IF block that will never be executed (note the IF 1 = 0).  Just doing that little bit of work is enough to fool EVERY "smart tool" that looks for result set metadata.  

Mocking up a data contract really fixes this?

Yes.  Let's demo.  Go back to SSIS to your OLE DB Source component.  Click the "Preview..." button again.  Last time we got an error.  This time we get our result set as expected.  


Creating your own mocked-up data contract is an easy way to overcome tools that attempt to read the metadata of your procedures and fail because they contain references to a temp table.  If your SSIS packages require the IsSorted flag you can even put an ORDER BY clause on your "data contract" and your output will automatically have the IsSorted flag set.  


You have just read "SSIS and data contracts" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Service Broker Demystified - Why do we need Services and Queues?

People claim that they don't want to use Service Broker because it is too complicated. I started a blog series called Service Broker Demystified because SSB really isn't that difficult if you understand some basic concepts. Folks don't understand why both a "Service" and a "Queue" are needed.  Why not just have one object?  In this post I'll show you why and give you a quick rule-of-thumb to avoid the confusion.  

Service Broker Demystified - Why is there no ALTER CONTRACT statement?

The concept of "contracts" in Service Broker is initially confusing to most data professionals.  I like to think of a contract as a constraint applied to a message.  Then why can't you ALTER a contract?  Because a contract is really more like a "legally-binding contract."  There are good reasons why contracts can't/shouldn't be altered.

Service Broker Demystified - Contracts and Message Types

Contracts and Message Types are the "table constraints" of the Service Broker world.  Like table constraints, they aren't required, but they keep you from doing stupid stuff with your Service Broker design.  In this post I'll cover some confusing aspects of contracts and message types.  


Subscribe to RSS - blogs