MD3 Extensibility Stories

Over the last several posts we've covered how [[Introduction to Metadata Driven Database Deployments|MD3]] handles metadata-driven, properties-based database deployments.  Perhaps you still aren't convinced that a declarative approach to your deployments is worth the effort.  Today I'm going to cover how I've evolved the MD3 system over the years to overcome problems that would have been much harder to solve if I used hand-crafted DDL or a GUI-based "database compare" tool instead of MD3.  Hopefully this post will give you some ideas regarding how you can use a properties-based deployment mindset to make your database deployments better, even if you don't decide to use MD3.   

Adding support for DATA_COMPRESSION to MD3.CreateIndex

I consulted for a huge ISV with ~60 customers running multi-terabyte databases.  When we (finally) certified our application to run on SQL Server 2008 it wasn't long before our customers asked us to consider supporting DATA_COMPRESSION on our indexes.  I did some testing and determined that ROW compression was an awesome new feature.  In no case did it perform worse for us and in most cases not only did we save $$$ on storage costs but we were able to force more data into buffer cache resulting in faster query runtimes.  We decided that the default compression setting for any NEW indexes would be ROW.  So I modified MD3.CreateIndex to support adding ROW to all new indexes unless requested otherwise.  

It also made a ton of sense to migrate all of our huge existing indexes to ROW as well.  But there are some side-effects with DATA_COMPRESSION  that made this difficult:  

  1. If we ran ALTER INDEX...DATA_COMPRESSION=ROW on our large indexes this would cause a lot of downtime that we couldn't afford.  
  2. We could run these commands with ONLINE=ON but there are some use cases where SQL Server does not allow an index to be rebuilt ONLINE.  For instance, the clustered index cannot be rebuilt with ONLINE=ON if the underlying table has a column with a LOB data type.  
  3. If your table is replicated you may see some goofy issues regarding how your publications and articles are scripted and snapshotted.  This can cause excessive downtime for your subscribers if you don't handle this according to your requirements.  It also causes excessive log reader latency and lots of tran log usage.  Sometimes it's better to rebuild them offline, if possible.  

So we determined that we needed the following requirements:  

Quick Links
[[Introduction to Metadata Driven Database Deployments]]
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)
  1. By default, all NEW indexes should have ROW compression.  
  2. EXISTING indexes on replicated tables would NOT have ROW compression enabled by our scripts to avoid Problem 3 above.  However, if a customer wanted to change this value given the caveats in Problem 3 our scripts would honor those changes.  
  3. There were a few EXISTING indexes on replicated tables where the benefits of compression outweighed the costs of Problem 3.  In that case we had an OVERRIDE flag where we could override Requirement 2.  
  4. All EXISTING indexes would be "queued" using SB to have their compression settings changed off-hours on a rolling schedule.  We didn't want to extend database upgrade downtimes for this feature, even if we did ONLINE=ON rebuilds.  
  5. Since these requirements were really complex their was fear among my colleagues and management.  We added an MD3 configuration parameter to quickly disable teh "data compression" code if it was found to be problematic.  

Imagine having to code all of those rules into every index DDL script you have.  Very daunting and error-prone.  With MD3 this was a few days of coding and quite a few days of testing, mostly testing the effects on replication.  We automated as many tests as we could with tsqlt.  We actually didn't have to touch any of the individual MD3 index scripts, merely the code for MD3.CreateIndex.  Although I coded and tested Requirement 5 (the on/off switch) we never had to use it because we had CIT in place testing various customers' databases in our performance lab and we were able to work out the implementation bugs well before the first production deployment.  

I haven't open-sourced any of this "custom" data compression code because it's frankly confusing and very specific to this ISV's implementation.  However, MD3 does support data compression as a standard property.  

A custom FILLFACTOR

MD3.CreateIndex fully supports FILLFACTOR.  I worked for a large ISV that implemented MD3 and most of their indexes used a FILLFACTOR of 100, with a few carefully-tested exceptions.  Quick review, MD3.CreateIndex takes @FillFactor as a parameter.  If the @FillFactor does not match the current index's fillfactor then that is detected as a change and the index is rebuilt so the index matches the requested properties.  

Our release's scripts were tested using CIT on various customer databases and worked flawlessly.  But our largest customer had a HUGE database, so huge that they couldn't afford (read: didn't want to pay for) a non-prod FULL COPY of their database.  They instead had some custom scripts that copied all master tables to their staging environments and then a "representative" set of recent transactional data.  These scripts were very intense to ensure that they could copy data to staging and still maintain relational fidelity.  Frankly, I always thought this process would blow up, but it always seemed to work.  Their staging database was less than 1% the size of their prod db using this process.  

Their staging db upgraded to the next release perfectly and we estimated that applying those MD3 scripts to their HUGE prod db would take about an hour of downtime (we did a "deferred deployment", discussed below).  Using MD3 I never got anxious during a customer upgrade event because I knew our scripts were vigorously tested.  However, that was about to change.  

At 3AM on the Saturday morning of the upgrade event I awoke to my cell phone ringing.  This customer's MD3 upgrade was running for 6 hours and appeared to be rebuilding EVERY index in their database.  Unfortunately I had a few adult beverages the night before so it took me some time to get ready for a bridge call to discuss the problem.  When I looked at MD3.Log I noticed that, indeed, every index was being rebuilt due to a property change...the fillfactors changed.  I queried sys.indexes and saw that each of their not-yet-upgraded indexes had its FillFactor set to 90.  Huh?  

It turns out that over the past year the customer's DBAs decided to change the FillFactors on ALMOST EVERY index in the database to 90.  It took them a year to do this, running a few indexes every night.  Now MD3 was undoing all of this work.  

People were panicking.  Our engineers were pointing fingers at the customer..."why did you change our indexes?".  The customer DBAs were blaming us for not properly testing our upgrade.  Plans were being discussed to roll back the database and redeploy the previous release's JVMs and try the upgrade again later.  

Meanwhile, this wasn't a problem for MD3.  While tempers flared I stopped the deployment and I modified 3 lines of code in MD3.CreateIndex to ignore FILLFACTOR differences when examining existing index properties.  I did a couple of quick tsqlt tests and told everyone on the call that I was ready to restart the deployment.  I was calm and confident, which caused everyone else to calm down.  We restarted the deployment, which completed this time in about 20 minutes.  

Notice how easy it was to overcome a potentially devastating problem with MD3.  That would be much harder to do with other deployment tools.  

As a side note...this customer's "custom fillfactors" spawned talks with them regarding why they decided to alter our fillfactors.  After we looked at their fragmentation statistics and we thought through our application data access we decided that for many tables a fillfactor of 90, instead of 100, was indeed warranted.  So we slowly implemented this change for every customer over the next few releases.  

Another MD3 win.  

Daily Software Releases to Production

When I started consulting at one particular ISV they released new versions of their software every 18 months.  Very anti-agile for a company that claimed to be agile.  There are so many problems with 18 month release cycles and I don't want to get off on a tangent but the biggest problem is that when a release comes so infrequently it leads to a mentality of only testing the release deployment in the last few months of the cycle.  This is one of those cases where "waterfall" development really does suck.  

Suffice it to say that the release deployment was rarely tested properly and customers cringed whenever a new release was deployed.  These events always caused excessive downtime and custom DDL to get the deployment to succeed.  We had customers that flat-out refused to upgrade their application because of this.  One customer was running a 5 year old version of our software running on an unsupported release of SQL Server.  During user group conferences the biggest customer complaint was botched upgrade events.  We asked for beta testers for each release but rarely found any willing customers.  Regulatory changes forced customers to upgrade far more than the desire for new feature functionality or bug fixes.  

We had to change these mentalities.  MD3 lends itself to proper CIT (see [[Continuous Integration Testing with MD3]]) and repeatable deployments with less hand-crafted DDL.  The first release that utilized MD3 was a complete success.  Our sole beta customer raved about the fact that we met our downtime goal (by HOURS) and we had no "Day One Impacts".  Then our "early adopters" reported similar successes.  We managed to deploy that first MD3 release without any significant issues for any customers.  

Morale improved...customer morale, developer morale, support morale.  

The next release had 3 customers volunteer to be beta testers.  Again, no major issues.  The upgrade experience changed so much that customers were now requesting more frequent releases to get bugs fixed and new functionality quicker.  And we delivered.  I won't lie and say there were no issues and customers were completely happy with our software, but deployments were no longer their top complaint...not even in the Top Ten.  

We even had one customer where we released new versions of our software on a weekly basis.  NO LIE.  The customer realized, as we did, that more frequent releases actually manages risk better than infrequent, monolithic deployments.  With good automation in place it was no more difficult to deploy to that customer than it was to deploy to a QA environment.  Just run the deployment.  

I firmly believe that we could've delivered daily releases.  That may sound radical but using MD3 we automated database deployments, CIT, and unit testing.  It was common for a bug fix to be properly QA'd and full regression testing done on the same day it was fixed.  

DevOps and NoOps

DevOps seems to be the new thing.  DevOps is a software development method where your developers and Ops Guys share responsibilities and work together.  MD3 encourages this.  In some shops DBAs may find a perf problem and immediately apply an index with hand-crafted DDL (or via the SSMS GUI).  MD3 syntax is so easy that this can be done and properly checked in to the next build by an Ops Guy.  Some DBAs don't see the value in source control for day-to-day DBA tasks.  I've seen this mentality change first hand with the introduction of MD3.  I don't see this happen using other deployment tools where the DBA may not have a license or access to the tool (Visual Studio database projects come to mind).  

As a data architect I can often guess at what indexes a new table may need during the design phase of a feature.  However, I don't overthink indexes anymore with MD3.  Instead I make my best guess and then let our performance engineers and DBAs monitor the system (missing index DMVs, query plan analysis, etc) and make their own index changes by modifying the MD3 index scripts.  This makes everyone much more productive.  Index selection, IMHO, should be the domain of data professionals that understand indexing, not by Java developers who create a table for simple persistence but forget to apply a PK or a clustered index.  Instead, that job can be passed off to an expert.  

For a while our team was a true DevOps team with data developers, architects, modelers, perf engineers, and DBAs.  We touted it and loved it.  But we realized over time that we were becoming a NoOps team.  There was less operational work to be done as we automated things properly with tools like MD3 that are easy to automate.  We still had DBAs focused on replication, clustering, HADR, backups, etc.  But they spent less time on support tasks as we introduced more automation, usually with MD3.  For instance, I have an MD3 module that covers general replication tasks that is fully automated.  For instance, a simple MD3 command exists to add an article to a publication if it doesn't exist.  

Don't worry, we didn't fire our Ops Guys.  Instead they embraced these changes and became Dev Guys focused on automating operational tasks and lending operational-domain expertise to Dev Guys.  Our product improved because our ilities (non-functional requirements)...like supportability...improved.  Our Dev Guys now started to question their decisions and think more about the future supportability of the product they were building.  

Here's an example.  I see a lot of shops that automate builds and non-prod deployments using tools like ant and Jenkins.  That's great.  But when it comes time for the production deployment they revert back to manual deployments done by DBAs and Ops Guys.  This makes no sense.  "Prod" is just another environment that should be deployed to using the same tooling as any other environment.  The usual argument for this is that a prod deployment contains extra steps like backups, notifications, replication teardown/setup, user smoke tests, etc.  But those arguments are flawed.  Those items CAN and SHOULD be automated and should be performed and tested in the non-prod environments as well.  The goal is repeatable processes.  

Once you get to this level you are no longer a DevOps shop, you are a NoOps shop.  MD3 doesn't magically make this happen, but it is one tool and set of processes and practices that can help you get there better and faster.  

Deferred Deployments

One ISV that implemented MD3 was very concerned about unnecessary downtime due to schema changes that could be executed shortly after the system was brought back online after a deployment.  An example is index changes.  Some index changes must occur with the application upgrade because the index may introduce a new unique index/key.  Other index changes are simply for performance reasons and can be deployed after the system is live using ONLINE=ON...if possible.  

This was a breeze to implement in MD3.  We added an option to "defer" certain changes that were time-consuming, if the @Defer option was set in MD3.CreateIndex (for example).  The deferred items were run after the deployment using Service Broker.  Our downtime deployment then consisted of what was minimally needed for the application to be brought back up.  Generally this was only table changes and changes to database "methods" (views, functions, stored procs).  This reduced downtime deployments to just a few minutes.  

Later we dropped the downtime to 7 seconds using ALTER SCHEMA...TRANSFER and MD3...but that's a different post.  

Summary

The above stories are just a few use cases where MD3 and the underlying principles of declarative, metadata driven deployments helped my clients overcome some daunting database deployment hurdles with ease.  There is a perception in our industry that relational schemas are too difficult to "evolve" over time, leading the the introduction of schema-less NoSQL solutions.  The fact is, relational schemas are easy to evolve when you have the correct tools and practices to do it.  The above stories are just a few cases where we had difficult problems and we solved them quickly using good processes.

 


You have just read "[[MD3 Extensibility Stories]]" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.