DaveWentzel.com            All Things Data

Data Architecture

MD3 Script Idempotency and MD3.CreateColumn

Quick Links
Introduction to Metadata Driven Database Deployments
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

In the last few posts of my MD3 blog series I've focused on how MD3 handles "stateful" database object deployments.  Today I'm going to cover the most important stateful object in your database...the table.  You really want to make sure you get your table scripts correct otherwise you risk data los.  Basic CREATE TABLE DDL is already really simple to understand so MD3 doesn't get in your way with some cumbersome new syntax that you need to learn.  However, MD3 adds some scaffolding code around the DDL to help you make your deployments bullet proof.  Here's an example:  

MD3 includes a script generator that will put your existing tables into the above format automatically and will be covered in How to Reverse-Engineer Your Database with MD3.  But the script generator will not handle the "Alter" section of the script since this is the stateful history that you must provide.  Sorry, but there's no way around that...no tool will provide this for you.  

First, All MD3 scripts must be idempotent

An idempotent operation is one that can be applied multiple times without changing the result beyone the initial application.  In SQL this is an operation that can be "rerun" without having any unintended side affects.  For instance, this piece of SQL is NOT idempotent:  

UPDATE dbo.Employees SET Salary = Salary * 1.10

Every time you run this code it will give an additional 10% raise to every employee.  

Every MD3 script must be idempotent, or "rerunnable", without "side-effects".  This is because MD3 runs ALL scripts during each deployment.  Some folks struggle with this concept.  Two suggestions:  ensure that you create proper CI testing (see Continuous Integration Testing with MD3) and whenever you write a script simply press F5 twice, then see if the results are what you expected.  Some of my clients call this the "F5 F5 test".  After a few weeks you'll be writing rerunnable scripts without even realizing it.  


...will help you add or alter a column based on the properties you provide.  Here are the parameters:  

There are some benefits to using MD3.CreateColumn vs using standard DDL:  

  • removes constraints (unique, foreign key, check, and PK) if the column change requires those objects to be dropped
  • removes indexes from the col, if needed
  • less esoteric DDL to remember

When you drop a column with @DropColumn (instead of using ALTER TABLE...DROP) MD3 will automatically remove the dependent objects and will provide the developer with a message like this to remind them to remove those objects from source control.  

Back to our Example

All MD3 scripts must be rerunnable.  Note that the table script pattern above isn't required with MD3, but I've found this is the best pattern to use.  In this case the script is rerunnable because the table will only be created on the first execution of the script.  On subsequent executions only the PRINT statement will be output.  

Let's assume that you have a new requirement to add a URL column to this table.  Using the above pattern this is how I would do it.  Note that we add the URL code to two places...the CREATE TABLE statement (covers a "net new" database) and as a call to MD3.CreateColumn (covers upgraded databases.  On the second and subsequent executions of this script nothing will happen since the URL column has already been created as declared.  

Interspersing DDL with data migration code

Some shops absolutely insist on separating DDL from data migration code.  This is wrong.  In many cases you must run data migration code either before or after your DDL runs.  In these situations the data migration code is tightly coupled with the DDL and should be stored together.  This aids in documentation and understanding for future developers.  

Let's look at a simple, contrived example.  For some reason you have a requirement to shrink the GroupName column from varchar(500) to varchar(100).  Further for existing data the requirement is simply to retain the first 100 characters.  Here's how you would accomplish that using my MD3 patterns.   First, you would change the definition in the CREATE TABLE statement (omitted from screenshot).  Then you would run the UPDATE statement to retain existing data, then you would make the call to MD3.CreateColumn changing the parameter for @Length.  As a side note, the UPDATE statement must be placed in dynamic sql otherwise the script will not compile on net new databases because the table does not actually exist.  This is goofiness with TSQL, not with MD3.  You can of course add any additional code you may want here.  A backup of the table (or at least the GroupName column) would be a wise idea, just in case.  

Add a non-nullable column to a table

This is another common use case that I see.  This is a three step process:  

  1. Add a nullable column to the table
  2. UPDATE the column to the default value for existing data per your requirements.
  3. Set the column to NOT NULL.  

Here's how we do it with MD3.  If you are sensitive to downtime and this is a large table we can do most of this work while the system is online, if we prepare properly and write the code to accomodate this.  

Here we've added an AccountingCode column and populated existing data with a "default" code, then set the column to NOT NULL.  


All MD3 scripts must be rerunnable.  This is, frankly, a good habit to get into.  You never know who'll accidentally rerun your script without you knowing so scripts you write should never have "side-effects".  We then looked at the pattern to use for MD3 table scripts.  You don't have to use this pattern, but in my experience this pattern works best.  There are scripts available with MD3 that will reverse-engineer your database tables into this pattern.  

In the next post we'll take a break from the technical aspects of MD3 and look at The OO Design Principles of MD3.  When you understand these principles you'll see how a tool like MD3 can solve a lot of problems that DDL can't.  

You have just read "MD3 Script Idempotency and MD3.CreateColumn" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3.CreatePrimaryKey in Action

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered MD3 and "state" which showed how MD3 deploys "stateful database objects" using The Four Rules.  In that post we covered how MD3.CreateIndex handles stateful index deployments.  In The Other MD3.CreateProcedures we covered the other stateful object deployment procedures briefly.  In this post I want to cover some use cases where MD3.CreatePrimaryKey can really help you evolve your database without having to code complex, error-prone DDL.  

In the AdventureWorks sample database there is a table named HumanResources.Department.  The PK is clustered and there is also one non-clustered index called AK_Department_Name.  

After doing some performance testing you determine that the table should really be clustered on Name and the PK should instead be non-clustered.  That seems like an easy request so you start by trying to drop the existing clustered PK first.  

But as you can see, that won't work because there is an FK that references that PK that would need to be dropped first.  Dropping the FKs, in turn, will require you to recreate those FKs later, and any associated indexes, which is a lot of code to write, test, and deploy.  

In MD3 this is much simpler.  In MD3 and "state" I showed how there are really two ways to deploy stateful objects...by calling the MD3 procedures directly or by populating a metadata table with the desired properties.  

Using the metadata table script

This is by far the easiest method.  The metadata table that holds PK information is called MD3.PrimaryKeys.  Here is what the (abbreviated) deployment script looks like for AdventureWorks' primary keys:

Let's a little hard to read so here is just the metadata entries specific for HumanResources.Department:  

Since MD3 is declarative I only need to change the entry for the IsClustered property and I'm done.  Here is the code I changed...the IsClustered property was changed from 1 to 0:

Now let's execute that script just like a standard deployment.  Here is the output:  

This is visible to the developer executing the script on her local development database, as well as the in the logs when this is finally deployed to your production databases.  Since we are modifying a PK that requires the associated FKs to be dropped a message is logged to remind the developer that either the FK MD3 scripts will need to be rerun, or if running a full MD3 deployment this dependency is handled automatically and the FKs will be run.  No custom DDL...very fast.  

For a point of reference, here is FK MD3 script that the developer would run, or modify as necessary.  And of course all of this can be run automatically using the MD3 deployer (RunMD3.ps1:  The MD3 Executor).  

And when you run the FK script you will see this output:  

Note that all FKs already exist EXCEPT the FK that was automatically removed when we altered the PK.  In the post The Other MD3.Create Procedures I mentioned that all FKs will try to first be applied as trusted, and you can see that this is in fact the case.  

Using a direct call to MD3.CreatePrimaryKey

About half of my MD3 clients don't like keeping all of their PK properties in one metadata table (see the screenshots above).  Instead, some like to create their stateful objects using calls to the MD3 procedures directly.  They keep these calls in .sql files segregated by table name.  There is a scripting utilitythat will script your existing stateful database objects into either the metadata format or direct MD3 calls (nodetitle).  

HumanResources.Department is currently configured with its PK as non-clustered.  This is what the direct call would look like.  Note that when I executed this I simply see "Command(s) completed successfully" because, according to MD3 and The Four Rules, the object already exists with the same name and properties, so it silently continues.

Let's say that after a while you determine that your decision to make DepartmentId a non-clustered PK was a bad idea and it really should in fact be a clustered index.  What do we do?  Write a bunch of hand-crafted, error-prone DDL?  Nope, we simply change the @IsClustered flag.   When a developer does this she sees these messages:  

Note that the FKs were automatically dropped and a helpful message tells the developer to update the FK scripts if necessary.  These are the same messages that will be seen during standard deployments as well.  

Summary and Benefits of Using MD3

The goal of this post was to cover how the MD3 scripts in general, and MD3.CreatePrimary in particular, can spare you from writing a lot of complex, error-prone DDL.  It is much faster to make these changes using MD3 and then have the proper CI testing in place (nodetitle) to ensure all valid deployment paths are covered.  

There are certainly lots of GUI-based tools that will also drop and recreate your dependent FKs whenever a PK changes.  However, most of the implementation details are hidden from the developer and DBA.  What if you need a custom deployment mechanism that those tools don't provide?  MD3 can be customized...MD3.CreatePrimaryKey is just TSQL.  In MD3 Extensibility Stories I'll give you a bunch of common use cases where MD3 can provide a quick solution where these tools cannot.  Further, since MD3 is just a bunch of TSQL calls these scripts can be properly versioned so you can quickly see the evolution of your database from release to release.  That's much harder to do with a GUI compare tool.  



You have just read "MD3.CreatePrimaryKey in Action" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

The Other MD3.Create Procedures

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered nodetitle which covered how MD3 deploys "stateful database objects" using The Four Rules.  If you handle The Four Rules correctly (and MD3 can greatly help you with this) then your database deployments should be a breeze.  No more hand-crafted, error-prone DDL.  In this post we are going to cover the other MD3.Create* procedures which help with database deployments and The Four Rules.  

The Other Stateful Database Objects

Here are the other stateful database objects and how MD3 handles their deployment.  In most cases this is handled with an MD3.Create* procedure.  These procedures deploy your stateful objects "smartly".  

Stateful Object MD3's deployment mechanism
Tables/columns Tables are created using standard DDL and can be scripted using SSMS tools.  Columns can be added or altered using MD3.CreateColumn, this will be discussed further in the post MD3 Script Idempotency and MD3.CreateColumn(source code).  
Indexes This was covered in the post MD3 and "state" using MD3.CreateIndex.  Source code.  
Primary Keys We'll cover this below.  Source code for MD3.CreatePrimaryKey.  
Foreign Keys We'll cover this below.  Source code for MD3.CreateForeignKey.  
CHECK constraints We'll cover this below.  Source code for MD3.CreateCheckConstraint.  
DEFAULT constraints Handled with MD3.CreateDefaultConstraint, and covered below.  This may not seem like a "stateful" database object, but it is.  Source Code.  
Model Data (states/provinces, valid order statuses, other "lookup" data) Covered in the post MD3 Model Data Patterns

Each of the above referenced MD3.Create procedures handles The Four Rules correctly.  Let's look at these routines.  


Conceptually, primary keys are very similar to indexes.  You can cluster them, give them a name, give them compound keys, put them on a filegroup, etc.  Therefore it shouldn't be surprising that MD3.CreatePrimaryKey takes the exact same parameters as MD3.CreateIndex.  

The difference is what each procedure does under-the-covers.  If you change a PK you may actually have to deal with objects that depend on it.  Here's a list of objects that are dependent on PKs and how MD3 handles PK changes.  You can of course change this functionality to suit your requirements:  

  • If the table is replicated then the table must be removed from any subscribed publications.  Currently MD3 merely throws an error if this situation exists. 
  • Removes all FKs that reference the PK.  MD3.CreatePrimaryKey simply removes them.  Later, when MD3.CreateForeignKey is executed (remember, MD3 handles the dependent ordering of objects for you) the FKs will be recreated properly with the new keys.  
  • If we are changing the PK to be clustered we first drop an existing clustered index that already exists on the table.  As with foreign keys, MD3 handles the dependencies for you...meaning the dropped clustered index can be recreated later as a non-clustered index if required.  

In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where MD3 can save you vast amounts of time and rework when you have to modify a primary key.  Stay tuned.  


FKs are stateful database objects because they require a properly-applied "parent" key to be in place first.  FKs can also be very expensive to recreate if the associated tables are large.  Here are the parameters (properties) for MD3.CreateForeignKey:

Like the other stateful object creator routines, MDs3.CreateForeignKey properly implements The Four Rules properly.  The parameters should be self-explanatory, except maybe @AllowNoCheck.  Constraints can be either TRUSTED or UNTRUSTED. When you create a FK using ALTER TABLE...WITH CHECK ADD CONSTRAINT you are saying that existing data in the table must adhere to the constraint, as well as any future data.  When you create a FK using ALTER TABLE...WITH NOCHECK ADD CONSTRAINT you are allowing existing data to violate the constraint, it will be enforced for future data only.  This is an untrusted constraint.  There are 2 reasons why untrusted constraints are frowned upon:  

  1. It means you may have data that violates your constraint and you should probably determine why that is, and fix it.  
  2. Some query plans will be less-optimal with an untrusted constraint, leading to possible performance issues.  

MD3 by default does NOT allow untrusted FKs to be created.  However, the @AllowNoCheck allows you to override that, if needed.  MD3 will try first to create the constraint using WITH CHECK and then will fallback to WITH NOCHECK if you have @AllowNoCheck set.  MD3 throws a WARNING in this case that you can then research later.  


You may wonder why CHECK constraints are stateful database objects.  The reason is that recreating a CHECK constraint on a large table can be EXTREMELY time consuming as all data must be read and evaluated.  We also need to take into consideration whethere the constraint is trusted or not, just like with foreign keys.  Here is the set of parameters that MD3.CreateCheckConstraint uses:  



Defaults are also stateful objects only because they are declared very similarly to CHECK constraints and the code is very similar.  


In nodetitle I covered how MD3 covers stateful database objects, specifically indexes.  In this post I covered how MD3 deploys the other stateful database objects like PKs, FKs, CHECK constraints, and defaults.  In MD3 Script Idempotency and MD3.CreateColumn I'm going to cover what is probably the most important stateful database object...the table.  In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where using MD3 instead of hand-crafted DDL can save you a lot of grief.  

You have just read "The Other MD3.Create Procedures" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and The Four Rules

In the last few posts I covered what MD3 is (Metadata Driven Database Deployments) and why it is better than hand-crafted DDL.  In this post we are goingto cover The Four Rules of Database Deployments.  Regardless of whether you decide to use MD3 or not, if you understand The Four Rules your database deployments will run much smoother.  

Some Background
In the last post (nodetitle) we covered a scenario where we had to write really complex DDL if we wanted to deploy an index change properly.  We need an index, called MyIndex, created on dbo.Example with a fill_factor of 90, on column Foo, with included_column Bux.  Here is the DDL:
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem in the post nodetitle).  In MD3 DDL is much simpler.  We just "declare" the properties of our object and let MD3 determine what needs to be done to get us to "model".  Basically, we enter "metadata" about our object into an MD3 table (for instance MD3.Indexes), which looks very similar to the "sys" system tables (in this case sys.indexes).  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  Under the covers MD3 is making calls to MD3.CreateIndex (which you can call instead of populating the MD3.Indexes table) similar to the nearby graphic that is adding the index nciFoo to dbo.Example.  
Each parameter to MD3.CreateIndex (and each column in the MD3.Indexes table (see above screenshot) is a metadata property that we can declare for an index.  These properties cover just about every option you can think of regarding an index.  There are some properties missing...for instance, there is no option for filtered indexes because I've never really found a good use case in my career where I absolutely needed a filtered index.  Since MD3 is open source you could always add this functionality if you needed it.  Here are some other index properties that are missing and why:
Index Property Why it is missing?
WHERE <filter predicate> filtered indexes.  You can always add this yourself.  
IGNORE_DUP_KEY I've never found a good use for this option.  
STATISTICS_NORECOMPUTE I've never found a need for this.  
DROP_EXISTING ON|OFF MD3.CreateIndex will smartly determine if this should be set to ON or OFF based on what is being asked.  
ONLINE = ON|OFF MD3.CreateIndex will determine whether this option should be used or not for you, based on whether the index follows the rules for ONLINE index creation.  Here is another case where MD3 is a huge help.  

So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL? 

The Four Rules of Stateful Object Deployment
If you always follow The Four Rules when you code your DDL you will always have a successful deployment (no failures!) and you will never do unnecessary DDL. MD3 does all of this for you.  Assume you have some DDL that you need to apply:  
  1. if the object exists by name and by "properties", then silently continue, there is no work to do.  In other words, there was no change in the object between deployments.  The "properties" are the parameters to the MD3.Create procedures (see the screenshot above).  
  2. if an object exists with the same “properties”, but the name is different, then RENAME the object.  Example:  I have an index with the same column list, order, fill factor, on the same filegroup, with the same compression, etc...but it is named differently.  All MD3 does is run sp_rename.  This scenario occurs, for instance, when your DBA adds an emergency index to a table and you check in the same index structure, but you use your naming conventions.  
  3. if the object exists by name, but the "properties" are different, ALTER the object accordingly.  Example, you have an index with the same name and mostly the same properties, but you need to change the INCLUDED col list and FILLFACTOR.  MD3 handles this without esoteric, hand-crafted DDL. Usually an ALTER can bring the current object up to the desired properties, otherwise DROP/CREATE is used.  The most important thing to remember is that YOU don't have to code the DDL and worry about covering all possible scenarios.  
  4. Else, CREATE object with properties as defined.  At this point MD3 realizes this is a missing object and creates it according to your declarations.  
If you don't do The Four Rules correctly you risk, at a minimum, recreating expensive objects at every deployment.  You don't need MD3 to do a DDL deployment, but it makes it a lot easier and less error-prone.  
An Example
Let's take our example code again and determine what MD3 will do in various situations regarding The Four Rules.  The below table is kinda like the "unit tests" for MD3.CreateIndex (tsqlt unit tests available on request).  
Database State Expected Result Why? Covers which of The Four Rules
dbo.Example table does not exist FAILURE This scenario will be caught during continuous integration testing and will never happen during a production deployment.  (See nodetitle) None.  
index nciFoo does not exist Index created The index will be created with the declared properties.   4
An index exists called dbaFoo with the same properties Index renamed The index will be renamed using sp_rename to nciFoo.  This ensures that a duplicate index is not created with the same properties.   2
nciFoo exists, but with different "properties" than what is declared Index altered When possible the index will be ALTERed.  Otherwise the existing index will be dropped and the desired index will be created.  In some cases, depending on what is being changed, an index cannot be ALTERed.   3
nciFoo exists with identical properties Silently continue There is no work to do.   1
In this post we looked at The Four Rules of stateful database object deployments.  If you follow The Four Rules when you deploy your objects you will find that you have fewer errors and more reliable, repeatable deployments.  You don't have to use MD3 to use The Four Rules.  You can use any tool, even plain 'ol DDL.  MD3 just makes everything a whole lot easier.  
In the next post, nodetitle, we'll cover how The Four Rules are applied to other stateful database objects.  In a later post we'll cover how MD3's model data patterns work with The Four Rules to ensure your model/system/lookup data deployments work just like your other stateful object deployments.  

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and "state"

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last post (Why do we need another database deployment tool like MD3?) I covered some reasons why database deployments give DBAs and developers fits.  In a nutshell, it's because we write too much hand-crafted DDL..  Even when we use third-party GUI tools to manage our deployments we still hit edge cases where the migration isn't smooth.  So why are schema changes so difficult?  

Database Object State and Builds
Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

When you "build" your Java app (...or C# or whatever...) you are really just compiling the code on your machine.  If compilation is successful then you can rest assured that it will compile on, basically, any machine (OS and processor architecture might be exceptions).  The deployment is simply a matter of copying your compiled code.  In the .NET world you usually build a Windows Installer package or perform an "xcopy deployment".  Build and deploy is really easy, as is building the necessary "CI loops" (...they run your unit tests...) to ensure your builds aren't broken when a developer checks in a change.  

Databases work differently.  Just because your scripts compile and deploy on your machine/database does not mean they will deploy anywhere else.
Java code, when it compiles, couldn't care less about the "state" of the previous build.  The code is “stateless”.  Some database code, on the other hand, is “stateless” while some is “stateful”.  Database developers need to worry about “history”, or "state", which is the existing data in the tables that must be maintained into the next release.  The code must do more than just compile, it must be respectful of existing data.  And data isn't the only stateful database object.  You don't want to recreate a bunch of indexes needlessly either.
Stateless Objects Stateful Objects
Views Tables/columns
Procedures Indexes
Functions Constraints (PKs/FKs/CHECK)
Model Data (sometimes called seed/lookup/master data).  An example is a list of states and abbreviations. This data rarely changes and when it does it changes via an approved process (ie, a Master Data Management update) Model Data (if the data can be altered by the customer.  An example is order status codes.  If you allow your customer to change (or add to) their model data then you don't want to "reset" their model data with your updates...or...maybe you do depending on the scenario.  
Stateless Object Patterns
Stateless db objects are the easiest to deploy...you simply compile them.  You do not care about what the previous version of the object looks like, you simply create the new version, overwriting the old.  Just like you would overwrite the last Java build with the current one.  Traditionally you code stateless objects using a pattern similar to the graphic to the right.  
Since the object, in this case a stored procedure, is stateless we can simply drop it and re-create it.  When doing that we lose our permissions so we must re-apply them.  It would be nice if we had the CREATE or REPLACE pattern for Transact SQL like Oracle does.  CREATE OR REPLACE is simplified syntax that obviates the need for re-applying permissions and checking if the object exists.  Oh well...maybe in the next replace of SQL Server.  
Generally you should deploy your stateless objects near the end of your deployment.  There are some exceptions to this rule:  
  1. You may have object dependencies which require certain stateless objects to deployed in a specific order.  For instance, if viewA references functionB which in turn references viewC then you have to ensure those objects are applied in a certain order.  
  2. if you have SCHEMABINDING applied to any of your objects and you try to first manipulate a stateful object (such as a table) you'll need to drop the object with the SCHEMABINDING clause first.  

MD3 handles all of this for you and in the demo files I give examples of this and how it is very simple to handle these edge cases. 

Stateful Objects and Patterns
As mentioned before, Java has no "stateful" code...if the code compiles you are good...if it doesn't compile you have problems.  In the database world it is the stateful code that causes database developers fits.  
An example of "stateful" database code that can be maddening is index DDL.  Let's see why that is.  You can download the example code here.   We start by creating a very simple table with a few columns.  Most developers just create their tables and then either worry about the indexes later or let the DBAs worry about it when performance becomes a problem.  At a later time the developes realize that they need to add an index to the Foo column.  If you are using a tool like Visual Studio db projects then you can use very simple CREATE INDEX syntax.  Otherwise you need to do an "existence check" to ensure that a subsequent execution of your script won't generate an "object already exists" error.  
The "existence check" syntax isn't terribly difficult to handle...yet.  Now let's say you notice a lot of page splitting and realize that your table design and data access patterns indicate that you need to change the fillfactor to 90.  Quick...do you remember where the fillfactor settings are stored in the metadata tables so you can modify the "existence check"?  You could just code up something quickly like this:  
The problem is that you will drop and recreate the index at every deployment.  Again, a tool like VS db projects will obviate the need for this.  Better "existence checking" code would be something like this:    
Our code is getting a little more "advanced" and much more "hand-crafted" than I am comfortable with.  
Here's a common scenario...once your index is released to production your DBAs notice that you have a bunch of Bookmark Lookups causing customers to report application slowness and they believe that they should add a couple of included columns to your index.  The DBAs are nice enough to tell you that they've done this and that you need to make this new index "model code".  You determine this is the new index creation code you need: 
Yikes!  That's getting to be really complicated code.  
Here's a slightly different example.  Your DBA calls you and tell you that he saw a gigantic "blocking storm" and needed to add an index on the Bar column.  You agree with his analysis and you both decide to make this a model index. The first block of code is what the DBA added (note the index name doesn't follow your naming conventions) and the second block is the code you made "model".  Do you see the problem?  
You really should've renamed his index, if his index exists.  Now you risk a duplicate index during a deployment.  
A Better Way to Handle Stateful Objects
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem later).  In MD3 DDL is much simpler.  Wouldn't it be nice if you could just enter the "properties" of the desired object in a metadata table and SQL Server just did what it needed to do to make that object for you?  MD3 does that for you.  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  

Now let's say you want to alter that index to change your compression settings from NONE to ROW...just update the entry in MD3.Indexes and rerun MD3. Need to add an Included column?  Change the NULL above to the name of the column you want to include and rerun MD3.  Maybe the index is no longer needed...just change the DropIndex column to 1.  
Under-the-covers MD3 is simply cursor-ing over the metadata table (MD3.Indexes) and determining if there is any work to do (more on how this is done in the next section).  It does this by calling the MD3 stored procedures that do the actual work...for instance, EXEC MD3.CreateIndex.  You can call these procedures and use them yourself without using the MD3.Indexes metadata table.  Here's an example call that creates nciFoo.  Let's say you decide to change this index to have a FillFactor of 90...how do you think you would do that?  That's right, change the parameter and run the procedure.  Want to change the filegroup (or partition)...take a guess at the process?  No complex hand-crafted DDL is required.  
So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL?  MD3 uses The Four Rules.  We'll cover nodetitle in the next post.

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Why do we need another database deployment tool like MD3?

This is the next post in my MD3 series (Metadata Driven Database Deployment).  MD3 is my build-and-deploy system that I've evolved over the past 15 years. Quick history...I worked at an ISV and we supported 4 versions of our software (Version X back to Version X-3) and we needed to support upgrading any of those versions to "current". We had about 40 customers with the average db size ~40GB. To make matters even more complicated we supported either SQL Server or Oracle.  Both had the same schema...mostly.

Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So we needed a tool to upgrade a database reliably without having non-database developers writing complex, error-prone, hand-crafted DDL. Most of our data developers knew Oracle or SQL Server, but few knew both.  I learned a lot about what works and what doesn't when upgrading a database in a complex data environment like this. I created a database CI loop (also included with MD3) before CI loops were sexy and ubiquitous. Every day we tested our db scripts against real customer databases and ensured everything upgraded and functioned properly. We never worried about the actual customer production upgrade event because our CI loops already upgraded the customer hundreds of times (albeit on older, stripped, and scrubbed databases).  

I've deployed versions of this tool at many different employers and clients. Each has been radically successful.  

  • The developers love how easy it is to make schema changes without hand-crafting DDL.  It is totally flexible and customizable to whatever you need.
  • DBAs and app admins love it because it is scriptable, has customizable logging, and show the SQL that is being executed.  Nothing is hidden from the DBA.   
  • Management likes the fact that it requires zero training and creates a repeatable, reliable process.  

The latest incarnation, which I haven't yet open-sourced, is a version that will do a near Zero Downtime database upgrade. Contact me if you might be interested in this.  

Why do we need another database deployment tool?

There are lots of database deployment tools out there.  They each work great...mostly.  But development teams and DBAs don't trust them when it comes to complex schema changes.  Don't believe me?  Assume the following scenario...you need to store new data in your database due to a change in business requirements.  There are lots of ways you could do this but you honestly believe the best way is to refactor and split the largest table in your database into 3 new tables.  
How likely are you to propose this solution to your team?  My guess is not very likely.  The reasons are simple: 
  • Modifying a table's schema requires you to "port" the old database structure to the new one.  And also the data.  This is fraught with danger.  Perhaps each version of the table has also experienced schema changes which you also need to take into consideration.  You have to then change keys (both primary and foreign) and clustering as well as all of the non-clustered indexes.  What if each customer/version has a different set of indexes and keys?  
  • Then you need to modify each database object that accesses the table.  All of your stored procs, functions, and views.  Then you need to modify your Java (or C# or whatever).  All of this scares the average database developer.  
  • Since this is a large table how can we guarantee this won't cause our customer excessive downtime while we migrate the data?  How do we verify the customers' data afterwards?  
NoSQL:  "RDBMS schemas are too hard to change"
Altering schemas scares developers so much that an entire cottage industry grew around the fact that relational databases are too darn hard to "evolve" with ever-changing business rules.  One of the primary reasons the NoSQL movement (whenever I hear this I think, "technology or bowel?") grew, especially document and keystores is because they touted the fact that their technologies allowed you to modify your schemas without needing to worry about modifying all of that data access code that scares us all.  And there was no need to "port" the data from old-to-new schema.  
But relational schemas really aren't that difficult to evolve.  You really don't need to write a bunch of complex hand-crafted DDL.  And you don't need a third-party tool either where you hit edge cases where your data won't migrate to the new schema properly.  So why are schema changes so difficult?  
Just Enough to Whet Your Appetite
MD3 deployments are "declarative", meaning you simply declare what you want your object to look like and MD3 figures out the best DDL commands to run (and you can review them) to get your schema to that destination.  Here is a simple example.  Assume we have a table named dbo.Example and we decide a non-clustered index on Foo would be beneficial to performance.  See the nearby graphic to see how easy this is with MD3.  (Actually, the command is even easier than that, and I'll show you that in the next post.)  But MD3 gets even better.  Let's say a DBA already noticed a performance problem and indexed Foo already but named the index dbaFooPerfProblem.  In that case the index will be renamed.  You won't get a duplicate index.  Let's say later you decide to make nciFoo into a compound index with Foo,Bar as the keys.  Simply changing Line 21 to 'Foo,Bar' will automatically change the existing nciFoo, if it exists, or build a new nciFoo with the declared properties, if it doesn't exist.  
In the next post I'll cover MD3 and "state", which is the key to a good database deployment routine.  

You have just read "Why do we need another database deployment tool like MD3?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Introduction to Metadata Driven Database Deployments

MD3 (Metadata Driven Database Deployment) is my build-and-deploy system that I've evolved over the past 15 years. Deploying database code tends to be challenging for most organizations, especially ISVs that may support multiple versions of their software at many customers.  MD3 handles this all very elegantly.  Over the next few posts I'll cover why I think MD3 is the best choice for your database deployments.  

Table of Contents

Page Summary
nodetitle This page.  
Why do we need another database deployment tool like MD3? Covers some scenarios where traditional database deployment tools fail and how MD3 makes hand-crafted DDL a thing of the past.  
MD3 and "state" In this post we cover stateful vs stateless database objects.  It is the stateful objects (those with "history" that we need to consider) that cause our database deployments grief.  We'll cover how MD3 handles this problem.  
nodetitle The Four Rules will ensure you always deploy your stateful database objects correctly.  You don't need MD3 to use The Four Rules, but it helps.  
nodetitle Covers the other "stateful" database objects and how MD3 handles their deployment.  

MD3.CreatePrimaryKey in Action

Shows some use cases with primary keys where MD3 can really save you a lot of time and rewrote.  We also further demonstrate how The Four Rules work.  
nodetitle We cover the last MD3.Create procedure in this post and cover the one aspect of MD3 scripts that is mandatory...scripts must be idempotent, which is just a fancy way of saying that all scripts must be rerunnable.
nodetitle There are certain object-oriented design principles I incorporated into MD3.  These principles are what sets MD3 apart from other database deployment tools.  
nodetitle Model data is the data, usually static, that you deliver with your application.  MD3 deploys this data in much the same way as other stateful objects are deployed.  We cover how this is done in this post.  
RunMD3.ps1: The MD3 Executor RunMD3.ps1 is a short Powershell script that simply executes each .sql file in each subfolder in your deployment folder.  In this post I'll cover how it does what it does, and why.  
MD3 Deployments in Action In this post we put everything together and actually show how deployments work with MD3.  
nodetitle The hardest part of any new deployment tool is getting your database migrated into the new structure.  This is actually easy with MD3.  I'll demo how to reverse-engineer AdventureWorks in the MD3 format, which takes about a couple of hours to do.  
nodetitle  CI Loops are a definite development best practice.  Yet few shops do this with their database code.  And if they do CI Loops they may not be doing them correctly.  In this post I'll share some secrets to doing really good CI testing with your database code and then I'll show you how to make it really great with MD3.  
MD3: Why you should care about declarative database deployments If you still don't see the value of declarative/properties/metadata driven database deployments I'll give you some additonal arguments in this post.  
nodetitle I've deployed MD3 at numerous employers and clients and it has always been a success.  In this post I'll recount some stories about how I saved tons of man-hours refactoring database schemas using MD3.  
What are the benefits of using MD3 over my existing solution? You don't have to use the entirely of MD3 to get some of its benefits.  You can pull out just pieces that you feel may help you.  This is a summary post that reviews all of benefits of the tool.  


Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Vertica Certification

I have just received news that I am now an HP ATP certified Big Data Solutions architect for Vertica.  I have been using Vertica off and on for a few years now but have really ramped up my exposure over the past year.  A few months ago I began a blog series on getting to know Vertica but I haven't had the time to continue that series.  Now that I am doing study for the Vertica exam I hope to continue sharing that knowledge on my blog.  



Understand accounting and you'll understand the trends in software development and IT

(Overheard at the water cooler many years ago)..."Why are we moving to kanban and agile?  I don't understand it, our teams always deliver."

(Overheard at a recent SQL Saturday I attended)..."One of our ISVs (Independent Software Vendors) is now offering subscription-based licensing in addition to their standard release-based licensing.  The subscriptions are almost twice as much as release-based licensing, yet my stupid company just switched to subscription.  There goes my raise."

(Overhead by a DBA in the hallway)..."I can't believe we are moving all of our data to the cloud.  Security is going to suck and by my calculations it's going to actually cost us more money.  I don't understand the rationale for this.  I guess we'll all be laid off soon."  

There are trends in our industry that baffle IT folks.  The above are just a few examples.  There are (somewhat) hidden reasons for these trends.

Accounting "gimmicks".  

If you are involved in software development or IT it behooves you to understand this stuff. YOUR CAREER AND FUTURE IS AT STAKE.  

Let's dive right in.  

Quick Accounting Primer on Expenses

Companies spend money on stuff.  These are expenses.  Expenses are always classified as either Capital Expenses or Operational Expenses.  A table is a good way to visually represent the differences.  

  Capital Expense Operational Expense
aka capex opex

the cost of developing or providing non-consumable goods that support the business.

the ongoing cost of operating a business.  
"value" the expense has a usable life of more than a year, hence it has longer-term value the expense is consumed within a year and then has zero value
example a 3D printer the toner material for the 3D printer
another example a new delivery truck toilet paper  (it has no value once consumed)
one last example building a new warehouse that the company owns leasing a new warehouse from another company that owns the land and erected the building for you
accounting treatment it is added to an asset account and the company's cash flow statement as an investment.   shown as a current expense, recorded immediately and subtracts from income, reducing net profit and thus taxes
affect on profits and taxes is deducted from earnings/reduces profits over its usable life.  this is called depreciation deducted from earnings and will reduce profits and taxes in the year it is paid/incurred
extremely simple example a truck is purchased for 100K and is added as a fixed asset with a useful 10 year life.  10K may be deducted to offset profit/taxes in each year for the next 10 years (VERY simplified example).  The annual 10K allotment is handled similarly to opex for that year.   Toilet paper purchases are expensed and deduct from profits/taxes in the year the tp is used.  

One quick note...R&D Expenses

The above table shouldn't be difficult to understand.  If it is, just trust me, or do your own research.  

Now let's start to get a bit tricky.  GAAP (Generally Accepted Accounting Principles) has a classification for expenses called "research and development".  These expenses are opex.  This is sometimes counter-intuitive for people.  If I'm a pharma company my R&D could lead to a breakthrough drug that nets me millions in profits over decades.  Shouldn't these be capex?  Is this an "investment"?  

Not generally.  The rationale is that at any point an R&D project might be abandoned and there will be no "asset" to the pharma company.  

If you work in software development then you may consider yourself R&D.  Right?  

Not always.  

But let's not get ahead of ourselves.  

Which is better...capex or opex?

First of all, there are a lot of gray areas where a given expense might be classified as capex or opex depending on just how you are willing to bend the GAAP rules and justify your position.  

In situations where an expense could be either capex or opex some companies will prefer one or the other based on how it will benefit them the most. 

A good auditor/investor/accountant can learn a lot about a company's management and goals by looking at how it handles decisions around whether an expense is on-the-books as capex and opex.  Frankly, many expenses, especially IT, like developing and maintaining software, could be either capex or opex depending on your world view.  Here are some generalities that I believe that others may disagree with:

Entity Capex Opex Reasoning
pre-IPO companies x   Pre-IPO companies prefer large expenses to be capex because the expense can be split amongst many years which has the appearance of inflating current year's profits at the expense of future profits.  Pre-IPO companies want to look as profitable as possible to get investors excited.  
Investors   x See above.  Investors would rather see higher opex because nothing is "hidden" by using depreciation methods that hide expenses in later years.  
Companies interested in minimizing taxes   x Costs are accounted for sooner.  This also has a "smoothing" affect on profits.  Forecasting is easier and you won't see as many huge "one time charges" to profits for some big capex item.  (Note that investors don't want to lose their profits to taxes, which is why investors like companies that don't try to hide big capex expenses).  
software development expenses (R&D) at ISVs (Independent Software Vendors)   x Many will disagree with me on this.  I'll discuss this more below.  
IT department expenses (including software development) for non ISVs (banks, pharma companies, finance, etc) x   Here is the sticking point and what I feel is the answer to all of those questions at the beginning of this post.  Companies want to move to Opex nowadays.

Some companies may want to defer as much expense as possible to look profitable (those pre-IPOs).  Those companies will want to capitalize as much as they can.  Otherwise, generally, companies nowadays prefer opex to capex.  

Even within a company there are conflicting interests.  Some folks want opex, some want capex.  A good CFO/CEO will favor opex because that is what their investors want to see.  But within those companies the CIO/CTO may feel differently.  Many companies view IT costs as out-of-control.  How better to make the budget look smaller than to shift expenses to capex?  So now you have the CIO/CTO working at cross-purposes with the goals of the company.  

Isn't this stuff complicated?  

The Rules for Various IT Expenses

Here are some "rules" for various IT expenditures.  Look at the list carefully and you'll see that the trends in our industry today is to move away from capex and towards opex.  This has been the general trend in business since at least Enron and the Dot Com Bubble.  

Expenditure Capex Opex Reasoning
software licenses x   They have a usable life of more than one year.  
software subscriptions   x You are only "renting" the software.  Fail to pay the rent and you have no asset.  
purchased laptop x   Has a usable life of more than one year.  
leased laptop   x No asset after the lease expires.  
"cloud"   x "renting"...no asset...do you see the trend?
data centers x   Huge upfront costs.  This is an asset.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  I'll say it again.  If you purchase software that would otherwise run in your data center, yet deploy it on a VM in the cloud, magically the capex becomes opex.  


The Rules for Software Development Expenses

There are actually FASB and IFRS rules that govern how to expense software development.   They are very complex.  This post is a simplification of the issues.  Feel free to use google to confirm.  You may find a lot of information that conflicts what I have here.  I suggest you actually read the rules and you may find your company is doing things incorrectly, at its detriment.  But I'm not an accountant nor do I play one on TV.  

First your company's primary business activity determines whether you should be using capex or opex for software development/IT costs.   

  ISV non-ISV
Core Business The company's core business is developing software to sell to others.   The company's core business is not software development but it undertakes software development to increase efficiencies in its core competencies.  
Example DaveSQL LLC creates tools that are sold to other companies to aid them in SQL Server DBA tasks.  Dave BioPharma, Inc creates new drugs.  It buys DaveSQL's products to help manage its servers.  
IT expenditures should be... always opex.  People disagree with this...they are wrong.  Go read the rules.  R&D is always opex.  If the ISV cancels the development effort at any time there is no asset.   at times this is capex, at other times, opex.  More in the next section.  

For non-ISVs...when is software development capex vs opex?

Remember, all software development costs (well, most I guess) should be opex for an ISV.  This table is solely for IT shops at traditional companies.  The "phase" of the software development project at a non-ISV determines how the expense is handled.  

Expenditure Capex Opex Reasoning
Functional design/"Evaluation Phase"   x If the project is not feasible and is scrapped there is no asset, so it is R&D, which is opex in the traditional sense.  
Development Phase including detailed technical specifications x   The outcome is an asset.  Even if the software is useless or obsolete by the end of this phase and is scrapped, it is still capex.  There is still an asset.  That asset may be worthless and can't be sold, but it is still an asset.  
Post-implementation   x This one should be obvious.  This is production support.  







Capex Opex Reasoning
software licenses x    
software subscriptions   x You are only "renting"
"cloud"   x You are "renting" and therefore there is no asset after the lease expires.  
data centers x   Huge upfront costs.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  

The Rules for Software Development Expenses

Expenditure Capex Opex Reasoning
software licenses x    
software subscriptions   x You are only "renting"
"cloud"   x You are "renting" and therefore there is no asset after the lease expires.  
data centers x   Huge upfront costs.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  

The Rules for Software Development Expenses

If you haven't noticed thus far in this post, there is a tendency for most companies to prefer opex over capex.  This is not an iron-clad rule, but that is the trend in the business world today.  So, if we were accountants/CFOs/analysts/investors we would want to figure out ways to get more opex and less capex from our software development efforts.  This helps us pay less taxes.  

First thing you should note is that the last table is very waterfall-ish in its "phases".  Design to development to ops.  But what if we were agile and used cross-functional teams?  Could we make some of that capex into opex?  Yep.  And there's the trick.  

Waterfall generates assets too quickly under accounting rules.  It has detailed design documents after all...and those are assets.  So there's another reason why agilists tout "Working Sofware over comprehensive documentation".  I'll bet you didn't know that.  Agile, if practiced properly and understood by your Finance Guys, will have less capex.  

Agile is the best way I've ever seen to shift almost all non-ISV software development costs to opex.  Just get some working software out the door and then bug fix it.  That probably seems like an oversimplication to a technologist, but not-so-much to an accountant.  

Bending the Rules Even More

You can justify anything you try hard enough.  For instance, you can front-load opex using waterfall if you lump that comprehensive documentation as part of your "evaluation phase" documentation.  Using that trick we could re-classify just about anything.  

Please note that pre-IPO companies can also bend the rules in the reverse direction to generate more capex to make their current year profits higher.  Like I said at the beginning of this post, this is all "accounting gimmicks".  

The Ultimate Rule Bender...the Cloud

Quick thought experiment...your customer comes to you and says, "Your software doesn't work because it doesn't do X properly."  You decide that you agree and proceed to rectify it.  Is this work capex or opex?  Here is the rule...upgrades and enhancements to non-ISV software is capex...maintenance and bug fixes are opex.  So, is the work you are about to undertake capex or opex?  That depends.  Your customer would probably label the "issue" a bug (hence opex), but your company may disagree and deem it a "requirements change", hence an enhancement, hence capex.  

But wait, we don't want capex...we want opex, so do we have to admit our software is buggy to get an opex classification?


Enter the cloud.  

All cloud application development, even enhancements and upgrades, is opex because the software is rented.  Nice.  Now you can get an opex expenditure and never admit that your software was buggy.  

More on the Cloud and Software Subscriptions

With traditional release-based licensing an ISV would only make money when the next release was available.  This had an unpredictable effect on profits.  If you missed a release date you may not make any money.  Subscription-based licensing fixes that by "smoothing" out the profits.  Recently Adobe moved their software packages to a subscription-only model.  When they first released their earnings under this model their profits were down radically based on where most of their customers were in the release cycle.  They basically picked an inopportune time to change their model.  

The buyer of software loves subscriptions for the same reason.  "Smoothed" expenses and no capex.  

Open Source Software and "Services"

I'm convinced that the opex/capex debate is one of the key reasons for the rise in the open source software (OSS) movement.  Most OSS vendors offer a version of their software for free and then try to make money by offering services.  To the user of OSS this is very appealing.  There is no upfront cost for the software (capex) and the customization services are opex.  

Not all OSS uses this model, but it is prevalent.  

Think of every blogger that offers free software to do performance analysis for SQL Servers.  Altruism aside, they do this to get you to use their tools hoping that you will attend their seminars to learn more.  Or purchases their consulting services.  It's really a great model.  


A History Lesson and Concluding Thoughts

Back in the Dot Com Days every company preferred capex to defer the hit to profits.  And times were good for IT guys who didn't look quite so expensive because their salaries were more-or-less capitalized.  Then the dot com bubble burst, the economy tanked, Enron blew up the world, and Sarbox came along.  Now most companies want to be as transparent as possible with their accounting.  And that means opex and less capex "one-time" charges to earnings.  

Every trend in our industry in the last 15 years is geared toward the move to opex.  

  • Why is there a push to be more agile and to use Kanban?  To get us to opex faster.  
  • Why are developers asked to do support work...a la "DevOps"?  To get more people under the support budget (opex).  
  • Why is every company tearing down their 10 year old data centers and moving to the cloud?  (Do I have to say it?)  
  • Why are ISVs moving to subscription-based software models?  So that their customers can opex the "rent".  (This also "smooths" the ISV's monthly recurring revenues too).  
  • Why is your company trying to move your on-premise SQL Servers to SQL Azure (or whatever it is called now)?  I think you got it.  

It behooves all technologists to understand the basics of accounting and economics.  Many of the trends in our industry can be traced to how those trends will ultimately affect profits.  You should be designing your software accordingly.  I have no clue what the future holds in our industry, but I sleep better knowing that I understand the economics of the decisions being made.  

You have just read "Understand accounting and you'll understand the trends in software development and IT" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

How secure is your data?

This isn't another post about locking down your data, auditing access, and generally taking data security seriously.  All of that is important and has been rehashed ad nauseum elsewhere.  This post is about the unseen security vulnerabilities that lurk in the physical location of your servers.  Yesterday a magistrate judge ordered that Microsoft emails that are stored on servers in Dublin, Ireland must be surrendered to US government officials if a valid warrant is issued.  MS has already appealed the ruling and a federal court is set to hear arguments as early as July 31.  

This caused a bit of an outrage yesterday as people begin to realize 

  • you have no privacy in the US anymore
  • the US government can seize your electronic data even if it resides in another country's jurisdiction ...or you can surrender your assets as punishment.  

It's interesting to me that this causes a fuss.  There is really nothing new here.  The Stored Communications Act (SCA) that is the basis for all of this has been in place still 1986.  It is worth reading the wikipedia article I linked to if you want to see just how far the Fourth Amendment is being twisted to make all of this "constitutional".  Most of the rest of the world understands just how little privacy Americans have and have taken appropriate measures to secure their citizens' data from the long arm of the US government.  Only Americans could be shocked by this.  

Don't believe me?  Here's a true story.  

About 12 years ago I architected some software that tracked employee certifications and CEUs (continuing education units) for accountants and auditors.  The company I worked for had a presence in about 70 countries but our initial rollout was for the US only.  Being a good little architect I built the system such that it could support any country as well as any extended character sets.  It was even "multi-tenant" in that one country could keep its data totally separate from another country's data just by changing some configuration values.  

Nothing interesting so far.  

We began rolling it out to other countries very quickly, because it was, well, awesome.  But we got stopped dead in our tracks when we wanted to implement this in the EU.  The EU values the privacy of its citizens' data far more than we Americans do and the fear was that if EU data was stored on US servers then laws like SCA could be used to snoop or steal that data by the US government.  This was my intiation into SCA and extra-governmental/international jurisdictional data issues.  

Our EU affiliated "member firms" (we were one BIG company but operated separate legal entities in every country) were, frankly, scared shitless regarding what the US could do with even relatively benign data like accountant certification histories.  And, after I started reading the laws I was scared too.  

I spent a few days with our legal team trying to determine if the EU member firms could even use my software.  Where could we draw the line in the sand regarding data privacy?  Could we just spin up an instance in a data center in Europe and run a copy of my application there?  Could we share any data between EU and US...even lookup/master/system data like timezones, currencies, and ISO codes?  That's very benign data.  If we could, could we use replication because then the EU servers would still be accessed by a US application with a US user/password that might be compromised by the US government?  Could our US Ops team still be operations for the EU instance or would that introduce a possible privacy conflict?  The US government might coerce our Ops Guys into relinquishing passwords!  The opened a new can of worms no one thought of...we used NT domains and accounts for everything Windows (including the servers for this application)...were the domains setup properly to provide the needed data security/privacy?  

What was supposed to be the rollout of a simple certification tracker turned into a firm-wide data privacy initiative.  Although this certification data was relatively benign if it fell into the wrong hands, how would we handle global finance and audit data?  My company had an ethical duty to its customers, like a lawyer to her client, and we didn't take that lightly.  We finally landed on using a separate application instance running in the UK (but there was still connectivity to our US servers) and some data was allowed to be shared.  US Ops Guys were allowed to also administer the UK instance.  But the legal team worked on some elearning courseware that many of us had to pass that tested us on exactly what we could do with EU data and how to handle jurisdictional requests for data.  

I don't remember all of the rules and laws around this, but I do remember that IT persons (dev/ops/QA...it didn't matter) were forbidden to release EU data to any entity, regardless of warrant, without going through our Legal department first.  Legal made a point of telling us know that they "had our backs" if anything were to happen to us for not complying immediately with a request for data.  I did write about some of the things I learned in the training for a blog post I did regarding data retention.  

PS...don't think that this stuff never happens to the average IT guy.  I worked at this firm during the global financial implosion in 2008.  You may remember that certain quasi-banks were accused of shifting money to/from the US/UK, possibly for nefarious reasons.  I worked for a different team when that happened but I was part of the staff that was issued a subpoena to help reproduce some of those questionable transactions.  Reading those subpoenas and warrants gets you scared, especially when individual staff persons are mentioned by name.  Legal stepped in and put the brakes on the process until they could review the requests to ensure other jurisdictional laws were not being violated.  Meanwhile I was sweating trying to determine if we had the necessary logging to reproduce those transactions...and how I was going to get it...and what would happen to me if I couldn't.  

What's the point of all of this?  You can be well-versed in security and auditing and theories of "least privilege"...but none of this means anything if all of those safeguards can be overridden by governmental fiat.  It only takes one scared DBA to release the wrong data to a government agency and your entire business can be erased (it happened in 2008).  If you care about your data then understanding the legal ramifications of your architectures is even more important then understanding the security features of your vendors' software.  

You have just read "How secure is your data?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  


Subscribe to RSS - Data Architecture