DaveWentzel.com            All Things Data

Data Architecture

The Other MD3.Create Procedures

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last few posts we covered MD3 and "state" which covered how MD3 deploys "stateful database objects" using The Four Rules.  If you handle The Four Rules correctly (and MD3 can greatly help you with this) then your database deployments should be a breeze.  No more hand-crafted, error-prone DDL.  In this post we are going to cover the other MD3.Create* procedures which help with database deployments and The Four Rules.  

The Other Stateful Database Objects

Here are the other stateful database objects and how MD3 handles their deployment.  In most cases this is handled with an MD3.Create* procedure.  These procedures deploy your stateful objects "smartly".  

Stateful Object MD3's deployment mechanism
Tables/columns Tables are created using standard DDL and can be scripted using SSMS tools.  Columns can be added or altered using MD3.CreateColumn, this will be discussed further in the post MD3 Script Idempotency and MD3.CreateColumn(source code).  
Indexes This was covered in the post MD3 and "state" using MD3.CreateIndex.  Source code.  
Primary Keys We'll cover this below.  Source code for MD3.CreatePrimaryKey.  
Foreign Keys We'll cover this below.  Source code for MD3.CreateForeignKey.  
CHECK constraints We'll cover this below.  Source code for MD3.CreateCheckConstraint.  
DEFAULT constraints Handled with MD3.CreateDefaultConstraint, and covered below.  This may not seem like a "stateful" database object, but it is.  Source Code.  
Model Data (states/provinces, valid order statuses, other "lookup" data) Covered in the post MD3 Model Data Patterns

Each of the above referenced MD3.Create procedures handles The Four Rules correctly.  Let's look at these routines.  

MD3.CreatePrimaryKey

Conceptually, primary keys are very similar to indexes.  You can cluster them, give them a name, give them compound keys, put them on a filegroup, etc.  Therefore it shouldn't be surprising that MD3.CreatePrimaryKey takes the exact same parameters as MD3.CreateIndex.  

The difference is what each procedure does under-the-covers.  If you change a PK you may actually have to deal with objects that depend on it.  Here's a list of objects that are dependent on PKs and how MD3 handles PK changes.  You can of course change this functionality to suit your requirements:  

  • If the table is replicated then the table must be removed from any subscribed publications.  Currently MD3 merely throws an error if this situation exists. 
  • Removes all FKs that reference the PK.  MD3.CreatePrimaryKey simply removes them.  Later, when MD3.CreateForeignKey is executed (remember, MD3 handles the dependent ordering of objects for you) the FKs will be recreated properly with the new keys.  
  • If we are changing the PK to be clustered we first drop an existing clustered index that already exists on the table.  As with foreign keys, MD3 handles the dependencies for you...meaning the dropped clustered index can be recreated later as a non-clustered index if required.  

In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where MD3 can save you vast amounts of time and rework when you have to modify a primary key.  Stay tuned.  

MD3.CreateForeignKey

FKs are stateful database objects because they require a properly-applied "parent" key to be in place first.  FKs can also be very expensive to recreate if the associated tables are large.  Here are the parameters (properties) for MD3.CreateForeignKey:

Like the other stateful object creator routines, MDs3.CreateForeignKey properly implements The Four Rules properly.  The parameters should be self-explanatory, except maybe @AllowNoCheck.  Constraints can be either TRUSTED or UNTRUSTED. When you create a FK using ALTER TABLE...WITH CHECK ADD CONSTRAINT you are saying that existing data in the table must adhere to the constraint, as well as any future data.  When you create a FK using ALTER TABLE...WITH NOCHECK ADD CONSTRAINT you are allowing existing data to violate the constraint, it will be enforced for future data only.  This is an untrusted constraint.  There are 2 reasons why untrusted constraints are frowned upon:  

  1. It means you may have data that violates your constraint and you should probably determine why that is, and fix it.  
  2. Some query plans will be less-optimal with an untrusted constraint, leading to possible performance issues.  

MD3 by default does NOT allow untrusted FKs to be created.  However, the @AllowNoCheck allows you to override that, if needed.  MD3 will try first to create the constraint using WITH CHECK and then will fallback to WITH NOCHECK if you have @AllowNoCheck set.  MD3 throws a WARNING in this case that you can then research later.  

MD3.CreateCheckConstraint

You may wonder why CHECK constraints are stateful database objects.  The reason is that recreating a CHECK constraint on a large table can be EXTREMELY time consuming as all data must be read and evaluated.  We also need to take into consideration whethere the constraint is trusted or not, just like with foreign keys.  Here is the set of parameters that MD3.CreateCheckConstraint uses:  

 

MD3.CreateDefaultConstraint

Defaults are also stateful objects only because they are declared very similarly to CHECK constraints and the code is very similar.  

Summary

In nodetitle I covered how MD3 covers stateful database objects, specifically indexes.  In this post I covered how MD3 deploys the other stateful database objects like PKs, FKs, CHECK constraints, and defaults.  In MD3 Script Idempotency and MD3.CreateColumn I'm going to cover what is probably the most important stateful database object...the table.  In the next post, MD3.CreatePrimaryKey in Action I'll show you specific use cases where using MD3 instead of hand-crafted DDL can save you a lot of grief.  


You have just read "The Other MD3.Create Procedures" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and The Four Rules

In the last few posts I covered what MD3 is (Metadata Driven Database Deployments) and why it is better than hand-crafted DDL.  In this post we are goingto cover The Four Rules of Database Deployments.  Regardless of whether you decide to use MD3 or not, if you understand The Four Rules your database deployments will run much smoother.  

 
Some Background
In the last post (nodetitle) we covered a scenario where we had to write really complex DDL if we wanted to deploy an index change properly.  We need an index, called MyIndex, created on dbo.Example with a fill_factor of 90, on column Foo, with included_column Bux.  Here is the DDL:
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem in the post nodetitle).  In MD3 DDL is much simpler.  We just "declare" the properties of our object and let MD3 determine what needs to be done to get us to "model".  Basically, we enter "metadata" about our object into an MD3 table (for instance MD3.Indexes), which looks very similar to the "sys" system tables (in this case sys.indexes).  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  Under the covers MD3 is making calls to MD3.CreateIndex (which you can call instead of populating the MD3.Indexes table) similar to the nearby graphic that is adding the index nciFoo to dbo.Example.  
 
Each parameter to MD3.CreateIndex (and each column in the MD3.Indexes table (see above screenshot) is a metadata property that we can declare for an index.  These properties cover just about every option you can think of regarding an index.  There are some properties missing...for instance, there is no option for filtered indexes because I've never really found a good use case in my career where I absolutely needed a filtered index.  Since MD3 is open source you could always add this functionality if you needed it.  Here are some other index properties that are missing and why:
Index Property Why it is missing?
WHERE <filter predicate> filtered indexes.  You can always add this yourself.  
IGNORE_DUP_KEY I've never found a good use for this option.  
STATISTICS_NORECOMPUTE I've never found a need for this.  
DROP_EXISTING ON|OFF MD3.CreateIndex will smartly determine if this should be set to ON or OFF based on what is being asked.  
ONLINE = ON|OFF MD3.CreateIndex will determine whether this option should be used or not for you, based on whether the index follows the rules for ONLINE index creation.  Here is another case where MD3 is a huge help.  

So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL? 

The Four Rules of Stateful Object Deployment
 
If you always follow The Four Rules when you code your DDL you will always have a successful deployment (no failures!) and you will never do unnecessary DDL. MD3 does all of this for you.  Assume you have some DDL that you need to apply:  
  1. if the object exists by name and by "properties", then silently continue, there is no work to do.  In other words, there was no change in the object between deployments.  The "properties" are the parameters to the MD3.Create procedures (see the screenshot above).  
  2. if an object exists with the same “properties”, but the name is different, then RENAME the object.  Example:  I have an index with the same column list, order, fill factor, on the same filegroup, with the same compression, etc...but it is named differently.  All MD3 does is run sp_rename.  This scenario occurs, for instance, when your DBA adds an emergency index to a table and you check in the same index structure, but you use your naming conventions.  
  3. if the object exists by name, but the "properties" are different, ALTER the object accordingly.  Example, you have an index with the same name and mostly the same properties, but you need to change the INCLUDED col list and FILLFACTOR.  MD3 handles this without esoteric, hand-crafted DDL. Usually an ALTER can bring the current object up to the desired properties, otherwise DROP/CREATE is used.  The most important thing to remember is that YOU don't have to code the DDL and worry about covering all possible scenarios.  
  4. Else, CREATE object with properties as defined.  At this point MD3 realizes this is a missing object and creates it according to your declarations.  
If you don't do The Four Rules correctly you risk, at a minimum, recreating expensive objects at every deployment.  You don't need MD3 to do a DDL deployment, but it makes it a lot easier and less error-prone.  
 
An Example
Let's take our example code again and determine what MD3 will do in various situations regarding The Four Rules.  The below table is kinda like the "unit tests" for MD3.CreateIndex (tsqlt unit tests available on request).  
 
Database State Expected Result Why? Covers which of The Four Rules
dbo.Example table does not exist FAILURE This scenario will be caught during continuous integration testing and will never happen during a production deployment.  (See Continuous Integration Testing with MD3) None.  
index nciFoo does not exist Index created The index will be created with the declared properties.   4
An index exists called dbaFoo with the same properties Index renamed The index will be renamed using sp_rename to nciFoo.  This ensures that a duplicate index is not created with the same properties.   2
nciFoo exists, but with different "properties" than what is declared Index altered When possible the index will be ALTERed.  Otherwise the existing index will be dropped and the desired index will be created.  In some cases, depending on what is being changed, an index cannot be ALTERed.   3
nciFoo exists with identical properties Silently continue There is no work to do.   1
 
Summary
In this post we looked at The Four Rules of stateful database object deployments.  If you follow The Four Rules when you deploy your objects you will find that you have fewer errors and more reliable, repeatable deployments.  You don't have to use MD3 to use The Four Rules.  You can use any tool, even plain 'ol DDL.  MD3 just makes everything a whole lot easier.  
 
In the next post, nodetitle, we'll cover how The Four Rules are applied to other stateful database objects.  In a later post we'll cover how MD3's model data patterns work with The Four Rules to ensure your model/system/lookup data deployments work just like your other stateful object deployments.  
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

MD3 and "state"

Metadata Driven Database Deployments (MD3) is my build-and-deploy mechanism for SQL Server-based applications.  Check out the other posts in my MD3 blog series.  In the last post (Why do we need another database deployment tool like MD3?) I covered some reasons why database deployments give DBAs and developers fits.  In a nutshell, it's because we write too much hand-crafted DDL..  Even when we use third-party GUI tools to manage our deployments we still hit edge cases where the migration isn't smooth.  So why are schema changes so difficult?  

Database Object State and Builds
 
Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

When you "build" your Java app (...or C# or whatever...) you are really just compiling the code on your machine.  If compilation is successful then you can rest assured that it will compile on, basically, any machine (OS and processor architecture might be exceptions).  The deployment is simply a matter of copying your compiled code.  In the .NET world you usually build a Windows Installer package or perform an "xcopy deployment".  Build and deploy is really easy, as is building the necessary "CI loops" (...they run your unit tests...) to ensure your builds aren't broken when a developer checks in a change.  

 
Databases work differently.  Just because your scripts compile and deploy on your machine/database does not mean they will deploy anywhere else.
 
Why?
 
Java code, when it compiles, couldn't care less about the "state" of the previous build.  The code is “stateless”.  Some database code, on the other hand, is “stateless” while some is “stateful”.  Database developers need to worry about “history”, or "state", which is the existing data in the tables that must be maintained into the next release.  The code must do more than just compile, it must be respectful of existing data.  And data isn't the only stateful database object.  You don't want to recreate a bunch of indexes needlessly either.
 
Stateless Objects Stateful Objects
Views Tables/columns
Procedures Indexes
Functions Constraints (PKs/FKs/CHECK)
Model Data (sometimes called seed/lookup/master data).  An example is a list of states and abbreviations. This data rarely changes and when it does it changes via an approved process (ie, a Master Data Management update) Model Data (if the data can be altered by the customer.  An example is order status codes.  If you allow your customer to change (or add to) their model data then you don't want to "reset" their model data with your updates...or...maybe you do depending on the scenario.  
 
Stateless Object Patterns
Stateless db objects are the easiest to deploy...you simply compile them.  You do not care about what the previous version of the object looks like, you simply create the new version, overwriting the old.  Just like you would overwrite the last Java build with the current one.  Traditionally you code stateless objects using a pattern similar to the graphic to the right.  
 
Since the object, in this case a stored procedure, is stateless we can simply drop it and re-create it.  When doing that we lose our permissions so we must re-apply them.  It would be nice if we had the CREATE or REPLACE pattern for Transact SQL like Oracle does.  CREATE OR REPLACE is simplified syntax that obviates the need for re-applying permissions and checking if the object exists.  Oh well...maybe in the next replace of SQL Server.  
 
Generally you should deploy your stateless objects near the end of your deployment.  There are some exceptions to this rule:  
  1. You may have object dependencies which require certain stateless objects to deployed in a specific order.  For instance, if viewA references functionB which in turn references viewC then you have to ensure those objects are applied in a certain order.  
  2. if you have SCHEMABINDING applied to any of your objects and you try to first manipulate a stateful object (such as a table) you'll need to drop the object with the SCHEMABINDING clause first.  

MD3 handles all of this for you and in the demo files I give examples of this and how it is very simple to handle these edge cases. 

Stateful Objects and Patterns
As mentioned before, Java has no "stateful" code...if the code compiles you are good...if it doesn't compile you have problems.  In the database world it is the stateful code that causes database developers fits.  
 
An example of "stateful" database code that can be maddening is index DDL.  Let's see why that is.  You can download the example code here.   We start by creating a very simple table with a few columns.  Most developers just create their tables and then either worry about the indexes later or let the DBAs worry about it when performance becomes a problem.  At a later time the developes realize that they need to add an index to the Foo column.  If you are using a tool like Visual Studio db projects then you can use very simple CREATE INDEX syntax.  Otherwise you need to do an "existence check" to ensure that a subsequent execution of your script won't generate an "object already exists" error.  
 
 
The "existence check" syntax isn't terribly difficult to handle...yet.  Now let's say you notice a lot of page splitting and realize that your table design and data access patterns indicate that you need to change the fillfactor to 90.  Quick...do you remember where the fillfactor settings are stored in the metadata tables so you can modify the "existence check"?  You could just code up something quickly like this:  
 
 
The problem is that you will drop and recreate the index at every deployment.  Again, a tool like VS db projects will obviate the need for this.  Better "existence checking" code would be something like this:    
 
 
Our code is getting a little more "advanced" and much more "hand-crafted" than I am comfortable with.  
 
Here's a common scenario...once your index is released to production your DBAs notice that you have a bunch of Bookmark Lookups causing customers to report application slowness and they believe that they should add a couple of included columns to your index.  The DBAs are nice enough to tell you that they've done this and that you need to make this new index "model code".  You determine this is the new index creation code you need: 
 
 
Yikes!  That's getting to be really complicated code.  
 
Here's a slightly different example.  Your DBA calls you and tell you that he saw a gigantic "blocking storm" and needed to add an index on the Bar column.  You agree with his analysis and you both decide to make this a model index. The first block of code is what the DBA added (note the index name doesn't follow your naming conventions) and the second block is the code you made "model".  Do you see the problem?  
 
 
You really should've renamed his index, if his index exists.  Now you risk a duplicate index during a deployment.  
 
A Better Way to Handle Stateful Objects
 
DDL is kinda esoteric and difficult to remember, as seen above.  DML for master/seed/model/system data that you may need to deliver is similarly complex (we'll look at this problem later).  In MD3 DDL is much simpler.  Wouldn't it be nice if you could just enter the "properties" of the desired object in a metadata table and SQL Server just did what it needed to do to make that object for you?  MD3 does that for you.  Here's a screenshot that shows how to add an index to the HumanResources.Department table:   
 
 
Pretty simple.  We have a metadata table called MD3.Indexes that looks VERY similar to sys.indexes.  We "declare" our new index with its desired properties right into that table, then we run MD3.  
 

Now let's say you want to alter that index to change your compression settings from NONE to ROW...just update the entry in MD3.Indexes and rerun MD3. Need to add an Included column?  Change the NULL above to the name of the column you want to include and rerun MD3.  Maybe the index is no longer needed...just change the DropIndex column to 1.  
Under-the-covers MD3 is simply cursor-ing over the metadata table (MD3.Indexes) and determining if there is any work to do (more on how this is done in the next section).  It does this by calling the MD3 stored procedures that do the actual work...for instance, EXEC MD3.CreateIndex.  You can call these procedures and use them yourself without using the MD3.Indexes metadata table.  Here's an example call that creates nciFoo.  Let's say you decide to change this index to have a FillFactor of 90...how do you think you would do that?  That's right, change the parameter and run the procedure.  Want to change the filegroup (or partition)...take a guess at the process?  No complex hand-crafted DDL is required.  
 
So the question is...how does MD3 actually build the DDL under the covers and what makes it so much better than hand-crafted DDL?  MD3 uses The Four Rules.  We'll cover nodetitle in the next post.
 
 
 
 
 
 
 
 
 
 
 
 
 

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Why do we need another database deployment tool like MD3?

This is the next post in my MD3 series (Metadata Driven Database Deployment).  MD3 is my build-and-deploy system that I've evolved over the past 15 years. Quick history...I worked at an ISV and we supported 4 versions of our software (Version X back to Version X-3) and we needed to support upgrading any of those versions to "current". We had about 40 customers with the average db size ~40GB. To make matters even more complicated we supported either SQL Server or Oracle.  Both had the same schema...mostly.

Quick Links
nodetitle
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

So we needed a tool to upgrade a database reliably without having non-database developers writing complex, error-prone, hand-crafted DDL. Most of our data developers knew Oracle or SQL Server, but few knew both.  I learned a lot about what works and what doesn't when upgrading a database in a complex data environment like this. I created a database CI loop (also included with MD3) before CI loops were sexy and ubiquitous. Every day we tested our db scripts against real customer databases and ensured everything upgraded and functioned properly. We never worried about the actual customer production upgrade event because our CI loops already upgraded the customer hundreds of times (albeit on older, stripped, and scrubbed databases).  

I've deployed versions of this tool at many different employers and clients. Each has been radically successful.  

  • The developers love how easy it is to make schema changes without hand-crafting DDL.  It is totally flexible and customizable to whatever you need.
  • DBAs and app admins love it because it is scriptable, has customizable logging, and show the SQL that is being executed.  Nothing is hidden from the DBA.   
  • Management likes the fact that it requires zero training and creates a repeatable, reliable process.  

The latest incarnation, which I haven't yet open-sourced, is a version that will do a near Zero Downtime database upgrade. Contact me if you might be interested in this.  

Why do we need another database deployment tool?

There are lots of database deployment tools out there.  They each work great...mostly.  But development teams and DBAs don't trust them when it comes to complex schema changes.  Don't believe me?  Assume the following scenario...you need to store new data in your database due to a change in business requirements.  There are lots of ways you could do this but you honestly believe the best way is to refactor and split the largest table in your database into 3 new tables.  
 
How likely are you to propose this solution to your team?  My guess is not very likely.  The reasons are simple: 
  • Modifying a table's schema requires you to "port" the old database structure to the new one.  And also the data.  This is fraught with danger.  Perhaps each version of the table has also experienced schema changes which you also need to take into consideration.  You have to then change keys (both primary and foreign) and clustering as well as all of the non-clustered indexes.  What if each customer/version has a different set of indexes and keys?  
  • Then you need to modify each database object that accesses the table.  All of your stored procs, functions, and views.  Then you need to modify your Java (or C# or whatever).  All of this scares the average database developer.  
  • Since this is a large table how can we guarantee this won't cause our customer excessive downtime while we migrate the data?  How do we verify the customers' data afterwards?  
NoSQL:  "RDBMS schemas are too hard to change"
 
Altering schemas scares developers so much that an entire cottage industry grew around the fact that relational databases are too darn hard to "evolve" with ever-changing business rules.  One of the primary reasons the NoSQL movement (whenever I hear this I think, "technology or bowel?") grew, especially document and keystores is because they touted the fact that their technologies allowed you to modify your schemas without needing to worry about modifying all of that data access code that scares us all.  And there was no need to "port" the data from old-to-new schema.  
 
But relational schemas really aren't that difficult to evolve.  You really don't need to write a bunch of complex hand-crafted DDL.  And you don't need a third-party tool either where you hit edge cases where your data won't migrate to the new schema properly.  So why are schema changes so difficult?  
 
Just Enough to Whet Your Appetite
MD3 deployments are "declarative", meaning you simply declare what you want your object to look like and MD3 figures out the best DDL commands to run (and you can review them) to get your schema to that destination.  Here is a simple example.  Assume we have a table named dbo.Example and we decide a non-clustered index on Foo would be beneficial to performance.  See the nearby graphic to see how easy this is with MD3.  (Actually, the command is even easier than that, and I'll show you that in the next post.)  But MD3 gets even better.  Let's say a DBA already noticed a performance problem and indexed Foo already but named the index dbaFooPerfProblem.  In that case the index will be renamed.  You won't get a duplicate index.  Let's say later you decide to make nciFoo into a compound index with Foo,Bar as the keys.  Simply changing Line 21 to 'Foo,Bar' will automatically change the existing nciFoo, if it exists, or build a new nciFoo with the declared properties, if it doesn't exist.  
 
In the next post I'll cover MD3 and "state", which is the key to a good database deployment routine.  
 

You have just read "Why do we need another database deployment tool like MD3?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Introduction to Metadata Driven Database Deployments

MD3 (Metadata Driven Database Deployment) is my build-and-deploy system that I've evolved over the past 15 years. Deploying database code tends to be challenging for most organizations, especially ISVs that may support multiple versions of their software at many customers.  MD3 handles this all very elegantly.  Over the next few posts I'll cover why I think MD3 is the best choice for your database deployments.  

Table of Contents

Page Summary
nodetitle This page.  
Why do we need another database deployment tool like MD3? Covers some scenarios where traditional database deployment tools fail and how MD3 makes hand-crafted DDL a thing of the past.  
MD3 and "state" In this post we cover stateful vs stateless database objects.  It is the stateful objects (those with "history" that we need to consider) that cause our database deployments grief.  We'll cover how MD3 handles this problem.  
nodetitle The Four Rules will ensure you always deploy your stateful database objects correctly.  You don't need MD3 to use The Four Rules, but it helps.  
nodetitle Covers the other "stateful" database objects and how MD3 handles their deployment.  

MD3.CreatePrimaryKey in Action

Shows some use cases with primary keys where MD3 can really save you a lot of time and rewrote.  We also further demonstrate how The Four Rules work.  
nodetitle We cover the last MD3.Create procedure in this post and cover the one aspect of MD3 scripts that is mandatory...scripts must be idempotent, which is just a fancy way of saying that all scripts must be rerunnable.
The OO Design Principles of MD3 There are certain object-oriented design principles I incorporated into MD3.  These principles are what sets MD3 apart from other database deployment tools.  
nodetitle Model data is the data, usually static, that you deliver with your application.  MD3 deploys this data in much the same way as other stateful objects are deployed.  We cover how this is done in this post.  
RunMD3.ps1: The MD3 Executor RunMD3.ps1 is a short Powershell script that simply executes each .sql file in each subfolder in your deployment folder.  In this post I'll cover how it does what it does, and why.  
MD3 Deployments in Action In this post we put everything together and actually show how deployments work with MD3.  
How to Reverse-Engineer Your Database with MD3 The hardest part of any new deployment tool is getting your database migrated into the new structure.  This is actually easy with MD3.  I'll demo how to reverse-engineer AdventureWorks in the MD3 format, which takes about a couple of hours to do.  
nodetitle  CI Loops are a definite development best practice.  Yet few shops do this with their database code.  And if they do CI Loops they may not be doing them correctly.  In this post I'll share some secrets to doing really good CI testing with your database code and then I'll show you how to make it really great with MD3.  
MD3: Why you should care about declarative database deployments If you still don't see the value of declarative/properties/metadata driven database deployments I'll give you some additonal arguments in this post.  
MD3 Extensibility Stories I've deployed MD3 at numerous employers and clients and it has always been a success.  In this post I'll recount some stories about how I saved tons of man-hours refactoring database schemas using MD3.  
What are the benefits of using MD3 over my existing solution? You don't have to use the entirely of MD3 to get some of its benefits.  You can pull out just pieces that you feel may help you.  This is a summary post that reviews all of benefits of the tool.  

 

Quick Links
Download MD3 (with samples)
Documentation on CodePlex/git
Presentation on MD3
Other Posts about MD3
AdventureWorks2013 reverse-engineered into MD3 (takes 5 mins to do with ANY database)

You have just read "nodetitle" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Vertica Certification

I have just received news that I am now an HP ATP certified Big Data Solutions architect for Vertica.  I have been using Vertica off and on for a few years now but have really ramped up my exposure over the past year.  A few months ago I began a blog series on getting to know Vertica but I haven't had the time to continue that series.  Now that I am doing study for the Vertica exam I hope to continue sharing that knowledge on my blog.  

 

 

Understand accounting and you'll understand the trends in software development and IT

(Overheard at the water cooler many years ago)..."Why are we moving to kanban and agile?  I don't understand it, our teams always deliver."

(Overheard at a recent SQL Saturday I attended)..."One of our ISVs (Independent Software Vendors) is now offering subscription-based licensing in addition to their standard release-based licensing.  The subscriptions are almost twice as much as release-based licensing, yet my stupid company just switched to subscription.  There goes my raise."

(Overhead by a DBA in the hallway)..."I can't believe we are moving all of our data to the cloud.  Security is going to suck and by my calculations it's going to actually cost us more money.  I don't understand the rationale for this.  I guess we'll all be laid off soon."  

There are trends in our industry that baffle IT folks.  The above are just a few examples.  There are (somewhat) hidden reasons for these trends.

Accounting "gimmicks".  

If you are involved in software development or IT it behooves you to understand this stuff. YOUR CAREER AND FUTURE IS AT STAKE.  

Let's dive right in.  

Quick Accounting Primer on Expenses

Companies spend money on stuff.  These are expenses.  Expenses are always classified as either Capital Expenses or Operational Expenses.  A table is a good way to visually represent the differences.  

  Capital Expense Operational Expense
aka capex opex
definition

the cost of developing or providing non-consumable goods that support the business.

the ongoing cost of operating a business.  
"value" the expense has a usable life of more than a year, hence it has longer-term value the expense is consumed within a year and then has zero value
example a 3D printer the toner material for the 3D printer
another example a new delivery truck toilet paper  (it has no value once consumed)
one last example building a new warehouse that the company owns leasing a new warehouse from another company that owns the land and erected the building for you
accounting treatment it is added to an asset account and the company's cash flow statement as an investment.   shown as a current expense, recorded immediately and subtracts from income, reducing net profit and thus taxes
affect on profits and taxes is deducted from earnings/reduces profits over its usable life.  this is called depreciation deducted from earnings and will reduce profits and taxes in the year it is paid/incurred
extremely simple example a truck is purchased for 100K and is added as a fixed asset with a useful 10 year life.  10K may be deducted to offset profit/taxes in each year for the next 10 years (VERY simplified example).  The annual 10K allotment is handled similarly to opex for that year.   Toilet paper purchases are expensed and deduct from profits/taxes in the year the tp is used.  

One quick note...R&D Expenses

The above table shouldn't be difficult to understand.  If it is, just trust me, or do your own research.  

Now let's start to get a bit tricky.  GAAP (Generally Accepted Accounting Principles) has a classification for expenses called "research and development".  These expenses are opex.  This is sometimes counter-intuitive for people.  If I'm a pharma company my R&D could lead to a breakthrough drug that nets me millions in profits over decades.  Shouldn't these be capex?  Is this an "investment"?  

Not generally.  The rationale is that at any point an R&D project might be abandoned and there will be no "asset" to the pharma company.  

If you work in software development then you may consider yourself R&D.  Right?  

Not always.  

But let's not get ahead of ourselves.  

Which is better...capex or opex?

First of all, there are a lot of gray areas where a given expense might be classified as capex or opex depending on just how you are willing to bend the GAAP rules and justify your position.  

In situations where an expense could be either capex or opex some companies will prefer one or the other based on how it will benefit them the most. 

A good auditor/investor/accountant can learn a lot about a company's management and goals by looking at how it handles decisions around whether an expense is on-the-books as capex and opex.  Frankly, many expenses, especially IT, like developing and maintaining software, could be either capex or opex depending on your world view.  Here are some generalities that I believe that others may disagree with:

Entity Capex Opex Reasoning
pre-IPO companies x   Pre-IPO companies prefer large expenses to be capex because the expense can be split amongst many years which has the appearance of inflating current year's profits at the expense of future profits.  Pre-IPO companies want to look as profitable as possible to get investors excited.  
Investors   x See above.  Investors would rather see higher opex because nothing is "hidden" by using depreciation methods that hide expenses in later years.  
Companies interested in minimizing taxes   x Costs are accounted for sooner.  This also has a "smoothing" affect on profits.  Forecasting is easier and you won't see as many huge "one time charges" to profits for some big capex item.  (Note that investors don't want to lose their profits to taxes, which is why investors like companies that don't try to hide big capex expenses).  
software development expenses (R&D) at ISVs (Independent Software Vendors)   x Many will disagree with me on this.  I'll discuss this more below.  
IT department expenses (including software development) for non ISVs (banks, pharma companies, finance, etc) x   Here is the sticking point and what I feel is the answer to all of those questions at the beginning of this post.  Companies want to move to Opex nowadays.

Some companies may want to defer as much expense as possible to look profitable (those pre-IPOs).  Those companies will want to capitalize as much as they can.  Otherwise, generally, companies nowadays prefer opex to capex.  

Even within a company there are conflicting interests.  Some folks want opex, some want capex.  A good CFO/CEO will favor opex because that is what their investors want to see.  But within those companies the CIO/CTO may feel differently.  Many companies view IT costs as out-of-control.  How better to make the budget look smaller than to shift expenses to capex?  So now you have the CIO/CTO working at cross-purposes with the goals of the company.  

Isn't this stuff complicated?  

The Rules for Various IT Expenses

Here are some "rules" for various IT expenditures.  Look at the list carefully and you'll see that the trends in our industry today is to move away from capex and towards opex.  This has been the general trend in business since at least Enron and the Dot Com Bubble.  

Expenditure Capex Opex Reasoning
software licenses x   They have a usable life of more than one year.  
software subscriptions   x You are only "renting" the software.  Fail to pay the rent and you have no asset.  
purchased laptop x   Has a usable life of more than one year.  
leased laptop   x No asset after the lease expires.  
"cloud"   x "renting"...no asset...do you see the trend?
data centers x   Huge upfront costs.  This is an asset.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  I'll say it again.  If you purchase software that would otherwise run in your data center, yet deploy it on a VM in the cloud, magically the capex becomes opex.  

 

The Rules for Software Development Expenses

There are actually FASB and IFRS rules that govern how to expense software development.   They are very complex.  This post is a simplification of the issues.  Feel free to use google to confirm.  You may find a lot of information that conflicts what I have here.  I suggest you actually read the rules and you may find your company is doing things incorrectly, at its detriment.  But I'm not an accountant nor do I play one on TV.  

First your company's primary business activity determines whether you should be using capex or opex for software development/IT costs.   

  ISV non-ISV
Core Business The company's core business is developing software to sell to others.   The company's core business is not software development but it undertakes software development to increase efficiencies in its core competencies.  
Example DaveSQL LLC creates tools that are sold to other companies to aid them in SQL Server DBA tasks.  Dave BioPharma, Inc creates new drugs.  It buys DaveSQL's products to help manage its servers.  
IT expenditures should be... always opex.  People disagree with this...they are wrong.  Go read the rules.  R&D is always opex.  If the ISV cancels the development effort at any time there is no asset.   at times this is capex, at other times, opex.  More in the next section.  

For non-ISVs...when is software development capex vs opex?

Remember, all software development costs (well, most I guess) should be opex for an ISV.  This table is solely for IT shops at traditional companies.  The "phase" of the software development project at a non-ISV determines how the expense is handled.  

Expenditure Capex Opex Reasoning
Functional design/"Evaluation Phase"   x If the project is not feasible and is scrapped there is no asset, so it is R&D, which is opex in the traditional sense.  
Development Phase including detailed technical specifications x   The outcome is an asset.  Even if the software is useless or obsolete by the end of this phase and is scrapped, it is still capex.  There is still an asset.  That asset may be worthless and can't be sold, but it is still an asset.  
Post-implementation   x This one should be obvious.  This is production support.  

 

 

 

 

 

Expenditure

Capex Opex Reasoning
software licenses x    
software subscriptions   x You are only "renting"
"cloud"   x You are "renting" and therefore there is no asset after the lease expires.  
data centers x   Huge upfront costs.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  

The Rules for Software Development Expenses

Expenditure Capex Opex Reasoning
software licenses x    
software subscriptions   x You are only "renting"
"cloud"   x You are "renting" and therefore there is no asset after the lease expires.  
data centers x   Huge upfront costs.  
software licenses for software deployed in the cloud   x Yup, you can buy a license for an IDE and deploy it on AWS and move it all to opex.  

The Rules for Software Development Expenses

If you haven't noticed thus far in this post, there is a tendency for most companies to prefer opex over capex.  This is not an iron-clad rule, but that is the trend in the business world today.  So, if we were accountants/CFOs/analysts/investors we would want to figure out ways to get more opex and less capex from our software development efforts.  This helps us pay less taxes.  

First thing you should note is that the last table is very waterfall-ish in its "phases".  Design to development to ops.  But what if we were agile and used cross-functional teams?  Could we make some of that capex into opex?  Yep.  And there's the trick.  

Waterfall generates assets too quickly under accounting rules.  It has detailed design documents after all...and those are assets.  So there's another reason why agilists tout "Working Sofware over comprehensive documentation".  I'll bet you didn't know that.  Agile, if practiced properly and understood by your Finance Guys, will have less capex.  

Agile is the best way I've ever seen to shift almost all non-ISV software development costs to opex.  Just get some working software out the door and then bug fix it.  That probably seems like an oversimplication to a technologist, but not-so-much to an accountant.  

Bending the Rules Even More

You can justify anything you try hard enough.  For instance, you can front-load opex using waterfall if you lump that comprehensive documentation as part of your "evaluation phase" documentation.  Using that trick we could re-classify just about anything.  

Please note that pre-IPO companies can also bend the rules in the reverse direction to generate more capex to make their current year profits higher.  Like I said at the beginning of this post, this is all "accounting gimmicks".  

The Ultimate Rule Bender...the Cloud

Quick thought experiment...your customer comes to you and says, "Your software doesn't work because it doesn't do X properly."  You decide that you agree and proceed to rectify it.  Is this work capex or opex?  Here is the rule...upgrades and enhancements to non-ISV software is capex...maintenance and bug fixes are opex.  So, is the work you are about to undertake capex or opex?  That depends.  Your customer would probably label the "issue" a bug (hence opex), but your company may disagree and deem it a "requirements change", hence an enhancement, hence capex.  

But wait, we don't want capex...we want opex, so do we have to admit our software is buggy to get an opex classification?

Nope.  

Enter the cloud.  

All cloud application development, even enhancements and upgrades, is opex because the software is rented.  Nice.  Now you can get an opex expenditure and never admit that your software was buggy.  

More on the Cloud and Software Subscriptions

With traditional release-based licensing an ISV would only make money when the next release was available.  This had an unpredictable effect on profits.  If you missed a release date you may not make any money.  Subscription-based licensing fixes that by "smoothing" out the profits.  Recently Adobe moved their software packages to a subscription-only model.  When they first released their earnings under this model their profits were down radically based on where most of their customers were in the release cycle.  They basically picked an inopportune time to change their model.  

The buyer of software loves subscriptions for the same reason.  "Smoothed" expenses and no capex.  

Open Source Software and "Services"

I'm convinced that the opex/capex debate is one of the key reasons for the rise in the open source software (OSS) movement.  Most OSS vendors offer a version of their software for free and then try to make money by offering services.  To the user of OSS this is very appealing.  There is no upfront cost for the software (capex) and the customization services are opex.  

Not all OSS uses this model, but it is prevalent.  

Think of every blogger that offers free software to do performance analysis for SQL Servers.  Altruism aside, they do this to get you to use their tools hoping that you will attend their seminars to learn more.  Or purchases their consulting services.  It's really a great model.  

 

A History Lesson and Concluding Thoughts

Back in the Dot Com Days every company preferred capex to defer the hit to profits.  And times were good for IT guys who didn't look quite so expensive because their salaries were more-or-less capitalized.  Then the dot com bubble burst, the economy tanked, Enron blew up the world, and Sarbox came along.  Now most companies want to be as transparent as possible with their accounting.  And that means opex and less capex "one-time" charges to earnings.  

Every trend in our industry in the last 15 years is geared toward the move to opex.  

  • Why is there a push to be more agile and to use Kanban?  To get us to opex faster.  
  • Why are developers asked to do support work...a la "DevOps"?  To get more people under the support budget (opex).  
  • Why is every company tearing down their 10 year old data centers and moving to the cloud?  (Do I have to say it?)  
  • Why are ISVs moving to subscription-based software models?  So that their customers can opex the "rent".  (This also "smooths" the ISV's monthly recurring revenues too).  
  • Why is your company trying to move your on-premise SQL Servers to SQL Azure (or whatever it is called now)?  I think you got it.  

It behooves all technologists to understand the basics of accounting and economics.  Many of the trends in our industry can be traced to how those trends will ultimately affect profits.  You should be designing your software accordingly.  I have no clue what the future holds in our industry, but I sleep better knowing that I understand the economics of the decisions being made.  


You have just read "Understand accounting and you'll understand the trends in software development and IT" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

How secure is your data?

This isn't another post about locking down your data, auditing access, and generally taking data security seriously.  All of that is important and has been rehashed ad nauseum elsewhere.  This post is about the unseen security vulnerabilities that lurk in the physical location of your servers.  Yesterday a magistrate judge ordered that Microsoft emails that are stored on servers in Dublin, Ireland must be surrendered to US government officials if a valid warrant is issued.  MS has already appealed the ruling and a federal court is set to hear arguments as early as July 31.  

This caused a bit of an outrage yesterday as people begin to realize 

  • you have no privacy in the US anymore
  • the US government can seize your electronic data even if it resides in another country's jurisdiction ...or you can surrender your assets as punishment.  

It's interesting to me that this causes a fuss.  There is really nothing new here.  The Stored Communications Act (SCA) that is the basis for all of this has been in place still 1986.  It is worth reading the wikipedia article I linked to if you want to see just how far the Fourth Amendment is being twisted to make all of this "constitutional".  Most of the rest of the world understands just how little privacy Americans have and have taken appropriate measures to secure their citizens' data from the long arm of the US government.  Only Americans could be shocked by this.  

Don't believe me?  Here's a true story.  

About 12 years ago I architected some software that tracked employee certifications and CEUs (continuing education units) for accountants and auditors.  The company I worked for had a presence in about 70 countries but our initial rollout was for the US only.  Being a good little architect I built the system such that it could support any country as well as any extended character sets.  It was even "multi-tenant" in that one country could keep its data totally separate from another country's data just by changing some configuration values.  

Nothing interesting so far.  

We began rolling it out to other countries very quickly, because it was, well, awesome.  But we got stopped dead in our tracks when we wanted to implement this in the EU.  The EU values the privacy of its citizens' data far more than we Americans do and the fear was that if EU data was stored on US servers then laws like SCA could be used to snoop or steal that data by the US government.  This was my intiation into SCA and extra-governmental/international jurisdictional data issues.  

Our EU affiliated "member firms" (we were one BIG company but operated separate legal entities in every country) were, frankly, scared shitless regarding what the US could do with even relatively benign data like accountant certification histories.  And, after I started reading the laws I was scared too.  

I spent a few days with our legal team trying to determine if the EU member firms could even use my software.  Where could we draw the line in the sand regarding data privacy?  Could we just spin up an instance in a data center in Europe and run a copy of my application there?  Could we share any data between EU and US...even lookup/master/system data like timezones, currencies, and ISO codes?  That's very benign data.  If we could, could we use replication because then the EU servers would still be accessed by a US application with a US user/password that might be compromised by the US government?  Could our US Ops team still be operations for the EU instance or would that introduce a possible privacy conflict?  The US government might coerce our Ops Guys into relinquishing passwords!  The opened a new can of worms no one thought of...we used NT domains and accounts for everything Windows (including the servers for this application)...were the domains setup properly to provide the needed data security/privacy?  

What was supposed to be the rollout of a simple certification tracker turned into a firm-wide data privacy initiative.  Although this certification data was relatively benign if it fell into the wrong hands, how would we handle global finance and audit data?  My company had an ethical duty to its customers, like a lawyer to her client, and we didn't take that lightly.  We finally landed on using a separate application instance running in the UK (but there was still connectivity to our US servers) and some data was allowed to be shared.  US Ops Guys were allowed to also administer the UK instance.  But the legal team worked on some elearning courseware that many of us had to pass that tested us on exactly what we could do with EU data and how to handle jurisdictional requests for data.  

I don't remember all of the rules and laws around this, but I do remember that IT persons (dev/ops/QA...it didn't matter) were forbidden to release EU data to any entity, regardless of warrant, without going through our Legal department first.  Legal made a point of telling us know that they "had our backs" if anything were to happen to us for not complying immediately with a request for data.  I did write about some of the things I learned in the training for a blog post I did regarding data retention.  

PS...don't think that this stuff never happens to the average IT guy.  I worked at this firm during the global financial implosion in 2008.  You may remember that certain quasi-banks were accused of shifting money to/from the US/UK, possibly for nefarious reasons.  I worked for a different team when that happened but I was part of the staff that was issued a subpoena to help reproduce some of those questionable transactions.  Reading those subpoenas and warrants gets you scared, especially when individual staff persons are mentioned by name.  Legal stepped in and put the brakes on the process until they could review the requests to ensure other jurisdictional laws were not being violated.  Meanwhile I was sweating trying to determine if we had the necessary logging to reproduce those transactions...and how I was going to get it...and what would happen to me if I couldn't.  

What's the point of all of this?  You can be well-versed in security and auditing and theories of "least privilege"...but none of this means anything if all of those safeguards can be overridden by governmental fiat.  It only takes one scared DBA to release the wrong data to a government agency and your entire business can be erased (it happened in 2008).  If you care about your data then understanding the legal ramifications of your architectures is even more important then understanding the security features of your vendors' software.  


You have just read "How secure is your data?" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Convention over Configuration

"Convention over configuration" (CoC) is one of the newer catch phrases you'll hear as a architect. I really don't know where this started but when I learned Ruby on Rails this started to make a lot of sense to me. Also known as "coding by convention" this is a design paradigm that seeks to limit the number of decisions that developers need to make, keeping things simple. When something needs to be unconventional you simply "manage by exception" and specify that which is unconventional. In other words, lengthy web.config files are no longer needed unless something is unconventional. This leads to less code, more simplicity, less bugs, less documentation, etc.  In this post I'll cover some areas for CoC improvement in SQL including one of my ideas that I've never seen proposed anywhere else that would eliminate TONS of bugs in SQL, not to mention cut down on needles SQL verbosity.  

We need more of this in SQL Server, RDBMS, and data technologies in general.  That's the gist of this post.  

An Example of CoC
Your programming language du jour probably has classes and some of those will loosely map to your database tables. Instead of a class being named "classEmployee" and the table being "dbo.Employee" we have a convention that "the class name is the table name"...unless of course it is specified as unconventional.  So the class would be Employee as would the underlying table.  Now there is no need to map this relationship explicitly.  Best practices suddenly become implicitly enforceable.  

The Need for CoC in SQL Server...Cursors

TSQL and SQL can be very verbose languages.  For the most part, these languages are declarative and not procedural.  If you can eliminate declaring some things by using sensible defaults, then you are improving the language.  IMHO.  Let's look at a simple example of where CoC could help us be less verbose...cursors.  After that I'll show you my biggest pet peeve with the SQL language and how I propose solving it using a CoC approach.  

How do you write cursors? I bet it's a tedious and error-prone process.  If you say that you NEVER use cursors and NEVER found a good use for them then stop reading, you are apparently a genius and I am not worthy of your time.  For the rest of us that know that cursors are sometimes the ONLY way to do something in tsql, read on.  

Here is a screenshot of a very hacky cursor.  It merely prints out a list of database objects and their type.  This is probably the simplest cursor you could create in SQL Server.  You'll notice a bit of verbosity.  For instance, the FETCH NEXT statement in repeated in the code.  No big deal, but we 

shouldn't have repeated code.  I'll bet you that at some point in your career you wrote a cursor and, in haste, forgot that second FETCH NEXT statement and put your code into an infinite loop.  

Ugh.  

You'll also notice on Line 1 that I was too lazy to see what the actual datatypes were for the cols in question so I just set them to varchar(2000) and varchar(200), which should be big enough.  Right?  I'll bet you've done the same thing in haste at some point in your career too.  

Cursors are just too darn tedious.  They follow a pattern (or you can use one of the templates in SSMS) but even so, it is still tedious, and therefore error-prone.  

One way we could eliminate the datatyping tedium, as well as remove the entire declaration of "cursor fetch variables" is if Microsoft gave us a shortcut 

similar to Oracle's %ROWTYPE.  In Oracle if you add %ROWTYPE to a variable the variable will assume the datatype of the underlying referenced object.  It does this by declaring a "record variable" over the input.  A bit of TSQL pseudo-code might help to illustrate this.  

First, you'll notice we no longer have to declare the individual variables for every column referenced in the cursor.  Our example only has 2 columns, but I'm sure you've written cursors with 10 cols being referenced, first in the "declare" section, then at least twice in the FETCH statements.  That's a lot of repeated code to screw up.  

Instead we declare a record type against the cursor on Line 22 and tell the record type that it should inherit the underlying columns and datatypes.  We can now reference the variable using dot notation on Line 26.  Using something like %ROWTYPE saves a lot of typing and it makes things a lot easier to read.  But it's still too wordy.  

What would be REALLY nice is something like this.  

Look at how nice and compact this is.  Further, it follows the FOR EACH...LOOP looping construct that every major language has these days.  Here I don't even need %ROWTYPE because I get the record object right from the cursor (cur) object.  No FETCH NEXT statements, no CLOSE/DEALLOCATE statements, no individual variable declarations for each column.  

Yes I know that this syntax severely limits your ability to use advanced cursor techniques.  This syntax is entirely forward only fetching only the NEXT row.  I'm cool with that.  I'll bet 99% of cursors you've written have been forward only, fetch-one-at-a-time cursors anyway.  And when you need that other functionality you can always fall back to the old TSQL cursor syntax.  

That's in the spirit of Convention Over Configuration...or "managing by exception".  

 

My Contribution to CoC - the KEY JOIN (quasi-NATURAL JOIN)

Here is my (I think) original contribution to CoC in SQL Server.  I've never seen this proposed...if it has been then I'm sorry and did not mean to steal anyone's idea.  It's called the KEY JOIN.  

How many times have you seen a query with a structure like this

No big deal right? I want to see all employees with the data for the department that they are assigned to.  

To me that query seems very "wordy". Think about it, 99% of the time when you are JOINing employee to department it is going to be by DepartmentId. There is no reason why it would ever be anything else.

And, IMHO, in about 99% of EVERY JOIN I've EVER written I've always JOINd the tables by the same cols...and those cols are the PK on the parent side to the FK on the child side.

The 1% of the JOINs that aren't by key are really edge cases.  Sometimes these are analytic queries and sometimes these are queries where I'm looking to find "fuzzy" matches so I don't want to use the key.  

For the other 99%, I see an opportunity for CoC.  We can manage the 1% as an exception.  

NATURAL JOIN?

Well, it just so happens that the ANSI SQL standard has a NATURAL JOIN syntax that seeks to alleviate the ON clause, thus aiding readability.  Here is the above query using the NATURAL JOIN syntax:

So much easier to read without that ON clause.

But few vendors (Oracle is one) support NATURAL JOIN. In fact, NATURAL JOIN is highly discouraged from use. Why?

  • a NATURAL JOIN actually joins tables by like col names, not DRI. 
  • it's generally thought that if a key col name changes then the NATURAL JOIN would break everywhere so the ANSI standard protects us from our own devices. But seriously folks, if you rename a key col you better realize that you are going to have A LOT of code that changes.  This is a ridiculous argument

To be clear, in the employee/department example if I used NATURAL JOIN the join would be by DepartmentId, assuming it is common in both tables.  But if Name is common in both tables then the join would be but that col as well...and clearly dbo.employee.Name is something entirely different from dbo.department.Name.  Also, many tables you work with probably have auditing cols that are always named the same CreUserId and CreDTime for instance.  In the NATURAL JOIN world those cols would also be part of the equi-JOIN, which is clearly not right.  

So NATURAL JOIN, while far more succinct, is worthless in modern RDBMSs, which is why it is discouraged from use.  Darn.  This is also why most RDBMS vendors don't even bother to support it.  

So, I would love to see something in the ANSI specification (or MS could just implement it as their own extension in TSQL) called something like KEY JOIN.  

KEY JOIN's Conventions

  1. A KEY JOIN will always join two objects by their DRI.  This will be a PK or Unique Key on the parent side and a FOREIGN KEY on the child side.  
  2. If DRI is not present between two objects referenced in a KEY JOIN, then an error should be thrown.  
  3. A KEY JOIN will assume INNER (INNER KEY JOIN) unless otherwise specified.  Just like JOIN is short for INNER JOIN, KEY JOIN is short for INNER KEY JOIN.  
  4. LEFT (OUTER) KEY JOIN would indicate an optional JOIN from tableA to tableB, following the same semantics as a LEFT OUTER JOIN.  
  5. RIGHT KEY JOIN and FULL KEY JOIN would work the same as RIGHT JOIN and FULL JOIN, except the ON clause would be assumed.  
  6. If an ON clause is found during query parsing with a KEY JOIN then an error should be thrown.  

Can anyone see a downside to KEY JOIN?  I can't.  This would really solve readability and errors due to accidentally picking the wrong join condition.  Here is a query I wrote that attempts to show all column names for all tables.  But it's not working.  Why?

Yeah, the ON clause is wrong.  That was probably easy to spot check, but as your queries grow larger your odds of JOINing incorrectly increase and the ability to spot the bug becomes more difficult.  With a KEY JOIN it is almost impossible to screw the query up.  That's just much easier to read.  

I have no clue why this isn't part of the ANSI standard.  

 

Lastly...CREATE or REPLACE

Here's another pet peeve of mine with SQL Server...the verbosity of simple tasks like deploying a new version of a procedure in a database.  I'm sure EVERY stored proc you've EVER written followed some pattern like this:

Maybe you construct Line 1 a little different, but I'm sure the basic pattern is the same.  

Why oh why can't we just have this?

Oh well.  

How can you spot an opportunity for CoC?

I like to write code using known patterns that I've used for years.  Those patterns (like the cursor example) work 99% of the time and I don't need to ever think about the verbosity of the code I'm trying to write.  I just use the pattern.  In the past few years I've realized that patterns really aren't that good after all.  A pattern is just needless verbosity and configuration that will lead to subtle errors.  If we had a shorthand for the given pattern then we could make that the convention and merely manage by exception.  You should look for patterns in your own work that you can factor out and set up as an implicit convention.  This will save you a lot of bug grief and make your code easier to document.  

I've proposed and up-voted many of the above items on Connect (such as %ROWTYPE) and Microsoft just doesn't care.  As for KEY JOIN...well, one day I'm going to seriously propose that.  

 


You have just read Convention over Configuration on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Job req o' the day

Saw this in an email from a recruiter

JASON is a .net web service now?  Retirement must be tough.

And LINK?  Um, can't think up a snarky comment for that one.  

Back in the dot com days I got probably 5 calls a day from recruiters wanting someone that knew "ESS-QUE-ELL" Server.  "Nope, sorry, don't know it, bye."  I won't talk with a recruiter that doesn't know the absolute basics of our industry, like how to pronounce the technology.  If the recruiter doesn't know the basics of the technology then how can she fulfill her client's requisition?  These recruiters are relying on the client to have a perfect req prepared and the recruiter functions solely as the middle man, trying to mine for gold.  But you'll never find gold if you don't know what gold looks like.  These recruiters cold call possible candidates by scouring job boards and LinkedIn.  Sorry, but I've wasted far too many hours taking phone interviews for ESS-QUE-ELL Server jr dba positions that I would've had no interest in if the recruiter would've understood the client's ask.  

(I don't know this Jason either)

Now, I've heard people use the auto mechanic analogy to dispute my contention that recruiters should understand what they are recruiting.  "You don't need to know anything about an internal combustion engine to go to a mechanic when your car won't run."  

Wrong analogy.  That analogy would apply if a business person asked me how to write a query.  I expect a business person not to know technical matters.  That's why they hired me.  And I hire a mechanic because I don't know engines.  

A better analogy for a recruiter is a used car dealer.  "I won't buy a used car from a dealer that can't tell me if the car has a V-8 or a V-6."  Exactly.  The used car dealer functions in an "agency" capacity (he finds you a car that meets your requirements) , just like the recruiter finds a candidate that meets the client's requirements.  

Yes, that's right, I just compared technical recruiters to used car salesmen.  But it's true.  I am buying a product from a recruiter, I expect him to have a cursory knowledge of that which he is selling.  

Not all technical recruiters are bad, just the ones that breath.

(kidding of course)

(Jason Sudeikis?...wait...no, I don't know this Jason either.)


 

You have just read "Job req o' the day" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

Subscribe to RSS - Data Architecture