DaveWentzel.com            All Things Data

July 2010

Doing Database CIT via TFS

Continuous Integration Testing (CIT) is an agile concept where you are constantly testing your code checkins to ensure they build, compile, and have successful unit tests.  Notification of failures for smaller shops is usually just an email to an SCM who then tracks down what broke using the version control system and notifies that person.  In larger shops a special CIT machine will sit in the scrum room with a big red or green light indicating if the builds are passing CIT.  When it goes red someone usually steps up to determine cause. 

In most shops this CIT concept is a no-brainer for the application code (tools like Apache Ant and CruiseControl really help with the automation and scheduling) , but is rarely done for the database scripts.  One reason is that most applications have a database build process that doesn't lend itself to CIT-style processing easily (for instance, requiring db-level object diffs to determine what needs to be applied).  My database build process is more of an evolutionary process, so by definition it does lend itself to CIT and change quite nicely.  

This post is the next in a series of posts showing useful TFS tips for the data professional.  I'll cover how I created a very simple script that handles database CIT using TFS as the source control repository.  You could also plug in a different VCS quite easily. It uses vbscript and batch files but PowerShell or a compiled application would work just as well, the concept is the important thing.   

I have the actual files available here for download.  Here is the process:

A controlling batch file (called RunCIT.bat) is scheduled using AT.exe or similar scheduler, this handles the automation component.  Let's look at some of the interesting points of the batch file...

Lines 1 - 14...the batch file has a series of parameters that tell it which server to run the db scripts against, user/pwd, location of the logs, and the TFS connection string.  

It then builds a loop (line 19) and checks  "branches.txt" (a csv file) for a list of branches that we want to run CIT against.  In our environment we have at least 5 active code/release branches at any time so I want to run CIT against all branches.  Our SCM can change this file whenever a new branch is created or deprecated.

As a side note, if you've ever tried to do looping in a batch file, it's a real PITA.  If you've ever tried to expand your DOS variables within that loop you know it's almost impossible.  I found a nifty workaround to make this happen using "SETLOCAL ENABLEDELAYEDEXPANSION" and alternative variable naming schemes.  See Lines 3 and 27 for examples. 

Within the loop we then run a series of tf.exe commands to Get Latest for the source code for that branch.  This means that your build machine needs to have Visual Studio loaded on it (unless someone knows how to get the tf.exe commands to run without it).  

The scripts for the given branch are then executed, first against a "net new" database and then a "ported" (upgraded) database, each is set up via manipulate_db.v-b-s.  I've found that naming a vbscript file as anything other than .vbs and then calling it with cscript/wscript is an easy way to bypass UAC on newer Windows OSs

Why run CIT against a "new" and "upgraded" database?

Unlike C# code, where I simply need to test if I compile and my unit tests succeed, a db build is a little more tricky because I need to be concerned about data history and upgrading data.  If I'm changing a UNIQUE INDEX from ColA,ColB to just ColA I can't just CREATE INDEX WITH DROP_EXISTING since it's possible an existing db will have violations.  I need to test that.  If I'm adding ColC to a table I need to remember to add that to the CREATE TABLE statement (for new databases) and an appropriate ALTER for upgraded dbs.  In each case the "new" and "upgraded" database take different codepaths through the sql scripts. 

So your upgraded or "ported" db should probably be a customer db with some substantial data.  You can even run against multiple upgraded dbs, perhaps one from each version of your software that you may have to upgrade. 

But the key is Line 73, GenerateDiscrepancyReport.vbs.  This compares netnew vs ported dbs and reports on missing (or schematically diferent) tables, views, columns, indexes discrepancies (different cluster keys, cols), and it will even report any data changes if you have defined system or master data tables that you ship with your db.  This is handled via 2 stored procs (here and here). 

Email is sent on success and failure with the logs attached. 

Happy Testing!!

How I automate my mornings with TFS

This is the next installment on my "Tips for using TFS" for a database person.  Every morning I need to Get Latest and Build/Debug our Visual Studio solution.  To do this via the IDE is a PITA considering I never change C# code (I'm a data architect after all).  With VS2005 I simply opened the solution and then did a build and debug.  It took about 30 minutes, during which I couldn't look at anything else in TFS without starting another instance of the IDE (and I rarely have RAM for that).  VS2010 I think is a real dog...it's resource-hungry and it seems to crash a few times a day, even for the simplest of tasks that I do.  So my solution is to avoid the IDE entirely.  

Here is how I automate my mornings using TFS (script) (callout).  This script does a get latest, followed by a build/debug, followed by updating my database(s) using my database build process.  I can do this against each branch as needed as well.  Let's look at the script... 


How to Make ROW_NUMBER Faster

I wrote a little bit about ROW_NUMBER when it first came out.  Officially this is a ranking/windowing function but I'd bet 99% of developers use it to assign an IDENTITY to an result set.  This may have a performance penalty however.

Row Constructors

Starting with SQL Server 2008 we now have row constructors.  "Row constructors"....wow...just using that phrase in a sentence at your job should be worth at least a 5% raise.  Think of row constructors as a string of VALUES clauses that will INSERT multiple rows into a table.  Here's an example.  But, there's an even cooler use.


Cool Uses For Windowing (Ranking) Functions

This is my first in a series of blog posts on window functions in SQL Server.  You are probably most familiar with ROW_NUMBER() OVER().  These are new features as of SQL 2005.  

The SQL:2003 specification now standardizes windowing functions.  Even Oracle and PostgresSQL have implemented them.  Window functions are defined as aggregates computed over a window.  Wow, that's helpful.  "An epistemologist is someone who studies epistemology."  Windowing can be thought of as an aggregation without the result set being filtered where the aggregated data is "mixed in" with the query result set.  Sometimes a few examples can drive the point home.  In the next few posts in this series I'll cover some really cool uses for windowing functions.  

Questions to ask about your company's 401K

Do a google search and you'll see tons of advice on how to handle your 401K investments, what to invest in, how much to set aside, why you shouldn't borrow against your plan, and other pedestrian topics.  But what if your plan is so hideous that you want to approach your committee about radical changes?  Do you even know if your plan is hideous?  I did some research and here's what I found.


TFS Power Tools

Continuing my series on TFS Tips, I thought I'd point out TFS Power Tools.  These are many 2 favorite tools...

I come from a Subversion background and love TortoiseSVN and it's shell integration features. I find that I can't do serious db development from within Visual Studio, I really need SSMS (or TOAD when I'm developing Oracle).  The process becomes tedious:


T Ball Notes

With T Ball wrapping up I wanted to jot down some notes for ideas I had for games and practices, in case I ever have to coach again.  You can find this scattered around the internet but most are paid sites and the rest have suggestions that aren't really workable for kids under age 12 (you really expect T Ballers to practice turning double plays??)


Agile and Scrum tools for TFS

I just stumbled across some nifty tools for those of us trying to practice agile/scrum in a Visual Studio world.  

SQL Server Data Files

I think I've proved earlier (as have others on the net) that autogrowth of data files is *not* your friend.  However, that seems to beg the question...then are more data files better than fewer data files, and if so, what's the magic formula?