DaveWentzel.com            All Things Data

Data Architecture

TSQL Unit Testing Tool Comparisons

 

I once had a manager who liked to say, "I want to see checked-in code with tests that prove it doesn't work."  Very Yogi Berra-esque.  

I'm a big proponent of unit testing my TSQL code.  The questions become

  • What tools are available for the data professional?  
  • Which tool is the "best"?  

Tool Availability

First off, there are a lot of unit testing tools that handle database testing, but are not necessarily TSQL unit testing tools.  An example would be DbFit.  This tool allows your Fit/Fitnesse tests to actually execute against your database.  This is cool, and valuable, but really doesn't help me unit test a stored procedure.  Another example is psunit.  This tool allows you to write unit tests for your PoSH code.  I've seen lots of people, myself included use psunit as their basic assertion framework to test tsql.  Again, this is cool, but it isn't a tool designed specifically for testing tsql.  Frankly, any tool that follows the "Arrange/Act/Assert" model can be retrofit to call a database and unit test your SQL.  

So, for this blog post, I'm not going to include unit testing tools that only peripherally touch the database, I only want to compare unit testing tools that specifically focus on TSQL.  A very strong argument can be made that db unit tests should be done in something like DbFit so that the db unit tests are stored and executed with the Java tests and code.  This is exactly how DataDude handles this.  I understand the rationale, I am simply questioning whether this is always the best approach.  In many organizations the db experts are not the Java experts, so keeping the tests separate makes a lot of sense in some circumstances too.  Further, I want the best tsql unit testing tool, not the best unit testing tool for my development environment that also handle tsql.  

What Makes a Good TSQL/database TDD Tool

These are just my opinions 

  • adheres to the AAA (Arrange, Act, Assert) Model for unit testing
  • Has a robust assertion engine that is specific to the needs of a data developer.  For example, support for asserting that a result set matches the expected result set, down to the column ordinals, names, and datatypes is important.  
  • Does not require injecting special assertion or debugging code into your stored procedure.  We don't need that stuff in our production code.  
  • Mocking data should be simple.  
  • Mocking views and stored procs so they return a given output for testing.  
  • has support for automatically setting up (Arranging) and tearing down test data without being destructive to existing data in the database.  So basically I don't need to backup/restore a test database to run my tests.  (Never run unit tests on a production system).  
  • Shouldn't require you to learn another language or tool, such as PoSH to write or run tests.  It should just work in SSMS.  
  • Generates a standard output that is human-readable and automatable so other CI Tools can call it.  Bonus points if it can return HTML/XML so I can plug the output into a build automation tool's standard reporting.  
  • Good documentation and tutorials.  Most data professionals have no idea how to start with TDD and unit testing their TSQL code.  (You could probably make the same case for Java Jocks too).  

Evaluating The Available Tools

The only two tools I've found that are specifically unit testing tools for tsql and stored procedures are TSQLUnit and TSQLT.  

Feature TSQLUnit TSQLT
Open Source (so it's easy to look at the source code and extend it for your needs) Yes Mostly.  The table assertion functionality is in CLR code that is not open source.  The rest of the product is written entirely in tsql.  You are using tsql to test tsql.  
Mocks for data Yes Yes.  tsqlt.FakeTable
Stored proc and view mocking No  Yes.  tsqlt.SpyProcedure.  This is very ingenious.  
Has the ability to "arrange" and "cleanup" test data Yes Yes
Can compare result sets Yes Yes
Test output to XML/HTML No Yes
tests can be written and executed entirely in SSMS No, requires python There is even a plugin helper available from RedGate.  
support from industry vendors Not that I'm aware of RedGate's release of SQLTest is fully supported
Documentation kinda sparse Lots of documentation and tutorials.  
Support (and on-going viability) Doesn't seem to have much new development lately The two authors actively respond to bug and feature requests.  (full disclosure...I was a co-worker with one of the authors)
Deployment test code is in dbo schema with your production objects Uses separate schemas that map closely to test classes, just like Java.  They are only deployed if your tests are run.  If you follow the test class creation pattern in their tutorial you can run tests and you will have no remaining testing artifacts in your database.  

 

Clearly I think the world of TSQLT.  Even if you do not do TDD or don't understand unit testing, you really should download the tsqlt code and peruse it.  I know that I learned quite a bit about transact-sql by looking through their code.  It's just totally ingenious.  If you are a database guy and are not familiar with TDD or unit tests then try using it when you write your next stored proc.  Your unit tests will likely be amateurish but your code will be much easier to maintain and will have fewer bugs.  In a few years you'll look back and laugh at your early unit tests.  That's ok, it's a journey.  

Autonomous Transactions

IMO the most obnoxious shortcoming of Transact-SQL (after maybe the goofy error-handling that won't allow you to rethrow any and every error) is the lack of native autonomous transactions.  I wrote about this here and here.  I've only ever found 2 good reasons for autonomous transactions, but they are HUGE:  

  1. The ability to ROLLBACK a long-running transaction but maintain any error logging for later analysis.  The workaround, and it has a lot of obnoxious limitations, is to log to a @TableVariable and persist that to an actual table after you have done your ROLLBACK.  
  2. The ability to spawn off a separate transaction to get more keys from a central keystore.  In SQL 2012 we have SEQUENCEs which function the same way, unfortunately, not all ORMs and custom-built in-house code will be able to use them.  At least not right away.  

The problem with Number 2 is that we can't write stored procedures that spawn the separate transaction needed to get more keys.  Obviously there are more uses for autonomous transactions, but let's continue the demo.  In this screenshot I have a call to a key generator from within a transaction.  I ask for 10 keys and I can see my LastUsedObjId has been incremented to 645 as expected.  

Now let's assume that call is a long-running query.  Assume that another similar call is made from another connection/transaction while the first spid is executing.  What will happen?  Blocking...

 

OK.  Let's issue a ROLLBACK on both connections.  It turns out that SQL 2008 and above can get you an autonomous transaction if you are OK with using a Loopback linked server.  Here I create a loopback linked server and ensure that RPC OUT is set.  The magic is 'remote proc transaction promotion'.  This setting ensures that MSDTC does not enlist any calls to this linked server into a distributed transaction.  Essentially, it just spawned off an autonomous transaction for you.  

So, let's demo this.  All we need to do is alter our original code slightly to call the linked server.  Here is how I do it:  

Notice I build the stored proc call in a variable that simply builds my 4 part name call.  I have nothing "hard-coded" with this...it will work on a test, prod, or dev environment, which aids in supportability.  Note that my LastUsedObjId was 655 and it was incremented correctly to 665.  Now if I open a new window/spid and paste the code and execute it:  

Note that the spid is NOT blocked this time.  That's because the call to the sequence generator was called in an autonomous transaction.  Note that the new spid correctly saw the first spid's updated LastUsedObjId and properly incremented it's LastUsedObjId. 

That's what we wanted.  

How Big is a BIGINT?

8 bytes.  Who cares?

Like many shops, we use a BIGINT datatype as a sequence for Id's in our tables.  We generate lots and lots of these every second.  There's now a requirement where we may have to generate hundreds of thousands every few seconds.  People started sweating.  "Are we sure that we won't overflow a BIGINT after a few years if we are creating this many Identifiers so quickly?"  

Let's do the math.  

How many Identifiers can I create EVERY SECOND, before I will run out of Identifiers in 100 years?  

2.9 million identifiers can be created per second for the next 100 years before we overflow a BIGINT.  

And that's assuming we ONLY use the positive side of ZERO.  We didn't even calculate the negative Identifiers we could create.  Effectively this doubles the available Identifiers we have.  

In my mind, BIGINT really is a BIG datatype.  

Object Relational Mappers Series - So You Want to Use an ORM Anyway

This is my last post on my series on ORMs.  Even after all of the arguments against an ORM, everyone wants to use them.  It seems like there are always two reasons for this:  1)the lure of keeping developers from learning SQL is just too great 2)the belief that the issues with ORM tools won't happen to me.  In every company I've been involved with, when an ORM was proposed, it was eventually implemented regardless of my protests.

Object Relational Mappers Series - The SELECT * Problem

DBAs have been bitching for years that we should never use SELECT * in our queries (or INSERT INTO statements without a column list).  ORMs, usually by default, violate this principle.  

More on the Halloween Problem

I wrote a few weeks back about Eager Spools in Query Plans and the Halloween Problem.  Three days after I wrote that Paul White put up a blog post about an ENTIRE SERIES he did on sqlperformance.com.  He is much more thorough and authoritative than I could ever hope to be.  In a nutshell, the Halloween Problem occurs when we read and write keys of a common index.  There are other causes as well.  

I just wanted to pull out a few gems that I either learned or found very important from Paul's work.  I encourage anyone who works on tweaking performance of TSQL to spend some time reading Paul's findings on the Halloween problem.  

  • The Eager Spool essentially reads every row from the child operator and stores that information in a temporary structure before it proceeds to the next step.  Under memory pressure this means tempdb spills to disk.  
  • One alternative is an index hint so we avoid using the index that requires halloween protection, ie the index keys are unstable.  
  • A query plan requiring halloween protection will require more locks, and longer locks, then an alternative plan.  
  • If an UPDATE statement modifies one of the keys of a composite index you may consider making that column an INCLUDED column instead.  
  • Halloween Protection is required for INSERT statements where the target table is also referenced in the SELECT statement.  
  • HP is required for DELETE statements with self-join relationships.  
  • If your performance issue is an Eager Spool on an INSERT...WHERE NOT EXISTS (<in table already>) query (Paul calls this a "hole-filling query") you can try a MERGE statement instead.  Here are the rules:
    • WHEN NOT MATCHED BY TARGET clause must EXACTLY match the ON clause in the USING clause.  
    • the target must have a unique key
  • You may see an Eager Spool in a query that uses a scalar function, even if that scalar function does not access any tables.  This happens because SQL Server must assume that any scalar function not declare with SCHEMABINDING may re-read from one of your tables, so SQL Server must protect from the HP.  The solution is to add SCHEMABINDING

CREATE or REPLACE pattern for Transact SQL

I just saw a blog post by someone discussing the pros and cons of DROP/CREATE vs ALTER for changing a stored procedure.  I was not able to comment on the blog post (seems like an error with their CMS) but I wanted to mention some issues with it and some better solutions.  

I generally use DROP/CREATE most often in my work, but I'm trying to change that.  

  • If you have Zero Downtime requirements you risk spurious errors using DROP/CREATE.  In the time it takes to drop the proc, recreate it, then issue the GRANTs your online users could see errors
  • The author mentioned that using ALTER requires dynamic sql.  That's not true, below I show you the pattern I use.  Dynamic SQL is difficult to use and read for many developers.  Doubling up quotes tends to be buggy.  
  • Oracle has the "CREATE or REPLACE PROCEDURE" syntax to avoid all of this.  I wish MS would give us this.  We would not need to do existence checking in metadata tables before creating or altering a routine.  

In any case, this is the pattern I use that is very close to Oracle's CREATE or REPLACE PROCEDURE syntax.  

Object Relational Mappers Series - The N+1 SELECTs Problem

This is the next post on ORMs.  Suppose you have a collection of Category objects, which are really just rows in a Category table.  Each Category has a collection of Products.  It's not uncommon to want a listing of Categories with their Products.  In SQL we would have code that looks something like this:  

SELECT *
FROM Category c JOIN Product p on C.Id = P.CategoryId

This will run fairly fast.  An ORM, however, will likely do something like this:  

Pages

Subscribe to RSS - Data Architecture