DaveWentzel.com            All Things Data

tsqlt

Useless Unit Tests

In my last post (Paradox of Unit Testing?) I mentioned that unit tests can give you a false sense of security.  This is especially true if you don't know how and what to unit test.  My examples here will be in pseudo-SQL, not tsqlt.  The concepts are equally applicable to junit or nunit or whatever.  The following is a list of useless unit tests that I see every day.  This list is not exhaustive.  

Having Too Many Tests

How many tests are too many and how many tests are too few?  This is totally subjective.  Assume the following code:

It's very simple, it takes a bit parameter and PRINTs it to the "standard output."  Note that I included 6 unit tests where I am attempting to exercise what exactly happens when I pass in different things for @Value.  But is any of this valuable?  No.  Any person with some experience will SQL Server will know how the rules of parameters for stored procedures work.  These six tests add no value.  They did not exercise the code and add no documentation value.  But you will often see procedures with hundreds of tests...and many/most are tests like this.  I think developers feel as though the more unit tests they have, the better their code will be received by their peers.  Reading hundreds of tests is not useful.  When I write unit tests, if I find I have too many (again "many" being subjective) I then consider breaking the logic up into multiple routines with a smaller concern.  

Commutativity and Associativity Tests

This is a contrived example but I see unit tests where the net effect of the test, usually unintentional, is testing commutativity.  Given the following screenshot:

You may see tests like "Assert that passing in arguments in a different ordinal order generates the same results."  In the case above, that kind of test is pointless.  Commutativity and associativity must be assumed by the developer of a stored procedure or java class.  You shouldn't need to assert that the language holds rules of arithmetic true.  

Attempting to Assert an Invariant and the Null Hypothesis

There are some big words there.  An "invariant" is some element in your code that cannot or should not change.  It is impossible to test an invariant that cannot or should not change using a unit test.  The correct place is to assert your invariant directly in your code.  

Why is it impossible to test that an invariant is invariant in a unit test?  Simple logic dictates that if you could prove an invariant can vary, then it is NOT truly an invariant, and therefore the entire premise for what you are testing is flawed...your code is flawed, your tests are flawed, possibly even your architecture, framework, and design are flawed.  A "varying invariant" can only be discovered "in the wild" and when that happens you realize your assumptions and code are flawed.  

This is known as "null hypothesis testing."  The null hypothesis states that I may not be able to prove that in every (possibly infinite) scenario my code does not break, but that does not mean it may be flawed.  Instead you must prove only ONE case where my code does fail to prove it is flawed.  In other words, I may not be able to always prove "correctness", but you should be capable of proving falseness once.  

OK.  Enough theory.  Assume this piece of contrived code:  

Clearly, following the principles of mathematics, the result must always be >= 1.  It can be nothing else.  We know this because, a priori, an absolute value function will always return a non-negative value and adding 1 to that result must therefore give me at least 1.  Knowledge that is a priori is simply knowledge that does not need experiential justification.  It just "is".  

So, given the laws of additon and absolute value we should know that the above procedure will always return a value >= 1.  Yet you may encounter developers who insist on writing unit tests to prove that.  It can't be done.  The result is always an invariant.  It will be >= 1.  Adding a unit test called something like "assert that the result set is always >= 1", first, is not testable (you would need the null hypothesis, otherwise the assumption must be true), and second, just clutters up any valuable unit tests that may exist.  

Testing Language Elements (a posteriori tests)

First a definition for a posteriori...a posteriori knowledge is experiential knowledge...you need to experience the knowledge, or empirical test it, to understand it.  This is the category of useless unit tests that I see the most.   On the left is our AbsPlusOne procedure from earlier.  I ALWAYS see unit tests that exercise what happens when the procedure is passed the wrong data type.  

You hopefully already knew that if you passed a string to an int parameter that you would generate an error, such as we see above on the right.  If you really knew your TSQL esoterica then you knew this would generate Msg 8114 with the EXACT text displayed in the screenshot.  Most of us don't know our esoterica that well, we simply know that we should never ever pass a string to an int.  How do we know that?  Through experience, basically a posteriori knowledge.  

So the question becomes, is this a valid unit test?  Should a unit test assume the reader has no knowledge of how the language works?  I say no.  If you don't know TSQL, then the unit tests are way over your head.  These unit tests are just noise to the serious developer who is trying to figure out what your procedure does by looking at your comments and unit tests.  

Here is a more blatant example:

Here we have 5 tests for a ridiculously simple procedure.  But simplicity of the procedure does not mean that unit tests are not necessary.  I've written many simple procs that have a handful of tests to help document assumptions for the next developer.  But in this case we have 5 tests that simply assert that "IF" switch branching works as advertised by Microsoft and Sybase in TSQL.  Those tests are not helpful and just clutter up and add noise to things.  

Paradox of Unit Testing?

A quick google search shows no hits for "paradox of unit testing", but there are equivalents, so I don't claim to invent the term.  What I'm about to tell you is so patently obvious that it is galling to me that no one else doesn't see this and attempt to change it.  My "paradox of unit testing" is quite simple...the more we unit test, the worse our code quality is and the less software we sell.  Please don't construe that I am against testing or unit tests.  I'm not.  I'm against excessive testing that shows no value.  And automated testing may actually be making your product worse!  
 
Over-Reliance on Automated (Unit) Testing Tools
For the last half decade every one of my employers has tried to automate more and more of the QA work.  True, for the small, less-than-5% of the automated tests, the code is entirely bug-free.  However, a machine cannot automate and test bugs it doesn't know how to test for.  Here's a well known story that you can google if you don't believe me.  When M$ was developing the Vista release of Windows there was an internal management push to use automated testing tools.  The reports indicated all of the tests passed.  Yet, the public reception of Vista was less-than-stellar.  Users felt the interface was inconsistent and unpolished and full of bugs.  An interface becomes aesthetically pleasing when the interface has consistent patterns of behavior and look-and-feel.  How do you automate testing of that?  You don't.  QA testers will only file bugs for those issues after they use features repeatedly and note the inconsistencies.  These obvious problems (to a human) do not fit the definition of a bug by an automated testing tool.  Taken in toto, these "bugs" led users to generally feel that Vista was inferior to XP.  
 
I am not at all claiming that automated testing tools should be tossed.  I'm merely saying that an over-reliance on them can be detrimental to the general perception of your software product.  
 
Goddamn Sonar and Code Coverage Metrics!
This is a family blog but occassional blasphemy is necessary when something is too egregious.  Sonar is code quality management software.  Sonar can tell you, for instance, what Java classes have unit tests and how many code branches have ZERO tests.  It can then drill-down into your code and determine where the bugs are given the technical debt.  This is less-than-perfect, but it gives you a good feel for where your bugs may be and I'm all for that.  It's another tool in the toolbelt.  
 
The problem is the tool gets a bit cute with its management reporting capabilities.  For instance, let's say your "Code Coverage" is a measly 7% (ie, 7% of your code has identifiable unit tests).  Is that bad?  If I was management, I'd be pissed.  The fact is, you don't need to unit test EVERY line of code.  Do you need to ASSERT that an "if" statement can evaluate a binary proposition and correctly switch code paths?  I think not.  If we needed a formal junit test for every line of code our projects would be even further behind.  
 
There is another report for "Code Duplication".  It's not uncommon to see this at 25% which again freaks out management.  Management says, "I thought you said you weren't supposed to copy/paste code.  Shouldn't that be a class or method? I don't understand you developers."  Well, the fact is that sometimes duplicated code is a good thing.  It's well known that tsql scalar functions (a code reusability feature) performance horrendously.  So in that case, a little duplication is good.  
 
But my favorite report is "Not enough or too many comments."  Whoa.  Is that ever subjective?  Coders:  "Uh oh.  We're only at 26.8% API documentation.  Let's spend the next sprint getting to 75%."  Does that sound like a good use of time?  
 
The bottom line is management is always looking for metrics to improve productivity.  Tools like Sonar cause developers agita as they need to refocus on meaningful work to adding possibly unnecessary tests and code comments simply to get higher sonar scores.  Will quality improve?  Doubtful.  The only metric that management or coders should ever worry about is "software sold."  Everything else is poppycock.  
 
Wasteful unit tests that assert language constructs and invariants
 
I'm going to cover this in the next blog post with examples.  What I see is that many unit tests people write are merely asserting that an IF statement works the way the language vendor has documented it.  That adds no value.  Another example is asserting an invariant has not changed in your code.  A unit test is not the place for this.  More to come in Useless Unit Tests.  
 
Management vs Developers
I hear management saying we have poor quality software.  Sometimes they'll dictate to the coder how to solve the problem..."we are a Kanban shop now.  We will follow good kanban methods and quality shall improve."  Or, usually, management will ask the coders how to improve quality.  Unfortunately, the coders usually say, "more and better testing." The answer should really be more time and resources, but we know that ain't happening.  
 
But the he solution is rarely more unit tests.  In fact, I don't even agree with the claim that software quality is poor.  The computer software you use daily is of really good quality compared to everything else in your life.  It only *seems* like software quality sucks because you are biased towards only remembering what you don't like.  Sorry if you don't agree with me.  
 

The solution to software quality problems is simple.  There is management and there are coders.  Coders want to solve software quality by adding more software...automation tools, TDD, more unit tests.  These tools are meant to prove to management that software is bug-free.  But how do you prove definitively that software is totally bug-free?  You can't.  The logic we coders use is flawed.  

Management doesn't care about any of this.  They want:

  1. more feature functionality
  2. less bugs
  3. faster to market
  4. cheaper

Management couldn't care less (or shouldn't) about TDD or dynamic logic code or more formal testing.  (But they do like the pretty Sonar graphs that they can use to show that their developers are lunk heads).  But management does understand good economics and the Law of Diminishing Returns.  If we focus on tracking down and fixing every last bug we may get better quality, but we will have lost in the marketplace.  

Good management knows that the goal is not bug-free software, the goal is software that has just few enough bugs that a sucker, er customer, will spend money on it.  Many coders don't realize this or have forgotten it.  Stop worrying about endless testing.  

What is the best way for a coder to test her code?

Given that you believe me and my contention that we test too much, what is the ONE thing we absolutely should do as developers to affect quality?  

Code reviews

There is no substitute.  I have spent 10 minutes looking at another person's code and have pulled out handfuls of bugs.  This is code that had full unit tests.  Likewise, I've had other, "junior" level people code review my work and within 10 minutes they've spotted bugs that I thought I had test coverage for.  

Always code review.  

TSQL Unit Testing Tool Comparisons

 

I once had a manager who liked to say, "I want to see checked-in code with tests that prove it doesn't work."  Very Yogi Berra-esque.  

I'm a big proponent of unit testing my TSQL code.  The questions become

  • What tools are available for the data professional?  
  • Which tool is the "best"?  

Tool Availability

First off, there are a lot of unit testing tools that handle database testing, but are not necessarily TSQL unit testing tools.  An example would be DbFit.  This tool allows your Fit/Fitnesse tests to actually execute against your database.  This is cool, and valuable, but really doesn't help me unit test a stored procedure.  Another example is psunit.  This tool allows you to write unit tests for your PoSH code.  I've seen lots of people, myself included use psunit as their basic assertion framework to test tsql.  Again, this is cool, but it isn't a tool designed specifically for testing tsql.  Frankly, any tool that follows the "Arrange/Act/Assert" model can be retrofit to call a database and unit test your SQL.  

So, for this blog post, I'm not going to include unit testing tools that only peripherally touch the database, I only want to compare unit testing tools that specifically focus on TSQL.  A very strong argument can be made that db unit tests should be done in something like DbFit so that the db unit tests are stored and executed with the Java tests and code.  This is exactly how DataDude handles this.  I understand the rationale, I am simply questioning whether this is always the best approach.  In many organizations the db experts are not the Java experts, so keeping the tests separate makes a lot of sense in some circumstances too.  Further, I want the best tsql unit testing tool, not the best unit testing tool for my development environment that also handle tsql.  

What Makes a Good TSQL/database TDD Tool

These are just my opinions 

  • adheres to the AAA (Arrange, Act, Assert) Model for unit testing
  • Has a robust assertion engine that is specific to the needs of a data developer.  For example, support for asserting that a result set matches the expected result set, down to the column ordinals, names, and datatypes is important.  
  • Does not require injecting special assertion or debugging code into your stored procedure.  We don't need that stuff in our production code.  
  • Mocking data should be simple.  
  • Mocking views and stored procs so they return a given output for testing.  
  • has support for automatically setting up (Arranging) and tearing down test data without being destructive to existing data in the database.  So basically I don't need to backup/restore a test database to run my tests.  (Never run unit tests on a production system).  
  • Shouldn't require you to learn another language or tool, such as PoSH to write or run tests.  It should just work in SSMS.  
  • Generates a standard output that is human-readable and automatable so other CI Tools can call it.  Bonus points if it can return HTML/XML so I can plug the output into a build automation tool's standard reporting.  
  • Good documentation and tutorials.  Most data professionals have no idea how to start with TDD and unit testing their TSQL code.  (You could probably make the same case for Java Jocks too).  

Evaluating The Available Tools

The only two tools I've found that are specifically unit testing tools for tsql and stored procedures are TSQLUnit and TSQLT.  

Feature TSQLUnit TSQLT
Open Source (so it's easy to look at the source code and extend it for your needs) Yes Mostly.  The table assertion functionality is in CLR code that is not open source.  The rest of the product is written entirely in tsql.  You are using tsql to test tsql.  
Mocks for data Yes Yes.  tsqlt.FakeTable
Stored proc and view mocking No  Yes.  tsqlt.SpyProcedure.  This is very ingenious.  
Has the ability to "arrange" and "cleanup" test data Yes Yes
Can compare result sets Yes Yes
Test output to XML/HTML No Yes
tests can be written and executed entirely in SSMS No, requires python There is even a plugin helper available from RedGate.  
support from industry vendors Not that I'm aware of RedGate's release of SQLTest is fully supported
Documentation kinda sparse Lots of documentation and tutorials.  
Support (and on-going viability) Doesn't seem to have much new development lately The two authors actively respond to bug and feature requests.  (full disclosure...I was a co-worker with one of the authors)
Deployment test code is in dbo schema with your production objects Uses separate schemas that map closely to test classes, just like Java.  They are only deployed if your tests are run.  If you follow the test class creation pattern in their tutorial you can run tests and you will have no remaining testing artifacts in your database.  

 

Clearly I think the world of TSQLT.  Even if you do not do TDD or don't understand unit testing, you really should download the tsqlt code and peruse it.  I know that I learned quite a bit about transact-sql by looking through their code.  It's just totally ingenious.  If you are a database guy and are not familiar with TDD or unit tests then try using it when you write your next stored proc.  Your unit tests will likely be amateurish but your code will be much easier to maintain and will have fewer bugs.  In a few years you'll look back and laugh at your early unit tests.  That's ok, it's a journey.  

Unit testing with TSQLT

I mentioned here and here how much I really love tsqlt for unit testing.  I noticed today that redgate has a product out to assist with tsqlt and unit tests via some GUI enhancements for Management Studio.

Unit Testing Is Not a Panacea

Perhaps you work at a company that REQUIRES all code to be unit tested.  Perhaps you work at a company where only some code is unit tested.  Perhaps only Java code is unit tested, not SQL code, etc.  I've worked in all of these environments and in each environment some code that was indeed unit tested did not work and some code that was never unit tested always worked like a champ since Day One.  Certainly unit tests help to get you to the Promised Land of bug-free code, but just because you have them doesn't mean your code works.  

Data Contracts for Stored Procedures

For years I've been tasked with fixing performance problems with SQL code, usually stored procedures.  If you've ever done this you know there are a few things you can and can't do when refactoring SQL if you can't change the underlying client.  For instance, say you have a report that is running slowly.  It's easy to quickly cowboy a change to your stored procedure because TSQL is not compiled code, but changing the report definition usually can't be done on the fly, at least not easily.  

I follow a process whenever I refactor TSQL.  I make sure that I never change the inputs or outputs, only the "guts in the middle."  If you adhere to this rule your TSQL refactoring will never break your application.  I call this a data contract.  Any part of my TSQL that is exposed to a client cannot be changed.  This includes:

  • result set column list
  • result set column order
  • result set column names/data types
  • input parameter order (for those developers who do not name their param lists), names, and data types (stored procs only)
  • output parameter order, names, and data types (stored procs only)
  • RETURN values (stored procs only)

SQL Server natively has no concept of a data contract, although plenty of people have proposed the idea with sample syntax changes for TSQL and there is even a Connect request for this.  

 If I can refactor TSQL code without breaking the data contract I don't need a .NET programmer to change and compile *any* of his code.  This actually isn't a "data contract" in the WCF/.NET sense of the term, but I think it adequately describes what is important.  

There will be times when you need to ALTER the data contract to add new functionality.  I always handle this by requirement:

  • If a new result set column is required it goes at the end of the list.  If it is at the end then developers who coded against a named or ordinal result set column will not break.  At that point forward I can no longer change the name, data type, or ordinal number of that new result set column.
  • If a new input parameter is required it always goes at the end of the list and is NULLable.  
  • Output params are handled similarly.  

All of this is probably common sense.  But there really are not any good tools that will tell you if your refactoring efforts have broken your data contract.  In the past I've relied on:  

  • My manual testing tools
  • Using SET FMTONLY will give you data types on underlying result set cols which may be necessary if you are changing arithmetic formulas in a calculation.  
  • On Oracle a PACKAGE defines the data contract nicely for the contained procedures.  SQL Server does not have this
  • Most ORM tools can validate your schema for you.  If it reports changes, then you broke the daa contract.  

My new favorite tool for handling data contract testing (and TSQL unit testing in general) is tsqlt.  It is open-source and works like a champ.  Soon I'll cover some additional details regarding how to test data contracts with tsqlt.  

Subscribe to RSS - tsqlt