DaveWentzel.com            All Things Data

July 2011

ETL Series

I may be getting a contract soon to work on some ETL processes.  I have tons of tips scattered around my website regarding how I've done these things in the past, but I thought I would consolidate my thoughts in a few posts to re-familiarize myself with the patterns I've used successfully.  The next few posts will cover these Best Practices in more detail.  (Links to the other posts)

Re-throwing Errors in TSQL

In my previous post I mentioned the new THROW keyword in Denali.  This allows you to re-throw an error from within a CATCH block easily.  But you can still re-throw an error pre-Denali...sorta.  Here's a pattern I've always used since the advent of TSQL TRY/CATCH.  

This pattern actually illustrates a number of useful features that I find lacking in most TSQL.  

  • How to handle error handling when stored procedure calls are nested.  
  • How to ROLLBACK gracefully when a stored procedure fails, regardless of nesting
  • How to use a "universal, generic error handler/logging procedure"
  • How to re-throw an error in pre-Denali TSQL


--PRESS CTL+T before executing to see Errors in a more-readable fashion

set nocount on

GO

USE tempdb

GO


IF OBJECT_ID('Foo') IS NOT NULL

  
DROP TABLE Foo

GO


IF OBJECT_ID('ErrorLog') IS NOT NULL

  
DROP TABLE ErrorLog

GO


IF OBJECT_ID('SimulateError') IS NOT NULL

  
DROP PROC SimulateError

GO


IF OBJECT_ID('Wrapper') IS NOT NULL

  
DROP PROC Wrapper

GO


IF OBJECT_ID('WrapItAgain') IS NOT NULL

  
DROP PROC WrapItAgain

GO


--this is a table that we will use to generate errors later.  

CREATE TABLE Foo (ID int)

GO

CREATE TABLE ErrorLog (ErrorMessage varchar(2000))

GO

CREATE PROCEDURE LogError

    
@Caller        SYSNAME = NULL,

    
@ErrMessage        varchar(2000) = '' OUTPUT --additional information to pass in for logging purposes

AS

BEGIN

    SET NOCOUNT ON



    
-- Do nothing if there is no error.  LogError is called in the CATCH block of other procs.

    
IF ERROR_NUMBER() IS NULL

        
RETURN



    DECLARE

        
@ErrorMessage      NVARCHAR(2048)

        ,
@ErrorProcedure    SYSNAME

        
,@SvreCd            CHAR(1)

        ,
@ErrorNumber       INT

        
,@ErrorSeverity     INT

        
,@ErrorState        INT

        
,@ErrorLine         INT



    SELECT

        
@ErrorProcedure = COALESCE(@Caller, ERROR_PROCEDURE(), 'Called outside a Stored Proc')

        ,
@ErrorMessage = ERROR_MESSAGE()

        ,
@ErrorNumber = ERROR_NUMBER()

        ,
@ErrorSeverity = ERROR_SEVERITY()

        ,
@ErrorState = ERROR_STATE()

        ,
@ErrorLine = ERROR_LINE()



    
--is this a rethrow?

    
IF (@ErrorMessage LIKE 'My Error: %') --this can be changed to suit your needs.  

    
BEGIN

        
-- Format the error message properly

        -- Set the error message to include any appended information

        
SET @ErrorMessage = @ErrorMessage + COALESCE(@ErrMessage, '')

    
END

    ELSE

    BEGIN

        
-- We need to format the error to include the original error information in the message

        -- so that it doesn't get lost as it goes up the call stack

        
SET @ErrorMessage =

                
'My Error: [' + @ErrorProcedure +

                
', line: ' + CAST(@ErrorLine AS VARCHAR) +

                
', ErrorNumber: ' + CAST(@ErrorNumber AS VARCHAR) + '] ' +

                
@ErrorMessage + COALESCE(@ErrMessage, '')

    
END



    
--insert into our log table

    
INSERT INTO ErrorLog VALUES (@ErrorMessage)

  




END;

GO

CREATE PROC SimulateError AS

BEGIN

BEGIN
TRY

  
--simulate an error...we create a table that already exists

  
CREATE TABLE Foo (ID int)

END TRY

BEGIN CATCH;

    
IF (@@TRANCOUNT > 0)

        
ROLLBACK;

  
DECLARE @ErrMessage varchar(8000), @Caller varchar(2000)

  
SELECT @Caller = object_name(@@procid)

  
EXEC LogError @Caller = @Caller,@ErrMessage = @ErrMessage OUTPUT

  
RAISERROR (@ErrMessage,16,1);

END CATCH;

END;

GO

CREATE PROC Wrapper AS

BEGIN

BEGIN
TRY

  
SELECT 'In Wrapper, now calling SimulateError'

  
EXEC SimulateError

END TRY

BEGIN CATCH;

    
IF (@@TRANCOUNT > 0)

        
ROLLBACK;

  
DECLARE @ErrMessage varchar(8000), @Caller varchar(2000)

  
SELECT @Caller = object_name(@@procid)

  
EXEC LogError @Caller = @Caller,@ErrMessage = @ErrMessage OUTPUT

  
RAISERROR (@ErrMessage,16,1);

END CATCH;

END;

GO

CREATE PROC WrapItAgain AS

BEGIN

BEGIN
TRY

  
EXEC Wrapper

END TRY

BEGIN CATCH;

    
IF (@@TRANCOUNT > 0)

        
ROLLBACK;

  
DECLARE @ErrMessage varchar(8000), @Caller varchar(2000)

  
SELECT @Caller = object_name(@@procid)

  
EXEC LogError @Caller = @Caller,@ErrMessage = @ErrMessage OUTPUT

  
RAISERROR (@ErrMessage,16,1);

END CATCH;

END;

GO

SELECT 'PRESS CTL+T before executing to see how rethrowing an error behaves.  '

PRINT 'Calling SimulateError.  '

PRINT 'Workaround to re-throw an error is helpful, BUT we STILL can''t examine @@error natively.'

PRINT 'there is no way around this. The trick is to remember that 50000 means a bubbled error, so parse for the real error if you need it.'

exec SimulateError

PRINT 'What does ErrorLog look like...'

select * frrom ErrorLog

PRINT 'Purge ErrorLog...'

delete from ErrorLog

PRINT 'Calling Wrapper which calls SimulateError'

PRINT 'Here I get a thrown error telling me the error was in Wrapper, nested once (two Msg 50000).  Also ErrorLog is showing the stack.  '

exec Wrapper

select * from ErrorLog

PRINT 'Purge ErrorLog...'

delete from ErrorLog

PRINT 'Calling WrapItAgain (which calls Wrapper which calls SimulateError).  Just to show the stack better.'

exec WrapItAgain

select * from ErrorLog

Go


DROP TABLE Foo

GO


DROP TABLE ErrorLog

GO


DROP PROC LogError

GO


DROP PROC SimulateError

GO


DROP PROC Wrapper

GO


DROP PROC WrapItAgain

GO

Tags: 

Re-throwing Errors in TSQL and Denali

Error handling in TSQL, in a word, sucks.  Prior to 2005 you had to manually check @@ERROR after every command you needed to check...the equivalent of setting ON ERROR RESUME NEXT in vbscript.  In 2005 we were graced with TRY/CATCH...an improvement, but still limited.  See one of my old posts for more.  

One very annoying shortcoming is the inability to re-THROW an error after it is caught. THROW is common in most programming languages.  For example, in PowerShell you can do something like this:
 
try{

    
do-something

}


catch [System.Management.Automation.ItemNotFoundException]  #or whatever, it doesnt matter

{

     #I know about
this error, I can handle it

     echo
"You did blah blah blah but thats OK, continuing"

}

catch [System.Management.Automation.RuntimeException] # a different example

{

     #lets log
this fact and re-throw the error

     log
-error $Error

    
throw

}

catch{

     #something
else happened, and I have no clue what that is.  re-throw the error

    
throw

}
 
So, why do we care?  
  1. If the above code is in a function and the error isn't rethrown I lose the error if the function is nested in another routine.  This happens when you want to have a "generic logger" routine that you call.  So you want to log the error, then re-THROW it.  
  2. Because some shops MANDATE that ALL code, even TSQL code, be wrapped in TRY/CATCH, even when the language doesn't support THROW.  
Back to TSQL...
 
I find that I rarely use TSQL TRY/CATCH.  Why?  Before I execute any TSQL that has the possibility of throwing any error I write a check to see if the data will throw the error up front...an EXISTS/NOT EXISTS block, if you will.  In cases where I can't check upfront for an error then I resort to TRY/CATCH.  It's frankly just easier to handle massaging data before it throws an error.  
 
When I tell this to people they cringe because they've always been told to wrap all calls in a TRY/CATCH block (see Point 2 above).  In a programming language that supports THROW, I agree.  But, since SQL Server does not have THROW capabilities, the actual error message is somewhat lost if I nest procedures.  A simple example to illustrate this:  
 
USE tempdb

GO


SET NOCOUNT ON

GO

CREATE PROC TestRethrow AS

BEGIN

   CREATE TABLE
#Foo (BarID int PRIMARY KEY)

  

  
INSERT INTO #Foo VALUES (1)

  
--generate a PK Error

  
INSERT INTO #Foo VALUES (1)

END;

GO

SELECT 'Testing with no error handling...'

EXEC TestRethrow;

GO

ALTER PROC TestRethrow AS

BEGIN

   CREATE TABLE
#Foo (BarID int PRIMARY KEY)

  

  
BEGIN TRY

  
INSERT INTO #Foo VALUES (1)

  
--generate a PK Error

  
INSERT INTO #Foo VALUES (1)

  
END TRY

  
BEGIN CATCH

      
IF ERROR_NUMBER() = 2627

      
BEGIN

          
--assume we "do something" with the error (log it maybe)

           --and now we want to rethrow it

          
RAISERROR (2627,16,1)

      
END

   END
CATCH

END;

GO

SELECT 'Testing RAISERROR to rethrow...'

EXEC TestRethrow;

DROP PROC TestRethrow
 
 
In Denali we can now rethrow an error gracefully:
 
USE tempdb

GO


SET NOCOUNT ON

GO

CREATE PROC TestRethrow AS

BEGIN

   CREATE TABLE
#Foo (BarID int PRIMARY KEY)

  

  
INSERT INTO #Foo VALUES (1)

  
--generate a PK Error

  
INSERT INTO #Foo VALUES (1)

END;

GO

SELECT 'Testing with no error handling...'

EXEC TestRethrow;

GO

ALTER PROC TestRethrow AS

BEGIN

   CREATE TABLE
#Foo (BarID int PRIMARY KEY)

  

  
BEGIN TRY

  
INSERT INTO #Foo VALUES (1)

  
--generate a PK Error

  
INSERT INTO #Foo VALUES (1)

  
END TRY

  
BEGIN CATCH

      
IF ERROR_NUMBER() = 2627

      
BEGIN

          
--assume we "do something" with the error (log it maybe)

           --and now we want to rethrow it

          
THROW

      
END

   END
CATCH

END;

GO

SELECT 'Testing RAISERROR to rethrow...'

EXEC TestRethrow;

DROP PROC TestRethrow
 
In my next post I'll cover how to re-throw an error in pre-Denali SQL since most of us won't get to deploy Denali for a production project for quite some time. 
 
This is definitely an improvement to TSQL error handling.  
 
Tags: 

Data Contracts for Stored Procedures

For years I've been tasked with fixing performance problems with SQL code, usually stored procedures.  If you've ever done this you know there are a few things you can and can't do when refactoring SQL if you can't change the underlying client.  For instance, say you have a report that is running slowly.  It's easy to quickly cowboy a change to your stored procedure because TSQL is not compiled code, but changing the report definition usually can't be done on the fly, at least not easily.  

I follow a process whenever I refactor TSQL.  I make sure that I never change the inputs or outputs, only the "guts in the middle."  If you adhere to this rule your TSQL refactoring will never break your application.  I call this a data contract.  Any part of my TSQL that is exposed to a client cannot be changed.  This includes:

  • result set column list
  • result set column order
  • result set column names/data types
  • input parameter order (for those developers who do not name their param lists), names, and data types (stored procs only)
  • output parameter order, names, and data types (stored procs only)
  • RETURN values (stored procs only)

SQL Server natively has no concept of a data contract, although plenty of people have proposed the idea with sample syntax changes for TSQL and there is even a Connect request for this.  

 If I can refactor TSQL code without breaking the data contract I don't need a .NET programmer to change and compile *any* of his code.  This actually isn't a "data contract" in the WCF/.NET sense of the term, but I think it adequately describes what is important.  

There will be times when you need to ALTER the data contract to add new functionality.  I always handle this by requirement:

  • If a new result set column is required it goes at the end of the list.  If it is at the end then developers who coded against a named or ordinal result set column will not break.  At that point forward I can no longer change the name, data type, or ordinal number of that new result set column.
  • If a new input parameter is required it always goes at the end of the list and is NULLable.  
  • Output params are handled similarly.  

All of this is probably common sense.  But there really are not any good tools that will tell you if your refactoring efforts have broken your data contract.  In the past I've relied on:  

  • My manual testing tools
  • Using SET FMTONLY will give you data types on underlying result set cols which may be necessary if you are changing arithmetic formulas in a calculation.  
  • On Oracle a PACKAGE defines the data contract nicely for the contained procedures.  SQL Server does not have this
  • Most ORM tools can validate your schema for you.  If it reports changes, then you broke the daa contract.  

My new favorite tool for handling data contract testing (and TSQL unit testing in general) is tsqlt.  It is open-source and works like a champ.  Soon I'll cover some additional details regarding how to test data contracts with tsqlt.  

Pattern: Overcoming Deferred Name Resolution When Using Stored Procs for Code Reusability

I hate Deferred Name Resolution in SQL Server mostly because it is inconsistent.  If you don't know what DNR is, see this article.  In a nutshell, I can create a stored proc that references a non-existent table, but I can't reference a non-existent column in an existing table.  Here's a simple proof: 

USE tempdb

IF NOT EXISTS (select * from sysobjects where id = object_id('Foo'))

  
PRINT 'Foo does not exist, OK'

GO

CREATE PROC BAR AS SELECT ID FROM Foo

GO


SELECT name, type FROM sysobjects WHERE name = 'BAR' and type = 'P'

GO

DROP PROC BAR

GO

And the output...

...confirms that I could create a proc referencing a non-existent table.  But now let's create the table first but reference a non-existent column:

USE tempdb

CREATE TABLE FOO (colA varchar(100))

GO

CREATE PROC BAR AS SELECT ID FROM Foo

GO


SELECT name, type FROM sysobjects WHERE name = 'BAR' and type = 'P'

GO

DROP TABLE FOO

GO


DROP PROC BAR

GO

And the screenshot shows that we failed.  

It seems ridiculous to me that I can create a proc referencing a non-existent table, but can't reference a table with a non-existent column.  You may be saying, "So what? Why would I ever want to reference a non-existent column?"  

Code Reusability and Temp Tables

...that's why.  It's good OO programming design to not copy/paste code everywhere.  Unfortunately in TSQL we often have to do that if we expect performance to be acceptable.  Many people take their copy/paste logic and insert into into a scalar UDF.  That won't work if you want peformant code since the scalar UDF is processed in RBAR (row by agonizing row) fashion.  An inline table valued function will work, the optimizer will treat the TVF as a kind of "macro" and will expand the code in a performant manner.  But there are limitations to this approach that make it's use very limited.  

The best approach, by far, in my opinion, is to pass around temp tables into stored procs that are in essence the TVF code, or code you don't want to copy/paste everywhere.  So, let's assume you have a complex calculation that you run over a series of table ids in various procedures that you would like to reuse in a performant manner.  

  1. First create a temp table that holds the Ids and a placeholder col for the calculation.  
  2. Then build a utility procedure that looks for that temp table and performs the calculation
  3. Now in any routine that needs that calculation simply make a call to first build the temp table, then call the utility procedure.  

The whole process should look something like this:

USE tempdb

GO


CREATE PROC Utility AS

BEGIN

   UPDATE
#tblUtility SET calc = round(PI(),4)* ID  --you're calculation logic goes here

  
WHERE ID % 2 = 0

END;

GO

CREATE PROC Caller AS

BEGIN

  
--I'm building my list of IDs that I need into my temp table

  
CREATE TABLE #tblUtility (ID int, calc decimal(13,4))

  
INSERT INTO #tblUtility SELECT TOP 5 object_id, NULL FROM master.sys.objects

  
EXEC Utility

  
SELECT * from #tblUtility

END

GO

EXEC Caller

GO


DROP Proc Utility

GO


DROP Proc Caller

GO

Very simple and elegant.  The problem comes in where I want to do a lot of calculations in my Utility procedure, hence I need a lot of columns, let's assume we need 20 columns declared.  We now have to copy/paste those 20 column declarations into every Caller procedure.  That's redundant code.  The simple solution is to only declare the basic table in Caller with just the ID col, then ALTER the table in the Utility procedure to ADD the calculation columns.  That's much less copy/paste code.  And much less work when we need to change or add new cols in the future.  We don't need 20 new columns to see how the improvement would work.  Using our example, this should be much easier to maintain:

 

USE tempdb

GO


CREATE PROC Utility AS

BEGIN

   ALTER TABLE
#tblUtility ADD calc decimal(13,4)  --note that any calc cols are created here

  
UPDATE #tblUtility SET calc = round(PI(),4)* ID  --you're calculation logic goes here

  
WHERE ID % 2 = 0

END;

GO

CREATE PROC Caller AS

BEGIN

  
--I'm building my list of IDs that I need into my temp table

  
CREATE TABLE #tblUtility (ID int)  --note only the ID col is needed here

  
INSERT INTO #tblUtility SELECT TOP 5 object_id FROM master.sys.objects

  
EXEC Utility

  
SELECT * from #tblUtility

END

GO

EXEC Caller

GO


DROP Proc Utility

GO


DROP Proc Caller

GO

But it doesn't work, thanks to our old friend Deferred Name Resolution.  

Essentially this is telling us that the UPDATE cannot occur because a column does not exist in an existing table.  Two solutions come to mind:

  1. We use dynamic SQL in the Utility procedure for the actual UPDATE statement.  This works because the column will be added to the table by the time DNR kicks in when the dynamic SQL is executed.  I don't like dynamic SQL so I don't like this solution.  
  2. Create a second "layer" of Utility procs that handle DNR goofiness.  In practice the Utility proc will merely have the ALTER...ADD commands.  It will then call into subprocedures that will have the actual calculation logic.  

Assuming we go with Option 2, this is what the new Code Reusability Pattern would look like:


USE tempdb

GO


CREATE PROC Utility AS

BEGIN

   ALTER TABLE
#tblUtility ADD calc decimal(13,4)  --note that any calc cols are created here, in the first level util proc

  
EXEC Utility_SubProc_Calculations


END;

GO

CREATE PROC Utility_SubProc_Calculations AS

BEGIN

   UPDATE
#tblUtility SET calc = round(PI(),4)* ID  --you're calculation logic goes here, in the subproc

  
WHERE ID % 2 = 0


END;

GO

CREATE PROC Caller AS

BEGIN

  
--I'm building my list of IDs that I need into my temp table

  
CREATE TABLE #tblUtility (ID int)  --note only the ID col is needed here

  
INSERT INTO #tblUtility SELECT TOP 5 object_id FROM master.sys.objects

  
EXEC Utility

  
SELECT * from #tblUtility

END

GO

EXEC Caller

GO


DROP Proc Utility

GO


DROP Proc Utility_SubProc_Calculations

GO


DROP Proc Caller

GO

 

Pattern Summary

The key to performant TSQL code reusability, in my opinion, is utility procedures.  To get the most benefit we need to create an "extra" layer of abstraction to handle the god-awful Deferred Name Resolution implementation of TSQL.  You need to :

  1. Create a Utility procedure that simply ALTER...ADD columns to your temp table and calls a Utility_SubProc ...
  2. Utility_SubProc handles the actual calculations and update of the temp table.  
  3. In any Calling procedures that want to utilize our Utility procedure we simply
    1. Add a call to CREATE TABLE #whatever with the only col being the ID
    2. Add a call to Utility

With this simple API pattern we overcome both DNR issues and Performant TSQL Code Reuse. More Tips on Writing Better Stored Procedures

SQL Server Mutexes and Data Caches

Mutexes in SQL Server can be very useful.  First, you may be wondering what a mutex is.  Mutex is the shortened form of "mutual exclusion" and is an algorithm used in programming to prevent the concurrent use of a shared resource.  Generally a SQL developer is trying to make a database more "concurrent", not less, meaning we are looking for ways to prevent, say, a table (or data) from serializing.  But in many cases a critical object in the database can only be accessed by one process or thread at a time (single-threaded).  The classic example of where this is needed in a database is a queue table, which I've written about previously.  There are lots of types of mutexes (semaphores, message passing, simple locking, etc).  

But you may have other needs for mutexes.  An example is a aggregate or summary table.  In this case an aggregate table stores aggregates of numeric data derived from lots of detail tables.  Somehow we need to keep the aggregate table up-to-date as changes are trickling in to the detail tables.  Methods to do this include:

  1. Triggers on the detail tables.  
  2. If your CRUD to the detail tables is via stored procedures you can inject the equivalent of the trigger code directly into the stored proc.  
  3. If you can withstand some latency with your data, is to use an asynchronous update mechanism such as a SQL Agent job or Service Broker.  I wrote a summary of how I used Service Broker with asynchronous triggers here.  
  4. And of course there is DTS/SSIS.  

Each of these mechanisms has its Pros and Cons.  

Another solution is to use a true mutex in SQL Server by using sp_getapplock and sp_releaselock.  A great use of these mutex devices is to create data caches in SQL Server.  Let's assume you need an aggregate table, but maintaining that data from the detail tables is very expensive relative to the frequency that the data is being read.  But let's say when the aggregate data is read (such as when a report is run) that many reads will occur (because many reports that use the data run simultaneously.  

And that is the exact scenario where I have used this.  Some background...we have a series of Reporting Services reports that need to aggregate data a particular way.  We display these reports "dashboard-style" meaning there are quite a few charts that display on the page, which is customizable, and each chart is a separate report.  SSRS runs these reports in parallel, not serially.  Further, although these are separate reports they each tend to utilize the same aggregate data.   So we want to maintain a data cache on the db server where each report can share the data.  Since SSRS does not guarantee which report will run in what order, and because the dashboard can be customized to include/exclude reports, we need a generic way to determine who builds the cache and to ensure the cache doesn't get poisoned from either another report or another user running the dashboard.  

IF @result NOT IN 0)   -- Only successful return codes

BEGIN

  PRINT 
@result

  
RAISERROR 'Lock failed to acquire for Proj Aggregate table population.'16)

END 

ELSE

BEGIN  
--mutex acquired

   --cache ageing...we allow 1 min

   
DECLARE @CacheAge DATETIME = DATEADD(mi,-1,GETDATE())

   
--take this opportunity to purge the DUMMY table of anything older than 1 min

   
DELETE FROM dbo.ProjectCache WHERE CacheInsertDateTime @CacheAge

   

   
IF EXISTS 

       (

           
SELECT 

               
input.ProjectID

               
,pam.ProjectID

           
FROM #ProjectIDList input

           
LEFT JOIN dbo.ProjectCache pam 

               
ON input.ProjectID pam.ProjectID         --natural key

               
AND pam.CacheInsertDateTime @CacheAge        --cache has to be within age

           
WHERE pam.ProjectID IS NULL                        --if we are missing cache rows for the ProjectID, fail it.

       
)

   
BEGIN

       
--cache is stale or Project requested does not live in cache

       
BEGIN

           INSERT INTO 
dbo.ProjectCache

           
SELECT input.ProjectID

               ...

           
FROM ...

       
END

   END

   

   
--get the data

   
SELECT FROM dbo.ProjectCache

EXEC @result sp_releaseapplock @Resource 'Proj Aggregate table population'

END  --mutex acquired

COMMIT     

END

Here we check to see that we obtained a lock.  In no case should this ever fail, it will simply wait to obtain the mutex, which is an exclusive lock on the data cache table.  

 

 

Now that I have my mutex I want to see if the data cache is stale and if so purge it.  

 

 

 

 

Since each report has a set of params we need to make sure our cache has those params (ProjectID).  

 

 

 

Here we rebuild the cache.  

 

 

 

 

And finally we return the data from the cache as requested, release the app lock, and commit the transaction.  

Assume now that a split second after the first user enters the procedure a second user makes the same request.  And then a third user makes the same request a second after that.  Here is the order of events.  

User One User Two User Three

Procedure is called

  1. Mutex is granted 
  2. Since no data is in the cache or it is stale we begin to populate the cache.  
   
 

Procedure is called

  1. Mutex is not granted.  User is blocked. 
 
   

Procedure is called

  1. Mutex is not granted.  User is blocked.  
Waiting on Procedure to build data cache...    
Waiting on Procedure to build data cache...    
Waiting on Procedure to build data cache...    
Procedure finally completes.  Data cache is rebuilt.  Data is returned to the client and mutex is released.      
 

Mutex is granted.

  1. Data cache is not stale, all ProjectIDs are available
  2. Data is returned immediately to the client
  3. Mutex is released.  
 
   

Mutex is granted.

 

  1. Data cache is not stale, all ProjectIDs are available
  2. Data is returned immediately to the client
  3. Mutex is released.  

EAGER Spools, Lock Escalations, and Optional Parameters

The presence of an Eager Spool in a SQL Server query execution plan should always be concerning.  Essentially this is telling you that the optimizer wants a read-consistent view of the data, which might be normal if seen in the query plan for an UPDATE statement, but should raise an eyebrow in most other cases.  More importantly, an Eager Spool is a blocking operator, which means that "branch" of the query plan will be "blocked" from continuing until the Eager Spool is satisfied.  

I've written about why this occurs in an UPDATE statement here.  But this blog post is about some situations where an Eager Spool can occur outside of an UPDATE statement.  

I struggled to come up with a repro script to reproduce an Eager Spool that would show the undesirable side effects of an Eager Spool so I decided to just show you a real world example that I found in the past and documented.   The "setup" is a large stored proc with lots of "optional parameters" which are performance-draining to start with.  Here is the code:

AND (
(

@EncProvObjIds is not null
AND EXISTS (
SELECT 1 FROM dbo.Enc
WHERE RcvGrp.EncObjId = Enc.ObjId
AND Enc.EncProvObjId in (SELECT EncProvObjIdsTable.ObjId FROM @EncProvObjIdsTable EncProvObjIdsTable) )
)
OR
@EncProvObjIds is null
)

Here we see that we only want to evaluate the EXISTS clause if we have valued @EncProvObjIds.  This is a standard "optional parameter" pattern that I've seen with almost every large database application I've worked with.  Here is the query plan for just this section of code:

That indicates the Index Scan on EncEncProvFX5 (an index on a HUGE table) is occurring before it even checks if @EncProvObjIds was valued!!!  The Eager Spool indicates that that branch of the query "blocks" until the condition is confirmed, since it's the right most operator of that branch it's scanning everything in the index/table.  It is also attempting to get a read-consistent view of the data (else it would be Lazy Spool).  No wonder it causes lock escalations...it's scanning a whole index when it doesn't have to, if it would just read the damn param first!!!!  I tried a few things...@EncProvObjIds to #TempTable, OPTION RECOMPILE, passing in lots of values in the optional parameter vs no values, setting the param to a local variable to avoid parameter sniffing, even adding an OPTIMIZE FOR clause...nothing worked, always EAGER SPOOL.  

I eventually rewrote it like this...


JOIN dbo.Enc
ON RcvGrp.EncObjId = Enc.ObjId
LEFT JOIN #EncProvObjIdsTable EncProvObjIdsTable
ON Enc.EncProvObjId = EncProvObjIdsTable.ObjId
WHERE COALESCE(EncProvObjIdsTable.ObjId,'') = CASE WHEN @EncProvObjIds IS NOT NULL THEN Enc.EncProvObjId ELSE COALESCE(EncProvObjIdsTable.ObjId,'') END

...which generated this query plan:

 

The only way to get the Eager Spool to disappear was by converting the EXISTS to a standard JOIN.  Granted, this *trick* may not work in every situation, in fact, that may be a given.  The fix depends on your data and the cardinality of the child table.  Regardless, the solution is not what is important here, it is important to understand what an Eager Spool is and how performance-draining they can be.  BTW, performance increased 2 orders of magnitude with JUST THIS CHANGE.  

Using My Mac Keyboard With Windows

I love my Mac keyboard (Ultra Thin USB).  So much so that I bought an extra one to use with my Windows laptop at work.  The only problem is some of the standard Windows keys, like Print Screen, are not native to Macs.  Other keys, like the Windows key is the Mac "command" key (just to the left or right of the spacebar) and it's obvious what that key would be.  But Print Screen, Insert, and a few others are not so obvious.

Tags: 

A Pattern To Create @PrintOnlyMode or @DebugMode Procedures (Part 3)

This is the last part of my series on how to create a stored procedure with a PrintOnly or Debug mode.  Part 1 covered the basic need for this and the pattern overview.  Part 2 covered the ancillary StringBuilder routine and inline code pattern.  This post will cover handling NULLs and escaping quotes easily using another helper routine.  

Handling NULLs and Escaping Quotes

As you are building commands/PRINT statements you need to be able to handle NULLs and escape quotes properly.  Concatenating a NULL with a string results in a NULL unless you change some of the ANSI settings which is never advisable.  In both PRINT and execution mode a NULL should be represented with the static text "NULL" (without the double quotes).  

Since we are building dynamic SQL we need to be aware that NULL and 'NULL' mean different things.  we can do the COALESCE (or ISNULL) logic while building the string, or we can handle it in our helper function.  The helper function I created handles both of these scenarios.  

if exists (select * from sysobjects where id = object_id('dbo.ReplCOALESCE'))

BEGIN

   DROP FUNCTION
dbo.ReplCOALESCE

END

GO

CREATE FUNCTION dbo.ReplCOALESCE

(

  
@String varchar(max)

)


RETURNS varchar(max)

AS

BEGIN

   RETURN
COALESCE('''' + @String + '''','NULL')

END;



GO



GRANT EXEC ON dbo.ReplCOALESCE to SiemensUser;

GO

Tags: 

A Pattern To Create @PrintOnlyMode or @DebugMode Procedures (Part 2)

Wouldn't it be nice to create stored procedures that had a "debug" or "Print Only" mode.  In my last post I began to outline a framework I use to do this easily.  I covered the basic changes needed to your existing procedures.  In this post I'll cover the two helper procedures you will need and why.  

There are two types of SQL statements you will need to concern yourself with..."executing" and "non-executing" statements.  Here is a matrix that will show you how our framework will handle these statements:

Type of Statement Example How It Should be Handled in Execution Mode How It Should be Handled in Print Mode
Executing EXEC sp_who2

 
Execute the given command The command should be PRINT'd exactly as it was entered.  
Non-executing --this is a comment it should be ignored, said differently, it should be treated as a comment.   PRINT 'this is a comment'

dbo.StringBuilder

To do all of this work we are going to need a StringBuilder routine.  Here is the code:

CREATE FUNCTION dbo.StringBuilder

(

  
@String varchar(max)

)


RETURNS varchar(max)

AS

BEGIN



   IF charindex
('--',@String,1) = 1

  
BEGIN

      
--we found a commment in PRINT only mode:

       -- add PRINT to the head

       -- remove the leading doubledash

       -- place the string in quotes, escaping as necessary

       --  9,223,372,036,854,775,800 is the size of bigint

      
SELECT @String = 'PRINT ' + QUOTENAME(SUBSTRING(@String,3,2147483600),'''')

  
END

  

  
--add a CRLF

  
RETURN @String + char(13)

END;


 

Our StringBuilder function takes @String as a parameter.  

 

We search for "--" which indicates a non-executing statement.  We remove the "--" in favor of a PRINT statement.  We will also need to add single quotes around the statement.  

 

 

The 2147483600 is "almost" the exact size of an INT.  

We add a CRLF for readability to every line.  I like just char(13), but char(10) + char(13) is also viable.  

 

String Building Pattern

We need to change the guts of our "PrintOnly" procedure and change "executing" statements to look something like this:

SELECT @PrintMsg = @PrintMsg + dbo.StringBuilder ('EXEC sp_addpublication_snapshot ')

And "non-executing" statements need to look like this:

SELECT @PrintMsg = @PrintMsg + dbo.StringBuilder ('--Creating snapshot agent for Pub: ' + @PublicationName)

We are simply doing standard string concatenation and StringBuilder is going to handle the formatting for us.  

Note again that "--" will be replaced with a PRINT statement in PrintOnly mode.  

In the next post I'll cover some goofiness regarding handling NULLs and escaping single quotes easily.

Tags: