DaveWentzel.com            All Things Data

February 2013

More on the Halloween Problem

I wrote a few weeks back about Eager Spools in Query Plans and the Halloween Problem.  Three days after I wrote that Paul White put up a blog post about an ENTIRE SERIES he did on sqlperformance.com.  He is much more thorough and authoritative than I could ever hope to be.  In a nutshell, the Halloween Problem occurs when we read and write keys of a common index.  There are other causes as well.  

I just wanted to pull out a few gems that I either learned or found very important from Paul's work.  I encourage anyone who works on tweaking performance of TSQL to spend some time reading Paul's findings on the Halloween problem.  

  • The Eager Spool essentially reads every row from the child operator and stores that information in a temporary structure before it proceeds to the next step.  Under memory pressure this means tempdb spills to disk.  
  • One alternative is an index hint so we avoid using the index that requires halloween protection, ie the index keys are unstable.  
  • A query plan requiring halloween protection will require more locks, and longer locks, then an alternative plan.  
  • If an UPDATE statement modifies one of the keys of a composite index you may consider making that column an INCLUDED column instead.  
  • Halloween Protection is required for INSERT statements where the target table is also referenced in the SELECT statement.  
  • HP is required for DELETE statements with self-join relationships.  
  • If your performance issue is an Eager Spool on an INSERT...WHERE NOT EXISTS (<in table already>) query (Paul calls this a "hole-filling query") you can try a MERGE statement instead.  Here are the rules:
    • WHEN NOT MATCHED BY TARGET clause must EXACTLY match the ON clause in the USING clause.  
    • the target must have a unique key
  • You may see an Eager Spool in a query that uses a scalar function, even if that scalar function does not access any tables.  This happens because SQL Server must assume that any scalar function not declare with SCHEMABINDING may re-read from one of your tables, so SQL Server must protect from the HP.  The solution is to add SCHEMABINDING

CREATE or REPLACE pattern for Transact SQL

I just saw a blog post by someone discussing the pros and cons of DROP/CREATE vs ALTER for changing a stored procedure.  I was not able to comment on the blog post (seems like an error with their CMS) but I wanted to mention some issues with it and some better solutions.  

I generally use DROP/CREATE most often in my work, but I'm trying to change that.  

  • If you have Zero Downtime requirements you risk spurious errors using DROP/CREATE.  In the time it takes to drop the proc, recreate it, then issue the GRANTs your online users could see errors
  • The author mentioned that using ALTER requires dynamic sql.  That's not true, below I show you the pattern I use.  Dynamic SQL is difficult to use and read for many developers.  Doubling up quotes tends to be buggy.  
  • Oracle has the "CREATE or REPLACE PROCEDURE" syntax to avoid all of this.  I wish MS would give us this.  We would not need to do existence checking in metadata tables before creating or altering a routine.  

In any case, this is the pattern I use that is very close to Oracle's CREATE or REPLACE PROCEDURE syntax.  

Object Relational Mappers Series - The N+1 SELECTs Problem

This is the next post on ORMs.  Suppose you have a collection of Category objects, which are really just rows in a Category table.  Each Category has a collection of Products.  It's not uncommon to want a listing of Categories with their Products.  In SQL we would have code that looks something like this:  

SELECT *
FROM Category c JOIN Product p on C.Id = P.CategoryId

This will run fairly fast.  An ORM, however, will likely do something like this:  

Eager Spools in Query Plans and the Halloween Problem

I've written twice in the past on Eager Spools in query plans (here and here).

Service Broker Tickling OR How to Run a Procedure Asynchronously Every x Seconds

Service Broker is great for asynchronous execution.  There are common patterns to asynch execution in the data tier that are well-covered on the Internet, such as asynch trigger execution.  Remus Rusanu's website is invaluable for all things Service Broker and I find myself visiting it FIRST, even before google, when I need an answer to a Service Broker question.  Someone on stackoverflow posted a question, "How can I get SQL Server to call a stored proc every n seconds?"  At my current client we have these things, and we have probably 30 of them, scheduled as SQL Agent jobs.  This works very well, but it means that when you restore a database for development or testing that you have an affinity with msdb that is not restored with your database.  It also leads to Blocking and Contention with sysjobhistory when you have a lot of these jobs.  I just don't find a ton of SQL Agent jobs to be very elegant.  

Remus answered the question by noting that Service Broker can do this using a Timer and an Activated Queue.  He hasn't (as of today anyway) posted this answer or pattern on his website.  I hope he will.  I've done something like this in the past where we developed a software product that was deployed using SQL Express.  At the time Express did NOT have SQL Agent so if you needed to have an asynchronous scheduling mechanism you needed to roll your own, probably using Windows built in AT.exe command (yuck).  The solution I used was called I called the "tickler".  It would "wake up" every x seconds, see if there was any work to do in the configuration table, and do the needful.  

Remus' solution is far more elegant.  I expanded upon his solution and wrote a more robust "tickler" in Service Broker.  You can download the script here.  

Let me point out some interesting code:  

We only need a single Q and Service.  Message types, contracts, routes, etc are not needed.  The "DoSomething" procedure is basically whatever you want to execute every x seconds.  
I created a "Start Tickling" procedure.  This simply looks to see if I have an existing conversation (you only need one) for the Tickler.  If not, a create a conversation and TIMER against that conversation.  That sends a message to the TicklerQ.  
The Tickler Q is an activated Q.  This is the stored proc that runs when a message is enqueued.  Here we simply RECEIVE the message and execute our DoSomething procedure.  Then we send another CONVERSATION TIMER (basically this is an infinite loop).  
After we have all of the scaffolding in place (the Q, Service, activator proc, etc...we need to turn on ACTIVATION for the Q.  We are now ready to seed the first TIMER using the StartTickling proc.  

Again, I think this is a much more elegant solution then a SQL Agent job.  Here I can run my procedure without having affinity to a job that will not get restored when I restore my db to another server.  Tickling also picks up whenever the db is restored, comes back online after a restart, etc.  A very elegant solution.  

Service Broker Monitoring Routine

 

UPDATE:  Please see Monitoring Service Broker for an updated my updated monitoring routine.  

There are almost no GUI tools for managing Service Broker.  This isn't a problem for me, I rarely use GUI tools, but it is a problem for our support people.  In the previous post I covered my Service Broker Setup Routine.  This routine enabled SB on a given database in a reliable, repeatable manner.  This helped our DBAs immensely.  

The next problem to overcome is how our DBAs can monitor Service Broker and its Queues, Services, etc.  In my experience SB is very "lights out", just like anything else in SQL Server, if done correctly.  But it's difficult to sell this to DBAs who really do not understand what Service Broker is.  So I created a monitoring script to put their minds at ease.  This monitoring routine is generic enough that it can be easily modified to suit any setup or configuration.  There are lots of SB monitoring scripts available on the web.  I've tried to compile them into one simple script that you can run that checks your configuration from the top down.  You can download the script here.  

Let me point out some interesting pieces of code:  

My monitoring procedure is actually a combination of an installation, monitor, and teardown script.  I do this so all logic is contained in one little place.  Here are the options: 

  • SETUP:  sets everything up.  It does a "properties-based" setup.  This means that SETUP can be rerun multiple times without throwing an error and will ensure that the net result of each run is that we have a running system.  
  • TEARDOWN:  the opposite of SETUP.  This is useful for unit testing the SETUP process.  
  • TEARDOWN FORCE OVERRIDE:  TEARDOWN will *not* do anything if there are unprocessed items in any queue, items stuck in the transmission q, etc.  This override option tears everything down regardless of state.  YOU WILL LOSE DATA using this option.  
  • CHECK:  this is the simplest monitoring tool, it ensures all Qs, metadata, and DMVs are showing everything as being good.  I'll cover this in-depth below.  
  • BROKER HELP:  This checks low level components of Service Broker.  I'll cover this in-depth below.  
  • TRACER TOKEN:  this works just like a tracer token in replication.  It sends a test transaction to our Svc and makes sure it properly reaches its destination.  
  • STALLED MESSAGE CLEANOUT:  I'll cover this below.  
This is part of BROKER HELP.  I am echoing the status of various SB components by looking at DMVs.  There is far more in the script.  
Here we are looking at PerfMon statistics for various SB-related counters.  
Activator procedure errors are echo'd to the SQL Error Log, so we display the last hour of that data.  We also build a SSBDIAGNOSE command for each service and then interrogate it for errors.  
STALLED MESSAGE CLEANOUT runs this code.  We are looking for conversations that are not closed and END them WITH CLEANUP.  

 

Other Things I Monitor

This is all in the script:  

  • Ensure SB is enabled in the db
  • We look for DROPPED queue monitors on activated queues
  • We look for queues in the NOTIFIED state for more than 10 seconds.  This could mean a really busy queue, or it could mean a problem with an activator throwing errors.  
  • Activator errors (we use PerfMon for this)
  • activated queues in a DISABLED state
  • "Poison Message" detection (queues that are no longer is_receive_enabled)
  • Conversation Population Explosion...check for an excessive amount of conversations not in a CLOSED state.  This may mean we are not issuing a CLOSE CONVERSATION correctly in our setup.  Or we have a "fire and forget" pattern, which is not right.  
  • conversations "stuck" in sys.transmission_queue

Pages