DaveWentzel.com            All Things Data

Misc Notes and Thoughts


This is just a compilation of ramblings that I've found trying to optimize query plans.  Someday I should expand upon these thoughts with some examples and more explanation. 
  • On query plans involving joins on columns that have a high number of NULL values the query plans may not always be optimal.  Other than slowness, other symptoms include "fat" estimated row count arrow bars (it's thinking it is bringing back say 100,000 rows when in reality the query returns 10 rows) and index scans vs seeks.  One fix that may or may not work is to add an additional condition to the JOIN to filter out the NULLs, hopefully getting a better query plan in the process.  A little example:

FROM tableA

JOIN tableB

ON tableA.id = tableB.tablea_id  --assume tablea_id has lots of nulls

this becomes

ON tableA.id = tableB.tablea_id AND tableB.tablea_id IS NOT NULL

this example may not be perfect...sometimes you see this problem when the condition occurs on the WHERE clause.  To diagnose this based on the symptoms consider doing a count(*) GROUP BY the tablea_id and see what the distribution is. 

  • often times we join tables together using mutliple columns that comprise the key.  It is often easy to miss one of the conditions, check for this.  Sometimes it is obvious because missing a key column results in more rows than expected.  But it is also an issue if, for instance, we join two tables on a 3 col composite key.  Let's say each table also has a LastUpdateDateTime col that we also filter on based on a parameter to the procedure.  But the developer forgot to apply that condition to the second table.  SQL Server can't always perform "implied predicates" on this...and if your index on the second table is more selective based on also having that column...you may get an index seek instead of a scan.