DaveWentzel.com            All Things Data

Statistics

 

What are statistics?
This seems to cause confusion with many people.  Statistics are collected on columns and indexes within the database to describe the data distribution and selectivity of the data.  The information is represented as a histogram.  This is just a fancy way of saying a table of counts of the occurence of particular data points evenly distributed across your data.  Here's the rub, what if the sample aren't representative samples?  What happens if my data distribution is not normally distributed? 
 
Statistics Algorithm
If the table rowcount <= 500 rows, stats aren't automatically updated until there have been 500 row modifications. On tables with > 500 rows, stats are not automatically updated until 20% of the rowcount, plus 500 rows have been modified. So, if you have a very active OLTP database, then maybe a nightly schedule would benefit. You definitely need to know the database activity to make this determination.  temp tables work differently.  They will update after 6 changes.  OPTION (KEEP PLAN) changes the temp table algorithm to use the permanent table algorithm.  OPTION (KEEPFIXED PLAN) prevents procedures from recompiling due to stats changes.  
 
_WA_Sys statistics
 
You may see statistics named like _WA_Sys_000000012_182AAA3B.  This indicates SQL Server automatically created statistics for col 9 of ObjectID 182AAA3B (in hex).  
When are stats recomputed?
 
If the stat was created when the index was created and STATISTICS_NORECOMPUTE = OFF then they are rebuilt automatically (as needed, see above) or whenever a REBUILD occurs, but not a REORG.  Manually created stats must be updated manually.  
 
Useful DMVs/Metadata
 
sys.indexes
sys.stats
sys.stats_columns
sp_helpstats
DBCC SHOW_STATISTICS
  • the sample size is under the row heading ROWS SAMPLED
 
 
Query for indexes and the last time stats were updated (2005)
 
SELECT 'Index Name' = i.name, 'Statistics Date' = STATS_DATE(i.object_id, i.index_id)
FROM sys.objects AS o
JOIN sys.indexes AS i
ON o.name IN (Select Name from Sys.objects where type = 'u')
AND o.object_id = i.object_id
AND i.name IS NOT NULL
ORDER BY STATS_DATE(i.object_id, i.index_id);
 
Other Notes of Interest
 
Multi-statement Table Valued Functions have no statistics.  The optimizer must guess their size so inefficient plans could result.  Be aware of this. 
 
When might I have a stats issue?
  1. if your actual vs estimated row counts in your query plan is off by a lot.  
 
Transferring statistics from one instance to another
 
Links

Add new comment