The Problems with the Missing Index DMVs

I seem to hear this all too often:  "We can see what indexes we need to add to our db by looking at the '[[List of Useful DMVs and Purpose|missing index DMVs]]', we don't need to do any analysis or testing."  Somehow people think these DMVs are a panacea.  They aren't.  They are a tool to help you with analysis, nothing more.  They are not a replacement for understanding your data.  

In case you've never been exposed to these DMVs they are:

  • sys.dm_db_missing_index_details
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_groups
  • sys.dm_db_missing_index_columns
The Good
Obviously MS gave these DMVs to us for a reason, they must provide something useful.  Certainly they do.  Among other things they give useful information like how many times the proposed index could've been used to satisfy a data request (see caveats below) and estimated cost savings.  These DMVs are a great place to monitor for quick performance hints.  
 
The Bad
  • They are not always accurate.  Their stats can get out of date.  Remember that they use algorithms to determine what additional indexes might be helpful, but no algorithm is a substitute for understanding your data.  If the algorithm was perfect than why would we ever have a DBA perform manual index analysis, or even create any index manually?  Just let the algorithm do it for you!  Perhaps someday the algorithm will have enough logic to eliminate the need for a DBA to do manual index creation, but then imagine the overhead of having that algorithm always executing in the background.  
  • They only provide data since the last sql service restart.  Further, if you have autoclose set on your db your statistics will be eliminated.  Further, data can be aged out of these DMVs, if needed, just like any other "cache".  

The Ugly
MS won't tell you these things.

  • The DMVs will not suggest changes to clustering.  If you don't know your data this can lead you to a false sense of security.  Perhaps the data access patterns are skewed such that a change to the clustered index key would be more beneficial than other covering, non-clustered indexes.  
  • It does not make suggestions to modify existing indexes because it does not know why you created them.  However, if it recommends a 3 col composite index where the first 2 cols are the leading index key for an indexing non-unique index, it won't recommend modifying the existing index.  It should at least provide a notation that says, "Hey, there is already an index that almost has what we are recommending, you might consider modifying that index instead of creating this new, recommended index." 
  • It does not consider any partitioning schemes that you may already have in place or might consider.  I got burned on this.  When you run sp_help against a table or sp_helpindex directly you aren't told what partitions a given index resides on.  At least, not directly.  The output instead leads you to believe the index is merely on a separate filegroup.  So, when I looked at the missing index DMVs it offered a suggestion that seemed reasonable.  I doublechecked existing indexes using sp_help and still I thought everything looked ok...until I remembered the partitioning we had in effect, which negated any benefits the recommended index would've given us.  In fact, the recommended index would've made things worse.  
  • It does not make suggestions on filegroup placement for recommended indexes, nor does it take current filegroup placement into consideration when it makes its suggestions.  
  • The same holds true for disk and LUN considerations for indexes.  
  • It does not suggest filtered indexes.  
  • It does not consider index intersection (see here and here for a definition if you've never heard of this before).  For instance, rather than implement a suggested covered index you might want to consider adding a new index that contains only the missing cols and utilize an existing index for the remainder of the cols.  A hash table is built over the two indexes with the net effect of a covered index without the overhead.  Granted, index intersection is rarely better than a covered index, but I have seen cases where it is.  Again, know your data.  
  • They won't show you the calls that the proposed index would've helped with, so it's hard to test.  You would need to look at the plan cache and determine this.   

Related:  [[sys.dm_db_index_usage_stats]]