Latency and IOPs

Just because your IO latency has increased does not necessarily mean that your IO subsystem is performing poorly.  If your IO workload has increased past what it was designed for then your latencies will increase.  Lots of things can increase your IO workload.  Baselining helps.  Some things that you might want to look for

  • you implemented new features that require more IO
  • you are using snapshot isolation which is heavily dependent on tempdb and IO.  
  • you turned on transactional replication, log shipping, mirroring, or anything else that is IO-intensive.

If you are certain that increased workload isn't responsible for the increased latencies (viewed via perfmon as Avg Disk sec/read.  Then determine which db component (tempdb, data, index filegroups, log) is the io bottleneck.  If you have a dedicated LUN for each of these "IO features" then it is easy.  If you have a virtualized SAN or the database is spread out amongst many LUNs then this is harder.  sys.dm_io_virtual_file_stats can help you in these situations.  You can determine latencies using these formulas:

  • Read latency = io_stall_read_ms / num_of_reads
  • Write latency = io_stall_write_ms / num_of_writes
  • Total latency = io_stall / (num_of_reads + num_of_writes)

As of June 2012 I had an engagement with MS and this is their current guidance:  

ReadsCached Writes
Excellent < 8ms< 1ms
Good < 12ms< 2ms
Fair < 20ms< 4ms
Poor > 20ms> 4ms

[[SAN Home]]