I installed PC and it showed me very little "waiting" or "blocking" or any of the other usual suspects. All I saw was a bunch of IO_COMPLETION
events which usually indicate queries that spill their "temporary worktables" to disk. That was expected due to the nature of the DW workload. But should that really be happening on Fusion? Shouldn't Fusion be able to keep up with the spills?
I asked the DBAs and OS engineers to confirm that there were no hardware "alarms" indicating issues with the IO subsystem. They quickly replied, "everything is green on our end."
I retorted, "I see these IO_COMPLETION waits that I really don't think I should be seeing on a system with Fusion, are you sure the Fusion stuff is set up correctly?"
"Yep, everything is green here."
Being a skeptic, I queried PerformanceCollector.IOStalls
and added the necessary calculations to the raw data to see the disk latencies. Please see the graphic to the left.
Essentially the latencies look awesome for the F drive where the user databases live...exactly what I would expect. But look at the G drive, where tempdb lives. Those latencies are terrible and are NOT the latencies I would expect to see from properly configured Fusion. Those latencies are more reflective of rotational SAN media.
So I continued to query the Ops folks, "Guys, the latencies for tempdb (and G drive in general) are terrible. Are you sure tempdb is on fusion? It can't be. Whenever you have fusion available to you you should always consider putting tempdb on those drives FIRST."
"Um, oh, it looks like when we built the server we left tempdb on the SAN instead of fusion. We got it wrong. We'll fix it."
They did and since then the IO_COMPLETION waits have disappeared and my application is perceived to be at least 3x faster.
Summary
IO stalls and disk latency data is not something that you need to worry about in your daily SQL Server performance tuning. But when you run into situations where you honestly believe your IO subsystem is not keeping up with the workload, and you believe it should, then disk latencies are the best way, from a DBA perspective to get the ammo you need to prove there is a problem.
PerformanceCollector has a table that logs IO stalls (disk latencies). In previous PC versions I aggregated the raw data by database which means some of the data fidelity is lost. In this release I maintain both the raw data and the actual latency calculated data so you no longer have to handle writing the calculations yourself. This data is also helpful for general baselining of your IO subsystem.
You can download PerformanceCollector here
You have just read "[[Monitoring Disk Latencies with PerformanceCollector]]" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.
Dave Wentzel CONTENT
performancecollector performance sql server