DaveWentzel.com            All Things Data

PerfMon Page

 

General Information about Performance Monitor
Performance Monitor consumes CPU and Disk I/O resources, but not much. If a system does not have much disk I/O and/or CPU to spare, consider running Performance Monitor from another computer to monitor the server running SQL Server over the network, but only if you are graphing data. When using log mode, it is more efficient to log Performance Monitor information locally on the server running SQL Server. If you must use log mode over the network, then consider reducing the logging to only the most critical counters.
 
Never use the GUI during a load test or on a system experiencing pressure.  Use the .blg log files if possible (binary logging) which is *much* more efficient than csv. 
 
Logman
This tool allows you to mostly automate perfmon collection.  (See typeperf below for further ways to automate this). 
 
logman create counter ServerCollection -s%COMPUTERNAME% -cf<textfile_of_counters>
logman ServerCollection start
logman ServerCollection stop
 
These commands will create files in \PerfLogs named ServerCollection.blg. 
 
Relog
...is an excellent tool for loading csv files to s SQL Server database.  It ships with XP and 2003.  It will also convert large csv files with too many cols into smaller files that Excel can handle (under 256 cols). 
 
relog ServerCollection.blg -f SQLServer
 
This will create a series of tables...DisplayToID, CounterDetails, CounterData. 
 
The Absolute Best PerfMon Counters
Eventually I'll consolidate my PerfMon counters, but for now I've found these most useful most recently. 
 
Object
Counter
Threshold
Notes
Physical Disk
Avg Disk sec/Transfer (or /Read or /Write)
0.010 (10 ms)
anything more than that indicates a disk problem.  A virtualized SAN may be the problem. 
 
%Idle Time
<40 for any LUN
at this point you have complete disk saturation and your IO rate cannot be increased. 
  %Disk Time   this guy is NOT reliable.  Beware.  It is really 100*Avg Disk Q Length, which is not what the explanation reads.  If you really want to know busy time then take 100-%Idle Time. 
  Avg Disk Q Length  

NOT reliable contrary to urban legend.  Your SAN and RAID controllers will abstract this from the OS.  It is especially meaningless on a virtualized SAN.  Unless this is huge and sustained, it's really meaningless. 

Just a few years ago it was common to hear that a disk's queue length > 2 was suspect and could be a performance problem.  It is not uncommon to see significantly larger values on even RAID 10.  SQL Server will push disk q lengths higher when it feels it is appropriate.  I have a better calculation below ****. 

  Disk Transfers/Sec   aka IOPS.  Not really good for performance troubleshooting directly, but SAN vendors like to know the statistic for sizing purposes.  We use this number in formulas to calculate queuing below. IOPS Guidance
  Bytes/Transfer   ***need to complete this for the formulas below****
  Split I/Os   can indicate fragmentation or sectors misaligned.  For more on this see Sector, Cluster, and Stripe Sizes
  (see below section for more info****)    
Memory
Page Reads/sec
Pages/Sec
0
should be near 0 on a dedicated SQL Server except during backups, full text indexes, etc.  These counters measure the hard page faults (memory resolution to the page file). 
       
Processor
Processor Q Length
 
actually, this is NOT a very reliable counter unless you see long spikes. 
       
Network** Output Q Length >2 could be q'ing at the bus
 
Bytes Total/Sec *8/Current Bandwidth
>.6
network bottleneck
  Current Bandwidth   you'd be surprised how many NIC cards are not configured correctly and the db is network-bound.  If your NIC is set to 1GB it should be 1GB...but if something is misconfigured on the network it may only be operating at 100MB. 
       
SQL Server Counters***
Buffer Cache Hit Ratio
<90
generally this may mean memory pressure.  However, don't rely solely on this metric.  Consider indexes and rewriting queries to read less data.  Also, consider looking at the index scan/table scan counters, perhaps your issue is the need for more covering indexes. 
 
Page Life Expectancy
300 sec
measures the amount of time non-locked buffer remains in memory.  If you have numbers < 300 sec then this could also mean you have memory pressure.  Consider indexes and rewriting queries to read less data. 
 
will tell you how many seconds on avg a page is expected to remain in cache without reference.  Low values indicate added memory will allow pages to remain longer in cache, while higher values indicate that adding memory won't help you much in this respect.  The actual numbers depend on your expectations and the frequency with which you run queries that rely on the same data/execution plans.  Typically, numbers greater than serveral hundred indicate a good state of memory.  But low values don't mean adding more memory is the best solution.  Indexes would help decrease IO as well. 
 
Unfortunately I don't think MS publishes the actual algorithm used (which would be helpful).  When the value drops quickly by, say, 50% then you know a big read just occurred which in essence flushed the cache. 
      Buffer Pool Health PerfMon Counters 
 
Target Server Memory < Total Server Memory
 
this could be an AWE issue on 2000, otherwise this means you have insufficient memory. 
 
 
 
 
 
**It's often easier to just monitor the SQL DMVs for NETWORKIO waits and similar signs of network issues. 
 
***Remember that you don't have to fire up PerfMon to see these values.  sys.dm_os_performance+counters will give you all of the SQL Server object-related counters, but no other counters unfortunately.  In 2000 the equivalent is sysperfinfo
 
 
****Disk Notes - Better Metrics
For physical disk PerfMon really doesn't give you the complete story.  You really need to know your disk utilization and there is no direct counter for this.  For instance, we really would like to know Avg Disk Service Time/Transfer and Avg Disk Qing/Transfer.  But with some math we can figure these out.  Here are some basic formulas
 
I/O Time = service time + queue time    (The driver will provide this reliably to Windows)
Disk Utilization = 100 - %Idle Time
Disk Service Time = Disk Utilization / Disk Transfers/Sec
Disk Q Time = Avg Disk sec/Transfer - (Disk Service Time)
 
If we suspect disk problems where do we go from here given our formulas?  It's not always possible to manage service time but you usually can handle queue time.  SQL Server can generate huge I/O which can saturate your I/O subsystem which can cause your service times to be high even without queuing.  That may be normal and the solution may not be a change to the I/O subsystem.  We want to know really if queuing is the issue, which we know we can fix with more spindles, different disk configurations, etc. 
 
Let's look at two examples.  The following table has two different disks and their corresponding fictional PerfMon counters.  Let's work the formulas and determine if queuing is a problem. 
 
Counter
Disk 1
Disk 2
%IdleTime
70%
30%
Disk Transfers/sec
0.50
50
Avg Disk sec/Transfer
700 ms
700 ms
Bytes/Transfer 1000 25000
 
Based on the table your first thought might be that Disk 2 is in worse shape than Disk 1.  Disk 2 is "less idle" for instance, has much larger I/O's, handling far more Disk Transfers/sec, etc.  We know also that the Avg Disk sec/Transfer is VERY high for both scenarios, we shouldn't see a number over 10ms. 
 
Let's work the formulas to see the truth.
 
Metric
Formula
Disk 1 Value
Disk 2 Value
Notes
Disk Utilization
100-%IdleTime
100-70 = 30
100-30 = 70
Disk 2 has higher utilization
Disk Service Time
Disk Utilization/Disk Transfers/sec
0.30/0.50 = 0.60 (600 ms)
0.70/50 = 0.014 (14 ms)
The picture is becoming clearer, it's actually taking Disk 1 more time to service less I/O. 
Disk Q Time
Avg Disk sec/Transfer - (Disk Service Time)
700ms - 600ms = 100ms
700ms - 14ms =
 
table not complete. 
 
 
 
 
 
 
Interesting Ratios
 
Here is a list of some interesting ways to use PerfMon numbers to get a better insight into your system. 
 
Ratio (Counters)
Explanation
Batch Reqs/sec And SQL Compilations/sec 
shows how plan reuse is occuring on your system. The higher the ratio the worse your performance will be on OLTP.  Why?  Plans are discarded quickly.  Things to help the problem:  look for parameterization opportunities.  Could you add more memory to your system?   
SQL compilations/sec And SQL Recompilations/sec
the first includes initial compiles and recompiles, the latter only the recompiles.  You can use the DMV's to determine what is recompiling most often. 
(Kernel CPU-User CPU)> .25 May indicate a hardware issue since SQL Server always operates in user mode. 
 
 
 
TYPEPERF.EXE
command line tool to capture performance statistics to the command window or a file. 
 
typeperf -? for the various options. 
 
typeperf -q > output.txt will output a list of available counters for the given machine.  Some counter names are machine specific, such as named sql instances. 
 
 

Add new comment