DaveWentzel.com            All Things Data


Other MySQL Performance Management Tooling

In my last post, Introduction to the MySQL Performance Schema, I covered the performance_schema which is a relatively new feature of MySQL 5.5.  If you are using an earlier version of MySQL you won't have access to performance_schema.  This post covers your alternatives when you find yourself facing a performance problem as an "accidental MySQL DBA".  

Introduction to the MySQL Performance Schema

If you are new to performance troubleshooting with MySQL, specifically using the performance_schema, then this post is for you.  If you are an "accidental DBA" for MySQL, and you need to quickly fix some performance problems, then read on.  This post will give you example queries and background to get your MySQL performance problems solved quickly.  

Monitoring Disk Latencies with PerformanceCollector

PerformanceCollector (PC) is free and open-source software (FOSS) that I have developed to monitor and baseline performance of your SQL Server.  There are certainly plenty of commercial tools and even free scripts available on the net that handle all of this. What I like about PC is that it can be deployed with a simple PoSh call, runs without any dependencies, and is easily customizable to support any additional logging and reporting that you may need.  

PerformanceCollector and the Heisenberg Uncertainty Group

This is the last post in my series on my PerformanceCollector tool.  This post is a warning and an anecdote that you can over-monitor your server.  By default PerformanceCollector (PC) monitors quite a bit of information but I've had the utility running on high-throughput production OLTP workloads and I could not discern any negative performance impacts by running it using the default configuration.  It has all of the necessary fail-safes in place (I hope) to ensure it doesn't roll off the rails.  For instance, there is a configuration setting that will cap the size of the database and will auto-purge data if needed.  

The Heisenberg Uncertainty Principle may eventually come into play however and you should be aware of that.  If you are not familiar, the HUP basically says that the more you monitor something, the more the act of monitoring will adversely impact the system.  You may also hear this called the "observer effect".  

If you find that PC is a bit too heavyweight for your environment then you can disable some of the AddIns or extend the interval between runs.  Or you can tweak the code to fit your needs.  The possibilities are endless.  In older versions of the tool I ran into some cases where the system was under such stress that PC could not accurately log the waiting and blocking of the system.  That kinda renders the tool useless when you are trying to diagnose severe performance problems.  But I haven't had that happen in a few years now so either my code is getting better, or M$ has improved the performance of their DMVs to work under harsh environments.  I assume the latter.  

But you can take performance monitoring too far.  Here's a quick story to illustrate the fact.  

I consulted at a company that had deployed 2 third party performance monitoring tools that shall remain nameless.  They worked ok but did not capture the wait and subsystem information like PC does so the client decided to deploy my scripts as well.  

There were now 3 tools running.  

PC only collects waits and blocks every 15 seconds and it's likely it won't capture EVERY "performance event" that occurs during that interval.  But again, I'm worried about HUP.  

Primum non nocere.

Like any good doctor, "first, do no harm."  

Frankly, users are generally tolerant of an occassional performance blip, but anything over 15 seconds and rest assured some user will open up a help desk ticket.  That is why the monitoring interval is 15 seconds.  But this interval is a bit like Prunes Analysis (when taking prunes...are 3 enough or are 4 too many?).  You can certainly make the case that every 10 seconds is better, or that every minute is optimal.  Decisions like this are very experiential.  

This particular client decided that with a 15 second interval too much valuable information was not being logged.  They decided that if a block or wait occurred the 15 second monitoring interval would be shrunk to every second to get even better metrics.  This required one of their developers to change the WHILE 1=1 loop in PC.  If after 60 seconds of monitor at the one-second interval no blocking/waiting was observed, then the code reverted back to every 15 seconds.  

IMHO, that's far too much logging.  It is quite normal for a busy system to constantly experience some variant of waiting.  That meant that this new code was ALWAYS running in one second interval mode.  This hyper-logging mode was implemented after I left the client and they did it without consulting me, or really consulting anyone that understands a modicum of performance monitoring a SQL Server.  

It didn't take long until performance of the system was worse than ever.  At one point the instance was crashing every few days due to tempdb log space issues.  Further, PerformanceCollector did no have all of the fail-safes at the time and the db was about 1TB in size, even with 30 day purging in place.  Another performance consultant was brought in (I was out-of-pocket) and he quickly determined that PC was the biggest consumer of tempdb and IO resources on the instance.  I looked thoroughly stupid when I was called back in to the client to explain myself.  I quickly determined that my code was not at fault, rather the change to shorten monitoring interval and increase the frequency of monitoring were the causes.  

Without finger-pointing I explained that we should put my PC code back to the defaults and re-evaluate why hyper-logging was needed at all.  The other consultant agreed and within a day or so everyone was calm and happy again.  Mistakes like this happen.  

As I was wrapping up this engagement I finally asked, "So, have you identified and corrected many issues that PC has found?  Clearly if you thought hyper-logging was needed you must've corrected quite a few performance problems with the tool."

The DBA responded, pensively, "Well, we keep the trend data but we haven't opened up any tickets to the dev team to fix specific stored procs yet.  We don't have any budget for a DBA to look at the missing indexes either.  But maybe we'll get to do all of that some day."  

The devs, architects, and I choked back the laughter.  All of that monitoring and nothing to show for it.  To this day the other architects at that client call the DBAs the HUGs (Heisenberg Uncertainty Group) behind their backs.  I have since returned to that client for other engagements and I have witnessed this for myself.  Those of us in the know chuckle at the joke.  But the lead HUG didn't get the joke apparently…he got angry once and asked not to be compared with Walter White. 

Moral of the story, monitor as much as you need to, and nothing more.  If you don't have the budget or resources to fix issues you find during your monitoring then it is best to consider NOT monitoring.  No need to make your performance worse with too much monitoring.  That will keep Heisenberg from rolling over in his grave.  

You have just read PerformanceCollector and the Heisenberg Uncertainty Group on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

PerformanceCollector AddIns

How do the AddIns Work?

There are a series of procedures named PerformanceCollector.Addin*. PerformanceCollector.RunAddIns is tickled every 5 minutes from the SQL Agent AddIns job. This proc builds a cursor over ANY entry in PerformanceCollector.Config that is enabled and runs it.

The AddIns procedures are responsible for determining when the given AddIn is actually supposed to run (every 8 hours, once a day, etc). It determines this from querying PerformanceCollector.Config. It then calls the actual worker proc if it needs to.

AddIns are responsible for populating the other tables in the PerformanceCollector schema.  The following tables store the performance metrics that I find most interesting.  Each table is structured so the first columns are CurrTime and ServerName. This aids in troubleshooting. You can  aggregate all of this data across multiple customers/instances to do trending in a centralized repository.

PerformanceCollector Object Name Default Runtime Interval Purpose
AdHocPlanCacheBloat 7 Hours Logs single-use ad-hoc query plans that are a total waste of plan cache space. If this is really big then the solution is to run 'optimize for ad hoc workloads' which reclaims all of the memory and makes sure that ad hoc plans are not fully cached. This is a "analyze once" script. Once you've enabled 'optimize for ad hoc' then you are done and you can disable this AddIn. But we run this in case a new framework (ruby seems susceptible) is deployed which causes a radical increase in ad hoc plans. You can also use the text to determine where you are not doing parameterized queries correctly.
BufferStats 5 Hours Gathers stats on the counts and size of each object in the buffer cache.  Larger values for some objects might mean a new index is required.  
Config   Using this table you can configure certain AddIns to NOT run or change their schedule.  
DeadlockSnapshot 4 Hours Shows all deadlocks that have occurred and tries to decode the deadlock XML to show the Victim and Survivor and what they were doing. Please note that there is a lag time between when a deadlock occurs and when it is decoded here that could be AT MOST 8 hours. You can manually run the Deadlock AddIns to do the decoding at any time if you really need to see up-to-the-minute deadlock data.
IndexBlocking 4 Hours Logs the amount of time each index is spent in a blocking/waiting state.  Useful to determine if better indexes may be needed.  
IndexesFromQueryPlanDtls   Not yet implemented. This will read the missing indexes from the query plans instead of relying on the DMVs. This seems to be far more accurate. This will list the TotalCost (somewhat nebulous) as well as the table and index columns that the optimize believes would be beneficial.
IndexesFromQueryPlansSummary   Not yet implemented. An aggregate of the the Dtls table that quickly shows the best indexes to be created.
IOStalls 7 Hours Lists the IOStalls metrics by database and filetype.  See more at nodetitle
MissingIndexes 4 Hours Snaps the sys.dm_db_missing_index* DMVs. However, there is a better way to do this that looks just at cached query plans (see IndexesFromQueryPlanDtls). Even so, this will show every egregious examples of missing indexes.
PageLifeExpectancies 5 mins Logs the PLEs for the instance.  
PerfCounters 1 Hour Gathers instance-level perf counters in EAV format for aggregation later.  
PlanCacheMetrics 7 hours Logs stored proc stats (reads/writes/CPU/duration) by Metric.  By looking at procs that overlap the different metrics you can quickly determine what you should be focus on when you are looking for performance problems.
-- Top Cached SPs By Execution Count 
-- Top Cached SPs By Avg Elapsed Time
-- Top Cached SPs By CPU
-- Top Cached SPs By Total Logical Reads
-- Top Cached SPs By Total Physical Reads
-- Top Cached SPs By Total Logical Writes
PlanTypeStats 7 Hours similar to AdHocPlanCacheBloat. This will show you the breakdown of adhoc vs prepared vs proc, etc. the USE_COUNT will show you if parameterization is working for you.
PossibleBadIndexes 4 Hours Logs cases where writes>reads. This may mean the index can be dropped. But you must understand your workload. However, if Total Reads is ZERO then you can generally drop that index. Unless you have a year-end process that needs it or some other edge case where the index is only needed occassionally.
TempDBLogWatch 5 mins Lists transactions and query plans that are using more than 50MB of tempdb log space.
TempDBUsage 5 mins Shows spids and statements that are using more than 50MB of tempdb space.

You have just read PerformanceCollector AddIns on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  


This is Part 3 of my blog series on PerformanceCollector.  This utility will help you monitor and baseline the performance of your SQL Server.  You can download the source code here. It is totally open source and there is no copyright.  It follows the Oracle YAPP method of performance monitoring where we monitor our response times and try to improve them by eliminating unnecessary waiting on the server.  Today I am going to cover PerformanceCollector.BlockAndWaitMonitor which is a view that shows you any process that was waiting or blocking/blocked and the cause. This is snapshotted from various DMVs every 15 seconds. We maintain this data for 30 days.  Both the snapshot interval and data retention settings are configurable in PerformanceCollector.Config.  

The Typical Use Case

In general people will complain about application performance by saying, "The app was slow from 1:45 to 2:00 today.  Can you look at the database?" In that case start by querying this view and filtering where CurrTime BETWEEN '1:45' and '2:00'.

You are looking for:  

  • any blocking/blocked spids
  • any spids waiting and what they are waiting on.
  • sql statements/procedures with longer durations than normal (may be missing an index or stats need to be updated).  

There are infinitely more things to look for that I can't possibly cover in a short blog post.  Experience and google are your friends here.  You'll likely find the cause of the problem if you look at the data and think logically.  


It's difficult to baseline wait events and whether they are trending in the wrong direction.  However, you can do basic trending by looking for "canaries in the coal mine".  Every system I've ever worked on has had patterns of performance behavior that, when observed, were good indicators that your users were unhappy with performance.  An example is lock escalations.  When a given query is running in READ COMMITTED isolation and scans an index it runs the risk of locking the entire data structure because the fine-grained row locks are escalated to table locks to conserve resources.  When this happens lots of spids tend to be blocked behind the lock escalating spid.  When the lock escalating spid runs for more than a few seconds users will think your database is down.  So, this is a use case I baseline.  It is fuzzy logic but we know to react to these event chains quickly before the calls start coming in.  

(sample output from PerformanceCollector.BlockAndWaitMonitor)

Manually Capturing a Spid

You can manually run EXEC PerformanceCollector.WaitAndBlockInspector at any time and it will log current waiting and blocking data. This may be useful if you notice your query is taking longer than it should. You can also pass a @spid parameter like this EXEC PerformanceCollector.WaitAndBlockInspector @spid = 100; and it will show you any waiting and blocking as well as any activity for the spid. This data is output to the screen and logged in the table.

Abandoned Spid Handling

BlockAndWaitMonitor will also show "abandoned spids." An abandoned spid is any spid with an open transaction for at least 60 seconds that is blocking something else. Here is the most important aspect...the spid must be sitting on AWAITING COMMAND. That is what differentiates an abandoned spid from an ordinary, run-of-the-mill blocking spid. When the spid is sitting on AWAITING COMMAND that means the SQL Server is waiting for another command from the spid to do something...most likely issue a COMMIT TRAN.

I have never seen a case where the spid will come back and issue the COMMIT.  Hopefully your developers are smart enough that they don't allow user input while a database transaction is open.  In every case an abandoned spid occurs because a data access framework thinks it is NOT in implicit_transaction mode and SQL Server thinks it is. For instance, various combinations of jdbc and websphere produce this behavior because jdbc does not issue sp_reset_connection commands like .NET does.

An abandoned spid is first logged after 60 seconds. After 5 minutes of being in an abandoned state, and after another spid has been blocked by it for 60 seconds, the spid is killed. It is logged to PerformanceCollector.SpidsKilled.

You can of course disable this feature in PerformanceCollector.Config.  


BlockAndWaitMonitor is the most important aspect of the PerformanceCollector tool.  It will show you how the SQL Server subsystems (IO, Memory, lock manager, log manager, etc) are contributing to "waiting" on your SQL Server.  You will see more than just the SQL command that is blocking...you'll see an approximation of why it is blocking.  This is important.  Just because a procedure is long-running does not mean a new index will fix it, nor will a rewrite either.  You have to understand why it is behaving like it is and which subsystem is contributing.  Only then will you know what needs to be fixed.  

In the next post I'll cover some of the other features of Performance Collector.  

You have just read "PerformanceCollector.BlockAndWaitMonitor" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

PerformanceCollector Part 2: How It Works

This is Part 2 of my blog series on PerformanceCollector.  This utility will help you monitor and baseline the performance of your SQL Server.  You can download the source code here. It is totally open source and there is no copyright.  It follows the Oracle YAPP method of performance monitoring where we monitor our response times and try to improve them by eliminating unnecessary waiting on the server.  


PerformanceCollector looks at various DMVs and at given intervals takes a snapshot of the most interesting DMV data that you can then query later.  It is implemented as two SQL Agent jobs.  

  1. PerformanceCollector BlockAndWaitMonitor:  runs in a "WHILE 1=1" loop and will rerun every 5 mins if it is stopped. Within the loop, every 15 seconds (configurable in PerformanceCollector.Config) it will collect and blocking and waiting data to PerformanceCollector.BlockAndWaitMonitor.
  2. PerformanceCollector.RunAddIns: runs every 5 mins to determine what AddIns are scheduled to run. This is discussed more in a future blog post:  nodetitle.  The AddIns capture data at less frequent intervals.  You can write your own AddIns as needed.  


Initial deployment and upgrades are handled by the same Powershell script: Deploy-PerformanceCollector.ps1. There is one parameter: $servername. It assumes you are running as an NT acct with sysadmin permissions.  Simply download the code and execute it like the screenshot.  

It will install a new database for your instance called PerformanceCollector that contains all of the code.  It will also create the new SQL Agent jobs noted above.    The installer only takes a few seconds to create or upgrade the utility.  The installer uses a variant of my MD3 (Metadata Driven Database Deployments) utility to install or upgrade a database via scripts.  


In the next blog post in this series I'll cover the BlockAndWaitMonitor which is the core component of the tool and the go-to object when I am presented with a performance problem on a production system.  

You have just read "PerformanceCollector Part2: How It Works" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

PerformanceCollector: a performance monitoring tool for SQL Server

About 15 years ago I wrote a suite of performance monitoring scripts for SQL Server 7 that would try to tell you why a given query was running slowly.  Back in those days there was no concept of DMVs to help you out.  Most people I knew relied solely on sqltrace (Profiler).  The problem is, Profiler will tell you how long something will run, but it won't tell you what subsystem of SQL Server is causing those long runtimes.  Is it a memory issue?  Slow IO?  Blocking?  The only information available was sysprocesses.  And it contained excellent information.  

I came from an Oracle background and in those days (Oracle8) there was no equivalent to Profiler.  If you wanted to look at your performance metrics you needed to query the sys and system schemas and the v$ views.  These are functionally equivalent to DMVs.  In fact, the folks at Oracle published a bunch of whitepapers on using these DMV-like tools and called it YAPP (Yet Another Performance Profiling Methodology).  YAPP was quite simply tuning based on the subsystems that caused long response times.  

I never realized how simple and elegant this approach to performance management was until I started working with SQL Server 6.5.  Other than sysprocesses, there was nothing else to tell you where the bottleneck was.  sqltrace was great for baselining but given a big query that had sporadic performance problems, where do you look next?  I actually started to do some independent learning about Performance Management that I began to document.  

But how could I do YAPP in SQL Server which had no publically-exposed wait interface?  Well, it turns out it did.  sysprocesses would show you the waittype, waittime, and waitresource for any spid that is waiting on some subsystem.  

I actually started this website in 1997 partially to document my findings with sysprocesses and ways to do YAPP-like analysis.  Subsystem Information Home, which now has lots of information on DMVs and response time tuning techniques, was initially a series of static pages on interested, undocumented things I found.  Remember, there was no google yet.  

I began to see more performance engineers and DBAs relying on sysprocesses as their tool of choice for performance troubleshooting.  Slowly DBAs started complaining to M$ to expose even more information in sysprocesses and give us something more YAPP-like.  With the release of SQL Server 2005 we were given our first DMVs.  I was thrilled.  But there were so many DMVs and they were still not entirely documented.  I created a new section of my website, DMV Info where I started to document what these DMVs did and how they helped in response time and wait analysis.  

I certainly wasn't the only DBA doing this, nor even the best.  In the last 10 years a complete shift in how SQL Server DBAs do performance engineering has occurred.  Nowadays I don't see anyone using Profiler and everyone uses the DMVs.  How the times have changed.  

Anyway, my first attempt at a performance monitoring and baselining solution was a SQL Executive (now known as SQL Agent) job that merely took a snapshot of sysprocesses and squirrelled it away so I could look at it later when someone complained about poor performance.  It was invaluable.  I rewrote it with the advent of DMVs.  Slowly over time I was monitoring even more "stuff" as every new release of SQL Server gave us new and exciting DMVs.  

I found this so useful I was installing it at every client engagement.  I decided to release a version of it a few years ago and posted it on my blog under the name Performance Engineer-in-a-Box.  I've since installed it at even more client sites and I finally got around to properly releasing this in a manner where I can do controlled releases as needed.  I now call the tool "PerformanceCollector".  The source code is available at my GitLab installation and is open to the public.  Feel free to modify it, fork it, steal it, change it, or whatever you would like.  I really don't care and their is no form of copyright on it.  I would appreciate an attribution if you do use it, but no payment is necessary.  

Over the next few blog posts I'll cover in details what PerformanceCollector does and how it works.  

You have just read "PerformanceCollector: a performance monitoring tool for SQL Server" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Buffer Pool Extensions in SQL 2014

I've been very busy lately but one of my tasks is to evaluate migrating to SQL 2014.  After poking around some of the new features I stumbled across Buffer Pool Extensions.  If your db server is memory-constrained or you have an IO-intensive workload, and you can afford a cheap SSD drive, you really should research this feature.  Essentially, you can extend your buffer pool onto nonvolatile storage without buying pricey RAM.  Technically you could use rotational media and not even use an SSD, but I can't imagine you would get much of a performance boost.  

How Does It Work?

When your working set is small the buffer pool is constantly performing random reads and writes whenever a CHECKPOINT is issued.  Each of these small IO operations is competing for your rotational media's attention.  Latency goes up, throughput goes down.  It's not uncommon on large systems to see CHECKPOINTs take a very long time and the Page Life Expectencies (PLEs) to drop radically.  

One of the beauties of SSDs is that random and sequential IO have nearly identical performance characteristics, just like volatile RAM.  Buffer Pool Extensions (we'll call them BPE for the rest of this post) essentially becomes a middle man between the RAM and the rotational media.  It would be nice if the small, random seeks of the dirtied buffer pool pages would go to the SSDs first, and from there write to our slower media without impacting the buffer pool.  Unfortunately that wouldn't be a very safe thing to do.  Instead, pages that are getting ready to be aged out of the buffer cache...we'll call them warm pages, will be moved to the BPE area instead...assuming the pages are "clean".  

Mechanics of Setting It Up

  • This can be enabled, disabled, and altered on-the-fly without service interruption.  Although you will see performance implications of doing these things obviously.  
  • You can query the current state of buffer pool extensions by using the DMV sys.dm_os_buffer_pool_extension_configuration.  
  • Ensure your SSD is formatted with a drive letter.  
  • MS's recommendation is for a 16:1 ratio of BPE size to MAX SERVER MEMORY.  

Performance Implications

I work on any application where everyone complains about the PLEs dropping to under 30 seconds when we do certain operations.  I've written about this in the past...IMHO the PLE is a symptom to be monitored, but doesn't help you determine root cause of any performance problems I've ever seen.  At best I correlate block and wait statistics information during periods of "bad PLEs" to determine exactly what activity was occurring and then determine how to fix it to be less IO intensive.  

I don't have a production load available on my SQL 2014 test box, and I'm not exactly sure how to test BPEs accurately.  Having said that in my test env it does appear as though PLEs are less "volatile" and more stable.  Elapsed checkpoint time is also better.  It will be interesting to see real perf numbers by some of the SQL experts on the blogosphere.  

This is an awesome feature.  

Yet another reason not to use TSQL scalar user-defined functions

...as if you really needed another one.  

I was poking around in SQL Server 2012 and I noticed a new feature of the XML SHOWPLAN.  You can now determine why a given query plan will be serial instead of parallel.  

I, of course, knew this already, but this will be another arrow in my quiver of proof for why, in many cases, TSQL scalar UDFs are evil.  

I'm actually more interested in finding out what the allowable values are for NonParallelPlanReason.  I haven't seen this documented yet, and I'm not really sure I care why a plan would/would not go parallel, but this is interesting.  

BTW, I've written NUMEROUS times about the performance problems with TSQL scalar UDFs:


Subscribe to RSS - Performance