CHANGE_TRACKING_CURRENT_VERSION() not always reliable with Log Shipping

There may be a "bug" with CHANGE_TRACKING_CURRENT_VERSION() when using Change Tracking and log shipping.  The returned value will not be accurate if ALLOW_SNAPSHOT_ISOLATION is set to OFF.  Change Tracking is one of a million ways to query for data changes and then do interesting things with those changes...such as ETL.  The best article I've ever read on Change Tracking (CT) is by the folks at Solidq.  Here are parts 1, 2, and 3.  

With the ETL use case the process is to take the CT value from the last ETL run and use that as the basis for the new changes you want to see using CHANGETABLE (CHANGES blah, @last_synchronization_version).  The process Microsoft outlines specifically says that, "...we recommend that you use snapshot isolation. This will help to ensure consistency of change information and avoid race conditions..."  

That's the problem...not EVERY db in the world can easily be converted to snapshot isolation without a lot of code rewrite and regression testing.  

It's also common for your ETL to run on a "copy" of your OLTP prod system.  For instance, I've seen lots of customers run their ETL processes on readable secondaries, transactional replication subscribers, or log shipping destinations.  In these situations you want to be able to run your CT queries against your "replicas" to reduce load on your primary server.  Ah, but that may not work if you are not using ALLOW_SNAPSHOT_ISOLATION ON.  

I don't have the time to fully repro this for a blog post or a Microsoft Connect bug, but here are some screenshots of the issue and a possible solution.  

First, it's important to understand CHANGE_TRACKING_CURRENT_VERSION().  By definition it should always give you the same answer as SELECT MAX(commit_ts) from sys.dm_tran_commit_table, regardless of isolation levels.  Here is a screenshot of my prod system that has many, active databases ... some with snapshot isolation ON, some with it OFF.  Note that the values are the same, regardless of whether snapshot isolation is ON or OFF.  There are cases where the values will be off, slightly, because transactions may be committing between the when the first and second queries run.  Note that the MAX() query can take a bit of time to run on a busy system, but I've never seen it cause blocking.  

So on a "writeable" db, regardless of isolation level, CHANGE_TRACKING_CURRENT_VERSION() works great.  Here is the result from running the same queries on my log shipped replica.  We use this replica as the basis for ETL daily loads.  We only restore log backups around 3 AM daily, so during the day we can assume that the replica is out of sync because we have the log restores turned OFF.  

Note the wild discrepancy between CTCV() and the MAX query.  Knowing what you know about my setup, clearly the MAX query is right but CTCV() is not in cases where ALLOW_SNAPSHOT_ISOLATION is OFF.  

Why?  

When I first saw this symptom I had no idea what could possibly cause this.  We have about 50 dbs on these servers and each is configured slightly differently regarding snapshot isolation settings, ansi settings, Service Broker enabled, etc.  After an hour or so I realized the pattern was that the erroneous data only happened on our older "legacy" dbs where we can't/didn't ALLOW snapshot isolation.  

I doublechecked the MS documentation for Change_Tracking_Current_Version() to see if I missed anything in the documentation that would account for this anamoly.  Otherwise this would be a bug.  I found nothing. 

But I was interested in why this would happen.  We all know that we can't view the source code for system functions to see if we can spot the bug, but perhaps instead we could see what sys.dm_tran_commit_table is "looking at" under the covers.  Sometimes you can do sp_helptext on DMVs...and this is one of them.  Here's what you'll see:  

--sp_helptext 'sys.dm_tran_commit_table'

       SELECT commit_ts, xdes_id, commit_lbn, commit_csn, commit_time

                     FROM sys.syscommittab (READCOMMITTEDLOCK)

       UNION

       SELECT commit_ts, xdes_id, commit_lbn, commit_csn, commit_time

                     FROM OpenRowset (table SYSCOMMITTABLE, db_id(), 0, 0)

That's an interesting little query, isn't it?  Frankly, I found myself doing all kinds of googling on what OpenRowset (table) does under the covers.  I definitely got a little shiny object syndrome looking at that code.  But that's another blog post.  Trust me, you can't run that query from ssms directly unfortunately.  

My thinking is that it is POSSIBLE that CTCV() is doing something like this under the covers and is getting confused and hence is returning the wrong information.  Again, I have no way of knowing without reviewing Microsoft source code...and that ain't happenin'.  

My suggestion is that if you are:

  • using Change Tracking
  • with Log Shipping
  • and are not using at least ALLOW_SNAPSHOT_ISOLATION

...then you should not rely on Change_Tracking_Current_Version on your log shipped copy to accurately determine the last committed row change that exists on your subscriber.  MAX(commit_ts) from sys.dm_tran_commit_table

I hope this helps someone in the future.