DaveWentzel.com            All Things Data

SQL Server

More on Log Backups

After covering tail of the log backups, which most people forget about, I thought I'd cover the other log backup options, what they do, and their differences.  These have changed in recent releases so what you thought you knew may no longer be applicable.  

 

What is the difference between COPY_ONLY and NO_TRUNCATE?

NO_TRUNCATE is COPY_ONLY plus CONTINUE_AFTER_ERROR.  COPY_ONLY does not mark the appropriate log records as inactive, just like NO_TRUNCATE.  COPY_ONLY is used so an ad hoc log backup does not break your log chain that your DBA has established.  Say for instance your DBA backs up your logs every 10 mins using a 3rd party tool, or maybe has a complicated process that copies the data to a log shipped warm spare, or copies it directly to a NAS device.  Let's also say you need to run an UPDATE command 

OK, then what's TRUNCATE_ONLY/NO_LOG?

SQL Server 2005 and newer doesn't allow these options anymore.  

But the theory was...log records that are inactive (ie, the data was committed to the data files) are usually backed up via BACKUP LOG, then truncated.  TRUNCATE_ONLY (or NO_LOG) skip the BACKUP portion and simple truncate the records.  This should only be done when

  1. your disk with your ldf file is entirely and completely full.  When this happens you can't even add a new ldf file to another drive because the process of adding an ldf file actually causes log records to be written to the original ldf.  
  2. your disk with your PRIMARY datafile is entirely and completely full.  In this case the log records to indicate that a new mdf/ndf was added to the db can be written, but the metadata cannot be written properly to the PRIMARY data file.  
  3. you can't find any disk space anywhere to do a proper log backup and you are willing to sacrifice your log recoverability because your ldf is growing and you need to truncate the old log records.

Obviously these scenarios shouldn't happen often.  I've actually seen this happen at a government agency once.  Oy vey.  Whenever you issue a TRUNCATE_ONLY your log chain is broken and you should try to take a full db backup ASAP.  

Since these options are discontinued the appropriate way to handle the situations above is to simply put your db into SIMPLE mode.  I'm not sure if that process requires metadata writes to either the mdf or ldf so I'm not sure if my scenarios above will be handled correctly anymore.  I have no time to test since those scenarios are frankly ridiculous.  

Backup And Recovery Page

The Tail of the Log...or...the Various Log Backup Options

Quick...your SQL Server database just went suspect, what do you do first?

Hopefully your answer was not "start restoring from last night's full backup".  What we often forget to do is backup the "tail of the log."  It's a simple concept...the tail of the log is that portion of the log that has yet to be backed up.  If you really want your restore to regenerate your data up to the point of time of the disaster, then you need the tail of the log.  

Don't Ban Those SQL Server Cursors!

There's always a push it seems to ban TSQL developers from using cursors.  I'm a contrarian, if everyone wants to ban something, then I'm for it.  By "cursors" I mean the big bad code blocks that allow you to iterate, row-at-a-time-style, over a result set, taking some action.  Do a google search for yourself and you will find many cases where a TSQL cursor performs better than a set based solution.  Banning cursor usage is asking for trouble.  

First, by banning cursors do you assume your developers will instantly think relationally vs procedurally?  I doubt it.  What you'll get is a bunch of WHILE loops instead of cursors which will likely perform even worse.  

Second, it's not uncommon for a shop to have far more procedural than TSQL coders, and the procedural guys will be more comfortable with cursors.  That's great, let them code a cursor to get the logic nailed down, then I come back and pair program with them to refactor it properly.  This becomes a learning exercise and procedural people love it.  This is the whole "give a man a fish he eats for a day...feed a man a fish and he eats forever" argument.  With good test data I can even refactor the cursor myself if needed.  Granted, it's always better to code correctly at the start vs refactor later, but we aren't living in a utopia.  And, frankly, management loves to hear when you fixed their 15 minute runtime report to be subsecond.  

Here I have lots of methods to help you tune and even rewrite cursors so they are a little more performant.  

Doing Database CIT via TFS

Continuous Integration Testing (CIT) is an agile concept where you are constantly testing your code checkins to ensure they build, compile, and have successful unit tests.  Notification of failures for smaller shops is usually just an email to an SCM who then tracks down what broke using the version control system and notifies that person.  In larger shops a special CIT machine will sit in the scrum room with a big red or green light indicating if the builds are passing CIT.  When it goes red someone usually steps up to determine cause. 

In most shops this CIT concept is a no-brainer for the application code (tools like Apache Ant and CruiseControl really help with the automation and scheduling) , but is rarely done for the database scripts.  One reason is that most applications have a database build process that doesn't lend itself to CIT-style processing easily (for instance, requiring db-level object diffs to determine what needs to be applied).  My database build process is more of an evolutionary process, so by definition it does lend itself to CIT and change quite nicely.  

This post is the next in a series of posts showing useful TFS tips for the data professional.  I'll cover how I created a very simple script that handles database CIT using TFS as the source control repository.  You could also plug in a different VCS quite easily. It uses vbscript and batch files but PowerShell or a compiled application would work just as well, the concept is the important thing.   

I have the actual files available here for download.  Here is the process:

A controlling batch file (called RunCIT.bat) is scheduled using AT.exe or similar scheduler, this handles the automation component.  Let's look at some of the interesting points of the batch file...

Lines 1 - 14...the batch file has a series of parameters that tell it which server to run the db scripts against, user/pwd, location of the logs, and the TFS connection string.  

It then builds a loop (line 19) and checks  "branches.txt" (a csv file) for a list of branches that we want to run CIT against.  In our environment we have at least 5 active code/release branches at any time so I want to run CIT against all branches.  Our SCM can change this file whenever a new branch is created or deprecated.

As a side note, if you've ever tried to do looping in a batch file, it's a real PITA.  If you've ever tried to expand your DOS variables within that loop you know it's almost impossible.  I found a nifty workaround to make this happen using "SETLOCAL ENABLEDELAYEDEXPANSION" and alternative variable naming schemes.  See Lines 3 and 27 for examples. 

Within the loop we then run a series of tf.exe commands to Get Latest for the source code for that branch.  This means that your build machine needs to have Visual Studio loaded on it (unless someone knows how to get the tf.exe commands to run without it).  

The scripts for the given branch are then executed, first against a "net new" database and then a "ported" (upgraded) database, each is set up via manipulate_db.v-b-s.  I've found that naming a vbscript file as anything other than .vbs and then calling it with cscript/wscript is an easy way to bypass UAC on newer Windows OSs

Why run CIT against a "new" and "upgraded" database?

Unlike C# code, where I simply need to test if I compile and my unit tests succeed, a db build is a little more tricky because I need to be concerned about data history and upgrading data.  If I'm changing a UNIQUE INDEX from ColA,ColB to just ColA I can't just CREATE INDEX WITH DROP_EXISTING since it's possible an existing db will have violations.  I need to test that.  If I'm adding ColC to a table I need to remember to add that to the CREATE TABLE statement (for new databases) and an appropriate ALTER for upgraded dbs.  In each case the "new" and "upgraded" database take different codepaths through the sql scripts. 

So your upgraded or "ported" db should probably be a customer db with some substantial data.  You can even run against multiple upgraded dbs, perhaps one from each version of your software that you may have to upgrade. 



But the key is Line 73, GenerateDiscrepancyReport.vbs.  This compares netnew vs ported dbs and reports on missing (or schematically diferent) tables, views, columns, indexes discrepancies (different cluster keys, cols), and it will even report any data changes if you have defined system or master data tables that you ship with your db.  This is handled via 2 stored procs (here and here). 

Email is sent on success and failure with the logs attached. 

Happy Testing!!

Data Virtualization

For some reason "data virtualization" is one of the new hot buzzwords in my industry.  Data virtualization is really just combining a bunch of data sources (databases, unstructured data, DWs, weblogs, whatever) into a single virtual data tier layer that provides access to applications.  Haven't we been doing this for years.  I remember when linked servers came out for SQL Server.  I thought they were the coolest thing at the time.  I could now connect to Oracle from my SQL Server and manipulate the data in the more familiar TSQL, realtime, without the need of an ETL process.

Some research on Solid State Drives

I started to do some research on SSD's and began to write a page on the subject.  

Pages

Subscribe to RSS - SQL Server