Submitted by Dave Wentzel on Sun, 2013-12-01 21:59
In distributed data architectures like some NoSQL solutions, a mechanism is needed to detect and resolve conflicting updates on different nodes. There are many different ways to do this. You don't need to understand the nitty gritty details of how this is done, but understanding the basic concepts is fundamental to understanding the strengths and limitations of your distributed data system. In this post I'll cover some of the methods.
Distributed Hash Tables
BitTorrent's distributed tracker uses this technology. In fact, some key-value systems like Cassandra and memcached (I'll cover both in a later post) are just giant DHTs. Essentially a data element is passed to a common hashing function to generate a key, which is then passed around to other nodes.
Very simply, technologies that use QP must "commit" on n number of nodes for the transaction to be considered successful. The "commit" can be handled in one of two ways:
Like a traditional two-phase commit (2PC). In this case the write commit is delayed until a quorum of n acknowledge the change. Obviously this can introduce a high amount of transaction latency.
The quorum is obtained on the read side. When the data is read the quorum is obtained then. Again, latency is introduced in this model.
This method allows nodes to become aware of other node crashes or new nodes joining the distributed system. Changes to data are propogated to a set of known neighbors, who in turn propogate to a different set of neighbors. After a certain period of time the data view becomes consistent. The problem is that the more nodes the system contains, the longer it will take for updates to propogate, which in turn means the "eventual consistency" takes longer and therefore the possibility of conflicts occurring increases.
A vector clock is probably the simplest way to handle conflict resolution in a distributed system. A vector clock is a token that distributed systems pass around to keep the order of conflicting updates intact. You could just timestamp the updates and let the last update win...if your requirements are that simple. But if the servers are geographically disparate it may be impossible to keep the clocks synchronous. Even using something like NTP (Network Time Protocol) on a LAN may not keep the clocks synchronized enough. Vector clocks tag the data event so that conflicts can be handled logically by the application developer. This is basically how Git works under-the-covers.
There are many ways to implement vector clocks but the simplest way is for the client to stamp its data event with a "tag" that contains what the client knows about all of the other clients in the distributed system at that point in time. A typical tag may look like this:
Assuming this was client1, the above tag indicates client1 knows that client2 is on its "Revision 100", client3 is on its "Revision 78", client4 is on its "Revision 90" and it is on "Revision 50". "Revision" in this sense is a monotonically increasing identifier specific to that node. You could use a GUID, but that is rarely done.
Problems with Vector Clocks
A vector clock tag grows very large as more clients participate in the system, and as the system generates more messages. Some vector clock-based systems like Riak have a pruning algorithm to keep the vector clock tags more compact. Vector clocks are also used by Amazon's Dynamo architecture.
Submitted by Dave Wentzel on Mon, 2013-11-18 14:52
This is a follow-on to Performant ETL and SSIS Patterns. I really need to do a longer, explanatory post on this. Two of the largest performance problems I see with SSIS packages is their lack of parallelism and the fact that they are written to run large batches during a defined time window. The former is totally unnecessary, the latter is unecessary if you structure your processing smartly.
Submitted by Dave Wentzel on Tue, 2013-09-17 11:22
"The Wooden Badger" ... this is the story of how I got that nickname. But first, let me say that I really don't "get" Monty Python. I rarely laugh at their sketches. But other people do. Especially programmers for some reason. The same thing happens with This is Spinal Tap. Programmers love it.
Here's an experiment:
Ask a random person if he enjoyed This is Spinal Tap
If the response is any of the following:
"But these go to Eleven" followed by excessive belly laughter
"It's like, how much more black could this be? And the answer is none. None ... more black."
"Remember when Harry Shearer got stuck in his pod".
Then you've got yourself a programmer.
I don't frankly find Spinal Tap all that funny, but programmers do, so best to speak on their terms. Basically, when in Rome speak Roman.
But I digress.
There is one scene from Monty Python and the Holy Grail that is hilarious. King Arthur and his Kinghts of the Roundtable lay siege on a French castle. They build a Trojan Horse, except their version is a Trojan rabbit. At the end of the scene we see the English watching at a distance while the French wheel the wooden rabbit into the castle. King Arthur asks how this plan is supposed to work. "Now we wait until nightfall and we all jump out of the rabbit, taking the French by surprise." They all simultaneously realize the flaw in their plan and slap their foreheads. Gallahad responds, "Um, well look, if we built this large, wooden badger." Simultaneous groans could be heard.
So, why do programmers in particular find this so funny? My theory is that we feel we can do the same thing over and over again expecting different results. In my experience, that doesn't happen much. For instance, we all know that not having proper declarative referential integrity is asking for trouble. Yet every "new development" project I work on I invariably hear someone say, "we don't need foreign keys, we'll actually be able to ensure integrity at the Java tier this time." And somehow the data folks always tend to lose the argument and the initial version of the system is delivered without keys.
There is something even worse than "repeated insanity" and that is the Second System Syndrome. In this case, we take an elegant, simple system, and we refactor it and add needless complexity and call it Version 2. This is much worse because you HAD a working system, now you don't. Imagine what King Arthur's wooden badger would've looked like. More comfortable seating, gilded ornamentation...basically a bunch of useless stuff.
So, how did I get the "Wooden Badger" nickname?
It all started with 2 sprints left in the development cycle. I was asked to do a "whiteboard" design review for an ordered queueing system. Generally a whiteboard review session occurs in the first few sprints because if the design is faulty it's best to know that before too much code is written. There is only one reason to hold a design review this late in the cycle...someone is married to their design approach and knows it is not right and that I will refute it which will cause lots of rework. My arguments, architecturally sound as they always are, can be rebutted with a simple sentence, "Dave, you know, you may be right, but we have to get this release out the door, so we can't go back to the drawing board now." And this argument only works if it is used at the last minute.
So I went to whiteboard session and learned about the requirements first:
There are multiple queues, some ordered, some not, some ordered by different, varying keys.
We must support tens of thousands of messages per minute.
This doesn't sound like anything I haven't seen before. Let's look at the design:
The queues will be stored in the database as a single master queue table (not a Service Broker queue).
There will be 3 outrigger tables to handle supporting details. No foreign keys needed. And since we support multiple, varying keys, let's store them as XML and key-value pairs.
Ordering logic, when needed, will be performed during enqueueing. And that will require shredding the XML for existing data to determine the keys.
Since ordering logic is so complex we need to single-thread ordered queue processing, so they used SQL Server's applock feature...which is a big mutex.
The queues will not "drain", they will keep 14 days of successful messages in the queue tables, and 30 days for any failure messages.
Support people will be able to query the queue tables real-time to look at throughput metrics or to look at individual message routing status.
If you didn't understand the queueing mumbo jumbo above, never fear. Rest assured that EVERY bullet point is a well-known, established anti-pattern. So I began my retort:
Why are we not using JMS? It is an anti-pattern to build queues in a database.
"It's too hard to support ordering across multiple JMSs without going single-threaded."
But an applock is a way to make sure a process in SQL Server is single-threaded.
"Well, we like this pattern better."
Why not use Service Broker?
"What is System Broker?"
Why not have separate queues for ordered and unordered processing?
"Well, it's too late to change the design."
Why not drain the queues and on dequeueing simply write the dequeued data to a History table?
"We can do that in Version 2."
Clearly the designers were happy and nothing was going to change. There was consensus that some people would look at my concerns and consider tweaking the design for Version 2.
Six months later our first customer experienced queueing problems. I again expressed my design concerns and the consensus was that we would begin working on them.
But of course break/fix work is not sexy like feature functionality, so the work was de-prioritized. Another 6 months elapsed until the next customer experienced problems. This time we hit the "knee of the curve" and the design simply could not scale any further. I again suggested we prioritize my simple design changes. I made the case that my few changes above yield radical improvements without a full-scale redesign. We could tweak the implementation and design to be more of a standard queueing system.
(Many developers relate to Star Trek too)
"It sounds like you want to build another one of your wooden badgers Dave."
Me: "I'm sorry, but I'm just trying to adapt a bad design a bit and make it follow best practices a little more. I think the final design won't be perfect, but we'll get the scalability without the cost of a full redesign."
"No, you signed off on the last design, now you just want to keep tweaking it hoping for different results every time. When will you learn that we can't keep designing Wooden Badgers, we have to do things using industry best practices. So, you're new nickname is Badger and we are going to redesign this the right way this time."
"Sounds good to me."
UPDATE: We've had a few additional large customer outages due to this design. Each time I've been asked to participate in giving band-aid fixes. And after every incident I always ask my collegues, "So, Badgers, when are you going to have that redesign done?"
Submitted by Dave Wentzel on Fri, 2013-02-15 10:47
Service Broker is great for asynchronous execution. There are common patterns to asynch execution in the data tier that are well-covered on the Internet, such as asynch trigger execution. Remus Rusanu's website is invaluable for all things Service Broker and I find myself visiting it FIRST, even before google, when I need an answer to a Service Broker question. Someone on stackoverflow posted a question, "How can I get SQL Server to call a stored proc every n seconds?" At my current client we have these things, and we have probably 30 of them, scheduled as SQL Agent jobs. This works very well, but it means that when you restore a database for development or testing that you have an affinity with msdb that is not restored with your database. It also leads to Blocking and Contention with sysjobhistory when you have a lot of these jobs. I just don't find a ton of SQL Agent jobs to be very elegant.
Remus answered the question by noting that Service Broker can do this using a Timer and an Activated Queue. He hasn't (as of today anyway) posted this answer or pattern on his website. I hope he will. I've done something like this in the past where we developed a software product that was deployed using SQL Express. At the time Express did NOT have SQL Agent so if you needed to have an asynchronous scheduling mechanism you needed to roll your own, probably using Windows built in AT.exe command (yuck). The solution I used was called I called the "tickler". It would "wake up" every x seconds, see if there was any work to do in the configuration table, and do the needful.
We only need a single Q and Service. Message types, contracts, routes, etc are not needed. The "DoSomething" procedure is basically whatever you want to execute every x seconds.
I created a "Start Tickling" procedure. This simply looks to see if I have an existing conversation (you only need one) for the Tickler. If not, a create a conversation and TIMER against that conversation. That sends a message to the TicklerQ.
The Tickler Q is an activated Q. This is the stored proc that runs when a message is enqueued. Here we simply RECEIVE the message and execute our DoSomething procedure. Then we send another CONVERSATION TIMER (basically this is an infinite loop).
After we have all of the scaffolding in place (the Q, Service, activator proc, etc...we need to turn on ACTIVATION for the Q. We are now ready to seed the first TIMER using the StartTickling proc.
Again, I think this is a much more elegant solution then a SQL Agent job. Here I can run my procedure without having affinity to a job that will not get restored when I restore my db to another server. Tickling also picks up whenever the db is restored, comes back online after a restart, etc. A very elegant solution.
There are almost no GUI tools for managing Service Broker. This isn't a problem for me, I rarely use GUI tools, but it is a problem for our support people. In the previous post I covered my Service Broker Setup Routine. This routine enabled SB on a given database in a reliable, repeatable manner. This helped our DBAs immensely.
The next problem to overcome is how our DBAs can monitor Service Broker and its Queues, Services, etc. In my experience SB is very "lights out", just like anything else in SQL Server, if done correctly. But it's difficult to sell this to DBAs who really do not understand what Service Broker is. So I created a monitoring script to put their minds at ease. This monitoring routine is generic enough that it can be easily modified to suit any setup or configuration. There are lots of SB monitoring scripts available on the web. I've tried to compile them into one simple script that you can run that checks your configuration from the top down. You can download the script here.
Let me point out some interesting pieces of code:
My monitoring procedure is actually a combination of an installation, monitor, and teardown script. I do this so all logic is contained in one little place. Here are the options:
SETUP: sets everything up. It does a "properties-based" setup. This means that SETUP can be rerun multiple times without throwing an error and will ensure that the net result of each run is that we have a running system.
TEARDOWN: the opposite of SETUP. This is useful for unit testing the SETUP process.
TEARDOWN FORCE OVERRIDE: TEARDOWN will *not* do anything if there are unprocessed items in any queue, items stuck in the transmission q, etc. This override option tears everything down regardless of state. YOU WILL LOSE DATA using this option.
CHECK: this is the simplest monitoring tool, it ensures all Qs, metadata, and DMVs are showing everything as being good. I'll cover this in-depth below.
BROKER HELP: This checks low level components of Service Broker. I'll cover this in-depth below.
TRACER TOKEN: this works just like a tracer token in replication. It sends a test transaction to our Svc and makes sure it properly reaches its destination.
STALLED MESSAGE CLEANOUT: I'll cover this below.
This is part of BROKER HELP. I am echoing the status of various SB components by looking at DMVs. There is far more in the script.
Here we are looking at PerfMon statistics for various SB-related counters.
Activator procedure errors are echo'd to the SQL Error Log, so we display the last hour of that data. We also build a SSBDIAGNOSE command for each service and then interrogate it for errors.
STALLED MESSAGE CLEANOUT runs this code. We are looking for conversations that are not closed and END them WITH CLEANUP.
Conversation Population Explosion...check for an excessive amount of conversations not in a CLOSED state. This may mean we are not issuing a CLOSE CONVERSATION correctly in our setup. Or we have a "fire and forget" pattern, which is not right.
Submitted by Dave Wentzel on Tue, 2013-02-05 11:58
I've been using Service Broker more and more and I love it. In my last post on Service Broker I wrote about how I sold my client on using Service Broker. After selling it I needed a simple method to enable it on all of our customer databases reliably. There is a dearth of GUI and monitoring tools for Service Broker. This is a problem for me because the support DBAs are mostly familiar with GUI tools. Further, SB is kinda a new technology (for them) and they find it a little bit scary. This means needless late night phone calls to my staff to troubleshoot relatively simple problems. I've decided that, going-forward, whenever I implement something new in Service Broker I will also deliver a little monitoring tool that my support guys can use that will save my team some grief.
In this post I'll cover my ServiceBrokerEnable script. This script simply enables Service Broker on your db and ensures everything is up-and-running so you can begin installing your queues and services.
You can run into some serious problems when you ENABLE_BROKER on a database restored as a COPY on the same SQL instance. This is because the broker GUID is no longer unique. My routine looks for that and does a SET NEW_BROKER instead. But to do this your database must be in SINGLE_USER mode so SB can get an exclusive db lock.
Your database must also have its TRUSTWORTHY flag set, something that will not happen when your db is restored.
I have instructed our support guys to simply run this new routine whenever they CREATE or RESTORE a database.
The script is "properties-based", meaning that I can re-run it several times, even on a RUNNING system, and it will only do what it needs to do, if it needs to do it. So, for example, it won't set my database to SINGLE_USER unless it has to and it won't rebuild my queues and services unless they do not exist in the correct, running state.
Note that we are checking if SB is enabled before doing any work. We are also CATCHing the case where we cannot ENABLE_BROKER because of the duplicate GUID issue. At the very end of the procedure I am turning on the TRUSTWORTHY flag and ensuring my db is owned by SA. You may not need these last two steps, but our environment and setup requires it.
Submitted by Dave Wentzel on Mon, 2013-02-04 07:53
In the next few posts I want to cover some interesting things I'm doing with Service Broker.
I recently architected and rolled out Service Broker as a replacement for various queueing and asynchronous execution models that did not scale. It was a tough sell initially because we already thought we were SOA (Service-Oriented Architecture) and were already using JMS. What I showed was that using JMS doesn't automatically obviate the need for asynchronous messaging in the data tier. And just because we are already SOA doesn't mean we can't buy a lot of performance by going SODA (Service-Oriented Data Architecture). We would want to use Service Broker and go "SODA" for anything that is heavily data-dependent and asynchronous.
The roll out of Service Broker was a complete success. Like any good architect, I started small with one section of our application that was already asynchronous to the user. The problem was, it was still synchronous to the Java tier. I call it "asynchronously synchronous". Our largest customer had a full 1/3 of their JVMs running at zero CPU utilization and totally thread-starved. All threads were waiting for JMS-called stored procedures to finish. I rewrote this so JMS simply popped a message onto an SB queue and then returned control for further Java processing. After the stored procedures were finished the result sets were passed to another JMS queue for further processing. So we wrote a push queue in Service Broker that would automatically send the results of the procedures to the JMS queue. It would have been faster and quicker to write a pull queue and poll for changes, but we were all very excited to demonstrate as many of the features of SB as possible.
Doing this rather small amount of code refactoring we were able to determine that we can decommission about 1/3 of our JVMs. That's a huge savings in licensing for WebSphere and Windows. And it only took about a month from design to rollout.
There are lots of Service Broker presentations on the web, but what I haven't found is a presentation that argues the merits of Service Broker side-by-side with JMS (or equivalent). Our shop, like many, is also leery of putting any additional processing logic at the data tier. The concern is that the data tier is already generally the bottleneck, so, why make it worse? I believe I address those concerns in this presentation as well. You can download the ppt file here. Here are the slides: