The CAP Theorem
Here is a little more of my experiences evaluating NoSQL solutions. It is important to understand the CAP Theorem. CAP is an acronym for Consistency, Availability, Partition Tolerance.

If a node goes down (called a network partition) will your data still be queryable (available) or will it be consistent (ie, queryable, but not exactly accurate)?
As a relational guy this makes sense to me. What distributed data guys have figured out is that the CAP Theorem can be bent by saying that you can actually have all 3, if you don't expect to have all 3 at any particular exact moment in time. With many NoSQL solutions the query language will have an option to allow a given request to honor availability over consistency or vice versa, depending on your requirements. So, if I must have query accuracy then my system will be unavailable during a network partition, but if I can sacrifice accuracy/consistency then I can tolerate a network partition.
It's usually not that simple unfortunately. It's generally not a wise idea, for hopefully obvious reasons, to write data to one and only one node in the distributed datastore. If we have, say, 64 nodes and every write only goes to one of those nodes, we have zero resiliency if a network partition occurs. That node's data is lost until the node comes back online (and you may even need to restore a backup).

This "multiple node write" issue also means that if we query for a specific scalar data element that any two nodes may have different values depending on which was updated last. This means that these datastores, while allowing queries to be leveraged against all nodes (map) and then merged to determine the correct version (reduce) will require a synchronization and versioning mechanism such as a vector clock. I'll discuss [[vector clocks]] and other synchronization mechanisms in the next post.
Other implementations may not require the "2PC" but will instead only write to one node but perform an asynchronous replication to other nodes in the background. They may use a messaging engine such as Service Broker or JMS to do this. Obviously, this is prone to consistency problems during a network partition. In this type of system the designer is clearly valuing performance of writes over consistency of data. Obviously a query may not return transactionally consistent data, always, in this type of system.

An area where a distributed datastore may take some liberties that freaks out us RDBMS folks is that a write may not *really* be a write. In the RDBMS world writes follow ACID. This means that a write can't be considered successful if I just write it to buffer memory. I have to write it to the REDO/transaction log and get a success from that. We've all been told in years past to "make sure your disk controller has battery backup so you don't lose your write cache in the event of failure".
Well, its possible that we can "bend" the write rules a little too. Perhaps we are willing to tolerate the "edge case" when we lose a write due to failure between the time the buffer was dirtied and when it was flushed to disk. So, some of these solutions will allow you to configure whether you need a "durable write" vs a plain 'ol "write". Again, the system designer needs to consciously make the tradeoff between durability and performance. This is a "setting" I've neither seen in an RDBMS nor even seen it requested as a feature. It's just a totally foreign concept. And in many cases the write/durable write setting can be changed at the request level, offering even more flexibility. In fact, it appears as though Microsoft is getting on board with this concept. It looks like [[Hekaton]] will support the concept of "delayed durability".
I see one last concern with these solutions, the cascading failure problem. If Node goes down then its requests will be routed to Node', which effectively doubles its load. That has the potential to bring Node' down, which will cause Node" to service thrice its load, etc etc. This can be overcome with more nifty algorithms, it's just worth noting that it's never wise to overtax your nodes. This is why NoSQL people always opt for more nodes on cheaper commodity hardware.
Dave Wentzel CONTENT
data architecture nosql