DaveWentzel.com            All Things Data

January 2014

Windows Azure Table Service

Another post in my NoSQL series.  Microsoft offers NoSQL-like data persistence engines outside of SQL Server.  The most compelling is Windows Azure Table Service.  This is a simple persistence mechanism for unstructured data that you can certainly use like a key/value store.  A single blob can be up to 1TB in size and each item in the key/value store can be 1MB.  Just like most key/value stores, no schema is needed.  
The service is organized as a series of tables (not at all the same as relational tables), and each table contains one or more entities. Tables can be divided into partitions and each entity has a two-part key that specifies the partition (the partition key) and the entity ID (the row key).  Entities stored in the same partition have the same value for the partition element of this key, but each entity must have a unique row key within a partition. The Table service is optimized for performing range queries, and entities in the same partition are stored in row key order. The data for an entity comprises a set of key/value pairs known as properties. Like other NoSQL databases, the Table service is schema-less, so entities in the same table can have different sets of properties.  
Conceptually this seems to be almost identical to MongoDb.  I'll write about my experiences with MongoDb in a future post

Are escrow accounts for suckers?

My "Pay Off Your Mortgage in 9 Years" post is surprisingly very popular.  There are a lot of radio ads for programs that promise to pay off your mortgage early.  Of course these services charge you a fee.  There is NO magic to these programs, you can do these tricks yourself to pay off your mortgage early.  I wrote that post because I rightly assumed that quite a few people do not know these tricks and I love to stick it to the banks whenever I can.  

Today's "stick-it-to-your-bank" post is about escrow accounts and how they benefit the banks and not you, and what you can do about it.  

What is an escrow account?

An escrow account is a bank account held by a third party (your mortgage company) which receives money from one party, such as the mortgagor (that would be you), and later distributes that money to one or many other parties.  These "other" parties might be your hazard insurance company and your local property tax authority (usually your school district).  

Why do banks want you to have an escrow account?

  1. Well, obviously they get to keep the interest on the float (the balance) in your escrow account.  Banks don't need to credit your account for interest, at least not in PA. So they get a few bucks every year.  
  2. People are generally lazy and stupid and banks would rather be the responsible party to make sure your hazard insurance and taxes are paid.  If your house burns down and you didn't pay your insurance you might decide to default on your mortgage and that means the bank loses THEIR property.  They don't like that.  An escrow account lets them keep tabs on you to ensure your bills, that they have a vested interest in, get paid. 
  3. Likewise, with escrow, if you default on your house the bank is guaranteeing that a taxing authority won't come in and get first claim on your house.  Without escrow the bank doesn't know that you are not paying your taxes.   With escrow, they know you are paying your taxes, because they are writing the check for you. 

Many will argue that a couple of lost bucks in interest is worth it if someone else (the mortgage company) has the responsibility to actually get the bills paid.  This is partially true.  Read on.  

Benefits of waiving escrow and paying your own bills

  • With escrow you are locking up your money for potentially 12 months prior to a given payment needing to be made from the escrow account.  
  • Make some interest on the money you would normally have in escrow.  Or pay a little more on a credit card.  Or do anything else you want with YOUR money.  Yes, the interest the bank makes on your escrow funds is miniscule...still, why let them have it?
  • In my area I get a SUBSTANTIAL reduction in property taxes if I pay a month early.  When I had an escrow account the bank only made the payment when it was due.  In other words, the bank had the funds in escrow but did NOT make the early payment to save me money.  
  • In other cases it would have been advantageous from a tax perspective to pay the small late fee on a tax bill and reap HUGE tax benefits by moving the tax payment to the next year.  It's rare, but it happens.  
  • Yes, it is true that it is LEGALLY the bank's responsibility to pay any and all bills for you (according to the terms of your escrow agreement), but that doesn't mean it won't cost you a ton of headaches when they screw it up.  For instance, when I had an escrow account some of my tax bills went directly to the bank, others came to me and I needed to mail them to the bank.  And the ones I needed to mail to the bank were the tax bills for a couple hundred bucks.  It would've been faster for me to just write the check to the taxing authority rather than find the bank's address and escrow account information (and a stamp) to send it to the bank.  
  • This one is controversial, but it's true.  As I mentioned in my previous mortgage post, there are unfortunately times when you may need to default on your mortgage.  No one likes to think about this but it happens.  In my last mortgage post I mentioned that PAYING DOWN (as opposed to PAYING OFF) your mortgage means that you are giving free money to the banks if you ever do, unfortunately, need to default.  The same is true with escrow.  If you can't pay the mortgage and escrow the bank will ensure the escrow is paid in full FIRST so the taxing authority doesn't seize your house.  However, it usually takes a long time for a taxing authority to seize your residence.  When funds are really tight, you are better off paying the mortgage first, and worrying about the taxes later.  With escrow you can't do this.  Obviously ALWAYS pay the hazard insurance first.  ALWAYS.  
  • In the 7 years I had escrow I found discrepancies in my annual statement in 3 of those years.  Check your annual escrow statements for discrepancies.  In every case they paid the hazard insurance late because my insurer, unfortunately, sends out the bill exactly 30 days before it is due.  My bank(s) couldn't handle such a small amount of lead time and decided to pay the penalties using MY escrow funds.  That's not legally allowed, but banks don't always follow the law either.  Once I disputed this in writing they credited my escrow properly, no questions asked.  But this was still VERY annoying and time-consuming.  


  • Once I overpaid escrow throughout the year (another bank calculation foul-up) and had a huge balance at the end of the year.  I received a letter indicating I could either get a refund check or apply some portion of the excess to next year's escrow and lower my monthly payment.  Or I could apply the money to principle.  Or...it becomes exhausting having to think about all of this.  I would rather have a nice static mortgage payment for 30 years rather than a wildly fluctuating payment due to yearly escrow differences.  
  • I noticed at the end of one year that my escrow balance was hugely negative.  The bank paid almost twice what my property taxes should have been.  After a lot of research the problem was with the taxing authority...the wrong tax bill was sent to the bank and they paid it without asking questions.  This is why I like to receive my bills at my home and scrutinize them.  The bank didn't catch the error, but I would have.  
  • In another situation I refi'd in the middle of the year.  My escrow account balance did not properly move from lender to lender.  I was missing one month's escrow payment.  I caught it only after reviewing the annual statement.  What a nightmare trying to fix that.  
  • Lenders are legally allowed to keep an escrow balance equal to twice the monthly escrow payment.  I had a lender actually delay a tax payment so they would not fall under the 2 month cushion.  And they tried to have me pay the late fees.  After I looked at the annual statement I noticed they did this the previous year too.  The conspiracy theorist in me thinks that maybe the banks really do want an extra nickel of float interest at the risk of an irate customer.  I hate banks.  
  • When you apply for credit there is a formula used called "debt-to-income ratio" (DTI) that determines if you are a good credit risk.  The first question is always, "what is your income?".  The second is, "what is your mortgage payment?".  But the creditor never asks what portion of your mortgage payment is escrow or if you even have escrow.  Therefore if you pay your own taxes your mortgage payment is lower and hence your DTI ratio looks better on paper. 

How do you waive escrow?  

  • You may only be able to ditch your escrow if you refinance.  Certainly don't refinance because you don't like escrow, but when the opportunity next presents itself, ditch your escrow.  
  • Ask.  You'll probably need to have at least 80% LTV (loan-to-value), in other words, you'll need about 20% equity before a bank will consider this.  Otherwise, again, strategic defaulting becomes appealing in bad economic times (see above).  If you ask and you are a good customer it is likely that a bank will waive escrow for you.  Most banks really do want happy customers.  

(sample LTV calculation)

  • This is rare:  If you don't escrow when you refinance then make sure you read the fine print and make sure you are not paying any extra fee because of this.  If the fee is small, and it usually is, it will be on your closing forms, then consider paying it anyway.  Paying a fee is a rarity though.  I was asked if I wanted escrow or not at my last refi.  

When is waiving escrow a bad idea?

  • If you must contractually pay higher mortgage rates because you don't have enough equity, then opt for escrow.  
  • If you are really terrible at budgeting and getting your bills paid on time, escrow is your friend.  
  • If your annual escrow statement shows LARGE negative balances then your bank can't properly estimate what your escrow should be and you are therefore getting free money from the bank.  This is rare but happens whenever your property taxes go up by VERY large amounts quickly.  Or when inflation gets out of control.  


This is the next post in my NoSQL series.   As I was starting my first NoSQL POC (using SAP HANA) SQL Server announced Hekaton in one of their CTPs for SQL Server 2014.  I was intrigued because it appeared as though Hekaton, an in-memory optimization feature, was, basically, HANA.  It turns out HANA is much more.  This post has been at least 9 months in the making and since then everyone in the blogosphere has evaluated and posted about Hekaton so I'll keep this post really short.  

Hekaton is Greek for a hundred, and that was the targeted performance improvement that Microsoft set out to achieve when building this new technology.  Oracle has an in-memory product called TimesTen and I wonder if the Hekaton name was a bit of one-upmanship.  

Initially I was skeptical that Hekaton was little more than DBCC PINTABLE.  It's more...a lot more.  

In case you've never heard of Hekaton, there are two main features:

  • a table can be declared as in-memory (similar to SAP HANA)
  • stored procedures can be compiled into native DLLs if they only touch in-memory tables.  

Most database managers in existence today (except the "newer" varieties like HANA) were built on the assumption that data lives on rotational media and only small chunks of data will be loaded into memory at any given time. Therefore there is a lot of emphasis on IO operations and latching within the database engine. For instance, when looking up data in SQL Server we traverse a B-Tree structure, which is a rotational media-optimized structure.  Hekaton does not use B-Trees, instead it uses memory pointers to get to the data.  This is orders of magnitude faster.  

Hekaton transactions are run in the equivalent of snapshot isolation level.  New versions of changed data are stored in RAM with a new pointer.  Transactions still get logged to the tran log and data is still persisted to disk, but the disk-based table data for Hekaton tables is only read from disk when the database starts up.  And it is not stored in B-Trees.  Instead, the versioned memory pointers are persisted to disk and on database startup those pointers/versions are re-read into memory.  

You will see errors just like with other in memory-based database managers if the data grows too large to fit into RAM.  The system does not fall back to traditional disk-based B-Trees.
Other Interesting Features
  • use SCHEMA_ONLY when creating table and it is non-durable and non-logged.  The data will be gone when the instance restarts (or fails over), but the schema remains.  This is good for ETL and session state information.  
  • If indexes on these tables are not B-trees then what are they?  Hash indexes...therefore all memory-optimized tables must have an index. Indexes are rebuilt on instance restart as the data is streamed to memory.  Indexes are not persisted to disk and are not part of your backup.  
  • No locks are acquired and there are no blocking waits.  In-memory tables use completely optimistic multi-version concurrency control.  
Implementation Features
  • the database must have a filegroup that CONTAINS MEMORY_OPTIMIZED_DATA that is used to recover the data.  This makes sense since legacy filegroups are B-Tree-organized.  
  • the tables (or database) must use a Windows BIN2 collation.  
  • tables can have no blobs or XML datatypes, no DML triggers, no FK/check constraints, no Identity cols, no unique indexes other than PK.  
  • maximum 8 indexes. 
  • There are no schema changes to the table once it is created.  That includes indexes.  

There is lots of good information on Hekaton on the internet.  Far more than I can put into a blog post.  This is an interesting development.  

Graph Datastores

This is the next post on my series on NoSQL solutions.  I haven't actually worked with any graph databases in depth because I haven't been presented with any data problems that would warrant the use of a graph data store.  

What are the "right" use cases for a graph database?

  • Mapping applications.  "Find me the shortest path between any two nodes in a connected graph."  This would be difficult to solve algorithmically using a RDBMS.  In something like Google Maps the "shortest path" isn't necessarily the ultimate goal.  You may want the "cheapest" path instead (no toll roads or only 4 lane highways).  
  • Recommendation generation systems.  If you purchase x online, what have others purchased as well and we can display that for upselling purposes.  
  • Social networking like LinkedIn.  How does a person relate to another person?  This is another example of a graph.  
  • Data mining to find the interactions among nodes.  

Neo4j is the graph data store I've played with.  It uses untyped datatypes and the data is stored much like it would be in a document db...in an ad hoc fasion.  Relationships are declared based on the Edges between nodes.  

Free SOHO VPN Solutions

I do a lot of small consulting jobs for "mom and pop" shops...developing small applications, iphone apps, helping with DBA tasks, etc.  A common theme is that I need to remotely work on their network resources.  How do you do this when the customer is small, remote, and may not have the latest versions of Windows nor expensive Cisco networking gear?  

In this post I'll take you over how I accomplish logging in to a remote network with minimal setup (and cost).  I wish there was one simple, universal way to do this on every network, but there isn't.  Depending on the client's version of windows and their router you may need various alternative solutions.  I'll show you how a setup a VPN, remotely, for free, using three different methods that have always (so far) worked.  You can also use these tricks to VPN to your home network from your job, or from your home to your neighbor, which is good for off-site encrypted backups using peer-to-peer networking (but that's another post).  

Simplest Solution:  MS Windows PPTP and L2TP VPNs

The easiest way is to set up a PPTP or L2TP Windows VPN, which is about 10 mouse clicks that you can talk "pop" through over the phone in 10 mins.  This method works about 80% of time in my experience.  No reboot or software necessary, works on any version of Windows since at least XP, and works over FiOS and Comcast.  

Ah, but sometimes this solution will not seem to work.  Wny is that?  Invariably the client has a SOHO (small office/home office) all-in-one wireless router between their network and their ISP.  Now the client has to poke holes in their router, and even then PPTP and L2TP require GRE packets which some of the cheaper SOHO routers won't pass through.  The ubiquitous WRT54G2 by Linksys is a case in point.  It is the best selling home wireless router EVER.  It claims to pass GRE packets around, but it does NOT.  PPTP and L2TP VPNs will not work over these things.  

(The original WRT54G2 and the updated model...these routers have VPN "issues")


You could just spend $79 and have the client buy a better SOHO router, but I'm frugal.  So, native Windows VPN isn't working for you...what next?  

LogMeIn Hamachi

It's free and a simple download that runs on each "resource" that you need access to on both the local and remote side.  The actual VPN is managed remotely by LogMeIn and you provide a "network name" and "password" that all clients can use to network.  Some people call this a "cloud-based VPN".  Once everyone is connected you use network resources as you normally would.  Problems:  

  • I have a problem with remotely managed stuff where a third party basically can have complete control of anything unsecured on my network.  These mom and pop shops tend to have file shares where Everyone has full control. Anyone who gets your "network name" and "password" and a copy of Hamachi effectively has control of your network.   
  • It doesn't run as a service

But you can get it setup quickly.  Honestly, I usually have the client install this first when I have PPTP VPN setup issues (darn Linksys routers).  Then I remote in and configure something better and uninstall this.  What I found works best is...


This is open source software that runs on almost any OS.  It is geared toward the Linux/scripting crowd though.  There is an admin GUI, but it's less than helpful.  And you have to learn about TAP and TUN adapters and when to use them.  And there is no standard Windows authentication (or even user/password) out-of-the-box.  And you must setup a certificate for your server and keys for every VPN user.  Then you must distribute the keys, securely.  And then you must...I'm exhausted just writing this paragraph.  So, this requires me to have a site visit just to get the VPN going. 

The above was my, and many people's, first impressions of OpenVPN for many, many years.  OpenVPN just wasn't a one-click install and "lights out" after that.  My biggest frustration was figuring out how to bridge and route some traffic and not other traffic.

This is no longer an issue as the fine folks at OpenVPN have created one-click VPN "appliances" that run on VMWare and VirtualServer.  

I have yet to have a network or hardware configuration that OpenVPN could not support.  Ever.  It just works.  And it is now simple to setup with no Linux experience necessary.  

Here is how you set it up:  

  1. Have your customer install VMWare Player and install it.  There is a VirtualServer version too, but it needs a 64bit Virtual Server.  And many of these mom and pop shops don't have that.  
  2. Download and run the VM appliance
  3. It boots right to a wizard where you assign a simple user authentication system.  
  4. Poke a hole in your customer's router for Port 1194.  
  5. On your client machine download the OpenVPN client software.  Don't bother configuring a .ovpn file when it asks you.  Just connect to your customer's IP address:1194.  You will be prompted for a user/password and you are in.  The configuration file is pushed down from the server.  
  6. When you are done "remoting in" have your customer stop the VMWare Player.  When you need to reconnect just have your customer run the VMWare Player again.  

Couldn't be easier when your router won't support M$'s native VPN abilities. 


SAP HANA Evaluation

As I mentioned in theAs I mentioned in the last couple of blog posts, I've been tasked with looking at big data slash no SQL solutions for one of the projects I'm working on.  We began looking at SP P HAMA.
As I mentioned in the last couple of blog posts, I've been tasked with looking at big data slash no SQL solutions for one of the projects I'm working on.  We began looking at SP P HAMA.
As I mentioned in the last couple of blog posts, I've been tasked with looking at BigData and NoSQL solutions for one of the projects I'm working on.  We are seriously considering SAP HANA.  HANA is an in-memory data platform that is deployed on an appliance or in the cloud.  It is mostly focused on real-time analytics.  SAP HANA is an optimized platform that combines column-oriented data storage, massively parallel processing, in-memory computing, and partitioning across multiple hosts.
HANA optimizes your data on the fly.  It can compress data and convert to/from columnstore/rowstore data depending on how you are using your data.  For instance if calculations are single column-based then columnstore storage is chosen.  
It has a standard SQL language, called SQLScript, which is very similar to TSQL, supports MDX (just like SQL Server...so it works well with Excel) and ABAP, and has standard jdbc and ODBC drivers.  It is ACID compliant.  This makes the move from a standard SQL RDBMS a little less painful.  SAP provides prepackaged algorithms optimized for HANA.  
My project is an Accounting System with lots of aggregate tables that holds summarized data at various grains of detail.  We aggregate, of course, because aggregated data is faster to read than performing the requisite calculations on the fly.  With HANA the aggregate tables are not needed.  They believe that they can retrieve the necessary aggregated data by querying the column stores directly, in-memory.  This of course would simplify our data model tremendously since we wouldn't need all kinds of logic to populate the aggregated tables.  ETL goes away.  We simply compute on the fly.  This would eliminate a lot of our data and a lot of our storage costs.  
Like many financial applications, ours is batch-oriented.  There are certain long-running, complex calculations we just can't do real-time with an RDBMS.  With SAP HANA “batch is dead”.  Even if HANA can't support your most demanding batch jobs, at a minimum they become "on demand" jobs instead.  Operational reports can run real-time on the OLTP system...no more need for an ODS.  
It will be interesting to see where this proof of concept leads us.
Where Can I Download It?
You can't.  But SAP gives you FREE development instances with all of the tools you need on their website (see below).  Here's why...SAP HANA is sold as pre-configured hardware appliances through select vendors.  It runs on SUSE Linux SLES 11.  It uses Intel E7 chips, Samsung RAM, Fusion IO SSD cards, and standard 15K rotational media for overflow and logging.  
Where Can I Get More Information?
The SAP HANA website provides lots of free information.  Lots of example use cases and code.  
The SAP HANA Essentials ebook is being written in "real time".  Google around for the free promo code and then you don't have to pay for it.  It is being continuously updated with new chapters as the content becomes available.  


This is my next post in my NoSQL series.  Sharding is not specific to NoSQL, but quite a few BigData/NoSQL solutions use sharding to scale better.  

What is sharding?

A shard is one horizontal partition in a table, relation, or database.  The difference between a shard and horizontal partitioning is that the shard is located on a separate network node.  The benefit of sharding is that you will have less data on each node, so the data will be smaller, more likely to be held in cache, and the indexes will be smaller.  Most importantly, I can now use a grid of nodes to attack queries across shards.  This is an MPP architecture vs a SMP architecture (massively parallel processing vs symmetric multi processing).  You can also view this as sharding is "shared nothing" vs horizontal partitioning which is generally "shared almost everything."  

Sharding works best when each node (or the central query node in some cases) knows exactly which shards a given data element would reside on.  Therfore the shard partition key must be a well-defined range.  You want your shards to be well-balanced and often that means using a contrived hash key as the shard key, but this is not a requirement.  

What products use sharding?

This is not a comprehensive list, but:

  • MySQL clusters use auto-sharding.  
  • MongoDB
  • Sharding is not so useful for graph databases. The highly connected nature of nodes and edges in a typical graph database can make it difficult to partition the data effectively. Many graph databases do not provide facilities for edges to reference nodes in different databases. For these databases, scaling up rather than scaling out may be a better option.  I'll talk more about graph databases in a future NoSQL blog post.  

Problems with sharding

The concept of sharding is nothing new.  The biggest problem is when you attempt to roll your own sharding.  I've seen this a lot and it never works.  Invariably the end result is a bunch of code in the app tier (or ORM) that tries to determine which shard should have a given data element.  Later we determine that the shards need to migrate or be re-balanced and this causes a lot of code changes.  

The CAP Theorem

Here is a little more of my experiences evaluating NoSQL solutions.  It is important to understand the CAP Theorem.  CAP is an acronym for Consistency, Availability, Partition Tolerance.  
In a distributed data architecture you can only have 2 of these 3 at any one time.  You, as the data architect, need to choose which 2 are the most important to your solution.  Based on your requirements you can then choose which NoSQL solution is best for you.  
If a node goes down (called a network partition) will your data still be queryable (available) or will it be consistent (ie, queryable, but not exactly accurate)?   
As a relational guy this makes sense to me.  What distributed data guys have figured out is that the CAP Theorem can be bent by saying that you can actually have all 3, if you don't expect to have all 3 at any particular exact moment in time.  With many NoSQL solutions the query language will have an option to allow a given request to honor availability over consistency or vice versa, depending on your requirements.  So, if I must have query accuracy then my system will be unavailable during a network partition, but if I can sacrifice accuracy/consistency then I can tolerate a network partition.  
It's usually not that simple unfortunately.  It's generally not a wise idea, for hopefully obvious reasons, to write data to one and only one node in the distributed datastore.  If we have, say, 64 nodes and every write only goes to one of those nodes, we have zero resiliency if a network partition occurs.  That node's data is lost until the node comes back online (and you may even need to restore a backup).
Instead, a distributed datastore will want each write to be acknowledged on more than one node before the client is notified of the write's success.  How many nodes a write must occur on is an implementation detail.  But clearly this means that writes will take a little longer in systems designed like this.  It's equivalent to a two phase commit (2PC).  
This "multiple node write" issue also means that if we query for a specific scalar data element that any two nodes may have different values depending on which was updated last.  This means that these datastores, while allowing queries to be leveraged against all nodes (map) and then merged to determine the correct version (reduce) will require a synchronization and versioning mechanism such as a vector clock.  I'll discuss vector clocks and other synchronization mechanisms in the next post.  
Other implementations may not require the "2PC" but will instead only write to one node but perform an asynchronous replication to other nodes in the background.  They may use a messaging engine such as Service Broker or JMS to do this.  Obviously, this is prone to consistency problems during a network partition.  In this type of system the designer is clearly valuing performance of writes over consistency of data.  Obviously a query may not return transactionally consistent data, always, in this type of system.  
The system designer has infinite latitude in how to bend the CAP Theorem rules based on requirements.  Most of us RDBMS people don't like this kind of flexibility because we don't like when certain rules, like ACID, are broken.  The interesting thing is, if you think about it, you can actually totally guarantee CAP in a distributed datastore if you remember that both readers and writers would need to execute against exactly half of the total nodes, plus one.  This would assume a "quorum".  Of course your performance on both reads and writes will be at their worst and will totally negate any reason for distributing data in the first place.  IMHO.  
An area where a distributed datastore may take some liberties that freaks out us RDBMS folks is that a write may not *really* be a write.  In the RDBMS world writes follow ACID.  This means that a write can't be considered successful if I just write it to buffer memory.  I have to write it to the REDO/transaction log and get a success from that.  We've all been told in years past to "make sure your disk controller has battery backup so you don't lose your write cache in the event of failure".  
Well, its possible that we can "bend" the write rules a little too.  Perhaps we are willing to tolerate the "edge case" when we lose a write due to failure between the time the buffer was dirtied and when it was flushed to disk.  So, some of these solutions will allow you to configure whether you need a "durable write" vs a plain 'ol "write".  Again, the system designer needs to consciously make the tradeoff between durability and performance.  This is a "setting" I've neither seen in an RDBMS nor even seen it requested as a feature.  It's just a totally foreign concept.  And in many cases the write/durable write setting can be changed at the request level, offering even more flexibility.  In fact, it appears as though Microsoft is getting on board with this concept.  It looks like Hekaton will support the concept of "delayed durability".  
I see one last concern with these solutions, the cascading failure problem.  If Node goes down then its requests will be routed to Node', which effectively doubles its load.  That has the potential to bring Node' down, which will cause Node" to service thrice its load, etc etc.  This can be overcome with more nifty algorithms, it's just worth noting that it's never wise to overtax your nodes.  This is why NoSQL people always opt for more nodes on cheaper commodity hardware. 

Metric System Humor

It's fairly cold in most of the continental US this week, like "low-temperatures-that-we-haven't-seen-in-20-years" cold.  Which got me to thinking...I've been saying for years that the main reason the US will never adopt Fahrenheit over Celsius, like the rest of the world has, is because people will have much too hard of a time grasping the fact that when they see temperatures expressed in Celsius it really is not as cold as they think it is.  

Well, that logic does not apply when we have these brutal cold spells.  The joke...

A guy walks into a bar..."It must be minus 40 out there."

Bartender:  "Fahrenheit or Celsius?"

Guy:  "Both"

But of course I am in favor of the metric system when it benefits me.  For instance, weights.  I love it when I go for my annual checkup and the doctor expresses my weight in kg.