A shard is one horizontal partition in a table, relation, or database. The difference between a shard and horizontal partitioning is that the shard is located on a separate network node. The benefit of sharding is that you will have less data on each node, so the data will be smaller, more likely to be held in cache, and the indexes will be smaller. Most importantly, I can now use a grid of nodes to attack queries across shards. This is an MPP architecture vs a SMP architecture (massively parallel processing vs symmetric multi processing). You can also view this as sharding is "shared nothing" vs horizontal partitioning which is generally "shared almost everything."
Sharding works best when each node (or the central query node in some cases) knows exactly which shards a given data element would reside on. Therfore the shard partition key must be a well-defined range. You want your shards to be well-balanced and often that means using a contrived hash key as the shard key, but this is not a requirement.
What products use sharding?
This is not a comprehensive list, but:
Problems with sharding
The concept of sharding is nothing new. The biggest problem is when you attempt to roll your own sharding. I've seen this a lot and it never works. Invariably the end result is a bunch of code in the app tier (or ORM) that tries to determine which shard should have a given data element. Later we determine that the shards need to migrate or be re-balanced and this causes a lot of code changes.
Dave Wentzel CONTENT
data architecture nosql