DaveWentzel.com            All Things Data


HBase is another NoSQL solution I evaluated for a recent project to replace some SQL Servers.  HBase is a close cousin to Hadoop and Google's BigTable.  HBase is a column-oriented data storage system.  It is distributed.  Hadoop is not great at random reads and writes on HDFS.  HBase is a huge improvement here.  HDFS and MapReduce are great at batch processing operations over huge datasets, but what if you only need an individual record?  HBase is the tool to use.  

The point of these blog posts is to briefly allow the relational data architect to understand a cursory bit about various NoSQL offerings.  So how does HBase and other column-oriented databases compare to RDBMSs?  In HBase the table schemas closely map to the physical implementation, whereas this is not always the case for a relational logical/physical model.  So, where the RDBMS abstracts the storage and retrieval of data for you, the burden of this activity is the programmer's problem with a column-oriented system.  

But that's kinda the whole point of NoSQL, isn't it?  For certain problems an RDBMS may not be a good fit.  Two examples...read/write concurrency and huge dataset sizes.  What I've seen in these cases is that the data professional bends or breaks the logical modeling rules of the relational model to overcome the physical limitations of the RDBMS.  In other words, denormalizing.  HBase is one possible way to avoid doing that by substituting a different storage engine.  Namely we are swapping a row-oriented storage mechanism for a column-oriented one.  

That's HBase in a nutshell.  But of course it also gives you a loosening of the ACID restrictions, and the like, as well.  


Add new comment