Hekaton

This is the next post in my NoSQL series. As I was starting my first NoSQL POC (using SAP HANA) SQL Server announced Hekaton in one of their CTPs for SQL Server 2014. I was intrigued because it appeared as though Hekaton, an in-memory optimization feature, was, basically, HANA. It turns out HANA is much more. This post has been at least 9 months in the making and since then everyone in the blogosphere has evaluated and posted about Hekaton so I'll keep this post really short.

Hekaton is Greek for a hundred, and that was the targeted performance improvement that Microsoft set out to achieve when building this new technology. Oracle has an in-memory product called TimesTen and I wonder if the Hekaton name was a bit of one-upmanship.

Initially I was skeptical that Hekaton was little more than DBCC PINTABLE. It's more...a lot more.

In case you've never heard of Hekaton, there are two main features:

a table can be declared as in-memory (similar to SAP HANA)
stored procedures can be compiled into native DLLs if they only touch in-memory tables.

Most database managers in existence today (except the "newer" varieties like HANA) were built on the assumption that data lives on rotational media and only small chunks of data will be loaded into memory at any given time. Therefore there is a lot of emphasis on IO operations and latching within the database engine. For instance, when looking up data in SQL Server we traverse a B-Tree structure, which is a rotational media-optimized structure. Hekaton does not use B-Trees, instead it uses memory pointers to get to the data. This is orders of magnitude faster.

Hekaton transactions are run in the equivalent of snapshot isolation level. New versions of changed data are stored in RAM with a new pointer. Transactions still get logged to the tran log and data is still persisted to disk, but the disk-based table data for Hekaton tables is only read from disk when the database starts up. And it is not stored in B-Trees. Instead, the versioned memory pointers are persisted to disk and on database startup those pointers/versions are re-read into memory.

You will see errors just like with other in memory-based database managers if the data grows too large to fit into RAM. The system does not fall back to traditional disk-based B-Trees.

Other Interesting Features

use SCHEMA_ONLY when creating table and it is non-durable and non-logged. The data will be gone when the instance restarts (or fails over), but the schema remains. This is good for ETL and session state information.
If indexes on these tables are not B-trees then what are they? Hash indexes...therefore all memory-optimized tables must have an index. Indexes are rebuilt on instance restart as the data is streamed to memory. Indexes are not persisted to disk and are not part of your backup.
No locks are acquired and there are no blocking waits. In-memory tables use completely optimistic multi-version concurrency control.

Implementation Features

the database must have a filegroup that CONTAINS MEMORY_OPTIMIZED_DATA that is used to recover the data. This makes sense since legacy filegroups are B-Tree-organized.
the tables (or database) must use a Windows BIN2 collation.
tables can have no blobs or XML datatypes, no DML triggers, no FK/check constraints, no Identity cols, no unique indexes other than PK.
maximum 8 indexes.
There are no schema changes to the table once it is created. That includes indexes.

There is lots of good information on Hekaton on the internet. Far more than I can put into a blog post. This is an interesting development.

Oracle in - memory database is called times 10.

Proposed a no SQL language standards: • CQL • UnSQL to

These are high level languages otherwise you need to write a compiler and didn't we learn in the 1970s that that was a bad idea.

Spell CODASyl. We really need to better understand the history of data management. Low level interfaces are not a good idea.

Dave Wentzel 2014-01-17 CONTENT
sql server data architecture nosql