This is the next post in my NoSQL series. As I was starting my first NoSQL POC (using SAP HANA) SQL Server announced Hekaton in one of their CTPs for SQL Server 2014. I was intrigued because it appeared as though Hekaton, an in-memory optimization feature, was, basically, HANA. It turns out HANA is much more. This post has been at least 9 months in the making and since then everyone in the blogosphere has evaluated and posted about Hekaton so I'll keep this post really short.
Hekaton is Greek for a hundred, and that was the targeted performance improvement that Microsoft set out to achieve when building this new technology. Oracle has an in-memory product called TimesTen and I wonder if the Hekaton name was a bit of one-upmanship.
Initially I was skeptical that Hekaton was little more than DBCC PINTABLE. It's more...a lot more.
In case you've never heard of Hekaton, there are two main features:
- a table can be declared as in-memory (similar to SAP HANA)
- stored procedures can be compiled into native DLLs if they only touch in-memory tables.
Most database managers in existence today (except the "newer" varieties like HANA) were built on the assumption that data lives on rotational media and only small chunks of data will be loaded into memory at any given time. Therefore there is a lot of emphasis on IO operations and latching within the database engine. For instance, when looking up data in SQL Server we traverse a B-Tree structure, which is a rotational media-optimized structure. Hekaton does not use B-Trees, instead it uses memory pointers to get to the data. This is orders of magnitude faster.
Hekaton transactions are run in the equivalent of snapshot isolation level. New versions of changed data are stored in RAM with a new pointer. Transactions still get logged to the tran log and data is still persisted to disk, but the disk-based table data for Hekaton tables is only read from disk when the database starts up. And it is not stored in B-Trees. Instead, the versioned memory pointers are persisted to disk and on database startup those pointers/versions are re-read into memory.
- use SCHEMA_ONLY when creating table and it is non-durable and non-logged. The data will be gone when the instance restarts (or fails over), but the schema remains. This is good for ETL and session state information.
- If indexes on these tables are not B-trees then what are they? Hash indexes...therefore all memory-optimized tables must have an index. Indexes are rebuilt on instance restart as the data is streamed to memory. Indexes are not persisted to disk and are not part of your backup.
- No locks are acquired and there are no blocking waits. In-memory tables use completely optimistic multi-version concurrency control.
- the database must have a filegroup that CONTAINS MEMORY_OPTIMIZED_DATA that is used to recover the data. This makes sense since legacy filegroups are B-Tree-organized.
- the tables (or database) must use a Windows BIN2 collation.
- tables can have no blobs or XML datatypes, no DML triggers, no FK/check constraints, no Identity cols, no unique indexes other than PK.
- maximum 8 indexes.
- There are no schema changes to the table once it is created. That includes indexes.
There is lots of good information on Hekaton on the internet. Far more than I can put into a blog post. This is an interesting development.
sql server data architecture nosql