Vertica Concurrency and Isolation Semantics

This is a brief blog post in my Vertica series covering how Vertica handles concurrency and isolation.

Vertica supports the four standard SQL isolation levels (READ COMMITTED, READ UNCOMMITTED, REPEATABLE READ, and SERIALIZABLE). However, it really only honors READ COMMITTED and SERIALIZABLE. READ UNCOMMITTED is mapped to READ COMMITTED and REPEATABLE READ is up-sized to SERIALIZABLE. So, if your app is, for some reason, sensitive to isolation levels, beware. This shouldn't affect the query workloads that Vertica targets, which is read-intensive data warehouse workloads (with a high amount of concurrent data loading). In a future post I'll cover how Vertica is architected such that it can handle massive data loads without affecting online query concurrency.

Vertica supports SQL92-style implicit transactions. This means that you don't really need a BEGIN TRAN clause, although you can certainly do that. It supports SAVEPOINTs and ROLLBACKs in a manner that is semantically identical to SQL Server.

Snapshot Isolation
Vertica has the ability to run queries in snapshot isolation mode similar to SQL Server. It is still READ COMMITTED under the covers. The trick is to preface your queries with AT EPOCH LATEST (or a given time) to your query. When you do this you instruct Vertica to fetch the committed data as of the time the query started. Essentially you are querying a snapshot, although you don't need to create or declare the snapshot like you do in SQL Server. These queries should have the fastest possible execution. Essentially you will hold no locks and won't block write ops. EPOCHs are advanced when new data is committed.

Vertica written with AT EPOCH clauses are called "historical queries".

More on Epochs

Let's take a deeper dive into epochs.

Vertica's SQL dialect doesn't have a name (such as T-SQL or PL/SQL) that I am aware of. Vertica's SQL is based off Postgres but it is a very limited subset of functionality. It's syntactically similar to TSQL and follows most of the ANSI SQL standards. it has CTE's (although not recursive), derived tables in FROM clauses, subqueries (correlated and non-correlated), etc etc. You terminate all SQL statements with a semicolon.

Vertica's SQL does not support variables. The vsql utility supports host-like environment variables but that is useless if using DBeaver or embedding SQL in Java. Instead you have to use a native programming language and handle your variablization there. Because there are no variables I have this script mostly hardcoded for my test environment.

`select current_epoch from system;` will show you the equivalent of the LSN for your system. Whenever the next commit occurs it will, in this case, be 957. This is a system-wide, not connection-specific, value. The `epochs` table will show you the current epochs that you can query against. The oldest listed, in this case 956, is the oldest that can be queried and is called the Ancient History Mark (AHM). Note that creating a table does not advance the current_epoch. However, when we INSERT a row into a table, and commit it, we do advance the current_epoch.
The epochs table now shows that we are able to query two different epochs. The AHM has not advanced. Just a quick note, similar to the bash shell or the autocomplete feature in SSMS if you type the first few letters of a command or object and then hit tab you can either autocomplete the command or the shell will give you a list of suggestions for you to continue typing if there are multiple possibilities. In this case I typed "pub" and then tab and vsql showed me all contextual values that would be available for autocompletion...in this case objects in the `public` schema. After entering another row my current_epoch has again advanced and I can now query 3 different epochs. A basic SELECT against our table will show two rows.
Now I can use the AT EPOCH statement, prepended to the SELECT to see the data at various snapshots in the past. If I choose EPOCH 955 I generate an error since that EPOCH is no longer available. EPOCH 956 shows me no rows. EPOCH 957 shows me one row. EPOCH 958 shows me two rows. If I try to query the current_epoch (959) I also generate an error. I forgot to grab a screenshot but `AT EPOCH LATEST`, in this example would be functionally equivalent to EPOCH 958. `AT EPOCH LATEST` is assumed when the AT EPOCH clause is omitted. It is functionally equivalent to saying, "show me all committed data as of the start of this query".
To reiterate I can now query 3 epochs. If you try this on your own you have to run your queres quickly...
...because if you wait about a minute you'll notice that your list of available epochs is automatically reduced. I show the current_epoch to show you that this is a quiet system and I have done no backups, delete statements, or anything else to advance the current_epoch...this is a quiet system. Epochs, be default, will be available for about a minute until the Tuple Mover process comes along and cleans up older epochs. This is totally configurable and we'll cover it more in a future post.

Epochs is the implementation that allows Vertica to have a READ COMMITTED SNAPSHOT implementation. You are always querying against the latest epoch if you don't specify an epoch. Any currently executing INSERTs, UPDATEs, or DELETEs will be in the current_epoch (and there may be multiple if you are running concurrent data modifications) and are not available for querying outside of the current session scope.

If you think about this, epochs are functionally equivalent to READ COMMITTED SNAPSHOT, it's simply a different implementation. This is one reason why I enjoy Vertica...it helps me understand other database managers and their implementations of things such as concurrency and isolation management.

You have just read "[[Vertica Concurrency and Isolation Semantics]]" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.

Dave Wentzel 2014-09-17 CONTENT
vertica