DaveWentzel.com            All Things Data

Vertica

Vertica Installation Prereqs

The best way to learn a new technology for an IT person is to install and play with it.  Today we'll get our OS prepared for an install of our first Vertica instance. In a later post we'll install a second and create a K-safe cluster (more on that in a future post). We'll use the free Community Edition.  

Vertica on Linux
This is what I'm going to use for the remainder of these tutorial blog posts on Vertica.  Specifically, Ubuntu.  

Prereqs

  • Ubuntu or another debian-based Linux. You can install on other platforms but my tutorial is geared toward Ubuntu.
  • If using Ubuntu, ensure it is 12.04 LTS. Don't load 14 (or whatever is the "current" LTS when you read this...or determine if Vertica supports newer Ubuntu releases).
  • SSH installed (Vertica will configure passwordless logins for you).  SSH is how cluster nodes communicate with each other
  • Ensure sudo is installed: which sudo

Storage Prereqs

  • Your data directory must be separate from your catalog directory
  • Both directories must have identical paths across ALL nodes.  Plan your directory structure.  
  • I think it is best to therefore use /home/catalog and /home/data. The installer will create these directories with the requisite permissions, but the parent folders (in this case /home) must be created first (which of course /home is).
  • As with any RDBMS, plan for data growth. In SQL Server there are algorithms on google that will tell you how much EXTRA disk space you'll need for things like tempdb and logs. Vertica is no different and HP recommends that disk utilization per node be no more than 60% for K-safety=1. This allows for background processes to have lots of needed space.  K-safety will be explained in a future blog post.  

 Disk Readahead

  • must be at least 2048
  • use --getra to view the current readahead
  • sudo /sbin/blockdev --setra 2048 /dev/sda1 (change the device accordingly)
  • sudo /sbin/blockdev --setra 2048 /dev/mapper/VERTICA1--vg-root
  • ensure that succeeds. Now
  • sudo nano /etc/rc.local and add the EXACT same lines above the exit0 line

Install NTP (network time protocol)

  • node clocks must be synchronized for conflict resolution.
  • sudo apt-get install ntp
  • sudo /etc/init.d/ntp reloadreboot
  • cd /usr/sbin
  • sudo ntpq -c rv | grep stratum
  • A stratum of 16 means that you have a problem. You might want to wait an hour or so before freaking out at a 16. Sometimes it takes awhile to sync. Not sure why.  

Swap File

  • change your paths accordingly
  • sudo dd if=/dev/zero of=/media/swap.img bs=1034 count=3M
  • sudo mkswap /media/swap.img
  • sudo swapon /media/swap.img
  • sudo nano /etc/fstab  add the following
  • /media/swap.img swap swap sw 0 0 

Other OS Tasks

  • sudo apt-get install pstack, mcelog, sysstat  various support and debugging tools

Reboot

At this point I would do a sudo shutdown -r now.  I know that technically this isn't required but it seems to be the only way I could make some of these settings "stick".  Probably just me doing something incorrectly.  

Other Installer Notes

  • the installer will create a dbadmin Linux user and verticadb group.  These names can be changed but then YOU must ensure you set things like the home directories correctly.  Therefore /home/dbadmin will be created with necessary chown/chmods for you.  
  • dbadmin will own the db catalog and data files on disk.  
  • dbadmin is configured for passwordless SSH communication between nodes.  
  • The binaries that Vertica installs will be located at /opt/vertica.  
  • The installer must be run with su or sudo.  

Summary

We'll cover the actual installation of your first Vertica node in the next blog post.  Installing to the first node requires some extra steps that are not needed for subsequent nodes.  However, the steps outlined in this blog post must be done by you PRIOR to installing Vertica on ANY node.  


You have just read "Vertica Installation Prereqs" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

HP Vertica

I've been using HP Vertica since before HP bought the company.  I first wrote about Vertica in 2010 when I wrote about column-oriented databases.  I'm again using Vertica in my daily responsibilities and I decided to try to get my Vertica certification.  I'm not a Vertica expert but I find taking certification exams helps you also understand your shortcomings of a product.  For instance, I passed my SNIA (SAN) certification on the first try with a perfect score.  However, this was just dumb luck.  Every question on NAS and tape storage I guessed at, and the right answers were fairly obvious.  Since I'm not a full-time SAN guy, I'm a data professional, I don't have much need for tape and NAS so I really didn't care to learn more about those topics.  But it was interesting learning what SNIA feels is important in the storage world.  

In the process of getting my Vertica certification I thought it would be wise to put together a blog series on Vertica for anyone else that wants to learn this technology rapidly in a hands-on fashion.  In these blog posts I'll cover what Vertica is, how to install it, we'll migrate AdventureWorks to it, and we'll do lots of fun labs along the way.  The posts will be geared toward those data folks who are familiar with MS SQL Server.  I specifically approach learning Vertica by calling out its differences with SQL Server. 

You'll find this is a lot of fun.  Vertica is pretty cool.  

Briefly, what is HP Vertica?

It is a column-oriented, compressed, relational DBMS.  Many people consider this a NoSQL solution, but it does use a dialect of SQL for its manipulations.  It is clustered across grids and nodes like many distributed NoSQL solutions, with builtin data redundancy (called k-safety), which means it has the typical marketing gimmick of "requiring no DBAs".  I can assure you that it performs QUITE NICELY for read-intensive workloads.  There is also an entire analytics platform that comes with Vertica...roughly equivalent to SSAS.  

What makes Vertica unique is that it persists the data in groupings based on the column of the tuple (table) instead of row-oriented, traditional, RDBMS offerings.  If you think of a table as a spreadsheet then retrieving a single row is an ISAM-type of operation.  This works very well for OLTP applications.  But in some reporting applications it is possible that you care more about an entire column than about the individual rows.  If your query is something like "give me the total sales for this year" then querying a columnstore will result in far less IO and will run radically faster.  

Even the amount of utilized disk to store the data will be less.  Columstores compress much better because like data types are grouped together.  Basically, there are more rows than columns and each row will need "offset" information to store the different datatypes together.  You'll need fewer offset markers if you organize your storage by column.  TANSTAAFL (there ain't no such thing as a free lunch), as economists say...the cost is that updates to existing rows require one write per column in the table.  So a columnstore is probably not best for your OLTP tables.  

That's a quick overview and is not all-inclusive.  I'll cover lots more in the next few posts.  

What's it cost?

The Community Edition is free for up to 1TB of data.  And if you consider that compressed columnstores generally take up a fraction of the space of a traditional row-oriented DBMS...that's actually a lot of data.  

HP Vertica Documentation

Vertica documentation in HTML and PDF format

Certification Process

Certification Exam Prep Guide

Getting Started

You need to register with HP to be able to download the Community Edition.  You can do this at my.vertica.com.  After you register you'll immediately be able to download the CE (or the drivers, Management Console, VMs, AWS documentation, etc).  

In the next post I'll cover how to install Vertica on Ubuntu.  You can of course download a VM instead but I believe the best way to learn a product is to start by actually installing the bits.  


You have just read HP Vertica on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

Subscribe to RSS - Vertica