In [[creating a Vertica database]] I covered how to install the VMart sample database (similar to AdventureWorks in SQL Server) and populate it with sample data. In this post we'll take a deeper dive into VMart, how to perform basic queries using vsql (similar to sqlcmd), and examine how data is physically persistent to storage media . We haven't covered installing Management Console yet so we'll explore VMart using good 'ol SQL commands. We'll cover using GUI tools in a future post.
vsql -h <IP address> -d VMart -U dbadmin
We can run a series of commands to explore the VMart schema. In SQL Server we have sys catalog views (or INFORMATION_SCHEMA views in ANSI-compliant RDBMSs). The equivalent in Vertica is the V_CATALOG views.
|Vertica Command||Purpose/SQL Server equivalent||Output|
|select * from schema where is_system_schema = 'f'||sys.schemas. This will give you the non-system schemas. Note that booleans are f and t in Vertica.|
|select * from all_tables where table_type = 'SYSTEM TABLE' order by table_name LIMIT 10;||Shows you the first 10 system tables. Note that Vertica is case-sensitive. Using 'system table' will generate 0 rows.|
|select table_name from all_tables where schema_name = 'store'||Show me all tables in the store schema|
|select table_name from all_tables where schema_name = 'public'||Show me all tables in the public schema|
|select table_name from all_tables where schema_name = 'online_sales'||Show me all tables in the online_sales schema|
|select * from columns where table_name = 'online_sales_fact'||similar to sys.columns|
|select constraint_name from foreign_keys where referece_table_name = 'date_dimension'||sys.foreign_keys...list all foreign keys referencing the date_dimension table|
|select constraint_name, constraint_type from table_constraints||sys.check_constraints and sys.key_constraints. primary key, foreign key, and check constraints|
This should give you a feel for how to explore the catalog and your database.
Physical Data Persistence
Every DBMS persists its data to storage media in a different fashion. In SQL Server you specify the name and location of all data files. Same with Oracle. In MySQL you specify a directory and MySQL does whatever it wants with that folder. Vertica works similarly to MySQL. I noted in [[creating a Vertica database]] that you must specify a single data directory for a database's data. With VMart now loaded we can now examine the directory structure. What you'll notice is Vertica creates a lot of folders within each folder is a series of subfolders and files.
I couldn't find anything authoritative regarding the make-up of these structures. But I do know 2 things for sure:
- Once a data file is created it is never updated or modified. The files, and the underlying data, are encoded and compressed (depending on your design settings) with no "air" in the files for new data to be inserted between existing data (ie, there is no concept of FILLFACTOR). When rows are DELETEd or UPDATEd (which is a DELETE followed by an INSERT) a "delete vector" file is created that marks which rows should be removed from any result sets. After a period of time the delete vectors are removed and the original files are written to new files without the deleted data.
- Since files are never modified backups can be performed using any standard file backup utility including
rcp(remote copy). There is no special buffer pool in Vertica...data files are buffered using standard unix mechanisms.