DaveWentzel.com            All Things Data

Data Virtualization

For some reason "data virtualization" is one of the new hot buzzwords in my industry.  Data virtualization is really just combining a bunch of data sources (databases, unstructured data, DWs, weblogs, whatever) into a single virtual data tier layer that provides access to applications.  Haven't we been doing this for years.  I remember when linked servers came out for SQL Server.  I thought they were the coolest thing at the time.  I could now connect to Oracle from my SQL Server and manipulate the data in the more familiar TSQL, realtime, without the need of an ETL process.  Then my applications could connect to the Oracle data using the same connection as my SQL Server.  It didn't take me long to create linked servers to text files and even Exchange.  That was, like, 12 years ago.  

What is Old is New Again

Why is data virtualization a big deal now?  Simple, we are storing so much useful information that someone, somewhere, wants to report off of, let we can't constantly integrate that data into our data marts/DWs.  And the report people want the data realtime.  I see most of the data virtualization (DV) efforts centered around data generated from web-based apps.  Again, I don't understand why this is new, when the dot com era was in full swing we all heard about analyzing "click through data".  It's really not much different.  

It's also a big deal because all of this data, naturally wanting to be replicated everywhere, costs $$$ in storage costs.  This is also why "data deduplication" is a hot buzzword in SAN storage.  Data virtualization eliminates these costs.  

What's It Good For

If you have a project that has dynamic requirements, needs to access lots of different data sources, needs realtime reporting, needs "agile" reporting, needs a dashboard, or you hear the terms "Operational BI" or "Data Integration", you should think about data virtualization.  Do you need a special appliance or application for DV?  Maybe, maybe not.  It's really just a mindset.  Once you get into the DV mindset, if you still determine you need better performance, or the ETL processing is a bottleneck to realtime reporting, then a tool is worth researching.  But they aren't cheap.  

DeveloperArchitect Topics