DaveWentzel.com            All Things Data

Nested Views and Performance...or...LEFT vs RIGHT JOINs

I have nothing against database views.  They are a great way to display common result sets that may require JOINs and logic that we don’t wish to repeat everywhere.  When I see developers nesting views 4 and 5 deep I start to cringe for possible performance reasons.  Usually developers who do “over nest” tend also to create their views using Query Designer, which compounds the performance problem.  QD likes to take your graphical queries and turn LEFT JOINs into RIGHT JOINs and FULL JOINs if you aren’t careful.  I have no idea why, it may even be user error.  When JOINs change you risk suboptimal query plans and performance problems, not to mention invalid results.  SQL Server does not “materialize” each view and then send the results along the pipeline to the next step.  Instead it tries to reduce the nested views into one simple SELECT and often it gets confused when it comes across mixed LEFT and RIGHT JOINs and has to fall back to pipelining.  In fact, RIGHT JOINs are never needed and should never be used in views.  A RIGHT JOIN is just a LEFT JOIN in reverse.  So, all views (and queries in general) should have:

  1. the “driving” table as the “FROM” table
  2. followed by any INNER JOINs that are used as row restrictors (ie, I’m filtering output rows based on the INNER JOIN's ON clause)
  3. followed by any LEFT JOINs for columns not used as restrictors.

Always prefer a LEFT JOIN to an INNER JOIN unless the LEFT JOIN will return unwanted rows.   Why?  A LEFT JOIN is “optional”, an INNER JOIN is mandatory.  This means that if a column in the view is not referenced in the projection (the SELECT clause) or in the WHERE predicate, it is not considered *if* it is sourced from a LEFT JOINed table.  *But* if an INNER JOIN is present then at least the ON clause must always be evaluated.  If we INNER JOIN 10 tables together where a LEFT JOIN works, we end up with needlessly long runtimes.

If you follow this paradigm you should get the best possible performance, even if you nest your views excessively.  Lets look at a real world example of nesting gone wrong.  I haven't made this into a repro script for you to try, but follow along with the logic.  I have a stored procedure that takes 5 mins to execute, during that time my disk thrashes and I occasionally get out of memory messages.

The result set is a SELECT off of a view that nests other views and returns 196 rows.  However, sql server doesn’t actually *believe* it will return 196 rows, it thinks it should return 1.1 Million rows, which causes the high runtimes and memory issues.  Take a look at the execution plan for a basic SELECT * from that view.  Notice it uses 20GB of memory!

 

 

 

 

 

Yikes.  So what is “Compute Scalar” doing?  Generally it is just concatenating text or doing basic arithmetic.  These operations should *never* be a problem.  Why is it here?  MXW.Report_PR_Timecard_Check is actually calling MXW.Report_PR_Employee (nested view).  Let’s look at that graphical plan for that view.

There are 50 employees in my database.  The screenshot on the right shows the bar going into Compute Scalar with 50 rows, and coming out with 450 (left screenshot).  Something’s not right there.  Well, MXW.Report_PR_Employee calls another nested view…MXW.Report_Company.  Let’s look there next.

This is a small database and rest assured that I do *not* have 450 companies in my database.  I have1…note the second screenshot with the “actual” rowcount input of 1 row, and it can’t compute an actual output, so it estimates it at 450 rows.  Again, Compute Scalar is still confused.  If we look even further into MXW.Report_Company it calls MXW.AddressView which is where the issue lies…too many LEFT and RIGHT mixed JOINs.

The problem is the mixing of RIGHT/LEFT/INNER JOINs that the query designer creates in all of the various nested views.  The rule of thumb is the table in the FROM clause should be the “driving table” (the transactional table if we are querying transactional data or the largest table if we are just returning entity and supporting data).  Any INNER JOINs should come next.  INNER JOINs are needed when the ON condition will further restrict the rowcounts.  Lastly, the LEFT JOINs for supporting cols.  Never use RIGHT JOINs, they are never needed.

Assume you SELECT one column from a large view that INNER JOINs 10 tables together, but could be rewritten as LEFT JOINs without modifying the rowcount of the result set.  In this case all 10 tables will be JOINed to ensure the INNER JOINs won’t restrict the outputs, even though we know it won’t and even though we don’t require data from the other 9 tables.   Change those INNERs to LEFTs and only the single col required is considered.

Now imagine you have nested 6 views together, but only need 1 col in the output…you are traversing a lot of tables needlessly.  If we LEFT JOIN as much as possible we can nest views and reuse view logic without performance fears. 

So, why can’t we use RIGHT JOINs?  Simple, when nesting views the individual views are not “materialized” and pipelined to the next calling view.  Instead, SQL Server will optimize the views and reduce the query plan to the minimum amount of tables required to get at the data logically.  It takes all of the nested views and makes one big SELECT.  Again, this is why we prefer LEFT JOINs.  But, sql server can’t do this always and reliably when it encounters a LEFT and a RIGHT together.  Now it falls back to pipelining results.  If you’ve ever looked at the JOINs created by the Query Designer you know this is a huge problem, not to mention the code is terrible to read.  When you INNER JOIN, then LEFT JOIN your tables together the query tree is easily reduceable by SQL Server.

Are you skeptical?  Let’s look at a different example.  This query is SELECTing a single col from a view that is nesting many views together.  The query plan is horrendous.  Trust me that the extra JOINs should not be restrictors (INNER JOINs) so switching this to LEFT JOINs would help with performance. 

 

 

Here is an example of a view that I’ve already optimized…MXW.Report_SH_Logical.  The query is very simple but note that it is pulling only the minimum amount of data because the LEFTs and INNERs are structured correctly.

 

Summary

Be careful with view nesting.  Be especially careful if your views mix LEFT, RIGHT, and INNER JOINs together.  Consider specifying all INNER JOINs first, and only when the ON clause must be used as a restrictor.  Then specify only LEFT JOINs.  Following these rules will make your nested views must more performant and easier to read. 

Part 2

2 comments

Comment: 
Thank you, I have recently been searching for information about this topic for ages and yours is the best I have discovered so far.

Comment: 
Hey, that's a clever way of tinhnkig about it.

Add new comment