To us SQL Server people this code makes perfect sense...update some values in the first table with rows from a second table. It surprises many SQL Server people when exposed to Oracle for the first time that this is not valid ANSI SQL syntax. I don't want to make this an Oracle post, but there are solutions to the "update one table with values from a second" pattern that work in Oracle. They will usually perform just fine, my only complaint would be that the Oracle syntax is not nearly as easy to convert from a SELECT to an UPDATE so I can "test before executing". What I mean is...in my UPDATE statement above I could change the first line to "SELECT table2.SomeOtherCol, tab.ID" and execute it to see exactly what rows I will be changing.
That is not a jab against Oracle. An Oracle purist will counter that the UPDATE...FROM TSQL syntax is frought with danger. I agree, and that is what this post is about. Let's look at some code:
Requirement: Set #Foo.colA equal to *any value* from #Bar.colB, if there is a value available for the given key. A contrived example to be sure.
It should be obvious that #Foo.ID = 1 will be set to 'BarRow1' since there is only one lookup row. It should also be obvious that #Foo.ID rows 3 and 4 will be set to NULL since there are no corresponding keys in #Bar for those FooIDs.
What is less clear is what will happen to #Foo.ID = 2 ... will it be set to 'BarRow2', 'BarRow3' or 'BarRow4'? There is ambiguity here. In Oracle-land, the equivalent query (written of course without a LEFT JOIN) would throw "ORA-01427: single row subquery returns more than one row." Exactly. And really, isn't that what SQL Server should be throwing? Many people have petitioned Microsoft to deprecate UPDATE...FROM (and DELETE...FROM, which has the same issues) syntax for this reason (as well as some other reasons). It's become something akin to a religious war. With people on both sides of the debate.
I personally like the UPDATE...FROM syntax, mostly because it's simple to convert to SELECT and test. It's the UPDATE...FROM...LEFT JOIN syntax that gets tricky. There are two reasons we might use LEFT JOIN in an UPDATE statement:
In that case, which is my example above, what will be the final value for colA for ID 2? Here you go:
Before you jump to the conclusion that "I'll always get the first matching row back", remember that there is no concept of row ordering in an RDBMS unless there is a specific ORDER BY clause applied. Never assume order. Don't believe me? Let's modify our code to add a CLUSTERED INDEX strategically:
Oops. Note that colA's value has changed due to the new ordering inferred by the clustered index!
So far, so good. I still don't see the big deal with UPDATE...FROM syntax *if* you understand the issues above. Ah, but that's the rub. Too often I see code like this:
What is this code trying to do? Very simply, it is bringing back the FIRST row from Bar that has the given FooID, using a correlated subquery. I'm not a mind reader, but the developer was probably thinking, "I have a situation where I am bringing back possibly more than one row for the given key, I better restrict it since I don't really understand what SQL Server will do. I can accomplish that by using a TOP clause."
I have problems with this approach:
If the requirement were changed to return "the most recent entry from #Bar" then the TOP solution might be acceptable over smaller result sets, assuming of course the addition of an ORDER BY clause.
Dave Wentzel CONTENT
sql server data architecture oracle