DaveWentzel.com            All Things Data

August 2012

Java JOINs, or, How Many Lines of Code Are Too Many For a Stored Procedure

Did you ever work at a place that had a policy that if a given Java/C#/SQL procedure or function was over X lines of code then it needed to be separated into multiple, smaller routines?  If not, then congratulations.  In general, if a routine can be refactored into smaller routines with more finely-grained logic and purpose, then that's a good thing.  But what developers fail to remember is that this needs to be tempered with any performance concerns.  And, it's really easy to oversimplify.  

Make things as simple as possible, but not simpler.

--Albert Einstein 

Let's say you have a bunch of tables with the standard "auditing" column pattern (CreateUserId, CreateDate, LastUpdateUserId, LastUpdateDate) that reference a dbo.User table's UserId column.  In most of your application's screens you need to show the name of the User who created and last updated the row.  Let's say you have hundreds of screens with hundreds of underlying queries that need these names.  

Most developers would simply write their hundreds of stored procs using a pattern similar to this:

SELECT tbl.ColA, tbl.ColB,...,CreateUser.Name, UpdateUser.Name
FROM tbl
JOIN dbo.tblUser As
CreateUserON tbl.CreateUserId = CreateUser.UserId
JOIN dbo.tblUser As
UpdateUserON tbl.LastUpdateUserId = UpdateUser.UserId

This is the correct pattern to use.  

However, many developers begin to over-think this and create unnecessary issues for themselves.  For instance, many developers quickly realize the double JOINs to User are copy/pasted to hundreds of procedures.  Wouldn't it be better to NOT have those JOINs copy/pasted everywhere? Wouldn't it be better to have one function/stored proc that can get this data for us?  The solution they come up with is to have a standard GetUserInfo procedure that looks something like this:

      SELECT UserId, Name
      FROM dbo.tblUser
      WHERE UserId = @UserId

The new pattern is that only the CreateUserId and LastUpdateUserId from tbl are returned in every procedure, not the actual Name column.  Then a new Java function (or whatever language is used, I'll pick on Java) loops through the list of returned UserIds and calls GetUserInfo for each distinct UserId found.  Java guys love the elegance and reduced code footprint of this solution.   The problem of course is performance.  If 100 distinct UserIds are returned for a given function, then 100 distinct calls are made to GetUserInfo.  

Therein lies the rub.  Elegance has cost you performance.  

Java programmers always think they are one better than everyone else.


I call this anti-pattern "Java Joins", but it's equally prevalent in .NET code, C code, etc, that I've seen.  The pattern is to remove redundant code from your application.  The problem is that the JOINs are simply moved to the application tier and away from the most appropriate place to put this logic, which is in the data tier.  

More of my "fun quotes" on this topic...

Business logic in SQL is bad, but data logic in Java is good?  


Java works with everything...as long as it's Java 


Less code is better, no matter how hard it is to support!!  


AMW Donuts

About 5 months ago my wife and I started a business in Phoenixville making fried cake donuts.  It's called AMW Donuts.  AMW officially stands for "America's Most Wanted"...unofficially it is my wife's initials.  So far it the business is going well.  We had a favorable writeup in the local newspaper a few weeks ago that I wanted to share.  


SQL Server: Get the Actual SQL from a Prepared Execution Call

JDBC applications (and some ORMs) are notorious for sending sp_prepexec, followed by sp_execute, calls to SQL Server, instead of sending the actual SQL command.  Generally this is fine until you need to see the actual command with parameters that is being sent from your client to SQL Server.  This is common when troubleshooting.  I have yet to find any DMVs or logging mechanisms on the SQL Server side that can do this. 


The first call to your SQL Server may look like this in Profiler: