DaveWentzel.com            All Things Data

Data Architecture

How Big is a BIGINT?

8 bytes.  Who cares?

Like many shops, we use a BIGINT datatype as a sequence for Id's in our tables.  We generate lots and lots of these every second.  There's now a requirement where we may have to generate hundreds of thousands every few seconds.  People started sweating.  "Are we sure that we won't overflow a BIGINT after a few years if we are creating this many Identifiers so quickly?"  

Let's do the math.  

How many Identifiers can I create EVERY SECOND, before I will run out of Identifiers in 100 years?  

2.9 million identifiers can be created per second for the next 100 years before we overflow a BIGINT.  

And that's assuming we ONLY use the positive side of ZERO.  We didn't even calculate the negative Identifiers we could create.  Effectively this doubles the available Identifiers we have.  

In my mind, BIGINT really is a BIG datatype.  

Object Relational Mappers Series - So You Want to Use an ORM Anyway

This is my last post on my series on ORMs.  Even after all of the arguments against an ORM, everyone wants to use them.  It seems like there are always two reasons for this:  1)the lure of keeping developers from learning SQL is just too great 2)the belief that the issues with ORM tools won't happen to me.  In every company I've been involved with, when an ORM was proposed, it was eventually implemented regardless of my protests.

Object Relational Mappers Series - The SELECT * Problem

DBAs have been bitching for years that we should never use SELECT * in our queries (or INSERT INTO statements without a column list).  ORMs, usually by default, violate this principle.  

More on the Halloween Problem

I wrote a few weeks back about Eager Spools in Query Plans and the Halloween Problem.  Three days after I wrote that Paul White put up a blog post about an ENTIRE SERIES he did on sqlperformance.com.  He is much more thorough and authoritative than I could ever hope to be.  In a nutshell, the Halloween Problem occurs when we read and write keys of a common index.  There are other causes as well.  

I just wanted to pull out a few gems that I either learned or found very important from Paul's work.  I encourage anyone who works on tweaking performance of TSQL to spend some time reading Paul's findings on the Halloween problem.  

  • The Eager Spool essentially reads every row from the child operator and stores that information in a temporary structure before it proceeds to the next step.  Under memory pressure this means tempdb spills to disk.  
  • One alternative is an index hint so we avoid using the index that requires halloween protection, ie the index keys are unstable.  
  • A query plan requiring halloween protection will require more locks, and longer locks, then an alternative plan.  
  • If an UPDATE statement modifies one of the keys of a composite index you may consider making that column an INCLUDED column instead.  
  • Halloween Protection is required for INSERT statements where the target table is also referenced in the SELECT statement.  
  • HP is required for DELETE statements with self-join relationships.  
  • If your performance issue is an Eager Spool on an INSERT...WHERE NOT EXISTS (<in table already>) query (Paul calls this a "hole-filling query") you can try a MERGE statement instead.  Here are the rules:
    • WHEN NOT MATCHED BY TARGET clause must EXACTLY match the ON clause in the USING clause.  
    • the target must have a unique key
  • You may see an Eager Spool in a query that uses a scalar function, even if that scalar function does not access any tables.  This happens because SQL Server must assume that any scalar function not declare with SCHEMABINDING may re-read from one of your tables, so SQL Server must protect from the HP.  The solution is to add SCHEMABINDING

CREATE or REPLACE pattern for Transact SQL

I just saw a blog post by someone discussing the pros and cons of DROP/CREATE vs ALTER for changing a stored procedure.  I was not able to comment on the blog post (seems like an error with their CMS) but I wanted to mention some issues with it and some better solutions.  

I generally use DROP/CREATE most often in my work, but I'm trying to change that.  

  • If you have Zero Downtime requirements you risk spurious errors using DROP/CREATE.  In the time it takes to drop the proc, recreate it, then issue the GRANTs your online users could see errors
  • The author mentioned that using ALTER requires dynamic sql.  That's not true, below I show you the pattern I use.  Dynamic SQL is difficult to use and read for many developers.  Doubling up quotes tends to be buggy.  
  • Oracle has the "CREATE or REPLACE PROCEDURE" syntax to avoid all of this.  I wish MS would give us this.  We would not need to do existence checking in metadata tables before creating or altering a routine.  

In any case, this is the pattern I use that is very close to Oracle's CREATE or REPLACE PROCEDURE syntax.  

Object Relational Mappers Series - The N+1 SELECTs Problem

This is the next post on ORMs.  Suppose you have a collection of Category objects, which are really just rows in a Category table.  Each Category has a collection of Products.  It's not uncommon to want a listing of Categories with their Products.  In SQL we would have code that looks something like this:  

FROM Category c JOIN Product p on C.Id = P.CategoryId

This will run fairly fast.  An ORM, however, will likely do something like this:  


Subscribe to RSS - Data Architecture