DaveWentzel.com            All Things Data

December 2010

SQL (Anti-)Standards

Lots of data professionals like to write SQL development standards.  That's fine if the point is to note some best practices, but strict adherence to these supposed "standards" within an organization often doesn't make any sense.  Do a Google search for "SQL Development Standards" and you'll find a myriad of blog posts and sample documents that have been instituted at many organizations.  Here are some "standards" that are commonly seen that I think are just plain ridiculous:


Never use SELECT * 


Anytime I see "never" in a standards document I cringe.  "Never" is a strong word.  If SELECT * was so bad why did the ANSI SQL guys give it to us?  When people say not to use SELECT * it is because they saw specific cases where it backfired, namely when the client was looking at the column list ordinally (instead of by name) and a new column was added between two columns (instead of at the end of the table), meaning the ordinal column list changed.  Really then, the standard should be all new columns go at the end of the table so the ordinal positions never change.  Also, clients should really be referencing cols in the result set by name and not ordinal.  


There are other reasons for not using SELECT * that may be legitimate.  But there are even more reasons for utilizing SELECT *.  I've worked on applications where we allow the customer to add "user-defined fields" to screens.  On the fly we add a column to an "outrigger" table that has a 1:0 or 1 relationship to the parent.  In the outrigger we dynamically add the cols the customer desires.  Some metadata helps the application handle data type and custom constraint requirements, as well as how to handle CRUD.  Obviously if the customer wanted to track the data he probably wants to add it to reports as well.  Our reporting views/procedures always LEFT JOIN'd to the outrigger tables and added "outrigger.*" to every SELECT list after all of our "system-defined" reporting columns.  This was very handy and allowed totally flexibility for our customers.  As long as our report developers never referenced columns ordinally it worked perfectly.  


Table names should indicate or not indicate plurality


Some will advocate an Employees table, some would call it Employee.  Advocates of plurality insist a table is a collection of rows so it should be plural, opponents point to logical reasons why tables should be singular.  This becomes a religious debate.  Who cares?  Pick a method and go with it.  


Object names should/should not have underscores


FirstName or first_name.  Another religious war.  Proponents of the underscore point to ISO 11179 (an established metadata standard) as a naming convention.  Following the standard aids in EAI activities (or that is the claim, in my experience just following the standard is not a panacea).  Opponents say the underscore is difficult to read depending on font and how your IDE highlights text.  Again, who cares?  Pick a standard and go with it.  If you need ISO certification then of course follow the standard.  But following ISO standards for the sake of following a standard seems to me to be a waste of time, especially in the case where your application likely will never participate in EAI activities.  My main complaint is the shear amount of time we waste with the slavish devotion to these standards.  While the rest of us are busily working to meet deadlines some pencil-pusher is coming up with unworkable standards that will not help us generate revenue.  It just isn't logical to me.  


TSQL code should use fixed-width fonts


.sql files, regardless of whether they are SQL Server or Oracle .sql files, are still text files.  The font is not specified in the ANSI/Unicode/UTF text file standards, that's your choice.  Who cares?  


Align SQL keywords, indent with 5 spaces, capitalize SQL keywords, etc etc


Again, who cares?  In smaller development shops it's actually easier to determine quickly who authored code based on individual coding styles.  Why try to change people who are comfortable with a style they have been using for years?  Some people think block indenting is easier to read, others don't.  I would rather have my people be productive writing working code than worrying about polishing the apple.  


I'll admit that I often format other people's code to make it more readable for me if I need to alter it, but I've seen other people do the same thing to my code.  Some people also use code formatters to do this.  My one complaint is a situation where a developer checks out a lengthy stored procedure, reformats it, alters maybe a single line, then checks it back in.  Later, when we are debugging issues with the procedure, it becomes *impossible* to determine what that developer actually altered since potentially every line of code has changed when versions are diff'd.  I always tell developers who love to reformat code to simply check out the originally routine first, then format it and check it back in with a comment indicating it was a formatting change only, then check it out again to perform the actual required change.  


Never Use Cursors, Temp Tables, etc


People swear they have seen cases where cursors are faster than set based approaches.  I've never seen this with SQL Server (but have with Oracle) but I can imagine it can happen.  When the standard says "never use cursors" people come up with pseudo-cursor constructs anyway (WHILE loops being the biggest culprit) because they don't know any better.  These constructs usually perform far worse than a cursor anyway.  There are many times where I've used a cursor to quickly get code out the door, knowing refactoring would need to occur later.  Instead of a "never use cursors" policy, maybe soften it to "Consider seeking help before writing a cursor".  Encourage learning instead of ruling with an iron fist.  


Never use SELECT...INTO...FROM syntax


Really?  People think this is a good rule because of the old 'select into/bulk copy' option that invalidated transaction log backups in SQL 7 and earlier.  The fact is, this is not applicable to temp tables.  Why declare a temp table and INSERT INTO it in 2 steps?  In SQL 2000 and earlier doing this actually caused your procedure to recompile causing performance degradation under concurrency load.  SELECT INTO doesn't do that.  Also, if your underlying data types change you need to find all of your CREATE #whatever statements and fix them.  SQL Server doesn't have the concept of Oracle's %ROWTYPE and %TYPE (which copies the underlying object's data type to the new object).  This is often known as a "STRUCT" data type in other programming languages.  You don't need to worry about data type changes with SELECT INTO syntax since the new table inherits the data type of the underlying.  


Always code 2 part/3 part object names


Using 3 part names is just stupid, it's likely your db name will change at some point.  Dynamic linked servers would be the only case where a 3 part name might make sense, but I've seen SQL Standards that advocate 3 part names.  


Using 2 part names might be OK given old, obsolete SQL Server development habits that have carried over since the "Pre-Schema" days of SQL Server.  In the Oracle world it's not uncommon to have 2 schemas with the same database objects, one called Prod and one called Test.  If I use 2 part naming I can't simply revert to test by changing my default schema or logging in with a different user.  In SQL Server we usually use separate dbs to do this, and the case can be made that performance is a *teensy* bit faster if you schema qualify your objects.  Still, if you only use the dbo schema I'm not sure I would worry about enforcing this standard solely for performance reasons.  It would seem to me your time would be better spent looking at *real* performance problems instead of enforcing this policy.  


All code must be commented in the form...


Some people prefer flower boxes, some like double dashes.  Who cares?  Some standards say that every variable's purpose must be documented, every RETURN code, etc.  Sometimes things are obvious, do they need to be documented?  Some standards insist on certain formats to get documentation-generation plugins to work properly.  In those cases a standard is important.  


And on and on and on


There are many more examples of documented SQL Standards that are pointless, but I think you get my point.  In my opinion the best standards are the briefest that are easiest for everyone to actually want to follow.  Standards that use "NEVER" are probably not good standards.  Allowing developers to have latitude and creativity probably leads to more productivity than strict adherence to "standards" ever could.  


Thanksgiving and the Tragedy of the Commons

I just realized this was sitting in my unpublished bucket...better late than never...  

As we celebrate Thanksgiving here in the US I thought I'd debunk the myth that the shared, communal aspect of Plymouth Colony, as well as the help of the natives, is what pulled the colonists through that first difficult year in the New World.  We all seem to learn this in grade school history.  It's wrong.  

The full text of the Mayflower Compact is long gone, but the journal of William Bradford (the leader of Plymouth Colony) does survive, called Of Plymouth Plantation.  I'll be honest, I've never read the whole thing, the Olde English prose is difficult for my feeble brain to comprehend.  However, this quote is telling:

3. The persons transported and the adventurers shall continue their joynt stock and partnership togeather, the space of 7. years, (excepte some unexpected impedimente doe cause the whole company to agree otherwise,) during which time, all profits and benifits that are gott by trade, traffick, trucking, working, fishing, or any other means of any person or persons, remaine still in the commone stock untill the division.

and

5. That at the end of the 7. years, the capitall and profits, viz. the houses, lands, goods and chatles, be equally devided betwixte the adventurers, and planters; which done, every man shall be free from other of them of any debt or detrimente concerning this adventure.

This is spelled out further, but the gist is the settlers had no private property and no free trade.  Labor was to be organized by the different capacities of the individuals.  If that doesn't sound like Karl Marx in 1875's Critique of the Gotha Program I don't know what does ("From each according to his ability, to each according to his need.").  The Plymouth Colony is what many interventionists and liberals point to in an effort to show that "fair" distribution of property works.  

It doesn't.  Here's the real history...

So that first year the settlers shared their labor and shared their harvest.  And they almost starved.  That's why the natives helped them out.  The natives showed them their native crops and farming techniques.  The settlers shared nothing (maybe disease) with the natives because they had nothing.  This is the Tragedy of the Commons, so named because when property goes unowned or private property rights are ignored there is no incentive for hard work, thrift, and asset creation.  Value and price discovery cannot occur.  Instead, the goal is to take what you can, when you can, from the communal pool of assets, before someone else beats you to it.  So when the crops are ready for harvest why bother saving some of it for next year's seed stock?  That's someone else's problem.  Let's eat everything instead (this is what happens with our oceans, BTW).  Why even work the harvest at all?  Just fake an illness.  You'll still get fed according to the Mayflower Compact.  

For two years the settlers were near starvation.  Bradford and the elder settlers came up with a better solution:

On the other hand the old planters were affraid that their corne, when it was ripe, should be imparted to the newcommers, whose provissions which they brought with them they feared would fall short before the year wente aboute (as indeed it did). They came to the Govr and besought him that as it was before agreed that they should set corne for their perticuler, and accordingly they had taken extraordinary pains ther aboute, that they might freely injoye the same, and they would not have a bitte of the victails now come, but watee till harvest for their owne, and let the new-commers injoye what they had brought; they would have none of it, excepte they could purchase any of it of them by bargaine or exchainge. Their requeste was granted them, for it gave both sides good contente; for the new-commers were as much afraid that the hungrie planters would have eat up the provissions brought, and they should have fallen into the like condition.

Bingo.  The settlers abandoned communal, interventionist, redistributionist ideas and went back to good 'ol private property.  Everyone, the original settlers and the new arrivals too, agreed that this was the proper way to ensure long term success.  

[...] By this time harvest was come, and in stead of famine, now God gave them plentie, and the face of things was changed, to the rejoysing of the harts of many, for which they blessed God. And the effect of their particuler planting was well scene, for all had, one way and other, pretty well to bring the year aboute, and some of the abler sorte and more industrious had to spare, and sell to others, so as any generall wante or famine hath not been amongst them since to this day.

 

Wow, that's the perfect testimonial to capitalism, private property, and libertarianism.  BTW, why did Bradford and the senior settlers have this change of heart?  They watched the Wampanoag who, contrary to what you may have read about Native Americans in general, had very definite ideas of private property rights (inherited property was passed matrilineally, regardless of marriage status) and limited government (they were a confederation).  They were *not* communal farmers.  So, when we celebrate Thanksgiving, what are we really celebrating?  It's different for every person, but for me I am thankful for free markets, capitalism, and private property rights.  That would seem to be the real lesson.  

GodMode in Windows 7

Create a folder anywhere on your filesystem called

GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}

Note that the icon changes for the folder (by default).

Double click this new folder and this opens an MMC window that allows you to change *a lot* of Windows 7 features from one convenient location:

What can you do here?  These are my favorite features of GodMode, but by no means everything:

  1. Change UAC settings
  2. Use System Restore to restore to an earlier point-in-time.  
  3. Create a Restore Point...this has saved my butt many times.  
  4. View problem reports (the things sent to Microsoft if you enable that feature).
  5. Windows Search Service settings...I swear changing this has dramatically increased my system's performance.  
  6. Disable offline files...again, seems to increase performance for me.  
  7.  

All of these settings can be changed elsewhere, GodMode is just "one stop shopping" for me.  

Tags: 

SQL Azure

I guess the next big thing I need to learn is SQL Azure.  You can even get a one-month free pass to try it out.  

I usually try to learn a new technology by developing a solution against it.  Usually the solution is something personal since I would never architect a solution against a technology that is new and that I don't understand.  This is how I initially learned SharePoint in 2000 when I wanted a quick way to post SQL Server scripts on the web so I could get to them when travelling, without knowing HTML.  I really learned the intricacies and performance bottlenecks of previous SharePoint versions this way.  This is also how I eventually learned MySQL and Drupal...migrating SharePoint content to (in my opinion) a better CMS.  I don't have any time right now to deploy something new to learn SQL Azure so I thought I'd migrate something existing.  It turns out this is pretty easy using SQL Server Migration Assistant.  SSMA can migrate quite a few different sources to SQL Server (Oracle, MySQL,Sybase), but it also migrates to SQL Azure from SQL Server, Access, and MySQL.  So, using a two hop approach you can migrate almost any DBMS backend to SQL Azure...albeit the migrations are never perfect.  

What is SQL Azure?

It's not just hosted SQL Server.  You can still manage your Azure instance/db using the familiar SQL Server tools such as ssms.  The latest information on Azure can be found on the SQL Azure Team Blog.  I would suggest suscribing to it.  

Getting Started With SQL Azure

SQL Azure Feature Limitations

SQL Azure has 99.9% availability guarantees but no performance guarantees.  In fact, you will be throttled and read/write requests denied if you consume too many resources (as yet I can't find a definition for "too many").  

Getting Started

MSDN subscribers can get started here.  If not, the signup for a standard account is here.  In either case you get started with the web portal here.  Pricing info can be found here.  Training Kits can be found here

Differences with SQL Server

SQL Server vs SQL Azure Whitepaper

  1. No heaps except temp tables
  2. uses READ COMMITTED with optimistic concurrency while SQL Server using READ COMMITTED with pessimistic concurrency.
  3. READ COMMITTED SNAPSHOT is the default isolation level, but can be changed on the connection every time if needed.  
  4. No SQL Agent (granted, you won't need to do typical DBA tasks with Azure, but then you also can't schedule off-hours batch processing either).
  5. Only SQL Authentication, no Windows authentication (aside...I find this fascinating.  For years we've been told not to use SQL Authentication in favor of NT auth.  And for years we've all struggled with double hop authentication issues.  After struggling long enough almost every web app I've ever seen using SQL Server has given up in favor of SQL authentication.  With SQL Azure we can now see that MS has given up as well.)
  6. You must set a MAXSIZE when you create your db.  You can ALTER it later.  I'm guessing this is for planning and billing purposes.  
  7. TSQL statements that are (partially) supported.  
  8. No cross-database support (queries, transactions, whatever) or linked servers.  
  9. No CLR types
  10. No: full-text indexing, XML indexes, filestream, sparse cols, SQL Profiler.  
  11. Collation changes are at the col level, not the db level.  

Changes to Accepted "Best Practices"

  1. Close connections manually and don't cache connections like you would with normal web development.  This is cloud development.  
  2. Connection Strings must specify a database, you can't issue a USE statement (see cross-database support above).  Your User ID must be username@servername format.  You should encrypt your connections and your connection string should always be encrypted.  

Other Points

  1. You don't need to use Windows Azure as your front-end.  Any front-end that can use ODBC can use SQL Azure.  
  2. SQL Azure Data Migration Wizard

Can you do local or offline SQL Azure development?  Not yet, but soon with Codename Juneau.  This will also have a rich development environment much like VS.  It's also version-aware so you can target development to a given feature set of SQL Azure.  

How to Handle Interviews - Turnoffs

I thought I'd finish my series on how I like to conduct interviews but covering some items that I feel are turnoffs.  I wouldn't instantly disqualify you as a candidate, but I really don't like to hear or see these things.  I'm not going to cover obviously things like take a shower, look groomed, don't pick your nose, etc.  These are obvious and don't need rehashing:

  • Focusing on "duties" vs "accomplishments"?  I want to hear about times you stepped outside of your area of expertise.  If you were the DBA then I just assume you handled backup and reindexing duties, you don't need to dwell on it.  But when did you go over and above the call of duty?  What was it?  Why?  
  • Mentioning how you were the "Senior" member on your team or you have "Senior level Oracle skills."  "Senior" is a meaningless term.  Were you the senior man because you had seniority, or because you were the "go to guy"?  A "Sr Engineer" at my company wouldn't even be a "Jr Intern Programmer" at Google, unfortunately (nor would I).  
  • Mentioning how you have 10 years of experience with blah blah blah.  Who cares, "years experience" is also meaningless.  Do you have 10 years of experience with Windows just resetting passwords, or doing progressively more difficult tasks like cluster management?  One of my favorite sayings is, "Do you have 10 years of experience or one year of experience repeated 10 times?"  I know people with one month of SQL Server experience that can code better TSQL than folks with 20 years of experience, simply because they are working with much more complex code than the 20 year guy has ever seen.  
  • Telling me how you know 43 different programming languages.  If you are going to do that, make sure you can tell me a few reasons why I might consider one of these over another for a given problem.  Or tell me why you want to know this many languages.  If you want to know all of these languages so you can have a deep and broad understanding of different programming paradigms then that shows amazing maturity.  
  • Focusing on the quantity/quality of your education/training/certifications.  Just because you have dual masters doesn't make you smarter than the rest of us with only a bachelor's degree.  Just because you have 12 certifications from Microsoft doesn't mean you know how to implement their technology in my environment, it really just means you are a good test taker.  Feel free to mention these things, they are important accomplishments, but don't assume the interviewer will immediately fawn over you.  Instead, tell me why you have this much education and why you think it is a benefit.   In my opinion, the sole purpose of an education is to help you to spot people's bullshit easier later in life.  See this video for an example:

You get extra points on an interview if you can mention these things:

  • Understanding the full software lifecycle.  If you can mention how a solution reduced support costs I'll love you for ever.  So many architects and developers think their job is done when the software is released.  
  • Mention "-ilities" and how one of your proposed solutions took -ilities into consideration.  
  • Somehow work in that you understand what requirements are (they are NOT something that you CREATE, rather they are something that you DISCOVER through analysis and then DOCUMENT.  
  • I like to see a wide range of development and business environments.  Some people think you are worth less as a developer if you never worked on a 100 person development team.  

Hiring Process Items I Disagree With

  • I don't like giving proficiency tests to candidates.  I think they tend to focus on minutiae ("if you compile this code what error will you see?" ... who cares?  It's a stupid block of code I would never write that way!) that isn't important.  I would rather have a conversation with a candidate and understand how he thinks, projects he has worked on, teams he has interacted with, etc.  The mechanics of programming can be learned easily on-the-job, if necessary.  
  • Hiring managers who focus on tool-based knowledge when advertising an opening.  They want to list every software package that the candidate may have to use.  If my company uses ClearCase for version control I really don't need to list that on the ad.  I would *never* choose one candidate over another based solely on whether she has experience with my version control product.  The "concept" of version control is what is important.  The tool-based knowledge is not.  Similarly, a lot of ads will mention required experience with SSIS, yet I wouldn't want to disqualify any candidates who have broad ETL experience, but with DataStage or Warehouse Builder and not SSIS.  Granted, if the primary skill is Oracle administration, a SQL Server DBA wouldn't be a good choice.  I'm merely referring to the software used on the periphery.  
  • Ads that mention "Sr Level Experience in Java.  Sr Level Experience in SQL Server, Sr Level Experience in C++" etc.  Again, this is tool-based knowledge and as mentioned above, the "Senior Level" qualifier is meaningless.  If I see this ad I know the company uses C++, Java, and SQL Server, and they think they need a guy who has been this for years.  I would rather see something like "Knowledge of C++, Java, and SQL in a high transaction online environment using SOA".  Less words, far more meaning.  

Tags: 

Denali New Features - columnstore indexes

Column stores and NoSQL are all the rage.  I'm glad to see Microsoft is getting in on the party with SQL 11/Denali.  As of CTP 1 documentation there will be a new index type called "columnstore".  MS is stating DW performance could improve by "hundreds to thousands of times in some cases."  That's a bold statement!  I've written about column stores previously (here and here), but as a quick refresher the basic jist is that the data will be stored physically on disk "by column" than by the traditional "by row" method of all RDBMSs.  So if I tend to do a lot of aggregations against a given ColumnN then with a columnstore index I keep every row's ColumnN values in a nice, neat area on disk.  Pre-built aggregates, summary tables, and materialized views may not be needed now.    In fact, ROLAP in general becomes much more attractive.  Of course, there is no free lunch so I would assume that the build time for a columnstore would be much longer than a simple B-tree index.  Also, it's likely a table with a columnstore will not allow CRUD operations without some kind of "special process"...if you think about it the traversal of this storage type would require too many changes to the lock manager to make direct CRUD feasible, or even performant.  

This is much more attractive than other column store technologies I've used for many reasons:

  1. One vendor...no new software, language to learn, reporting tools, etc.  This is really just a new type of index using the CREATE INDEX syntax.  
  2. No new infrastructure

Apparently MS created columnstore indexes using the Vertipaq technology (basically an in-memory, compact, column-oriented database) that PowerPivot uses.  That's an ingenious way of leveraging existing technology!  I really think this raises the bar...I wonder how long it will be until Oracle has a lightweight column store offering in their main DBMS offering?

With CTP 1 I couldn't get this to execute (nor any variant), but this is supposedly the syntax of this new feature:

CREATE COLUMNSTORE INDEX MyIndex ON schema.table (colA,colB,colC);  

Hopefully this feature makes RTM!