Physical Data Modeling AntiPatterns - the misuse of BIT flags

This is part one of my series on [[Physical Data Modeling AntiPatterns Series|Physical Data Modeling AntiPatterns]]. 

Joe Celko refers to this as attribute splitting.  In this situation we are trying to model "states" of an entity/object in terms of purely binary relationships (on/off, yes/no, true/false).  In many cases the states of a given entity are not really binary at all and when you use BIT flags you are really short-changing your design.  The example most often cited for misuse of BIT flags is a column to denote SEX (ie, male or female).  If you've given blood recently at a Red Cross center you know that they no longer ask you if you are male or female, rather, what is your current sex and what sex were you born?  And yes, you can identify yourself as something other than male or female (sorry, I didn't ask what other choices there were).  

It is common to model a Person as a specific case in a generic Entity table, which means a corporation would have an entry as well.  Assuming we leave the SEX col in the Entity table we need some value other than male/female for the corporate entity, and NULL is frankly frowned upon.  ISO 5218 defines the valid sex codes as 0=unknown, 1=male, 2=female, 9=not applicable.  My gut tells me this list will be expanded in the next 20 years (see the Red Cross problem above).  So, if ISO compliance is a goal, a BIT wouldn't work in your model.  

What appears to be a binary relationship usually is not.  An invoice status may be Open/Closed upon first look, but later other statuses may be needed.  It's best not to use a BIT unless we are positive that there is a true binary relationship.  

A last problem with BIT flags is when modelers take an entire series of binary relationships and shoehorn them into a single bitmask column where a query would need to use bitwise ORs to decode values of interest.   My problem with this solution is that many people don't fully understand bitwise OR calculations.  Those people will write functions and views to decode the values more easily.  In this case, why not just create a simple series of columns at the start.  I have yet to see any proof where a bitmapped column was absolutely required for performance reasons.  

I'm not saying to totally avoid BIT cols, just always ask yourself, "is there any way there might be a third valid value in the future?"  If the answer could at all be YES, then don't use BIT.