DaveWentzel.com            All Things Data

Physical Data Modeling AntiPatterns - MUCK table designs

This is part three of my series on Physical Data Modeling AntiPatterns

 

MUCK stands for Massively Unified Code Key table.

There are many times when you have various similarly structured tables.  Assume tables like CustomerStatus, EmployeeStatus, VendorStatus, etc.  Sometimes a modeler looks at the table design and says, "this is redundant data and rules of normalization say there should be no redundant data, so I'm going to create a universal Status table to hold all of things."

 

 
So you have rows like this
ID
Code
1
Fired
2
Contractor
3
Preferred
4
NewHire
 
Uh oh.  We tried to eliminate redundant data...but we didn't.  Now we have a case where we don't want to assign ID 1 to a vendor and ID 3 should not be assigned to an employee. 
 
So, this isn't really redundant data afterall, it's just similar data.  Some data modelers realize that and decide the correct choice is to keep separate CustomerStatus, EmployeeStatus, VendorStatus, etc tables. 
 
But some don't.  They decide to model it by adding a "type" column such as this...
ID
Code
Type
1
Fired
Employee
2
Contractor
Employee
3
Preferred
Vendor
4
NewHire
Employee
 
Oops.  Now Preferred can only be assigned to Vendors, not to Customers. 
 
I won't bore you with the details of how some modelers handle this.  It ends up being a lot of extra joins when business logic needs to be applied to Preferred customers but not vendors. 
 
Others problems with this approach...how do you ensure domain integrity?  Foreign keys won't work to the universal status table unless we also carry Type in the entity tables.  Now we need composite FK's.  Sounds like separate status tables would've been better.  Our only alternative is to enforce domain integrity via check constraints or triggers...or leave domain integrity to the app code. 
 
Might want to stick with separate status tables in these cases.  Remember...statuses are NOT redundant data to be removed...merely similar data to be maintained. 
 
As usual generalization like this is good in the OO world but not for your db.  Eventually if you generalize too much you end up with an EAV model.  Sure, they are the epitome of flexibility, but perform poorly and tend to be subject to data "correctness" issues.  Chaos is overgeneralization and ultra-flexibility, but it's difficult to get correct data from chaos.  With databases we shouldn't be concerned with simply flexible persistent storage, but rather with ensuring we can answer the query accurately. 
 
Summary of Issues with MUCK designs:
  1. constraints cannot be used properly
  2. data types are always varchar, else the table needs different value cols for dates, ints, etc.  This complicates the design.
  3. MUCKs eventually just become a table of rows where the meaning of an individual row gets overlooked. 
  4. if the tables get large enough you run into performance problems (locking,concurrency, cache usage)

Add new comment