DaveWentzel.com            All Things Data

Physical Data Modeling AntiPatterns - circular references

This is the last part (for now) of my series on Physical Data Modeling AntiPatterns.  Today's topic is circular references, aka "redundant relationships", or "cycles".  An example helps to show the problem:

In the example we need to add a row to X before adding a related row to Y and Y must be populated before Z, which must be populated before X.  To say another way, X is a parent of Y, which is a parent of Z, which is a parent of X.  This is a Catch-22 (you can't get a loan without established credit, but you can't get your credit established without having a previous loan), and we really can't do anything.  In mathematics this is called a cyclic graph.  Generally in data modeling we prefer acyclic graphs, in other words, we can't start at any given table and travel the relationships and loop back to the original table.  

This is a logical modeling problem and obviously if we modeled this physically with the correct foreign key constraints we would quickly determine our physical design was useless.  But many times you will see this anyway where the modeler simply didn't add one DRI constraint so that the physical model actually worked and matched the logical model.  That's a bad idea and is part of the antipattern.  Omitting a DRI constraint is not an antipattern, it's just flat out wrong, but omitting a DRI (or any constraint) to overcome a model problem is the antipattern.  

I see this implemented most often in hierarchies with self-joins.  For instance, I once designed a reporting metadata management system.  A given report could be a parent of another report but could also be the child of a third report.  This is common in custom reporting applications.  I may have a subreport on a master report that is simply a jpg of the company logo so I don't need a separate result set for that logo, nor include the logo on every row of the report, nor worry about logo sizing and consistency across many reports.  Here is the first pass of the model I came up with:

Note that ParentRDID and ChildRDID both refer back to ReportDefinition.  That's OK if we want to ensure solely that we are referencing valid ReportDefinitionIDs, but does nothing to ensure we can't have circular references.  It gets worse...I attempted to use update and delete CASCADE DRI actions when I declared the foreign keys.  That generates (smartly) an error due to the circular nature of the cascading updates:


Msg 1785, Level 16, State 1, Server DWENTZELLP, Line 1
Introducing FOREIGN KEY constraint 'FK_ReportDefinitionXReportDefinition_ReportDefinition1' on table 'ReportDefinitionXReportDefinition' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other

Sometimes self-referencing relationships need to be asymmetric (another way of saying acyclical...you are my friend but I am not your friend).  My requirement didn't need this, but the constraint would've eased some pain for sure.  Self-referencing relationships are sometimes irreflexive as well, meaning a row cannot be a parent/child of itself directly.  This further simplifies the model, but isn't always desired.  Reflexive relationships are rarely stored.  Asymmetric relationships by definition are irreflexive.  

It's also important to determine if transitive relationships will be allowed, such as the TableX to TableY to TableZ example above.  If transitive relationships are disallowed in self-referencing relationships then the constraints become more daunting.  If they are allowed it is important to determine how/if they will be stored physically.  If they will be then you need to consider if the user defined them directly or if they are inherently transitive.  You need some kind of algorithm to determine how/when to store these relationships.  Further, it is important to decide which side (or both) of the transitive relationships will be stored.  For instance, if I'm modeling friendships, and assuming if you are my friend then I am your friend, how do I store this, with you as the "parent" or me?  Or are we both "parents" of each other.  I generally say only store one side of the relationship and the "parent" is the side that comes first alphanumerically.  That has worked in all cases I have seen.  If I remove the assumption that I can be your friend but that does not imply that you can be my friend then I have a symmetric relationship that is neither transitive nor intransitive, but could be both.  


Add new comment