Gotta Stop Violating Foreign Keys

About a year ago I was working on a project that required that data be extracted from one database and inserted into an database.  Not the most earth shattering concept.  This is done quite often and doesn’t necessarily require a discussion of foreign keys.

So why talk about them… well… in this case the source and the destination databases were identical. They had the same table and foreign key structure.  But no reason to be sold on this discussion quite yet… because it would be east to just build the ETL process around this.  It would need to be built with the ordering of the tables controlled with precedence constraints or inserts statements in the proper order.

So here’s the rub… this process needed to move data between hundreds of tables.  When hundreds of tables are in the discussion, I lose interest very quickly in trying to determine by hand how to order the tables.  I did in this case and looked for a method to accomplish this through foreign key meta data.

And of course, there wouldn’t be a post if there wasn’t an answer.  Well, maybe I’d post a question, but I not so nanner nanner.  The solution uses a function with a query to get the necessary data.  There might be better ways to do this, but this way worked quite well and had pretty decent performance.  One variation of this used a single CTE that I let run for a couple hours before I decided it was running sub-optimal.

There are two caveats with this solution:

  1. Self-referencing foreign keys will cause a recursion error.  When row A needs to be inserted before row B can’t really be helped through metadata.  Since the environment I was working in didn’t have this issue, the solution didn’t address it either.
  2. Circular-referencing foreign keys will also cause a recursion error.  There are situations where table A references table B which references table C which then references table A.  I run into these from time to time but not in the environment that this was designed for.

With the caveats out of the way, here is the solution:

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ListFKParents]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[ListFKParents]
GO

/*================================================================================
Function: dbo.ListFKParents
Author: Jason Strate
Date: July 23, 2009

Synopsis:
This function returns a comma separated list of parents for a table.

================================================================================
Revision History:
Date: By Description
----------------------------------------------------------------------------------
================================================================================*/
CREATE FUNCTION [dbo].[ListFKParents] (@ObjectID int)
RETURNS table
AS
RETURN (
WITH cteParentChild (Object_ID, parent_Object_ID, referenced_Object_ID, name)
AS
(
SELECT fk.parent_Object_ID, t1.name, fk.referenced_Object_ID, t2.name
FROM sys.foreign_keys fk
INNER JOIN sys.tables t1 ON fk.parent_Object_ID = t1.Object_ID
INNER JOIN sys.tables t2 ON fk.referenced_Object_ID = t2.Object_ID
WHERE t1.Object_ID = @ObjectID
UNION ALL
SELECT fk.parent_Object_ID, t1.name, fk.referenced_Object_ID, t2.name
FROM sys.foreign_keys fk
INNER JOIN sys.tables t1 ON fk.parent_Object_ID = t1.Object_ID
INNER JOIN sys.tables t2 ON fk.referenced_Object_ID = t2.Object_ID
INNER JOIN cteParentChild cte ON cte.referenced_Object_ID = fk.parent_Object_ID
WHERE fk.referenced_Object_ID  fk.parent_Object_ID
)
SELECT (SELECT ',' + CONVERT(varchar(50), x.referenced_Object_ID) as [text()]
FROM cteParentChild x
WHERE x.referenced_Object_ID  @ObjectID
ORDER BY x.referenced_Object_ID
FOR XML PATH ('')) + ',' AS FKParents
)
GO

IF OBJECT_ID('tempdb..#ParentChild') IS NOT NULL
DROP TABLE #ParentChild

CREATE TABLE #ParentChild
(
object_id int
,name sysname
,parents varchar(max)
)

--List all tables and return a list of all of the parent tables to the table
INSERT INTO #ParentChild
SELECT object_id, name, x.FKParents
FROM sys.tables t
CROSS APPLY dbo.ListFKParents(object_id) x

IF OBJECT_ID('tempdb..#TableOrder') IS NOT NULL
DROP TABLE #TableOrder

CREATE TABLE #TableOrder
(
object_id int
,name sysname
,rank int
)

DECLARE @Loop int
SET @Loop = 0

--Insert all tables with out parents as 0-level rank
INSERT INTO #TableOrder
SELECT object_id, name, @Loop
FROM #ParentChild
WHERE parents IS NULL

WHILE 1=1
BEGIN
SET @Loop = @Loop + 1

--Remove any items that are in #TableOrder and in #ParentChild
DELETE FROM #ParentChild
WHERE object_id IN (SELECT object_id FROM #TableOrder)

;WITH cteParentChild (varchar_object_id, object_id, name, parents)
AS
(
--Select items that match to parent #TableOrder in #ParentChild
SELECT DISTINCT '%,' + CONVERT(varchar(50),t.object_id) + ',%', t.*
FROM #ParentChild t
INNER JOIN (
SELECT '%,' + CONVERT(varchar(50),object_id) + ',%' as object_id
FROM #TableOrder
) x ON parents LIKE x.object_id
)
--Insert items into that have all parents in #TableOrder
INSERT INTO #TableOrder
SELECT pc1.object_id, pc1.name, @Loop
FROM cteParentChild pc1
LEFT OUTER JOIN cteParentChild pc2 ON pc1.parents LIKE pc2.varchar_object_id
WHERE pc2.varchar_object_id IS NULL

IF @@RowCount = 0
BREAK
END

SELECT *
FROM #TableOrder
ORDER BY rank, name

Happy hunting.