Hierarchical Data in SQL Server

Posted on December 23, 2011 By

Often times in applications, I am required to maintain data that has a hierarchical relationship such as a tree structure. The easiest way to approach this problem is to add a key to the table that refers to the record’s parent key. For example, in my last project, I needed to have a table that held tasks:

create table Tasks (
                TaskID int,
                Name varchar(255)
)
TaskID Name
135 Root
153 Child A
314 Child B
522 Node 1
629 Node 2

 

In order to have a hierarchical relationship, I create a new column to the table as below:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Name varchar(255)
)
TaskID ParentTaskID Name
135 NULL Root
153 135 Child A
314 135 Child B
522 314 Node 1
629 314 Node 2

 

The ParentTaskID is a foreign key back to the Tasks table that allows NULL values. A NULL ParentTaskID would indicate the root level task. This methodology, often referred to as an Adjacency List, is sufficient if you know how many levels in the tree you will be maintaining. If you want to get all the first children of the root node you could do the following:

select 
                child.*
from
                Tasks parent join
                Tasks child on parent.TaskID = child.ParentTaskID
where
                parent.ParentTaskID is null
Returns this:

TaskID ParentTaskID Name
153 135 Child A
314 135 Child B

 

This will return all the first level child nodes for each root node. However, it becomes more difficult using this convention if you want to return all the descendants of a Task, not just the next child level. If you only have three levels of Tasks, you could do repetitive self joins and get to the desired level. However, the number of levels in the tree is not always fixed. Luckily, it becomes much simpler if you store the tree path as a separate column in the Task table:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Path varchar(1024),
                Name varchar(255)
)
TaskID ParentTaskID Path Name
135 NULL 1 Root
153 135 1.1 Child A
314 135 1.2 Child B
522 314 1.2.1 Node 1
629 314 1.2.2 Node 2

 

Some people prefer to model the Path using a count methodology shown above so that the root will start at 1, its first child will be 1.1, the second, 1.2, etc. I find it better to use the actual TaskID as the path as shown below:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Path varchar(1024),
                Name varchar(255)
)
TaskID ParentTaskID Path Name
135 NULL 135 Root
153 135 135.153 Child A
314 135 135.314 Child B
522 314 135.314.522 Node 1
629 314 135.314.629 Node 2

 

This method makes is much easier to insert a new task into the tree structure. Just get the new Tasks’s parent path and append the ID separated by a period (or whatever character you prefer).

Now if you want to get all the descendants of the root nodes you could do the following:

select 
t2.*
from
                Tasks t1 join
                Tasks t2 on t2.Path like t1.Path + ‘%’
where
                t1.ParentTaskID is null 
                and t2.TaskID <> t1.TaskID
Returns this:

TaskID ParentTaskID Path Name
153 135 135.153 Child A
314 135 135.314 Child B
522 314 135.314.522 Node 1
629 314 135.314.629 Node 2

 

Similarly, if you wanted to get all the parent tasks of a given task you could do the following:

select
                t1.*
from
                Tasks t1 join
                Tasks t2 on t2.path like t1.path + ‘%’
where
                t2.TaskID = 629
                and t1.TaskID <> t2.TaskID
Returns this:

TaskID ParentTaskID Path Name
135 NULL 135 Root
314 135 135.314 Child B

 

If you need to know the level of the current task, you can use the method discussed in a separate article here to get the number of separators in the path:

select
                t.*,
                len(t.Path) - len(replace(t.Path, ‘.’, ‘’) as Level
from
                Tasks t
where
                t.TaskID = 314
Returns this:

TaskID ParentTaskID Path Name Level
314 135 135.314 Child B 1

 

You can also use this methodology to get all the Tasks at a given level:

select
                t.*
from
                Tasks t
where
                LEN(t.Path) - LEN(REPLACE(t.Path, ‘.’, ‘’) = 1
Returns this:

TaskID ParentTaskID Path Name
153 135 135.153 Child A
314 135 135.314 Child B

 

SQL