Hierarchical Data in SQL Server

Posted on December 23, 2011 By jollyjollybean

Often times in applications, I am required to maintain data that has a hierarchical relationship such as a tree structure. The easiest way to approach this problem is to add a key to the table that refers to the record’s parent key. For example, in my last project, I needed to have a table that held tasks:

create table Tasks (
                TaskID int,
                Name varchar(255)
)

TaskID	Name
135	Root
153	Child A
314	Child B
522	Node 1
629	Node 2

In order to have a hierarchical relationship, I create a new column to the table as below:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Name varchar(255)
)

TaskID	ParentTaskID	Name
135	NULL	Root
153	135	Child A
314	135	Child B
522	314	Node 1
629	314	Node 2

The ParentTaskID is a foreign key back to the Tasks table that allows NULL values. A NULL ParentTaskID would indicate the root level task. This methodology, often referred to as an Adjacency List, is sufficient if you know how many levels in the tree you will be maintaining. If you want to get all the first children of the root node you could do the following:

select 
                child.*
from
                Tasks parent join
                Tasks child on parent.TaskID = child.ParentTaskID
where
                parent.ParentTaskID is null

Returns this:

TaskID	ParentTaskID	Name
153	135	Child A
314	135	Child B

This will return all the first level child nodes for each root node. However, it becomes more difficult using this convention if you want to return all the descendants of a Task, not just the next child level. If you only have three levels of Tasks, you could do repetitive self joins and get to the desired level. However, the number of levels in the tree is not always fixed. Luckily, it becomes much simpler if you store the tree path as a separate column in the Task table:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Path varchar(1024),
                Name varchar(255)
)

TaskID	ParentTaskID	Path	Name
135	NULL	1	Root
153	135	1.1	Child A
314	135	1.2	Child B
522	314	1.2.1	Node 1
629	314	1.2.2	Node 2

Some people prefer to model the Path using a count methodology shown above so that the root will start at 1, its first child will be 1.1, the second, 1.2, etc. I find it better to use the actual TaskID as the path as shown below:

create table Tasks (
                TaskID int,
                ParentTaskID int,
                Path varchar(1024),
                Name varchar(255)
)

TaskID	ParentTaskID	Path	Name
135	NULL	135	Root
153	135	135.153	Child A
314	135	135.314	Child B
522	314	135.314.522	Node 1
629	314	135.314.629	Node 2

This method makes is much easier to insert a new task into the tree structure. Just get the new Tasks’s parent path and append the ID separated by a period (or whatever character you prefer).

Now if you want to get all the descendants of the root nodes you could do the following:

select 
t2.*
from
                Tasks t1 join
                Tasks t2 on t2.Path like t1.Path + ‘%’
where
                t1.ParentTaskID is null 
                and t2.TaskID <> t1.TaskID

Returns this:

TaskID	ParentTaskID	Path	Name
153	135	135.153	Child A
314	135	135.314	Child B
522	314	135.314.522	Node 1
629	314	135.314.629	Node 2

Similarly, if you wanted to get all the parent tasks of a given task you could do the following:

select
                t1.*
from
                Tasks t1 join
                Tasks t2 on t2.path like t1.path + ‘%’
where
                t2.TaskID = 629
                and t1.TaskID <> t2.TaskID

Returns this:

TaskID	ParentTaskID	Path	Name
135	NULL	135	Root
314	135	135.314	Child B

If you need to know the level of the current task, you can use the method discussed in a separate article here to get the number of separators in the path:

select
                t.*,
                len(t.Path) - len(replace(t.Path, ‘.’, ‘’) as Level
from
                Tasks t
where
                t.TaskID = 314

Returns this:

TaskID	ParentTaskID	Path	Name	Level
314	135	135.314	Child B	1

You can also use this methodology to get all the Tasks at a given level:

select
                t.*
from
                Tasks t
where
                LEN(t.Path) - LEN(REPLACE(t.Path, ‘.’, ‘’) = 1

Returns this:

TaskID	ParentTaskID	Path	Name
153	135	135.153	Child A
314	135	135.314	Child B

SQL