Hierarchical Data in SQL Server
Often times in applications, I am required to maintain data that has a hierarchical relationship such as a tree structure. The easiest way to approach this problem is to add a key to the table that refers to the record’s parent key. For example, in my last project, I needed to have a table that held tasks:
create table Tasks ( TaskID int, Name varchar(255) ) |
|
In order to have a hierarchical relationship, I create a new column to the table as below:
create table Tasks ( TaskID int, ParentTaskID int, Name varchar(255) ) |
|
The ParentTaskID is a foreign key back to the Tasks table that allows NULL values. A NULL ParentTaskID would indicate the root level task. This methodology, often referred to as an Adjacency List, is sufficient if you know how many levels in the tree you will be maintaining. If you want to get all the first children of the root node you could do the following:
select child.* from Tasks parent join Tasks child on parent.TaskID = child.ParentTaskID where parent.ParentTaskID is null |
Returns this:
|
This will return all the first level child nodes for each root node. However, it becomes more difficult using this convention if you want to return all the descendants of a Task, not just the next child level. If you only have three levels of Tasks, you could do repetitive self joins and get to the desired level. However, the number of levels in the tree is not always fixed. Luckily, it becomes much simpler if you store the tree path as a separate column in the Task table:
create table Tasks ( TaskID int, ParentTaskID int, Path varchar(1024), Name varchar(255) ) |
|
Some people prefer to model the Path using a count methodology shown above so that the root will start at 1, its first child will be 1.1, the second, 1.2, etc. I find it better to use the actual TaskID as the path as shown below:
create table Tasks ( TaskID int, ParentTaskID int, Path varchar(1024), Name varchar(255) ) |
|
This method makes is much easier to insert a new task into the tree structure. Just get the new Tasks’s parent path and append the ID separated by a period (or whatever character you prefer).
Now if you want to get all the descendants of the root nodes you could do the following:
select t2.* from Tasks t1 join Tasks t2 on t2.Path like t1.Path + ‘%’ where t1.ParentTaskID is null and t2.TaskID <> t1.TaskID |
Returns this:
|
Similarly, if you wanted to get all the parent tasks of a given task you could do the following:
select t1.* from Tasks t1 join Tasks t2 on t2.path like t1.path + ‘%’ where t2.TaskID = 629 and t1.TaskID <> t2.TaskID |
Returns this:
|
If you need to know the level of the current task, you can use the method discussed in a separate article here to get the number of separators in the path:
select t.*, len(t.Path) - len(replace(t.Path, ‘.’, ‘’) as Level from Tasks t where t.TaskID = 314 |
Returns this:
|
You can also use this methodology to get all the Tasks at a given level:
select t.* from Tasks t where LEN(t.Path) - LEN(REPLACE(t.Path, ‘.’, ‘’) = 1 |
Returns this:
|