ConnectByRootQuery in Oracle

How to Migrate connect_by_root From Oracle to Postgres

During a recent Oracle to Postgres migration, I found the keyword connect_by_root very tricky to migrate. In this post, I hope to make migrating connect_by_root to Postgres a bit easier for you all!

Understanding connect_by_root

The operator connect_by_root can return the root value from a hierarchical query. This can be quite hard to visualize, but is easier to understand with an example:

Manager_idEmployee_idEmployee
NULL1Nick
12Josh
23Ali
NULL4Joe
45Kyle
SELECT Employee, CONNECT_BY_ROOT Manager_id as "Manager", SYS_CONNECT_BY_PATH(Employee, '/') "Path"
FROM exampleTable
START WITH Manager_id is not null
CONNECT BY PRIOR Employee_id = Manager_id;

The above query would return the results shown below. It includes “Start with Manager_id is not null” to stop the query from trying to find the manager of employees that do not have managers. The way the root node is found is determined by the “Connect By” part of the aforementioned query. In this case, it is set up to traverse the tree by employee id matching the manager id. Note: The sys_connect_by_path is shown here only to indicate the path of each row, to make clear where the Manager value is taken from – it is not needed for a connect_by_root query.

Employee | Manager | Path
Josh	 | 1	   | /Josh
Ali	     | 1	   | /Josh/Ali
Ali	     | 2	   | /Ali
Nick	 | 4	   | /Nick

As you can see, the connect_by_root operator traverses the hierarchy and returns the manager id at each point. If the manager found also has a manager then that value is returned as well – this is what causes the employee “Ali” to appear twice. Ali has a direct manager Josh ( Employee_id = 2) who also has a manager Nick ( Employee_id = 1).

Hierarchical Queries in Postgres

First, to migrate connect_by_root to Postgres, you will need to understand how recursive queries are set up. However, Postgres does not have the recursive query keywords that are available in Oracle. As a result, hierarchical queries are more difficult to set up in Postgres. I recommend checking out my post on recursive queries, before continuing, if you are unsure how they work in Postgres.

Migrating connect_by_root to Postgres

In order to correctly copy connect_by_root in Postgres, your recursive query will need to capture the behaviour of the “connect_by_root“, “start with” and “connect by prior” keywords. The code below shows how this can be achieved. Note: You can ignore the lines labelled “sys_connect_by_path”, they are simply there to help show in the output where each line is coming from. For more info on sys_connect_by_path in Postgres, check out my other article.

With recursive example_query as (
  select 
    Employee,
    Manager_id as Manager, -- This is the connect_by_root root value.
    Employee_id,
    '/' || Employee as name_path  -- sys_connect_by_path NOT NEEDED FOR CONNECT_BY_ROOT
  from exampleTable
  where Manager_id is not null -- This term should match the "start with" term.

  union all

  select 
    child.Employee,
    parent.Manager,  -- For the connect_by_root value, you must take the parent value and not the child!
    child.Employee_id,
    parent.name_path || '/' || child.Employee as name_path  --  sys_connect_by_path NOT NEEDED FOR CONNECT_BY_ROOT 
  from exampleTable as child
  inner join example_query as parent 
    on child.Manager_id = parent.Employee_id -- This term should match the "connect by prior" term.
)
select * from example_query;

In the first half of the recursive query, you need to ensure you have a term in the select clause which will be selecting your root value – in this case, it is “Manager_id as Manager”. You also need to set up a where clause to match the “Start With” term. The first part of a recursive query (before the union) does not actually get read during the recursion, and this is why the where clause is set up here.

In the second part of the recursive query, the connect by prior term is set up. This term allows the query to return values for each row in the hierarchy all the way up to the root. The connect by prior term is used to specify the relationship between the parent and child rows. The key part of the query is to take the parent value, for the connect_by_root term, in the select after the union. The results from this query are the same as the oracle query previously shown:

Employee | Manager | Path
Josh	 | 1	   | /Josh
Ali	     | 1	   | /Josh/Ali
Ali	     | 2	   | /Ali
Nick	 | 4	   | /Nick

Summary

In conclusion, migrating a connect_by_root query from Oracle to Postgres is not that difficult, the key thing is to understand recursive queries. Once you understand how to set up Postgres recursive queries, modifying it to get a root value is not so hard!

Leave a Reply