From my experience, recursive queries in Postgres can be quite complex and daunting to work with. With this post, I aim to break down how recursive queries are structured and then take a look at how they run. Hopefully, giving you the technical skills to use them in your code.
Why Use Recursive Queries?
You will need to use a recursive query if you are navigating through hierarchical data. An example of hierarchical data is a record containing manager and employee data; a nested comment section, with main and sub comments; or even categories and subcategories, such as food groups. The key to the information set up is that a record will contain an id relating to a parent or child.
The table, above, shows an example of data that you could work a recursive query on. It shows the records are connected through matching “main_id” and “parent_id“. For example, Andy has a parent with an ID of 1 – this corresponds to Jane.
Structure of Recursive Queries
The recursive query is held within a common table expression (CTE), with the keyword “recursive” included. A quick note on the recursive keyword: It needs to be declared on the first CTE in a statement and it will apply to all valid CTEs within scope. This does not mean that the first CTE declared needs to be recursive.
A recursive query can be split into two parts – the non-recursive section, and the recursive section. Only the recursive part of the code repeats during the recursion (obviously).
It is important to note that the non-recursive and recursive statements must both select the same number of columns or the query will give an error. You will also see that the recursion is happening due to the inner join statement. The join is calling the whole recursive CTE “example_query” during the second select statement.
How Recursive Queries Run
Now you know the structure of the query I will outline how it runs. Luckily the query can be broken down into four simple steps:
- The select statement in the non-recursive section is run.
- The select statement in the recursive section is run using the results from the previous select statement.
- If there are results then these are saved to the final result and then step 2 is performed again.
- When there are no results returned from step 3, then the query stops and the appended results from all the selects are returned.
In the above example, I added an optional row into the select statements to track the level of recursion. This is optional for a recursive query, but it is useful to show where each row is coming from.
The row with a recursion level of zero is from the non-recursive select statement (Step 1). The rows with a recursion level of 1 are from the first recursive select (Step 2). Then, Step 3 is assessed, and there are results – so step 2 is run again. This returns the row with a recursion level of 2. Again, Step 3 is assessed and there are results – so step 2 is triggered. Finally, step 2 gives no results, which leads to all the previous results being returned (step 4).
This is quite a wordy explanation but, as you can see, the query can keep going round and round. You just need to ensure your query and data will eventually give perform a select that does not return anything. Otherwise, it will be impossible for step 4 to be hit, so your query will not end. Alternatively, force the query to end by adding a limit to the select statement that calls the recursive query.
Writing a Recursive Query
When writing recursive queries I tend to test them as I go using a do block. I have covered how I test code in pgAdmin in a previous article. The main things to watch out for when writing recursive queries are:
- Ensure your select statements make sense. The select statements on either side of the union must have the same number of columns – and they should be selecting the same columns. The recursive select statement may select from the parent or child.
- Use a recursion level column whilst you are building out the query (outlined in the How Recursive Queries Run section). If you don’t want the recursion level in your final query remove it at the end. But it is a very good sanity check whilst you are building out the query, to see if everything is working as expected.
- Don’t get distracted by complex from and join clauses – they are not affecting the recursion structure. All the tables added in the “from” and “joins” in the non-recursive select statement will be the same in the recursive section (with the recursive select also including the join of itself to cause the recursion).
- If your code never stops executing then it will be trapped in an endless loop. Meaning your query isn’t reaching a point in the recursion where no results are returned. As mentioned previously, setting a limit could be useful in this case.
That was a quick look at breaking down recursive queries, understanding how they work, and a few tips for when you are writing your own. Once you get past how complex the syntax looks, you will see that it always boils down to a few simple steps. If you walk through these you will be able to understand all recursive queries you come across in Postgres!