A little late to answer but, to be able to build such a hierarchical structure with nodes where each one might be connected with other and there could be cycles, one would have to keep track of what goes in what place, as also pointed out by @p3consulting -
mainly after the recursive query, you need another CTE to calculate LAG and LEAD levels in the hierarchy + the JSON of the item node, and the final query will add the "children" node and use the lead/lag infos to close the hierarchy correctly
In order to do that (keep a track of what goes where), we'll have to attach some meta data to a row, or calculate it. I approached it with adding some meta data, essentially, which row is a parent to what row and what's the depth at which a particular row would exist. With that meta data being added, we can use it to build the hierarchy incrementally.
Few Things to Note
- This script works under the assumption that, although cycles may be there in the data, we only wish to show all the nodes reached in our hierarchy without any cycles. To wit, there won't be same hierarchical data at any different depths. I'll talk about it further in the answer as well.
- This script works for only this scenario (when there'll be at most 2 connections with each node) and doesn't use a generalized approach, so to say. However, it's easy to take hints from it to extend it further.
- Snippets of scripts are added in-between the explanation, might not work when used standalone, so refer to the final script for that.
- This approach seems pretty dirty.
The Approach
Recursive Breadth Traversal with Max Depth
In order to do a recursive breadth traversal with max depth, I think it's better to always look in both the directions so we can modify OP's script to this -
WITH RECURSIVE connections_and_items AS (
SELECT
c.id AS connection_id,
c.title AS connection_title,
c.origin_item_id AS origin_item_id,
i.title AS origin_item_title,
c.destination_item_id AS destination_item_id,
it.title AS destination_item_title,
1 AS depth,
`${itemId}`::UUID AS parent,
gen_random_uuid() AS child_of
FROM connections c
JOIN items i
ON i.id = c.origin_item_id
JOIN items it
ON it.id = c.destination_item_id
), base_items AS (
SELECT
*
FROM connections_and_items cai
WHERE origin_item_id = `${itemId}`::UUID OR destination_item_id = `${itemId}`::UUID
), connected_items AS (
SELECT
*
FROM base_items
UNION
SELECT
cai.connection_id,
cai.connection_title,
cai.origin_item_id,
cai.origin_item_title,
cai.destination_item_id,
cai.destination_item_title,
ci.depth + 1 AS depth,
CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
FROM connections_and_items cai, connected_items ci
WHERE ci.depth < `${maxDepth}`
AND ((ci.destination_item_id = cai.destination_item_id)
OR (ci.origin_item_id = cai.origin_item_id))
)
Where base_items is the very first result from the query and then in connected_items we do a recursive union query. Here, depth is, well the depth, child_of is the UUID of the row which the record will be a child of, and parent is going to be the UUID of the parent, in the record. For now the definition might seem ambiguous, but it'll become clearer later on.
Build Initial Objects Record
For all the rows that we've from our query, we'll build an initial jsonb object for them, that's going to go in the result, to a column called json_conn, adding another CTE, taking advantage of the fact that if parent is not equal to the destination_item_id we know that the origin_item_title would be the title and vice-versa -
initial AS (
SELECT
DISTINCT ON (connection_id)
connection_title,
depth,
parent,
child_of,
jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
)
With this, we'll get another column json_conn where a value would look like -
{
"title": "Shenzen",
"connections": [
{
"title": "Same Author - Guy Delisle - 3",
"origin_item": {
"title": "Jerusalem"
}
}
]
}
Depicting the relation, between destination and origin, keeping the order in which the nodes were scanned.
Note that we're doing a DISTINCT ON (connection_id) to remove any duplicate connections, and hence remove cycles, if we don't add this, the result might become unexpected, because with only information to keep track of nodes being the depth and parent, the script gets confused. If a hierarchy where all the duplicate cycles are added too is expected, we'll have to add an FQP (Fully Qualified Path) metadata to keep a track.
With this, we'll get non-duplicate connections, with each depth existing at most twice.
Update Initial Record Objects
Our initial object records show the relationship between parent and child, however, we don't have the name of the connection for where the parent is coming from, so we'll add that like so -
intermediate AS (
SELECT * FROM (SELECT
i.depth,
i.parent,
i.child_of,
CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
ELSE NULL
END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1)
We make an inner join on the intial CTE with condition to match that a child_of matches a parent where the difference in depth is 1, from there, we can determine, whether the parent was scanned as an origin_item or a destination_item. We also, merge the 2 records that we've for the depth 1, because we can at this point, and it won't cause no harm This is how the record at depth 1 will look -
{
"title": "Pyongyang",
"connections": [
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Burma"
}
},
{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzen"
}
}
]
}
And this is how any record below depth 1 will look, with connection information (compare it with what we had before in initial) -
{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzen",
"connections": [
{
"title": "Same Author - Guy Delisle - 3",
"origin_item": {
"title": "Jerusalem"
}
}
]
}
}
At this point, all the json_conn's from the all the rows are ready to be merged by depth for their children and then wrapped in an array and added to the connections of their parent.
We can either build the hierarchy from top to bottom or from bottom to top. If we go with former option, we'll have to manipulate a lots of nested data, to avoid that, we'll do a bottom to top build. For that, we'll have to go from the maximum depth to the least and merge the records.
Towards Final Result
We'll add another recursive CTE, to build the hierarchy from bottom up, like so -
max_depth AS (
SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
SELECT i.* FROM intermediate i, max_depth WHERE i.depth = max_depth
UNION
(SELECT
fr.depth - 1,
i.parent,
i.child_of,
CASE
WHEN fr.child_of = i.parent THEN
CASE
WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
END
END
ELSE i.json_conn END AS json_conn
FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
)
That looks really nasty, however, what we're trying to do is, start from bottom, take that record, and check in our intermediate CTE to see where does the final_result.child_of matches intermediate.parent with a difference of 1 in depth, and for the json_conn column, check the membership with the position of the connection and add it there. All these nested CASE-WHEN-ELSE-END look terrible and could probably be reduced by using jsonb_path_exists, I didn't do it because, well, laziness.
With this query, we'll get the record(s) for depth 2 with all the children merged where they need to be merged, and all we need to do is aggregate it in an array and replace the connections value of the depth 1 record that we'd merged earlier. A simple final statement get the merged result will do the trick -
SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;
Note the CASE here on max_depth, when the max_depth is 1, the final_result performs really minor calculations and results in null because there's nothing to merge and we'd already gotten our result in intermediate so we get the result conditionally.
Complete Script
You can find the complete script here. I couldn't add the complete script here because the markdown was giving me a hard time, with code formatting. Feel free to update the answer and add it here, if you know how to go about it. The script has hardcoded values for itemId and maxDepth from my testing.
Test
With this cyclic data in my tables -
// items table
id | title
--------------------------------------+-----------
908b18ea-6a98-45a5-bdf9-1dcf7f97f35c | Software
999c02fd-8112-4f01-864a-d29a423d1704 | Burma
79cd129f-ffd7-468e-9551-2eca974a6308 | Shenzen
cbf5e622-773b-4b42-8002-1be91cf4efda | Jerusalem
bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Hostage
b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Pyongyang
// connections table
id | origin_item_id | destination_item_id | title
--------------------------------------+--------------------------------------+--------------------------------------+-------------------------------
be5e99f2-e272-4d3a-ac0b-183805901ad9 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 1
6a74d269-0573-4ad9-a66a-5e681032250e | 999c02fd-8112-4f01-864a-d29a423d1704 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Same Author - Guy Delisle - 2
1bf720a6-27f0-4fca-b4a0-0f4ff34965eb | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | cbf5e622-773b-4b42-8002-1be91cf4efda | Same Author - Guy Delisle - 4
8477cf0f-8a39-42d0-90bf-d9bfaa31a0a0 | cbf5e622-773b-4b42-8002-1be91cf4efda | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 3
01e3b341-0ca3-4049-abdf-0c4d1ab0aa52 | cbf5e622-773b-4b42-8002-1be91cf4efda | 999c02fd-8112-4f01-864a-d29a423d1704 | Same Author - Guy Delisle - 5
d237bc88-1fa3-411f-baec-d003da193a51 | 999c02fd-8112-4f01-864a-d29a423d1704 | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Same Author - Guy Delisle - 6
And running the script attached (depth - 3), we get this -
{
"title": "Pyongyang",
"connections": [
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Burma",
"connections": [
{
"title": "Same Author - Guy Delisle - 6",
"destination_item": {
"title": "Hostage"
}
}
]
}
},
{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzen",
"connections": [
{
"title": "Same Author - Guy Delisle - 3",
"origin_item": {
"title": "Jerusalem",
"connections": [
{
"title": "Same Author - Guy Delisle - 5",
"destination_item": {
"title": "Burma"
}
}
]
}
}
]
}
}
]
}
Final script
WITH RECURSIVE connections_and_items AS (
SELECT
c.id AS connection_id,
c.title AS connection_title,
c.origin_item_id AS origin_item_id,
i.title AS origin_item_title,
c.destination_item_id AS destination_item_id,
it.title AS destination_item_title,
1 AS depth,
'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID AS parent,
gen_random_uuid() AS child_of
FROM connections c
JOIN items i
ON i.id = c.origin_item_id
JOIN items it
ON it.id = c.destination_item_id
), base_items AS (
SELECT
*
FROM connections_and_items cai
WHERE origin_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID OR destination_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID
), connected_items AS (
SELECT
*
FROM base_items
UNION
SELECT
cai.connection_id,
cai.connection_title,
cai.origin_item_id,
cai.origin_item_title,
cai.destination_item_id,
cai.destination_item_title,
ci.depth + 1 AS depth,
CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
FROM connections_and_items cai, connected_items ci
WHERE ci.depth < 3
AND ((ci.destination_item_id = cai.destination_item_id)
OR (ci.origin_item_id = cai.origin_item_id))
),
initial AS (
SELECT
DISTINCT ON (connection_id)
connection_title,
depth,
parent,
child_of,
jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
), intermediate AS (
SELECT * FROM (SELECT
i.depth,
i.parent,
i.child_of,
CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
ELSE NULL
END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1),
max_depth AS (
SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
SELECT i.* FROM intermediate i, max_depth WHERE i.depth = max_depth
UNION
(SELECT
fr.depth - 1,
i.parent,
i.child_of,
CASE
WHEN fr.child_of = i.parent THEN
CASE
WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
END
WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
THEN
CASE
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
END
END
ELSE i.json_conn END AS json_conn
FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
) SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;
maxDepthor just want the query to go as deep as it can without providing any depth?maxDepthas a parameter, not hardcoded. I think wrapping it in a PSQL function would be better, I kind of forgot about that when I asked the question.