1

(I had originally asked this question on the DBA StackExchange, but didn't get much activity in it).

In my DB, I have tables for items and connections between them, creating a likely sparse graph, with islands:

CREATE TABLE IF NOT EXISTS items (
    --------------------------------------------------------
    id    UUID         NOT NULL DEFAULT uuid_generate_v4(),
    --------------------------------------------------------
    title VARCHAR(300) NOT NULL,
    --------------------------------------------------------
    CONSTRAINT     items_pk
       PRIMARY KEY (id)
    --------------------------------------------------------
);

CREATE TABLE IF NOT EXISTS connections (
    ---------------------------------------------------------------------
    id                  UUID         NOT NULL DEFAULT uuid_generate_v4(),
    ---------------------------------------------------------------------
    origin_item_id      UUID         NOT NULL,
    destination_item_id UUID         NOT NULL,
    ---------------------------------------------------------------------
    title               VARCHAR(300) NOT NULL,
    ---------------------------------------------------------------------
    CONSTRAINT     origin_item_fk
       FOREIGN KEY (origin_item_id)
        REFERENCES items (id),
    CONSTRAINT     destination_item_fk
       FOREIGN KEY (destination_item_id)
        REFERENCES items (id),
    CONSTRAINT     connections_pk
       PRIMARY KEY (id)
    ---------------------------------------------------------------------
);

I'm using node-pg, and I would like to do a breadth traversal from an inital seed (e.g. url parameter on ExpressJS), up to a certain depth (from what I see in the docs, the depth part is probably the easiest).

Ideally, I would like things to be returned in a nested way, but, from reading these docs (WITH queries), I couldn't figure out how, or if it is even possible.

INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');

INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
    (
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 1'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Burma'),
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        'Same Author - Guy Delisle - 2'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 3'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Hostage'),
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        'Same Author - Guy Delisle - 4'
    ),

The recursive query would then result in, for Pyongyang and maxDepth = 2 — note neither Hostage nor Software should appear then —:

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzhen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 2",
            "origin_item": {
              "title": "Jerusalem"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    }
  ]
}

(I'm not 100% sure if I've covered all the cases with this example.)

Is this type of nesting even possible in PostgreSQL (with, say, jsonb_build_object or jsonb_agg)? How would I do it?

Here is what I've been able to do:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
        it.title AS destination_item_title

    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
),
connected_items AS (
    SELECT *
    FROM connections_and_items
    WHERE origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT c.*
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
) 

SELECT 
    ci.connection_id, 
    ci.connection_title,
    jsonb_build_object(
        'id',    ci.origin_item_id,
        'title', ci.origin_item_title
    ) origin_item,
    jsonb_build_object(
        'id',    ci.destination_item_id,
        'title', ci.destination_item_title
    ) destination_item

FROM connected_items ci

This actually does traverse the graph successfully, in both directions (!). However, I haven't been able to add a maxDepth constraint (having trouble detecting cycles), and the nesting is still not there.

So, in summary, here's what I have and what I'm missing:

  • ✔ Recursive Breadth Traversal
  • ✗ Max Depth
  • ✗ Nested JSON
  • ✔ Filtering the Initial Seed

References

7
  • your example for the sample query result is pretty different from what you're trying to do with your query, would either of the formatted result work for you? Also, do you want to be able to input a maxDepth or just want the query to go as deep as it can without providing any depth? Commented Aug 23, 2023 at 18:04
  • You mean the query I was able to come up with is wrong or is it only the formatting? If it's the formatting, then it's true that it's missing, but that's kind of the core of the question. If it's wrong, could you please briefly describe what's wrong? Commented Aug 23, 2023 at 18:53
  • I would rather be able to input it maxDepth as a parameter, not hardcoded. I think wrapping it in a PSQL function would be better, I kind of forgot about that when I asked the question. Commented Aug 23, 2023 at 18:55
  • Would there be cyclic connections in the data? Commented Aug 24, 2023 at 2:23
  • 1
    You should really wait for the end of the bounty period to award it. You never know when you might get another, possibly better answer. Commented Aug 26, 2023 at 22:45

4 Answers 4

2

You said "(I'm not 100% sure if I've covered all the cases with this example.)": and you are right, the JSON format you try to achieve is ambiguous when an item is origin or destination of several connections.

Your recursive query can be the basis of a JSON like the one below: the main difference being that there is no mixing of nodes item/connection in the hierarchy traversal, which is the main difficulty of the format you try to achieve where you go from item (while your query is connection-driven) then go the connections and then back to items, and furthermore you have 2 hierarchy: the "from" and the "to" ones...

You recursive query leads easily to the following kind of JSON: mainly after the recursive query, you need another CTE to calculate LAG and LEAD levels in the hierarchy + the JSON of the item node, and the final query will add the "children" node and use the lead/lag infos to close the hierarchy correctly, e.g. the correct number of "]}" required to close the "children" node):

[
  {
    "conn_title" : "Same Author - Guy Delisle - 1",
    "org_title" : "Pyongyang",
    "dest_title" : "Shenzhen",
    "children" :
    [
      {
        "conn_title" : "Same Author - Guy Delisle - 2",
        "org_title" : "Burma",
        "dest_title" : "Pyongyang",
        "children" :
        [
          {
            "conn_title" : "Same Author - Guy Delisle - 1",
            "org_title" : "Pyongyang",
            "dest_title" : "Shenzhen"
          }
        ]
      },
      {
        "conn_title" : "Same Author - Guy Delisle - 3",
        "org_title" : "Jerusalem",
        "dest_title" : "Shenzhen",
        "children" :
        [
          {
            "conn_title" : "Same Author - Guy Delisle - 1",
            "org_title" : "Pyongyang",
            "dest_title" : "Shenzhen"
          },
          {
            "conn_title" : "Same Author - Guy Delisle - 4",
            "org_title" : "Hostage",
            "dest_title" : "Jerusalem",
            "children" :
            [
              {
                "conn_title" : "Same Author - Guy Delisle - 3",
                "org_title" : "Jerusalem",
                "dest_title" : "Shenzhen"
              }
            ]
          }
        ]
      }
    ]
  }
]

As requested, the code but don't forget it's for ORACLE, not PostgreSQL:

with items(id, title) as (
    select 1, 'Software' from dual union all
    select 2, 'Pyongyang' from dual union all
    select 3, 'Burma' from dual union all
    select 4, 'Shenzhen' from dual union all
    select 5, 'Jerusalem' from dual union all
    select 6, 'Hostage' from dual -- union all
),
connections(id, org_item_id, dest_item_id, title) as (
    select 1, 2, 4, 'Same Author - Guy Delisle - 1' from dual union all
    select 2, 3, 2, 'Same Author - Guy Delisle - 2' from dual union all
    select 3, 5, 4, 'Same Author - Guy Delisle - 3' from dual union all
    select 4, 6, 5, 'Same Author - Guy Delisle - 4' from dual -- union all
),
connections_and_items as (
    select 
        c.id as conn_id,
        c.title AS conn_title,
        c.org_item_id,
        ito.title AS org_title,
        c.dest_item_id,
        itd.title AS dest_title
    from connections c
    join items ito on ito.id = c.org_item_id
    join items itd on itd.id = c.dest_item_id
),
connected_items(lvl, parent_id,
    conn_id, conn_title, org_item_id, org_title, dest_item_id, dest_title
) as (
    SELECT 1, null,
        c.conn_id, c.conn_title, 
        c.org_item_id, c.org_title, 
        c.dest_item_id, c.dest_title
    FROM connections_and_items c
    WHERE org_item_id = 2
    UNION ALL   
    SELECT lvl+1, ci.conn_id,
        c.conn_id, c.conn_title, 
        c.org_item_id, c.org_title, 
        c.dest_item_id, c.dest_title
    FROM connections_and_items c
    JOIN connected_items ci
      ON 
        ci.org_item_id      = c.dest_item_id
      OR 
        ci.dest_item_id     = c.org_item_id
      OR 
        (ci.org_item_id     = c.org_item_id AND c.conn_id <> ci.conn_id)
      OR 
        (ci.dest_item_id     = c.dest_item_id AND c.conn_id <> ci.conn_id)
    where lvl < 10 -- set the max as desired
)
search depth first by conn_id set rn
CYCLE org_item_id, dest_item_id SET is_loop TO 1 DEFAULT 0
, rel_hier_with_leadlag AS (
  SELECT r.*
    , LAG(lvl) OVER(ORDER BY rn) AS lag_lvl
    , LEAD(lvl,1) OVER(ORDER BY rn) AS llead_lvl -- we need to know which the latest node
    , LEAD(lvl,1,1) OVER(ORDER BY rn) AS lead_lvl  -- for the latest node we need to use 1 instead of NULL
    , JSON_OBJECT(
      -- 'conn_id'         VALUE conn_id, 
      'conn_title'      VALUE conn_title,
      -- 'org_item_id'     VALUE org_item_id,
      'org_title'       VALUE org_title,
      -- 'dest_item_id'    VALUE dest_item_id,
      'dest_title'      VALUE dest_title
      ABSENT ON NULL
      RETURNING CLOB
    ) js
    FROM connected_items r WHERE is_loop = 0
)
select
  JSON_QUERY(   
     XMLCAST(   
        XMLAGG(
            XMLELEMENT(e,
                  CASE WHEN rn = 1 THEN '[' END ||
                  CASE
                    WHEN lvl - lag_lvl = 1 THEN ',"children":['    -- Level incremented by one, so child level, start array
                    WHEN lvl > 1 then ','                          -- appending when not first level
                    WHEN rn>1 AND parent_id IS NULL THEN ','      -- appending when a root node but not the first one
                  END ||
                  SUBSTR(js, 1, LENGTH(js) - 1) ||                  -- remove last brace, as we are controlling children
                  CASE
                    WHEN lvl >= lead_lvl then '}' ||               -- Level same or greater than next level, so close json_object
                         RPAD(' ', (lvl - lead_lvl) * 2 + 1, ']}') -- and add as many closing array / object blocks as required
                  END ||
                  CASE WHEN llead_lvl IS NULL THEN ']' END          -- when the latest node 
                )
                ORDER BY rn
            )
    AS CLOB
    )
  , '$' RETURNING CLOB PRETTY
) 
    as js
from rel_hier_with_leadlag ;
Sign up to request clarification or add additional context in comments.

2 Comments

well, for me it's not "new" code: I use it (the "algorithm") in another context for my own needs... and furthermore I have no idea about the amount of work required to convert it from ORACLE to PostgreSQL... (it's using JSON_OBJECT, JSON_QUERY, XMLCAST, XMLAGG, XMLELEMENT in the final SELECT and also SEARCH DEPTH FIRST and CYCLE on the recursive CTE) and much less to adapt it to output the desired JSON format (and I have doubt about the feasibility of this part) dbfiddle.uk/xcEdeBnI
@p3consulting put the code in the link in the answer, plz
1

Very late to the party, but I think the easiest way to solve this is with a recursive function. This can take parameters of the starting node and the maximum depth to search to. This function also has a third input of seen nodes to prevent cycling along any branch of the tree:

CREATE OR REPLACE FUNCTION build_node(node INT, max_depth INT, seen INT[] default ARRAY[]::INT[])
RETURNS JSONB
AS $$
DECLARE 
  _conns JSONB;
BEGIN
  IF max_depth = 0 THEN
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node)
    );
  END IF;
  seen = ARRAY_APPEND(seen, node);
  SELECT jsonb_agg(json_build_object('title', title, 
    CASE WHEN origin_item_id = node THEN 'destination_item' ELSE 'origin_item' END,
    build_node(CASE WHEN origin_item_id = node THEN destination_item_id ELSE origin_item_id END, max_depth - 1, seen)
    )
  ) INTO _conns
  FROM connections
  WHERE origin_item_id = node AND NOT destination_item_id = ANY(seen)
     OR destination_item_id = node AND NOT origin_item_id = ANY(seen);
  IF jsonb_array_length(_conns) > 0 THEN
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node),
      'connections', _conns
    );
  ELSE
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node)
    );
  END IF;
END;
$$
LANGUAGE plpgsql

Sample usage:

SELECT build_node(2, 2)

Output (for your sample data):

{
    "title": "Pyongyang",
    "connections": [{
        "title": "Same Author - Guy Delisle - 1",
        "destination_item": {
            "title": "Shenzhen",
            "connections": [{
                "title": "Same Author - Guy Delisle - 3",
                "origin_item": {
                    "title": "Jerusalem"
                }
            }]
        }
    }, {
        "title": "Same Author - Guy Delisle - 2",
        "origin_item": {
            "title": "Burma"
        }
    }]
}

Demo on dbfiddle

2 Comments

Damn. If I could, I would create a second bounty just for this answer. Thanks a lot, @Nick!
@PhilippeFanaro no worries at all - I'm glad you found it useful.
1

For a maximum depth restriction try the following logic (i.e. DIY level counter)

WITH RECURSIVE t(depth) AS (
    SELECT 1
    UNION ALL
    SELECT depth+1 FROM t WHERE depth < 10 -- specify max depth here
)
SELECT n FROM t;

e.g:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
         it.title AS destination_item_title,
         1 AS depth,
         c.origin_item_id AS parent_id
    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
    WHERE c.origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT 
        c.connection_id,
        c.connection_title,
        c.origin_item_id,
        c.origin_item_title,
        c.destination_item_id,
        c.destination_item_title,
        ci.depth + 1 AS depth,
        ci.connection_id AS parent_id
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
    WHERE ci.depth < ${maxDepth}
)

and (untested) this may help on the nested JSON:

SELECT 
    ci.connection_id, 
    json_build_object(
        'title', ci.connection_title,
        'origin_item', json_build_object(
            'id',    ci.origin_item_id,
            'title', ci.origin_item_title
        ),
        'destination_item', json_build_object(
            'id',    ci.destination_item_id,
            'title', ci.destination_item_title
        ),
        'children', json_agg(ci) FILTER (WHERE ci.parent_id = c.connection_id)
    ) connection
FROM connected_items ci
GROUP BY ci.connection_id,
    ci.connection_title,
    ci.origin_item_id,
    ci.origin_item_title,
    ci.destination_item_id,
    ci.destination_item_title

1 Comment

I think I did try your suggestion about depth. It didn't quite work because it would just get lost in cycles. I then tried everything in section 7.8.2.2. Cycle Detection of the PostgreSQL docs, but with no success.
1
+100

A little late to answer but, to be able to build such a hierarchical structure with nodes where each one might be connected with other and there could be cycles, one would have to keep track of what goes in what place, as also pointed out by @p3consulting -

mainly after the recursive query, you need another CTE to calculate LAG and LEAD levels in the hierarchy + the JSON of the item node, and the final query will add the "children" node and use the lead/lag infos to close the hierarchy correctly

In order to do that (keep a track of what goes where), we'll have to attach some meta data to a row, or calculate it. I approached it with adding some meta data, essentially, which row is a parent to what row and what's the depth at which a particular row would exist. With that meta data being added, we can use it to build the hierarchy incrementally.


Few Things to Note

  • This script works under the assumption that, although cycles may be there in the data, we only wish to show all the nodes reached in our hierarchy without any cycles. To wit, there won't be same hierarchical data at any different depths. I'll talk about it further in the answer as well.
  • This script works for only this scenario (when there'll be at most 2 connections with each node) and doesn't use a generalized approach, so to say. However, it's easy to take hints from it to extend it further.
  • Snippets of scripts are added in-between the explanation, might not work when used standalone, so refer to the final script for that.
  • This approach seems pretty dirty.

The Approach

Recursive Breadth Traversal with Max Depth

In order to do a recursive breadth traversal with max depth, I think it's better to always look in both the directions so we can modify OP's script to this -

WITH RECURSIVE connections_and_items AS (
    SELECT
        c.id AS connection_id,
        c.title AS connection_title,
        c.origin_item_id AS origin_item_id,
        i.title AS origin_item_title,
        c.destination_item_id AS destination_item_id,
        it.title AS destination_item_title,
        1 AS depth,
        `${itemId}`::UUID AS parent,
        gen_random_uuid() AS child_of
    FROM connections c
    JOIN items i
     ON i.id = c.origin_item_id
    JOIN items it
     ON it.id = c.destination_item_id
), base_items AS (
    SELECT 
        *
    FROM connections_and_items cai
    WHERE origin_item_id = `${itemId}`::UUID OR destination_item_id = `${itemId}`::UUID
), connected_items AS (
    SELECT 
        *
    FROM base_items
    
    UNION
    
    SELECT
        cai.connection_id,
        cai.connection_title,
        cai.origin_item_id,
        cai.origin_item_title,
        cai.destination_item_id,
        cai.destination_item_title,
        ci.depth + 1 AS depth,
        CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
        CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
    FROM connections_and_items cai, connected_items ci
    WHERE ci.depth < `${maxDepth}`
    AND ((ci.destination_item_id = cai.destination_item_id)
    OR (ci.origin_item_id = cai.origin_item_id))
    
)

Where base_items is the very first result from the query and then in connected_items we do a recursive union query. Here, depth is, well the depth, child_of is the UUID of the row which the record will be a child of, and parent is going to be the UUID of the parent, in the record. For now the definition might seem ambiguous, but it'll become clearer later on.

Build Initial Objects Record

For all the rows that we've from our query, we'll build an initial jsonb object for them, that's going to go in the result, to a column called json_conn, adding another CTE, taking advantage of the fact that if parent is not equal to the destination_item_id we know that the origin_item_title would be the title and vice-versa -

initial AS (
    SELECT
    DISTINCT ON (connection_id)
             connection_title,
             depth,
             parent,
             child_of,
             jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
    FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
)

With this, we'll get another column json_conn where a value would look like -

 {
    "title": "Shenzen",
    "connections": [
      {
        "title": "Same Author - Guy Delisle - 3",
        "origin_item": {
          "title": "Jerusalem"
        }
      }
    ]
  }

Depicting the relation, between destination and origin, keeping the order in which the nodes were scanned.

Note that we're doing a DISTINCT ON (connection_id) to remove any duplicate connections, and hence remove cycles, if we don't add this, the result might become unexpected, because with only information to keep track of nodes being the depth and parent, the script gets confused. If a hierarchy where all the duplicate cycles are added too is expected, we'll have to add an FQP (Fully Qualified Path) metadata to keep a track.

With this, we'll get non-duplicate connections, with each depth existing at most twice.

Update Initial Record Objects

Our initial object records show the relationship between parent and child, however, we don't have the name of the connection for where the parent is coming from, so we'll add that like so -

intermediate AS (
SELECT * FROM (SELECT 
    i.depth, 
    i.parent, 
    i.child_of,
    CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
    ELSE NULL
    END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1)

We make an inner join on the intial CTE with condition to match that a child_of matches a parent where the difference in depth is 1, from there, we can determine, whether the parent was scanned as an origin_item or a destination_item. We also, merge the 2 records that we've for the depth 1, because we can at this point, and it won't cause no harm This is how the record at depth 1 will look -

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    },
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzen"
      }
    }
  ]
}

And this is how any record below depth 1 will look, with connection information (compare it with what we had before in initial) -

{
  "title": "Same Author - Guy Delisle - 1",
  "destination_item": {
    "title": "Shenzen",
    "connections": [
      {
        "title": "Same Author - Guy Delisle - 3",
        "origin_item": {
          "title": "Jerusalem"
        }
      }
    ]
  }
}

At this point, all the json_conn's from the all the rows are ready to be merged by depth for their children and then wrapped in an array and added to the connections of their parent.

We can either build the hierarchy from top to bottom or from bottom to top. If we go with former option, we'll have to manipulate a lots of nested data, to avoid that, we'll do a bottom to top build. For that, we'll have to go from the maximum depth to the least and merge the records.

Towards Final Result

We'll add another recursive CTE, to build the hierarchy from bottom up, like so -

max_depth AS (
    SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
    
    SELECT i.*  FROM intermediate i, max_depth WHERE i.depth = max_depth
    
    UNION
    
    (SELECT
        fr.depth - 1,
        i.parent,
        i.child_of,
        CASE 
          WHEN fr.child_of = i.parent THEN
            CASE 
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
            END
          ELSE i.json_conn END AS json_conn
    FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
    
)

That looks really nasty, however, what we're trying to do is, start from bottom, take that record, and check in our intermediate CTE to see where does the final_result.child_of matches intermediate.parent with a difference of 1 in depth, and for the json_conn column, check the membership with the position of the connection and add it there. All these nested CASE-WHEN-ELSE-END look terrible and could probably be reduced by using jsonb_path_exists, I didn't do it because, well, laziness.

With this query, we'll get the record(s) for depth 2 with all the children merged where they need to be merged, and all we need to do is aggregate it in an array and replace the connections value of the depth 1 record that we'd merged earlier. A simple final statement get the merged result will do the trick -

SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;

Note the CASE here on max_depth, when the max_depth is 1, the final_result performs really minor calculations and results in null because there's nothing to merge and we'd already gotten our result in intermediate so we get the result conditionally.

Complete Script

You can find the complete script here. I couldn't add the complete script here because the markdown was giving me a hard time, with code formatting. Feel free to update the answer and add it here, if you know how to go about it. The script has hardcoded values for itemId and maxDepth from my testing.

Test

With this cyclic data in my tables -

// items table
                  id                  |   title   
--------------------------------------+-----------
 908b18ea-6a98-45a5-bdf9-1dcf7f97f35c | Software
 999c02fd-8112-4f01-864a-d29a423d1704 | Burma
 79cd129f-ffd7-468e-9551-2eca974a6308 | Shenzen
 cbf5e622-773b-4b42-8002-1be91cf4efda | Jerusalem
 bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Hostage
 b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Pyongyang

// connections table

                  id                  |            origin_item_id            |         destination_item_id          |             title             
--------------------------------------+--------------------------------------+--------------------------------------+-------------------------------
 be5e99f2-e272-4d3a-ac0b-183805901ad9 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 1
 6a74d269-0573-4ad9-a66a-5e681032250e | 999c02fd-8112-4f01-864a-d29a423d1704 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Same Author - Guy Delisle - 2
 1bf720a6-27f0-4fca-b4a0-0f4ff34965eb | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | cbf5e622-773b-4b42-8002-1be91cf4efda | Same Author - Guy Delisle - 4
 8477cf0f-8a39-42d0-90bf-d9bfaa31a0a0 | cbf5e622-773b-4b42-8002-1be91cf4efda | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 3
 01e3b341-0ca3-4049-abdf-0c4d1ab0aa52 | cbf5e622-773b-4b42-8002-1be91cf4efda | 999c02fd-8112-4f01-864a-d29a423d1704 | Same Author - Guy Delisle - 5
 d237bc88-1fa3-411f-baec-d003da193a51 | 999c02fd-8112-4f01-864a-d29a423d1704 | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Same Author - Guy Delisle - 6

And running the script attached (depth - 3), we get this -

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 6",
            "destination_item": {
              "title": "Hostage"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 3",
            "origin_item": {
              "title": "Jerusalem",
              "connections": [
                {
                  "title": "Same Author - Guy Delisle - 5",
                  "destination_item": {
                    "title": "Burma"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

Final script

WITH RECURSIVE connections_and_items AS (
    SELECT
        c.id AS connection_id,
        c.title AS connection_title,
        c.origin_item_id AS origin_item_id,
        i.title AS origin_item_title,
        c.destination_item_id AS destination_item_id,
        it.title AS destination_item_title,
        1 AS depth,
        'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID AS parent,
        gen_random_uuid() AS child_of
    FROM connections c
    JOIN items i
     ON i.id = c.origin_item_id
    JOIN items it
     ON it.id = c.destination_item_id
), base_items AS (
    SELECT 
        *
    FROM connections_and_items cai
    WHERE origin_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID OR destination_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID
), connected_items AS (
    SELECT 
        *
    FROM base_items
    
    UNION
    
    SELECT
        cai.connection_id,
        cai.connection_title,
        cai.origin_item_id,
        cai.origin_item_title,
        cai.destination_item_id,
        cai.destination_item_title,
        ci.depth + 1 AS depth,
        CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
        CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
    FROM connections_and_items cai, connected_items ci
    WHERE ci.depth < 3
    AND ((ci.destination_item_id = cai.destination_item_id)
    OR (ci.origin_item_id = cai.origin_item_id))
    
),
initial AS (
    SELECT
    DISTINCT ON (connection_id)
             connection_title,
             depth,
             parent,
             child_of,
             jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
    FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
), intermediate AS (
SELECT * FROM (SELECT 
    i.depth, 
    i.parent, 
    i.child_of,
    CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
    ELSE NULL
    END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1),
max_depth AS (
    SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
    
    SELECT i.*  FROM intermediate i, max_depth WHERE i.depth = max_depth
    
    UNION
    
    (SELECT
        fr.depth - 1,
        i.parent,
        i.child_of,
        CASE 
          WHEN fr.child_of = i.parent THEN
            CASE 
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
            END
          ELSE i.json_conn END AS json_conn
    FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
    
) SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.