2

I have an edges table in my PostgreSQL database that represents the edges of a directed graph, with two columns: node_from and node_to (value is a node's id).

Given a single node (initial_node) I'd like to be able to traverse the entire graph, but in an undirected way.

What I mean is, for instance for the following graph :

(a->b)
(c->b)
(c->d)

If initial_node is a, b, c, or d, in any case, I would get [a, b, c, d].

I used the following SQL query (based on http://www.postgresql.org/docs/8.4/static/queries-with.html ):

WITH RECURSIVE search_graph(uniq, depth, path, cycle) AS (
        SELECT
          CASE WHEN g.node_from = 'initial_node' THEN g.node_to ELSE g.node_from END,
          1,
          CASE WHEN g.node_from = 'initial_node' THEN ARRAY[g.node_from] ELSE ARRAY[g.node_to] END,
          false
        FROM edges g
        WHERE 'initial_node' in (node_from, node_to)
      UNION ALL
        SELECT
          CASE WHEN g.node_from = sg.uniq THEN g.node_to ELSE g.node_from END,
          sg.depth + 1,
          CASE WHEN g.node_from = sg.uniq THEN path || g.node_from ELSE path || g.node_to END,
          g.node_to = ANY(path) OR g.node_from = ANY(path)
        FROM edges g, search_graph sg
        WHERE sg.uniq IN (g.node_from, g.node_to) AND NOT cycle
)
SELECT * FROM search_graph

It worked fine... Until I had a case with 12 nodes that are all connected together, in all directions (for each pair I have both (a->b) and (b->a)), which makes the query loops indefinitely. (Changing UNION ALL to UNION doesn't eliminate the looping.)

Has anyone any piece of advice to handle this issue?

Cheers,

Antoine.

1 Answer 1

1

I got to this, it should not get into infinite loops with any kind of data:

--create temp table edges ("from" text, "to" text);
--insert into edges values ('initial_node', 'a'), ('a', 'b'), ('a', 'c'), ('c', 'd');

with recursive graph(points) as (
  select array(select distinct "to" from edges where "from" = 'initial_node')
  union all
  select g.points || e1.p || e2.p
  from graph g
  left join lateral (
    select array(
      select distinct "to"
      from edges 
      where "from" =any(g.points) and "to" <>all(g.points) and "to" <> 'initial_node') AS p) e1 on (true)
  left join lateral (
    select array(
      select distinct "from"
      from edges 
      where "to" =any(g.points) and "from" <>all(g.points) and "from" <> 'initial_node') AS p) e2 on (true)
  where e1.p <> '{}' OR e2.p <> '{}'
  )
select distinct unnest(points)
from graph
order by 1

Recursive queries are very limiting in terms of what can be selected, and since they don't allow using the recursive results inside a subselect, one can't use NOT IN (select * from recursive where...). Storing results in an array, using LEFT JOIN LATERAL and using =ANY() and <>ALL() solved this conundrum.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot Ziggy ! The query doesn't loop indefinitely for the most complex cases and even for simple graphs, it's twice as fast as my previous implementation.
Just to add something, in my case for the initial part it's: SELECT DISTINCT "to" AS "uniq" FROM edges WHERE "from" = 'initial_node' UNION SELECT DISTINCT "to" AS "uniq" FROM edges WHERE "to" = 'initial_node')
I also had to add initial_node to the array (unnsest(points||array('initial_node')) because in some cases, I wouldn't get it in the result.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.