0

I cannot grasp how the JSONB_ARRAY_ELEMENTS function works and why it expands the rows the way it does.

Please, consider this simplistic table:

CREATE TABLE test_jsonb_expand (
   id SERIAL PRIMARY KEY,
   data JSONB NOT NULL
);

We fill it with example data:

INSERT INTO test_jsonb_expand (data) 
VALUES 
('{"events": [{"id": 1, "type": "enter"}, {"id": 141, "type": "exit"}], "other_data": ["whatever#1"]}'),
('{"events": [{"id": 1, "type": "enter"}, {"id": 150, "type": "exit"}], "other_data": ["whatever#1", "whatever#2"]}'),
('{"events": [{"id": 1, "type": "enter"}, {"id": 300, "type": "exit"}], "other_data": ["whatever#1", "whatever#2", "whatever#3"]}');

And then run this query:

SELECT
    id,
    JSONB_ARRAY_ELEMENTS(data->'events')->>'id' AS event_id,
    JSONB_ARRAY_ELEMENTS(data->'events')->>'type' AS event_type,
    JSONB_ARRAY_ELEMENTS(data->'other_data')->>0 AS other_data
FROM
    test_jsonb_expand;

In my mind it is not a query that can be meaningfully run, as it's not clear how to expand the multiple JSONB_ARRAY_ELEMENTS invocations on different values of the data column. However, postgres returns a meaningful enough result:

 id | event_id | event_type | other_data 
----+----------+------------+------------
  1 | 1        | enter      | whatever#1
  1 | 141      | exit       | 
  2 | 1        | enter      | whatever#1
  2 | 150      | exit       | whatever#2
  3 | 1        | enter      | whatever#1
  3 | 300      | exit       | whatever#2
  3 |          |            | whatever#3

Which kind of makes sense, but kind of doesn't in a way.

  1. Why are the results insertion-ordered?
  2. Is this a reliable behavior? Or it's just an implementation detail and a large enough dataset is not guaranteed to work this way?
  3. How does postgres know how to expand this kind of query?
  4. Is there any documentation regarding this syntax?

Sorry, if there are too many questions. A link to the docs would be sufficient, as I haven't been able to find an explanation.

6
  • 1
    1. They happen to be fetched in that order, but at some point they might stop. Without a specific order by, PostgreSQL doesn't guarantee any sort of order. 2. It's not reliable, it'll break when scaled up. 3. It doesn't. It just goes with whatever seems convenient based on what's currently cached, how it's stored, etc. 4. If the manual isn't sufficient, you can take a look at the source - it's pretty readable, with a lot of clarifying comments. Commented Jun 10, 2024 at 19:49
  • 1
    I think the only guarantee you get here is that all 3 elements in a given result row will originate from the same data value. I don't think it's likely it'll ever be convenient for Postgres to shuffle those results in a way that would alternate between source values, especially not if they come from the same page. Commented Jun 10, 2024 at 19:55
  • @Zegarek thank you. I would accept this answer, if you decided to post it. Commented Jun 10, 2024 at 21:00
  • Using set-returning functions in the SELECT is a Postgres peculiarity, and has many weird things about it. See link. The only actually reliable behavior is that the lateral joins run in lockstep until they are all exhausted, giving nulls for any exhausted functions. But the actual result of the functions are non-deterministic without an ORDER BY. Commented Jun 10, 2024 at 23:49
  • @Zegarek 1. But the nulls always come at the end ie the functions are always retrieved in lockstep from the first row, with nulls for the remaining rows for any exhausted functions. And you can use WITH ORDINALITY for deterministic results. 2. Scaling doesn't necessarily come into it, it might break for many other reasons. 3. Lateral joins in the SELECT are well-defined since v10.0, if rather weird. 4. Duplicate Link has some obscure docs linked postgresql.org/docs/current/… and postgresql.org/docs/10/release-10.html Commented Jun 10, 2024 at 23:56

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.