I cannot grasp how the JSONB_ARRAY_ELEMENTS function works and why it expands the rows the way it does.
Please, consider this simplistic table:
CREATE TABLE test_jsonb_expand (
id SERIAL PRIMARY KEY,
data JSONB NOT NULL
);
We fill it with example data:
INSERT INTO test_jsonb_expand (data)
VALUES
('{"events": [{"id": 1, "type": "enter"}, {"id": 141, "type": "exit"}], "other_data": ["whatever#1"]}'),
('{"events": [{"id": 1, "type": "enter"}, {"id": 150, "type": "exit"}], "other_data": ["whatever#1", "whatever#2"]}'),
('{"events": [{"id": 1, "type": "enter"}, {"id": 300, "type": "exit"}], "other_data": ["whatever#1", "whatever#2", "whatever#3"]}');
And then run this query:
SELECT
id,
JSONB_ARRAY_ELEMENTS(data->'events')->>'id' AS event_id,
JSONB_ARRAY_ELEMENTS(data->'events')->>'type' AS event_type,
JSONB_ARRAY_ELEMENTS(data->'other_data')->>0 AS other_data
FROM
test_jsonb_expand;
In my mind it is not a query that can be meaningfully run, as it's not clear how to expand the multiple JSONB_ARRAY_ELEMENTS invocations on different values of the data column. However, postgres returns a meaningful enough result:
id | event_id | event_type | other_data
----+----------+------------+------------
1 | 1 | enter | whatever#1
1 | 141 | exit |
2 | 1 | enter | whatever#1
2 | 150 | exit | whatever#2
3 | 1 | enter | whatever#1
3 | 300 | exit | whatever#2
3 | | | whatever#3
Which kind of makes sense, but kind of doesn't in a way.
- Why are the results insertion-ordered?
- Is this a reliable behavior? Or it's just an implementation detail and a large enough dataset is not guaranteed to work this way?
- How does postgres know how to expand this kind of query?
- Is there any documentation regarding this syntax?
Sorry, if there are too many questions. A link to the docs would be sufficient, as I haven't been able to find an explanation.
order by, PostgreSQL doesn't guarantee any sort of order. 2. It's not reliable, it'll break when scaled up. 3. It doesn't. It just goes with whatever seems convenient based on what's currently cached, how it's stored, etc. 4. If the manual isn't sufficient, you can take a look at the source - it's pretty readable, with a lot of clarifying comments.datavalue. I don't think it's likely it'll ever be convenient for Postgres to shuffle those results in a way that would alternate between source values, especially not if they come from the same page.SELECTis a Postgres peculiarity, and has many weird things about it. See link. The only actually reliable behavior is that the lateral joins run in lockstep until they are all exhausted, giving nulls for any exhausted functions. But the actual result of the functions are non-deterministic without anORDER BY.WITH ORDINALITYfor deterministic results. 2. Scaling doesn't necessarily come into it, it might break for many other reasons. 3. Lateral joins in theSELECTare well-defined since v10.0, if rather weird. 4. Duplicate Link has some obscure docs linked postgresql.org/docs/current/… and postgresql.org/docs/10/release-10.html