I have been given the task of importing data from a 3rd party. Not knowing what the data would look like, I imported it into a JSON field in a PostgreSQL table. Now to make things easier, I want to import the data into separate tables. I have created a dbfiddle that shows the schema I want to import from/to and the JSON I want to import.
https://www.db-fiddle.com/f/gYWUqVWvwJ7v4NtJCYhkvA/4
Here is a query I am using
SELECT
"Id"::integer as id, "TotalAmt" as total_amt, "CreateTime" as create_time,
"InvoiceRef"::integer as invoice_ref, "CustomerRef"::integer as customer_ref,
("CustomFields"->0->>'Value')::integer as order_number
FROM
jsonb_to_recordset((
SELECT data->'Payments'
FROM json_imports
WHERE id = 7
)) x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
Here is the explain plan copied from PgAdmin4
"Function Scan on jsonb_to_recordset x (cost=8.17..11.67 rows=100 width=56) (actual time=14.781..14.952 rows=333 loops=1)"
" InitPlan 1 (returns $0)"
" -> Index Scan using json_imports_pkey on json_imports (cost=0.15..8.17 rows=1 width=32) (actual time=13.561..14.034 rows=1 loops=1)"
" Index Cond: (id = 7)"
"Planning Time: 0.161 ms"
"Execution Time: 15.040 ms"
I am wondering if the sub-select is necessary. I previously tried to do something similar.
SELECT *
FROM json_imports
CROSS JOIN LATERAL jsonb_to_recordset(data->'Payments') as x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
where id = 7
Using the cross join lateral was taking much more time, and the explain plan includes a nested loop inner join, so I don't think that is what I wanted. Here is the explain plan for the cross join version.
"Nested Loop (cost=0.15..10.17 rows=100 width=257) (actual time=14.348..14.462 rows=333 loops=1)"
" -> Index Scan using json_imports_pkey on json_imports (cost=0.15..8.17 rows=1 width=89) (actual time=0.020..0.024 rows=1 loops=1)"
" Index Cond: (id = 7)"
" -> Function Scan on jsonb_to_recordset x (cost=0.01..1.00 rows=100 width=168) (actual time=14.321..14.350 rows=333 loops=1)"
"Planning Time: 0.104 ms"
"Execution Time: 15.107 ms"
It seems a little over-the-top to use a sub-select just to drill-down into the payments array. Is there a better or more idiomatic way to select the elements of a JSON array in PostgreSQL 15?
Update Erwin posted the following in his answer which prompted me to look at the queries in question a little closer.
You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.
I realized that the select *, in the second query is very different than the limited data I was selecting from data->'Payments' in the first query. There is a lot of data in the other top-level keys. Using select * took 3 minutes, then didn't show any data in PgAdmin4. Changing it to select x.* made the execution time similar to the first query. The first query took 0.092 seconds. After changing it to select x.*, the 2nd query took 0.062 seconds.
id = 7and on the cross join you are not, of course it will take more time as it will return everything.paymentsinto an actual table. Now you see why that might not be a good idea. 2) You might want to look at JSON path queries for another way to drill down into the JSON.EXPLAIN ANALYZE SELECT ...