1

I have been given the task of importing data from a 3rd party. Not knowing what the data would look like, I imported it into a JSON field in a PostgreSQL table. Now to make things easier, I want to import the data into separate tables. I have created a dbfiddle that shows the schema I want to import from/to and the JSON I want to import.

https://www.db-fiddle.com/f/gYWUqVWvwJ7v4NtJCYhkvA/4

Here is a query I am using

SELECT
    "Id"::integer as id, "TotalAmt" as total_amt, "CreateTime" as create_time,
    "InvoiceRef"::integer as invoice_ref, "CustomerRef"::integer as customer_ref,
    ("CustomFields"->0->>'Value')::integer as order_number
FROM
jsonb_to_recordset((
    SELECT data->'Payments'
      FROM json_imports
      WHERE id = 7
)) x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)

Here is the explain plan copied from PgAdmin4

"Function Scan on jsonb_to_recordset x  (cost=8.17..11.67 rows=100 width=56) (actual time=14.781..14.952 rows=333 loops=1)"
"  InitPlan 1 (returns $0)"
"    ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=32) (actual time=13.561..14.034 rows=1 loops=1)"
"          Index Cond: (id = 7)"
"Planning Time: 0.161 ms"
"Execution Time: 15.040 ms"

I am wondering if the sub-select is necessary. I previously tried to do something similar.

SELECT *
FROM json_imports
CROSS JOIN LATERAL jsonb_to_recordset(data->'Payments') as x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
where id = 7

Using the cross join lateral was taking much more time, and the explain plan includes a nested loop inner join, so I don't think that is what I wanted. Here is the explain plan for the cross join version.

"Nested Loop  (cost=0.15..10.17 rows=100 width=257) (actual time=14.348..14.462 rows=333 loops=1)"
"  ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=89) (actual time=0.020..0.024 rows=1 loops=1)"
"        Index Cond: (id = 7)"
"  ->  Function Scan on jsonb_to_recordset x  (cost=0.01..1.00 rows=100 width=168) (actual time=14.321..14.350 rows=333 loops=1)"
"Planning Time: 0.104 ms"
"Execution Time: 15.107 ms"

It seems a little over-the-top to use a sub-select just to drill-down into the payments array. Is there a better or more idiomatic way to select the elements of a JSON array in PostgreSQL 15?

Update Erwin posted the following in his answer which prompted me to look at the queries in question a little closer.

You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.

I realized that the select *, in the second query is very different than the limited data I was selecting from data->'Payments' in the first query. There is a lot of data in the other top-level keys. Using select * took 3 minutes, then didn't show any data in PgAdmin4. Changing it to select x.* made the execution time similar to the first query. The first query took 0.092 seconds. After changing it to select x.*, the 2nd query took 0.062 seconds.

6
  • 1
    So in the subselect you are filtering your data by id = 7 and on the cross join you are not, of course it will take more time as it will return everything. Commented Feb 27 at 23:45
  • 2
    1) It was your choice, I presume, to create a nested data structure instead of putting payments into an actual table. Now you see why that might not be a good idea. 2) You might want to look at JSON path queries for another way to drill down into the JSON. Commented Feb 28 at 0:08
  • Thank you @JorgeCampos. I have edited the question to filter by id = 7 also in the cross join lateral and it still takes forever. Commented Feb 28 at 0:23
  • 1
    And please provide query plans in human-readable form, not the JSON output. EXPLAIN ANALYZE SELECT ... Commented Feb 28 at 0:59
  • 1
    Also be aware that the timing you see in pgAdmin4 (in the query tool) includes network latency and additional overhead. There are more valid metrics. See: stackoverflow.com/questions/9063402/… Commented Feb 28 at 23:41

2 Answers 2

4

You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.
(Indeed, your question update shows just that.)

Either way, if you run this query on a somewhat regular basis, register the row type once:

CREATE TYPE payment_row AS ("Id" integer, "TotalAmt" Decimal, "CreateTime" timestamptz, "InvoiceRef" integer, "CustomerRef" integer, "CustomFields" jsonb);

And use jsonb_populate_recordset() instead, so you don't have to spell out the column definition list every time:

SELECT *
FROM   jsonb_populate_recordset(null::payment_row
                              , (SELECT data->'Payments' FROM json_imports WHERE id = 7)) p;

Or with a LATERAL join. Almost as fast and less convoluted:

SELECT p.*
FROM   json_imports j
     , jsonb_populate_recordset(null::payment_row, j.data->'Payments') p
WHERE  j.id = 7;

The comma in the FROM clause is just short syntax for CROSS JOIN LATERAL here.

Related:

Sign up to request clarification or add additional context in comments.

Comments

0

Thank you @AdrianKlaver for pointing me to JSON path queries. I have come up with the following alternative. It takes about the same amount of time as my previous solution and has a similar explain plan.

select
    (jsonb_path_query(data, '$.Payments[*]')->>'Id')::integer id,
    (jsonb_path_query(data, '$.Payments[*]')->>'TotalAmt')::decimal total_amt,
    cast(jsonb_path_query(data, '$.Payments[*]')->>'CreateTime' as timestamp with time zone) create_time,
    jsonb_path_query(data, '$.Payments[*]')->>'InvoiceRef' invoice_ref,
    jsonb_path_query(data, '$.Payments[*]')->>'CustomerRef' customer_ref,
    (jsonb_path_query(data, '$.Payments[*]')->'CustomFields'->0->>'Value')::integer order_number
from json_imports
where id = 7

Here is the explain plan

"Result  (cost=0.15..78.17 rows=1000 width=112) (actual time=13.389..14.545 rows=333 loops=1)"
"  ->  ProjectSet  (cost=0.15..13.18 rows=1000 width=32) (actual time=13.364..13.910 rows=333 loops=1)"
"        ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=32) (actual time=0.020..0.025 rows=1 loops=1)"
"              Index Cond: (id = 7)"
"Planning Time: 0.144 ms"
"Execution Time: 14.602 ms"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.