query the jsonb array data by specific field condition with OR and ORDER BY constraint really slow

Ask Question

Asked 3 years ago

Modified 3 years ago

Viewed 46 times

I have a postgresql table named events

Name	Data Type
id	character varying
idx	integer
module	character varying
method	character varying
idx	integer
block_height	integer
data	jsonb

where data is like

["hex_address", 101, 1001, 51637660000, 51324528436, 6003, 235458709417729, 234683610487930] //trade method
["hex_address", 200060013, 1, 250000000000, 3176359357709794, 6006, "0x00000000000000001431f1724a499fcd", 114440794831460] //add method
["hex_address", 200060013, 1, 42658905229340, 407285893749, "0x000000000000000000110204f76c06e2", 6006, 121017475390243, "0x000000000000000013bd9463821aedee"] //remove method

And the table is about 3 million items.The indexes have been created

CREATE INDEX IDX_event_multicolumn_index ON public.events ("module", "method", "block_height") INCLUDE ("id", "idx", "data");
CREATE INDEX event_data_gin_index ON public.events USING gin (data)

When I run the SQL SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved')) OR ((data::jsonb->5 IN ('6002', '6003') AND method IN ('Traded', 'LiquidityAdded')))) ORDER BY block_height DESC, idx DESC LIMIT 500;

it'll take about 1 minute. the explain analyze result(this is another table with less items):

"Limit  (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.484..14019.524 rows=100 loops=1)"
"  ->  Sort  (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.477..14019.504 rows=100 loops=1)"
"        Sort Key: block_height DESC, idx DESC"
"        Sort Method: top-N heapsort  Memory: 128kB"
"        ->  Bitmap Heap Scan on events  (cost=114.28..1909.15 rows=5 width=334) (actual time=703.038..13957.503 rows=25625 loops=1)"
"              Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"              Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"              Rows Removed by Filter: 9435"
"              Heap Blocks: exact=28532"
"              ->  BitmapOr  (cost=114.28..114.28 rows=462 width=0) (actual time=696.569..696.580 rows=0 loops=1)"
"                    ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..4.59 rows=3 width=0) (actual time=24.375..24.382 rows=896 loops=1)"
"                          Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
"                    ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..109.69 rows=459 width=0) (actual time=672.191..672.191 rows=34164 loops=1)"
"                          Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"

If I delete the multicolumn index, it does be faster and take about 20 seconds.

"Gather Merge  (cost=477713.00..477720.00 rows=60 width=130) (actual time=22151.357..22210.826 rows=79864 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Sort  (cost=476712.97..476713.05 rows=30 width=130) (actual time=22090.960..22097.308 rows=26621 loops=3)"
"        Sort Key: block_height"
"        Sort Method: external merge  Disk: 5400kB"
"        Worker 0:  Sort Method: external merge  Disk: 5264kB"
"        Worker 1:  Sort Method: external merge  Disk: 5416kB"
"        ->  Parallel Seq Scan on events  (cost=0.00..476712.24 rows=30 width=130) (actual time=5.151..21985.878 rows=26621 loops=3)"
"              Filter: (((module)::text = 'amm'::text) AND ((((data -> 6) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))))"
"              Rows Removed by Filter: 2858160"
"Planning Time: 0.559 ms"
"Execution Time: 22217.351 ms"

Then I try to create the specific field multicolumn index:

create EXTENSION btree_gin;
CREATE INDEX extcondindex ON public.events USING gin (((data -> 5)), ((data -> 6)), module, method);

The result is just same with the origin multi column index. If I remove one of the OR constraint SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved'))) ORDER BY block_height DESC, idx DESC LIMIT 500; It's fast enough and takes about 3 seconds.

I want to know why the multi column index slow down the query, and how should I add the index for the specific field to optimize my query.

The io timing and buffer analyze for the multi-column index table

"Sort  (cost=1916.98..1916.99 rows=5 width=142) (actual time=14204.316..14210.109 rows=25683 loops=1)"
"  Sort Key: block_height"
"  Sort Method: external merge  Disk: 5264kB"
"  Buffers: shared hit=93 read=31211, temp read=658 written=659"
"  I/O Timings: read=13533.823"
"  ->  Bitmap Heap Scan on events  (cost=114.30..1916.92 rows=5 width=142) (actual time=926.714..14156.662 rows=25683 loops=1)"
"        Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"        Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
"        Rows Removed by Filter: 9503"
"        Heap Blocks: exact=28626"
"        Buffers: shared hit=90 read=31211"
"        I/O Timings: read=13533.823"
"        ->  BitmapOr  (cost=114.30..114.30 rows=464 width=0) (actual time=919.777..919.779 rows=0 loops=1)"
"              Buffers: shared hit=27 read=2648"
"              I/O Timings: read=892.499"
"              ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..4.59 rows=3 width=0) (actual time=32.255..32.255 rows=898 loops=1)"
"                    Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
"                    Buffers: shared hit=7 read=72"
"                    I/O Timings: read=30.095"
"              ->  Bitmap Index Scan on ""IDX_event_multicolumn_index""  (cost=0.00..109.71 rows=461 width=0) (actual time=887.519..887.520 rows=34288 loops=1)"
"                    Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
"                    Buffers: shared hit=20 read=2576"
"                    I/O Timings: read=862.404"
"Planning:"
"  Buffers: shared hit=230"
"Planning Time: 0.663 ms"
"Execution Time: 14214.038 ms"

edited Nov 20, 2022 at 11:17

asked Nov 19, 2022 at 16:48

Bruce

112 bronze badges

After you delete the multicolumn index but before you create another one, what is the faster plan that you got?

jjanes
– jjanes

2022-11-19 19:15:31 +00:00
Commented Nov 19, 2022 at 19:15
1

Why show us the plan for a different table? You say it takes a minute, that is not an impossibly long time to wait for the real answer.

jjanes
– jjanes

2022-11-19 19:17:17 +00:00
Commented Nov 19, 2022 at 19:17
You should turn on track_io_timing if it not already, then collect the plans with EXPLAIN (ANALYZE, BUFFERS)

jjanes
– jjanes

2022-11-19 19:26:30 +00:00
Commented Nov 19, 2022 at 19:26
@jjanes I have supply the plan without multi-column index. why I use another table is that the index cannot be recreated after delete, which complains the disk space is not enough. And I miss the plan before delete.

Bruce
– Bruce

2022-11-20 01:15:47 +00:00
Commented Nov 20, 2022 at 1:15
@jjanes But I think the execution process of the plan itself is no different except the data.

Bruce
– Bruce

2022-11-20 01:22:12 +00:00
Commented Nov 20, 2022 at 1:22

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

query the jsonb array data by specific field condition with OR and ORDER BY constraint really slow

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest