0

I have a table with two numeric columns as primary key. In a transaction, I select the first (ordered by the primary key columns) N rows (for N=500 or so) for update, process them, then update them.

SELECT ...
ORDER BY pk1, pk2
LIMIT 500
FOR UPDATE

Now, I'm not sure what's the optimal approach to select those rows in the update's WHERE clause. I've tried this:

array[pk1, pk2] >= array[$first_pk1_value, $first_pk2_value]
AND array[pk1, pk2] <= array[$last_pk1_value, $last_pk2_value]

(where pk1 and pk2 are the primary key columns and ${first,last}_pk{1,2}_value is the scanned value for the corresponding column for the first and last rows scanned in the select)

Given that arrays are ordered lexicographically, just like ORDER BY pk1, pk2 does, this finds the right rows.

I've also tried the equivalent:

(pk1 = $first_pk1_value AND pk2 >= $first_pk2_value)
OR (pk1 > $first_pk1_value AND pk1 < $last_pk1_value)
OR (pk1 = $last_pk1_value AND pk2 <= $last_pk2_value)

Both work, but doing a sequential scan. Since the WHERE clause just expresses a range over the primary key, I'd expect Postgres to do an index scan.

Is it that Postgres just doesn't support selecting a range over multicolumn indexes, or am I doing something wrong?

1
  • maybe the "in approach" could be better for postgres (sorry, i use oracle. and ms-sql if forced too ;-) ) . like "where (pk1,pk2) in (array[])" stackoverflow.com/questions/6672665/… Commented Jul 16, 2020 at 13:10

1 Answer 1

1

Update

In response to Toni's comment, below, I tried to use a tuple as I would in python, and it worked much better than my original suggestion. Based on the analyze output, the implicit row(id1, id2) is compatible with the index backing the PK.

select version();
                                                                version                                                                 
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.12 (Ubuntu 10.12-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
(1 row)

explain analyze 
 select * from testidx_array 
  where (id1, id2) between (8, 150) and (9, 2000);

                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on testidx_array  (cost=423.81..1263.91 rows=19855 width=40) (actual time=1.772..4.148 rows=11851 loops=1)
   Recheck Cond: ((ROW(id1, id2) >= ROW(8, 150)) AND (ROW(id1, id2) <= ROW(9, 2000)))
   Heap Blocks: exact=54
   ->  Bitmap Index Scan on testidx_array_pkey  (cost=0.00..418.84 rows=19855 width=0) (actual time=1.722..1.722 rows=11851 loops=1)
         Index Cond: ((ROW(id1, id2) >= ROW(8, 150)) AND (ROW(id1, id2) <= ROW(9, 2000)))
 Planning time: 0.096 ms
 Execution time: 4.867 ms
(7 rows)

Old Answer, Below, Superseded

You should be able to force the use of the index by specifying the range for your pk1 and then including an and for the array[pk1, pk2] condition.

where pk1 between $first_pk1_value and $last_pk1_value
  and array[pk1, pk2] between array[$first_pk1_value, $first_pk2_value] 
                          and array[$last_pk1_value, $last_pk2_value]

This worked for me in a test table:

\d testidx_array
            Table "public.testidx_array"
  Column  |  Type   | Collation | Nullable | Default 
----------+---------+-----------+----------+---------
 id1      | integer |           | not null | 
 id2      | integer |           | not null | 
 somedata | text    |           |          | 
Indexes:
    "testidx_array_pkey" PRIMARY KEY, btree (id1, id2)

explain analyze
 select * from testidx_array 
  where array[id1, id2] between array[8,150] and array[9,2000];

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on testidx_array  (cost=0.00..1943.00 rows=500 width=40) (actual time=42.011..50.758 rows=11851 loops=1)
   Filter: ((ARRAY[id1, id2] >= '{8,150}'::integer[]) AND (ARRAY[id1, id2] <= '{9,2000}'::integer[]))
   Rows Removed by Filter: 88149
 Planning time: 0.151 ms
 Execution time: 51.325 ms
(5 rows)

explain analyze 
 select * from testidx_array 
  where id1 between 8 and 9 
    and array[id1, id2] between array[8,150] and array[9,2000];

                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on testidx_array  (cost=418.85..1258.91 rows=99 width=40) (actual time=2.278..11.109 rows=11851 loops=1)
   Recheck Cond: ((id1 >= 8) AND (id1 <= 9))
   Filter: ((ARRAY[id1, id2] >= '{8,150}'::integer[]) AND (ARRAY[id1, id2] <= '{9,2000}'::integer[]))
   Rows Removed by Filter: 8149
   Heap Blocks: exact=90
   ->  Bitmap Index Scan on testidx_array_pkey  (cost=0.00..418.82 rows=19853 width=0) (actual time=2.138..2.138 rows=20000 loops=1)
         Index Cond: ((id1 >= 8) AND (id1 <= 9))
 Planning time: 0.289 ms
 Execution time: 11.693 ms
(9 rows)



Sign up to request clarification or add additional context in comments.

2 Comments

That did the trick, thanks! But I'd like to understand why Postgres can't figure this out by itself. Is there a fundamental reason I'm missing or it's just not smart enough (yet)?
@ToniCárdenas Whoa! Please see my update. It looks like there is a solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.