1

Schema:

create table records(
  id         varchar,
  updated_at bigint
);
create index index1 on records (updated_at, id);

Query. It iterates over recently updated records. Fetches 10 records, remembers the last one and then fetches next 10 and so on.

select * from objects
where updated_at > '1' or (updated_at = '1' and id > 'some-id')
order by updated_at, id
limit 10;

It uses index, but it doesn't uses it wisely and also applies filter and processes tons of records, see Rows Removed by Filter: 31575 in query explanation below.

The strange thing is that if you remove or and leave either left or right condition - it works well for both. But it seems like if can't figure out how to apply index correctly if both conditions are used simultaneously with or.

Limit  (cost=0.42..19.03 rows=20 width=1336) (actual time=542.475..542.501 rows=20 loops=1)
   ->  Index Scan using index1 on records  (cost=0.42..426791.29 rows=458760 width=1336) (actual time=542.473..542.494 rows=20 loops=1)
         Filter: ((updated_at > '1'::bigint) OR ((updated_at = '1'::bigint) AND ((id)::text > 'some-id'::text)))
         Rows Removed by Filter: 31575
 Planning time: 0.180 ms
 Execution time: 542.532 ms
(6 rows)

Postgres version is 9.6

3
  • ... where updated_at > '1' ... You should not quote integer literals. Commented Sep 24, 2017 at 10:44
  • @wildplasser I tried it without quotes, same thing. Commented Sep 24, 2017 at 10:48
  • width=1336 That is a very wide table, Commented Sep 24, 2017 at 11:13

2 Answers 2

2

I would try this as two separate queries, combining their results like this:

select *
from
  (
    select   *
    from     objects
    where    updated_at > 1
    order by updated_at, id
    limit    10
    union all
    select   *
    from     objects
    where    updated_at = 1
      and    id > 'some-id'
    order by updated_at, id
    limit    10
  ) t
order by updated_at, id
limit    10

My guess is that the two queries would each optimise pretty well and running both would be more efficient than the current one.

I would also make those columns NOT NULL if possible.

Sign up to request clarification or add additional context in comments.

1 Comment

Yea, I also thought about that. But I thought PostgreSQL is smart enough and maybe there's some mistake in my code...
2

There is an optimization of the calls to the index made by PostgreSQL.

For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

https://www.postgresql.org/docs/9.6/static/indexes-multicolumn.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.