4

I have a simple table that store precipitation readings from online gauges. Here's the table definition:

    CREATE TABLE public.precip
    (
        gauge_id smallint,
        inches numeric(8, 2),
        reading_time timestamp with time zone
    )

    CREATE INDEX idx_precip3_id
        ON public.precip USING btree
        (gauge_id)

    CREATE INDEX idx_precip3_reading_time
        ON public.precip USING btree
        (reading_time)

CREATE INDEX idx_precip_last_five_days
    ON public.precip USING btree
    (reading_time)
    TABLESPACE pg_default    WHERE reading_time > '2017-02-26 00:00:00+00'::timestamp with time zone

It's grown quite large: about 38 million records that go back 18 months. Queries rarely request rows that are more than 7 days old and I created the partial index on the reading_time field so Postgres can traverse a much smaller index. But it's not using the partial index on all queries. It does use the partial index on

explain analyze select * from precip where gauge_id = 208 and reading_time > '2017-02-27' 
            Bitmap Heap Scan on precip  (cost=8371.94..12864.51 rows=1169 width=16) (actual time=82.216..162.127 rows=2046 loops=1)   
            Recheck Cond: ((gauge_id = 208) AND (reading_time > '2017-02-27 00:00:00+00'::timestamp with time zone))
           ->  BitmapAnd  (cost=8371.94..8371.94 rows=1169 width=0) (actual time=82.183..82.183 rows=0 loops=1)
                ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=20.754..20.754 rows=125601 loops=1)
                      Index Cond: (gauge_id = 208)
                ->  Bitmap Index Scan on idx_precip_last_five_days  (cost=0.00..6135.13 rows=331560 width=0) (actual time=60.099..60.099 rows=520867 loops=1) 
    Total runtime: 162.631 ms

But it does not use the partial index on the following. Instead, it's use the full index on reading_time

 explain analyze select * from precip where gauge_id = 208 and reading_time > now() - interval '7 days' 

Bitmap Heap Scan on precip  (cost=8460.10..13007.47 rows=1182 width=16) (actual time=154.286..228.752 rows=2067 loops=1)
   Recheck Cond: ((gauge_id = 208) AND (reading_time > (now() - '7 days'::interval)))
      ->  BitmapAnd  (cost=8460.10..8460.10 rows=1182 width=0) (actual time=153.799..153.799 rows=0 loops=1)
              ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=15.852..15.852 rows=125601 loops=1)
                   Index Cond: (gauge_id = 208)
        ->  Bitmap Index Scan on idx_precip3_reading_time  (cost=0.00..6223.28 rows=335295 width=0) (actual time=136.162..136.162 rows=522993 loops=1)
              Index Cond: (reading_time > (now() - '7 days'::interval))
Total runtime: 228.647 ms

Note that today is 3/5/2017, so these two queries are essentially requesting the rows. But it seems like Postgres won't use the partial index unless the timestamp in the where clause is "hard coded". Is the query planner not evaluating now() - interval '7 days' before deciding which index to use? I posted the query plans as suggested by one of the first people to respond.
I've written several functions (stored procedures) that summarize rain fall in the last 6 hours, 12 hours .... 72 hours. They all use the interval approach in the query (e.g., reading_time > now() - interval '7 days'). I don't want to move this code into the application to send Postgres the hard coded timestamp. That would create a lot of messy php code that shouldn't be necessary.

Suggestions on how to encourage Postgres to use the partial index instead? My plan is to redefine the date range on the partial index nightly (drop index --> create index), but that seems a bit silly if Postgres isn't going to use it.

Thanks,

Alex

3
  • You need one or two composite indexes on {gauge_id, reading_time} They maybe even could be unique. BTW: the smallint makes no sense, and could be replaced by an ordinary integer, IMHO. Commented Mar 5, 2017 at 18:18
  • 1
    Please edit your question and add the execution plan generated using explain (analyze, verbose). Formatted text please, no screen shots Commented Mar 5, 2017 at 19:26
  • @wildplasser, I chose small integers (2 bytes) because, according to the postgres documentation, they're half the storage size of integers (4 bytes). And my system will never have more than 32,767 gauges; it'll probably never have more than 300 gauges. I need to index gauge_id, so using smallint should produce an index that's half the size of integer. Commented Mar 5, 2017 at 22:47

2 Answers 2

9

Generally speaking, an index can be used, when the indexed column(s) is/are compared to constants (literal values), function calls, which are marked at least STABLE (which means that within a single statement, multiple calls of the functions -- with same parameters -- will produce the same results), and combination of those.

now() (which is an alias of current_timestamp) is marked as STABLE and timestamp_mi_interval() (which is the back-up function for the operator <timestamp> - <interval>) is marked as IMMUTABLE, which is even stricter than STABLE (now(), current_timestamp and transaction_timestamp marks the start of the transaction, statement_timestamp() marks the start of the statement -- still STABLE -- but clock_timestamp() gives the timestamp as seen on a clock, thus it is VOLATILE).

So in theory, the WHERE reading_time > now() - interval '7 days' should be able to use an index on the reading_time column. And it really does. But, since you defined a partial index, the planner needs to prove the following:

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time.

And that is what is happening with your query, which has and reading_time > now() - interval '7 days'. By the time now() - interval '7 days' is evaluated, the planning already happened. And PostgreSQL couldn't prove that the predicate (reading_time > '2017-02-26 00:00:00+00') will be true. But when you used reading_time > '2017-02-27' it could prove that.

You could "guide" the planner with constant values, like this:

where gauge_id = 208
and   reading_time > '2017-02-26 00:00:00+00'
and   reading_time > now() - interval '7 days'

This way the planner realizes, that it can use the partial index, because indexed_col > index_condition and indexed_col > something_else implies that indexed_col will larger than (at least) index_condition. Maybe it will be larger than something_else too, but it doesn't matter for using the index.

I'm not sure if that is the answer you were looking for though. IMHO, if you have a really large amount of data (and PostgreSQL 9.5+) a single BRIN index might suit your needs better.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the detailed explanation. I hope my delayed response doesn't imply that I don't value your response; I've had a lot of fires crop up that needed to be extinguished.
<continuing my comment above>But you've confirmed what I suspected: postgres isn't evaluating now() - interval '7 days' during query planning and execution. What do you think would happen if I used a function to return the 7 day timestamp as a string and then put that string in my where clause? Thanks for the suggestion with guiding the query. But if I have to send it a timestamp as a string, then I'm kind of back to square one. Out of space...
@Debaser if you evaluate now() - interval '7 days' in your client & send the result to the query, it should use the index. Like the one, which only does reading_time > '2017-02-27'. However, maintaining that kind of index is still a lot of mess. I strongly suggest that you should try the BRIN index, maybe with some CLUSTERing. And, if you have PostgreSQL 9.4-, upgrading really has more benefits than just BRIN.
0

Queries are planned and then cached for possible later use, which includes the choice of indexes to apply. Since your query includes the volatile function now(), the partial index can not be used because the planner has no certainty about what the volatile function will return and thus if it will match the partial index. Any human reading the query will understand that the partial index would be a match, but the planner is not that smart that it knows what now() does; the only thing it knows is that it is a volatile function.

A better solution in your case would be to partition the table into smaller chunks based on the reading_time. Properly crafted queries will then only access a single partition.

1 Comment

now() is stable, just like current_timestamp. clock_timestamp() is the one, which is volatile.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.