I wanted to get any ideas/information on how to attack the following performance issue.
Given the following query:
select *
from station_status_history
where station_id = <...>
order by timestamp desc
limit 1;
It will perform rather different for 2 different station_id values:
- station 1: query will be fast. there are roughly 6000k records for this station id. Latest records are no older than a few minutes ago from
now - station 2: query will be very slow. there is only 1 record in the table and it is generally quite old (e.g. even up to one year old)
Here are the to query plans for each station:
The fast one:
Limit (cost=0.56..644.44 rows=1 width=134) (actual time=0.105..0.106 rows=1 loops=1)
Buffers: shared hit=15 read=1 dirtied=1
-> Index Scan Backward using station_status_history_pk on station_status_history (cost=0.56..3726099.44 rows=5787 width=134) (actual time=0.104..0.104 rows=1 loops=1)
Index Cond: (station_id = 17453)
Buffers: shared hit=15 read=1 dirtied=1 <=========
Total runtime: 0.128 ms
The slow one:
Limit (cost=0.56..644.44 rows=1 width=134) (actual time=5040.417..5040.418 rows=1 loops=1)
Buffers: shared hit=12804 read=730743 written=1060
-> Index Scan Backward using station_status_history_pk on station_status_history (cost=0.56..3726099.44 rows=5787 width=134) (actual time=5040.415..5040.415 rows=1 loops=1)
Index Cond: (station_id = 16799)
Buffers: shared hit=12804 read=730743 written=1060 <=========
Total runtime: 5040.467 ms
My suspicion was the records's age and its relation with the Buffers information but didn't know how to go about it to fix it
PG version:
PostgreSQL 9.3.25 on x86_64-pc-linux-musl, compiled by gcc (Alpine 6.4.0) 6.4.0, 64-bit
Table size and description
- records: 42 million
- total size: 16 GB
# \d+ station_status_history;
Table "public.station_status_history"
Column | Type | Modifiers | Storage | Stats target | Description
---------------------------------+--------------------------+-----------+---------+--------------+-------------
timestamp | timestamp with time zone | not null | plain | |
station_id | integer | not null | plain | |
is_resampled | boolean | not null | plain | |
weight | integer | | plain | |
...
Indexes:
"station_status_history_pk" PRIMARY KEY, btree ("timestamp", station_id, is_resampled)
"ix_station_status_history_id" UNIQUE, btree (id)
Foreign-key constraints:
"station_status_history_station_id_fkey" FOREIGN KEY (station_id) REFERENCES station(id) ON DELETE CASCADE
"station_status_history_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id)
Referenced by:
TABLE "operation" CONSTRAINT "operation_station_status_history_id_fkey" FOREIGN KEY (station_status_history_id) REFERENCES station_status_history(id)
TABLE "task" CONSTRAINT "task_station_status_history_id_fkey" FOREIGN KEY (station_status_history_id) REFERENCES station_status_history(id)
TABLE "task_update" CONSTRAINT "task_update_station_status_history_id_fkey" FOREIGN KEY (station_status_history_id) REFERENCES station_status_history(id)
Has OIDs: no