Postgresql count performance

Question

I am doing a count query on a postgresql table. Table name is simcards containing fields id, card_state and 10 more. Simcards contains around 13 million records

My query is

SELECT CAST(count(*) AS INT) FROM simcards WHERE card_state = 'ACTIVATED';

This is taking more than 6 seconds and I want to optimize it. I tried creating partial index below

CREATE INDEX activated_count on simcards (card_state) where card_state = 'ACTIVATED';

But no improvements. I think it is because I got more than 12 million records with card_state = 'ACTIVATED'. Note that card_state can be 'ACTIVATED', 'PREPROVISIONED', 'TERMINATED'

Anyone got an idea on how the count can be drastically improved?

Running EXPLAIN (ANALYZE, BUFFERS) SELECT CAST(count(*) AS INT) FROM simcards WHERE card_state = 'ACTIVATED'; gives

Finalize Aggregate  (cost=540300.95..540300.96 rows=1 width=4) (actual time=7103.814..7103.814 rows=1 loops=1)
  Buffers: shared hit=2295 read=155298
  ->  Gather  (cost=540300.74..540300.95 rows=2 width=8) (actual time=7103.773..7103.810 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2295 read=155298
        ->  Partial Aggregate  (cost=539300.74..539300.75 rows=1 width=8) (actual time=7006.368..7006.368 rows=1 loops=3)
              Buffers: shared hit=5983 read=455025
              ->  Parallel Seq Scan on simcards  (cost=0.00..526282.77 rows=5207186 width=0) (actual time=2.677..6483.503 rows=4166620 loops=3)
                    Filter: (card_state = 'ACTIVATED'::text)
                    Rows Removed by Filter: 10965
                    Buffers: shared hit=5983 read=455025
Planning time: 0.333 ms
Execution time: 7123.739 ms

Postgresql will always do seq scan with count. This doesn't scale well with number of rows. One way to get it faster is to have a separate table with count and to put appropriate triggers on the original table. This will dramatically improve read speed at the cost of write speed (which might not be noticable). — freakish
– freakish, Commented Feb 27, 2020 at 12:39
@a_horse_with_no_name most likely index scan will still be slower than separate table unless there's only 1 or so element in the index. — freakish
– freakish, Commented Feb 27, 2020 at 12:44
@kevin: you can try CREATE INDEX activated_count on simcards (id) where card_state = 'ACTIVATED'; (with id being the PK column of the table) - then you might get an index only scan — user330315
– user330315, Commented Feb 27, 2020 at 12:55
@KevinJoymungol: did you run vacuum analyze simcards after creating the index — user330315
– user330315, Commented Feb 27, 2020 at 13:19

Laurenz Albe · Accepted Answer · 2020-02-27 12:48:24Z

7

Counting is slow. Here are a few ideas how to improve it:

If you don't need exact results, use PostgreSQL's estimates:

/* this will improve the results */
ANALYZE simcards;

SELECT t.reltuples * freqs.freq AS count
FROM pg_class AS t
   JOIN pg_stats AS s
      ON t.relname = s.tablename
         AND t.relnamespace::regnamespace::name = s.schemaname
   CROSS JOIN
      (LATERAL unnest(s.most_common_vals::text::text[]) WITH ORDINALITY AS vals(val,ord)
       JOIN
       LATERAL unnest(s.most_common_freqs::text::float8[]) WITH ORDINALITY AS freqs(freq,ord)
          USING (ord)
      )
WHERE s.tablename = 'simcards'
  AND s.attname = 'card_state'
  AND vals.val = 'ACTIVATED';

If you need exact counts, create an extra “counter table” and triggers on simcards that update the counter whenever rows are added, removed or modified.

For a more detailed discussion, read my blog post.

answered Feb 27, 2020 at 12:48

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kevin Joymungol Over a year ago

Correct. I did a VACUUM ANALYZE together with CREATE INDEX activated_count on simcards (id) where card_state = 'ACTIVATED'; This improved things. Thanks for the counter table solution. Will consider it in case I need better performance

Anthony Sotolongo · Accepted Answer · 2020-02-28 00:27:46Z

0

Do you test setting the max_parallel_workers_per_gather = 4; parameter?

Is probable that some extra worker help here

Regards

answered Feb 28, 2020 at 0:27

Anthony Sotolongo

1,6882 gold badges13 silver badges23 bronze badges

Collectives™ on Stack Overflow

Postgresql count performance

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related