1

I am running the below query in postgres DB which has around 2,861,092,854 records.

Type column is indexed and it can have 2 values either 'customer' or 'vendor'

Why it is taking this much time?

SELECT count(*) FROM companies where type='vendor';

Explain Analyze Query Response

Finalize Aggregate  (cost=61231320.98..61231320.98 rows=1 width=8) (actual time=756767.565..756774.121 rows=1 loops=1)
   ->  Gather  (cost=61231320.76..61231320.97 rows=2 width=8) (actual time=756767.489..756774.115 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=61230320.76..61230320.77 rows=1 width=8) (actual time=756764.026..756764.028 rows=1 loops=3)
               ->  Parallel Index Only Scan using companies_type on companies  (cost=0.58..59886514.92 rows=537522334 width=0) (actual time=0.256..735750.564 rows=434640967 loops=3)
                     Index Cond: (type = 'vendor'::text)
                     Heap Fetches: 1303955010
 Planning Time: 7.047 ms
 Execution Time: 756774.221 ms
(10 rows)
4
  • 2.8b rows is a lot of data, so depending on your machine it might be a bit slow. What you could do is to create a partial index for type='vendor', e.g. CREATE INDEX idx_company_type_vendor ON companies (type) WHERE type = 'vendor'. This will reduce query time, but perhaps slow down inserts and updates a little bit. Commented Nov 11, 2021 at 7:46
  • 2
    Taking into account the Heap Fetches value, you should run vacuum for the companies table to update visibility map. Commented Nov 11, 2021 at 8:04
  • 1
    The output of explain (analyze, buffers, timing) would be interesting to judge how fast your disk is (set track_io_timing to on before doing that) Commented Nov 11, 2021 at 10:25
  • When i drop the index on type column the time taken to do the count is reduced from 12 minutes to 5 minutes. Commented Nov 12, 2021 at 5:24

1 Answer 1

2

Suggestion 1 - Index

Easiest approach is to add an index on the column type:

CREATE INDEX companies_type_idx ON companies (type) WHERE type = 'vendor'

Suggestion 2 - Partitioned Table

Partition your table in multiple tables by some criteria. If you know all possible values for type, you can use PARTITION BY LIST(type)

CREATE TABLE companies
(
    id   serial      NOT NULL,
    type varchar(50) NOT NULL,
    PRIMARY KEY (id, type)
)
    PARTITION BY LIST (type);

CREATE TABLE companies_vendor PARTITION OF companies FOR VALUES IN ('vendor');
CREATE TABLE companies_another_key PARTITION OF companies FOR VALUES IN ('another_key');
CREATE TABLE companies_default PARTITION OF companies DEFAULT;

When you insert data, the row will be written in the the matching partition.

Later, when performing count

SELECT count(*) FROM companies where type='vendor';

Postgres will search only in the companies_vendor partition and will ignore others.

See more details here. Please also take a look Caveats, because partitions have some limitations.

Suggestion 3 - Cache the count in the software

You can cache the count, to avoid doing a query. You'll need to fetch the count for the first time and then:

  • Decrease number when you delete
  • Increase number when you insert

It's possible that the count will not be up to date and you can sync it again time by time when your server is not busy.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.