improve performance for postgres

Question

I count the number of users in this way, it runs for 5 seconds to produce results, I am looking for a better solution

SELECT COUNT(*)
FROM (SELECT user_id
      FROM slot_result_primary
      WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999
      GROUP BY user_id) AS foo

Please edit your question and add the execution plan generated using explain (analyze, buffers, format text) (not just a "simple" explain) as formatted text and make sure you prevent the indention of the plan. Paste the text, then put ``` on the line before the plan and on a line after the plan. Please also include complete create index statements for all indexes as well. — user330315
– user330315, Commented May 31, 2020 at 8:41

Milney · Accepted Answer · 2020-05-29 10:40:39Z

2

First of all you can simplify the query:

SELECT COUNT(DISTINCT user_id)
FROM slot_result_primary
WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999

Most importantly - make sure you have an index on sesion_timestamp

answered May 29, 2020 at 10:40

Milney

6,4472 gold badges21 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

zane Over a year ago

thanks you, I tried but the results were not better

Milney Over a year ago

Tried what? Putting an index on session_timestamp? How many rows are in your table?

pifor Over a year ago

Please post EXPLAIN ANALYZE output for your query.

zane Over a year ago

I ran this command CREATE INDEX session_timestamp_index ON slot_result_primary (session_timestamp) and query again The table has 10998516 rows

Milney Over a year ago

You will need to include user_id in your index to make it a bit faster

Dominik Boszko · Accepted Answer · 2020-05-31 08:38:03Z

0

Counting is a very heavy operation in Postgres. It should be avoided if possible. It is very difficult to make it better so for each row Postgress needs to go the the disc. You can indeed create a better index to choose which rows to pick from the disc faster but even with this count time will always go up in time in a linear time compared to the size of the data.

Your index should be:

CREATE INDEX session_timestamp_user_id_index ON slot_result_primary (session_timestamp, user_id)

for best results.

Still an index will not solve your count problems fully. In a similar situation I faced two days ago (with a SELECT query running 3s and count running 1s) dedicated indexes allowed to push down the time of select to 0,3ms but best I could do with count was 700ms.

Here you can find a good article with a summary why count is difficult and different ways to make it better: https://www.citusdata.com/blog/2016/10/12/count-performance/

edited May 31, 2020 at 8:38

answered May 29, 2020 at 13:29

Dominik Boszko

18610 bronze badges

1 Comment

Tim Biegeleisen Over a year ago

You should also include the query for which this suggested index is to be used. This is the wrong index for the query suggested by @Milney above.

Collectives™ on Stack Overflow

improve performance for postgres

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related