Improve PostgreSQL query performance

Question

When running this query in my server it's very slow, and I can't understand why. Can anyone help me figure it out?
Query:

SELECT
    "t_dat"."t_year" AS "c0",
    "t_dat"."t_month" AS "c1",
    "t_dat"."t_week" AS "c2",
    "t_dat"."t_day" AS "c3",
    "t_purs"."p_id" AS "c4",
    sum("t_purs"."days") AS "m0",
    sum("t_purs"."timecreated") AS "m1"
FROM "t_dat", "t_purs"
WHERE "t_purs"."created" = "t_dat"."t_key"
  AND "t_dat"."t_year" = 2013
  AND "t_dat"."t_month" = 3
  AND "t_dat"."t_week" = 9
  AND "t_dat"."t_day" IN (1,2)
  AND "t_purs"."p_id" IN (
      '4','15','18','19','20','29',
      '31','35','46','56','72','78')
GROUP BY
    "t_dat"."t_year",
    "t_dat"."t_month",
    "t_dat"."t_week",
    "t_dat"."t_day",
    "t_purs"."p_id"

Explain Analyze:

HashAggregate  (cost=12252.04..12252.04 rows=1 width=28) (actualtime=10212.374..10212.384 rows=10 loops=1)
  ->  Nested Loop  (cost=0.00..12252.03 rows=1 width=28) (actual time=3016.006..10212.249 rows=14 loops=1)
        Join Filter: (t_dat.t_key = t_purs.created)
        ->  Seq Scan on t_dat  (cost=0.00..129.90 rows=1 width=20) (actual time=0.745..2.040 rows=48 loops=1)
              Filter: ((t_day = ANY ('{1,2}'::integer[])) AND (t_year = 2013) AND (t_month = 3) AND (t_week = 9))
        ->  Seq Scan on t_purs  (cost=0.00..12087.49 rows=9900 width=16) (actual time=0.018..201.630 rows=14014 loops=48)
              Filter: (p_id = ANY ('{4,15,18,19,20,29,31,35,46,56,72,78}'::integer[]))
Total runtime: 10212.470 ms

how many records are there in these tables? is the indexing done? — Travis G
– Travis G, Commented Mar 3, 2013 at 19:14
t_purs about 600K records, t_dat about 9K. Indexes are set in t_purs.id, t_dat.t_key — Eli_Rozen
– Eli_Rozen, Commented Mar 3, 2013 at 19:15
You need to show us the table and index definitions. Diagnosing slow queries requires full table and index definitions, not just a description or paraphrase. Maybe your tables are defined poorly. Maybe the indexes aren't created correctly. Maybe you don't have an index on that column you thought you did. Without seeing the table and index definitions, we can't tell. — Andy Lester
– Andy Lester, Commented Mar 4, 2013 at 4:41
Also, it's entirely unnecessary to put your "column" and "table" names in quotation marks. — Andy Lester
– Andy Lester, Commented Mar 4, 2013 at 4:42

mvp · Accepted Answer · 2013-03-03 19:36:29Z

8

It is difficult to say what exactly you are missing, but if I were you, I would make sure that following index exists:

CREATE INDEX t_dat_id_date_idx
    ON t_dat (t_key, t_year, t_month, t_week, t_day);

For t_purs, create this index:

CREATE INDEX t_purs_created_p_id_idx
    ON t_purs (created, p_id);

answered Mar 3, 2013 at 19:36

mvp

118k15 gold badges132 silver badges155 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eli_Rozen Over a year ago

Hey, I've did some changiing in the indexes and it's now speed as the light! Thanks!!

mvp Over a year ago

Glad to hear that. Don't underestimate power of compound indexes! :)

Community · Accepted Answer · 2017-04-13 12:42:39Z

Consider using a single column in your table:

t_date date

instead of (t_year, t_month, t_week, t_day). The data type date occupies 4 byte. That would shrink your table a bit, make the index smaller and faster and grouping a lot easier.

Year, month, week and day can easily and quickly be extracted from a date with extract(). Your query could then look like this and would be faster:

SELECT extract (year  FROM t_date) AS c0
      ,extract (month FROM t_date) AS c1
      ,extract (week  FROM t_date) AS c2
      ,extract (day   FROM t_date) AS c3
      ,p.p_id                      AS c4
      ,sum(p.days)                 AS m0
      ,sum(p.timecreated)          AS m1
FROM   t_dat  d
JOIN   t_purs p ON p.created = d.t_key
WHERE  d.t_date IN ('2013-03-01'::date, '2013-03-02'::date)
AND    p.p_id IN (4,15,18,19,20,29,31,35,46,56,72,78)
GROUP  BY d.t_date, p.p_id;

More important for performance is the index, which would then simply be:

CREATE INDEX t_dat_date_idx ON t_dat (t_key, t_date);

Or, depending on data distribution:

CREATE INDEX t_dat_date_idx ON t_dat (t_date, t_key);

The sequence of column matters. You may even create both.

Jeremy Caney · Accepted Answer · 2022-02-07 00:13:07Z

1

enter image description here

Your query is having sequential scans on t_purs and t_dat. Creating the appropriate indexes will help you make this query faster and avoid sequential scans.

create index index_name on t_purs(created) where created is not null;
create index index_name on t_dat using btree(t_key, t_year, t_month, t_week, t_day)

Run explain analyze after running the above two queries. You'll see the plan time and execution time will be reduced.

edited Feb 7, 2022 at 0:13

Jeremy Caney

7,808115 gold badges58 silver badges86 bronze badges

answered Feb 6, 2022 at 21:02

Pratyush Srivastava

214 bronze badges

Collectives™ on Stack Overflow

Improve PostgreSQL query performance

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related