Optimize time of execution of PSQL query

Question

It is first time I came across problem of long time of query execution. Problem is actually pretty big because query is executing in more then 20seconds which highly visible for endpoint user.

I have quite large database of topics (~8k), topic's have it's parameters (which is dictionared - I have 113 different parameters for 8k topics).

I would like to show report about number of repetitions of those topics.

topic table:
----------------+---------+-----------------------------------------------------
 id             | integer | nextval('topic_id_seq'::regclass)
 topicengine_id | integer |
 description    | text    |
 topicparam_id  | integer |
 date           | date    |

topicparam table:
----------------+---------+----------------------------------------------------------
 id             | integer | nextval('topicparam_id_seq'::regclass)
 name           | text    |

and my query:

select distinct tp.id as tpid, tp.name as desc, (select count(*) from topic where topic.topicparam_id = tp.id) as count, t.date
from topicparam tp, topic t where t.topicparam_id =tp.id

Total runtime: 22372.699 ms

fragment of result :

 tpid |                     topicname               | count |    date
------+---------------------------------------------+-------+---------
 3823 | Topic1                                      |     6 | 2014-03-01
 3756 | Topic2                                      |    14 | 2014-03-01
 3803 | Topic3                                      |    28 | 2014-04-01
 3780 | Topic4                                      |  1373 | 2014-02-01

Is there any way to optimize time of execution for this query?

Please post the output of explain analyze (or upload it to explain.depesz.com). Also which indexes are defined on the table? And which exact Postgres version are you using? — user330315
– user330315, Commented Apr 8, 2014 at 6:05
Please read stackoverflow.com/tags/postgresql-performance/info then edit your question appropriately. — Craig Ringer
– Craig Ringer, Commented Apr 8, 2014 at 6:09

user330315 · Accepted Answer · 2014-04-08 06:11:22Z

1

A simply group by should do the same thing (if I understood your query correctly.

select tp.id as tpid, 
       max(tp.name) as desc, 
       count(*) as count, 
       max(t.date) as date
from topicparam tp
  join topic t on t.topicparam_id = tp.id
group by tp.id;

Btw: date is a horrible name for a column. For one reason because it's also a reserved word, but more importantly because it does not document what the column contains. A "start date", an "end date", a "due date", a "recording date", a "publish date", ...?

answered Apr 8, 2014 at 6:11

user330315

Sign up to request clarification or add additional context in comments.

2 Comments

Ryx5 Over a year ago

max() on tp.name doesn't make any sense. max() or min() on date can be interesting to get the first topic date or the last if there are differents date, but according to the original query, seems not.

user330315 Over a year ago

@Ryx5: the original query uses a distinct which seems to indicate that the OP just wants some unique combination. It did look like an attempt to get what the group by does - but as the original questions lacks a lot of necessary information I had to guess. It could just as well be a group by on all columns as you did in your answer.

Justin · Accepted Answer · 2014-04-08 06:15:34Z

0

You can try this query:

SELECT tp.id AS tpid,
       tp.name AS DESC,
       topic.cnt AS count,
       t.date
FROM topicparam tp
JOIN topic t
  ON t.topicparam_id =tp.id
JOIN (SELECT topicparam_id,
             count(*) cnt 
      FROM topic
      GROUP BY topicparam_id) topic
  ON topic.topicparam_id = tp.id
GROUP BY tp.id,
         tp.name,
         t.date,
         topic.cnt

answered Apr 8, 2014 at 6:15

Justin

9,7336 gold badges38 silver badges49 bronze badges

Comments

Ryx5 · Accepted Answer · 2014-04-08 06:20:16Z

0

For me DISTINCT + SUBQUERY are killing your performance. You should use GROUP BY in both way to "disinct" you data and "count".

SELECT 
    tp.id as tpid
    , tp.name as description
    , count(*) as numberOfTopics
    , t.date
FROM 
    topicparam tp
    INNER JOIN topic t 
        ON t.topicparam_id = tp.id
GROUP BY
    tp.id 
    , tp.name
    , t.date

Considering the bulk of data, you have to pay attention on indexes :

In this case, use indexes on topicparam.id and topic.id

Remove indexes on columns that is never use in join clauses.

Try to not use sql reserved words like "date, desc, count" for aliases or table fields.

edited Apr 8, 2014 at 6:20

answered Apr 8, 2014 at 6:13

Ryx5

1,3668 silver badges10 bronze badges

Collectives™ on Stack Overflow

Optimize time of execution of PSQL query

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related