0

Hi I can't seem to find the right answer so I might as well write a post

Could any db expert help me improve the following query (see explain plan) which is slowing down our application on production quite a bit.

  • a bid is related to a realty
  • a realty is owned by an agency
  • I'm using postgres
  • a table stores the views per user: HIT(user_id, bid_id, date)

the aim is to retrieve the number of hits per bids for a particular agency

here is the query

select hit.bid_id , count(hit.id)
from hit
  cross join bid
  cross join realty
where hit.bid_id=bid.id
  and realty.id=bid.realty_id
  and realty.agency_id = 91
group by hit.bid_id
order by count(hit.id) desc

here is the explain plan

"Sort  (cost=167474.69..167493.30 rows=7445 width=16)"
"  Sort Key: (count(hit.id)) DESC"
"  ->  HashAggregate  (cost=166921.45..166995.90 rows=7445 width=16)"
"        Group Key: hit.bid_id"
"        ->  Nested Loop  (cost=694.81..162541.34 rows=876021 width=16)"
"              ->  Hash Join  (cost=694.38..7217.46 rows=1986 width=8)"
"                    Hash Cond: (bid.realty_id = realty.id)"
"                    ->  Seq Scan on bid  (cost=0.00..6398.98 rows=27798 width=16)"
"                    ->  Hash  (cost=669.92..669.92 rows=1957 width=8)"
"                          ->  Bitmap Heap Scan on realty  (cost=63.45..669.92 rows=1957 width=8)"
"                                Recheck Cond: (agency_id = 91)"
"                                ->  Bitmap Index Scan on agency_idx  (cost=0.00..62.97 rows=1957 width=0)"
"                                      Index Cond: (agency_id = 91)"
"              ->  Index Scan using hit_bid_id_idx on hit  (cost=0.43..61.74 rows=1647 width=16)"
"                    Index Cond: (bid_id = bid.id)"

I tried to use exists, or select in but they are worse

[EDIT] I'm using QueryDsl (java api) which generates the cross joins, but even with inner join the execution plan is too long, here is the explain plan with verbose

"Sort  (cost=169479.60..169498.99 rows=7756 width=16) (actual time=15350.858..15351.819 rows=821 loops=1)"
"  Output: hit.bid_id, (count(hit.id))"
"  Sort Key: (count(hit.id)) DESC"
"  Sort Method: quicksort  Memory: 63kB"
"  ->  HashAggregate  (cost=168900.96..168978.52 rows=7756 width=16) (actual time=15348.418..15349.550 rows=821 loops=1)"
"        Output: hit.bid_id, count(hit.id)"
"        Group Key: hit.bid_id"
"        ->  Nested Loop  (cost=699.70..164385.85 rows=903022 width=16) (actual time=17.777..14364.165 rows=582723 loops=1)"
"              Output: hit.bid_id, hit.id"
"              ->  Hash Join  (cost=699.26..7225.23 rows=2013 width=8) (actual time=8.427..146.966 rows=1977 loops=1)"
"                    Output: bid.id"
"                    Hash Cond: (bid.realty_id = realty.id)"
"                    ->  Seq Scan on public.bid  (cost=0.00..6400.88 rows=27988 width=16) (actual time=0.018..84.389 rows=27994 loops=1)"
"                          Output: bid.id, bid.created_by, bid.created_date, bid.last_modified_by, bid.last_modified_date, bid.agency_costs, bid.availability_begin_date, bid.availability_end_date, bid.bail, bid.description, bid.imported_bid, bid.is_availabl (...)"
"                    ->  Hash  (cost=674.46..674.46 rows=1984 width=8) (actual time=8.186..8.186 rows=1977 loops=1)"
"                          Output: realty.id"
"                          Buckets: 2048  Batches: 1  Memory Usage: 94kB"
"                          ->  Bitmap Heap Scan on public.realty  (cost=67.66..674.46 rows=1984 width=8) (actual time=0.533..4.967 rows=1977 loops=1)"
"                                Output: realty.id"
"                                Recheck Cond: (realty.agency_id = 91)"
"                                Heap Blocks: exact=208"
"                                ->  Bitmap Index Scan on agency_idx  (cost=0.00..67.17 rows=1984 width=0) (actual time=0.491..0.491 rows=1978 loops=1)"
"                                      Index Cond: (realty.agency_id = 91)"
"              ->  Index Scan using hit_bid_id_idx on public.hit  (cost=0.43..61.88 rows=1619 width=16) (actual time=2.198..6.376 rows=295 loops=1977)"
"                    Output: hit.id, hit.created_by, hit.created_date, hit.last_modified_by, hit.last_modified_date, hit.date, hit.ip, hit.user_id, hit.bid_id, hit.display_phone"
"                    Index Cond: (hit.bid_id = bid.id)"
"Planning time: 3.037 ms"
"Execution time: 15353.187 ms"

Tables DDL

CREATE TABLE public.bid
(
  id bigint NOT NULL,
  realty_id bigint,
  CONSTRAINT bid_pkey PRIMARY KEY (id),
  CONSTRAINT bid_fkey_realty FOREIGN KEY (realty_id)
      REFERENCES public.realty (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE public.hit
(
  id bigint NOT NULL,
  bid_id bigint,
  CONSTRAINT hit_pkey PRIMARY KEY (id),
  CONSTRAINT hit_fkey_bid FOREIGN KEY (bid_id)
      REFERENCES public.bid (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE public.realty
(
  id bigint NOT NULL,
  CONSTRAINT realty_pkey PRIMARY KEY (id)
)
3
  • 2
    Add the DDL of each table please Commented Nov 6, 2017 at 5:22
  • 2
    Please Edit your question and add the execution plan generated using explain (analyze, verbose, buffers). But those cross joins don't make any sense. Why don't you just use a regular join as apparently that is what you want to do Commented Nov 6, 2017 at 6:44
  • thanks, I have edited the post with verbose, the cross joins are generated by QueryDsl (java api) I've tried inner join but the exec plan is pretty similar Commented Nov 10, 2017 at 4:14

2 Answers 2

0

You are unnecessarily using cross join, beyond that there is a "seq scan" on the bid table in your explain plan; but the explain plan of the following may be different:

select hit.bid_id , count(hit.id)
from hit
  inner join bid ON hit.bid_id=bid.id
  inner join realty ON realty.id=bid.realty_id
where realty.agency_id = 91
group by hit.bid_id
order by count(hit.id) desc

while it should not matter, maybe changing the table sequence will have an effect:

select hit.bid_id , count(hit.id)
from realty
  inner join bid ON realty.id=bid.realty_id
  inner join hit ON hit.bid_id=bid.id
where realty.agency_id = 91
group by hit.bid_id
order by count(hit.id) desc

Can I assume the db statistics are current or "fresh"?

Sign up to request clarification or add additional context in comments.

1 Comment

thanks, I have edited the post, inner join doesn't make any difference, and yes the stats are fresh.
0

If you provide more information (such as index status, table descriptions, explain plan with more details options), I would suggest more interesting solutions. But the good solutions will provide from other people. I provide the bad solution but it's helpful at sometimes.

This source is based Java but it apply to the other languages.

It's a temporary work, but you might have effect immediately.

Try it for testing non-nestloop execution plan.

PreparedStatement stmt = connect.preparedStatement(
"SET enable_nestloop TO false;" +
"select hit.bid_id , count(hit.id)
from hit
cross join bid
cross join realty
where hit.bid_id=bid.id
and realty.id=bid.realty_id
and realty.agency_id = 91
group by hit.bid_id
order by count(hit.id) desc;" +
"SET enable_nestloop TO true;"
);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.