3

I have two tables: A and P. I want to get information out of all rows in A whose id is in a temporary table I created, tmp_ids. However, there is additional information about A in the P table, foo, and I want to get this info as well. I have the following query:

SELECT A.H_id AS hid,
       A.id AS aid,
       P.foo, A.pos, A.size
FROM tmp_ids, P, A
WHERE tmp_ids.id = A.H_id
  AND P.id = A.P_id

I noticed it going slowly, and when I asked Postgres to explain, I noticed that it combines tmp_ids with an index on A I created for H_id with a nested loop. However, it hashes all of P before doing a Hash join with the result of the first merge. P is quite large and I think this is what's taking all the time. Why would it create a hash there? P.id is P's primary key, and A.P_id has an index of its own.

UPDATE: All the datatypes are INTEGER, except A.size which is a DOUBLE PRECISION and P.foo which is VARCHAR. I'm using PostgreSQL version 8.4.

Here is the explain: http://explain.depesz.com/s/WBo .

7
  • you might have to declare explicitly the data type of the join fields Commented Jun 17, 2010 at 20:46
  • Post an explain analyze, or better yet, post the explain on explain.depesz.com and provide the link. Also, what PG version? Commented Jun 17, 2010 at 20:49
  • With some DBMSs you can force a loop join, but I don't think that's possible with PostrgresSQL. That's kind-of a shame, because there are lots of cases where data realities like yours aren't figured out by the optimizer. Commented Jun 17, 2010 at 20:49
  • @Pointy You can "set enable_hashjoin = false" which in this case would probably do that. Commented Jun 17, 2010 at 20:52
  • hmm i realize that the table it is hashing, P, only has 43000 entries, so maybe that isn't why the query runs slowly.. might have been another part of the program Commented Jun 17, 2010 at 21:12

3 Answers 3

3

The query planner estimated it'd be faster to sequentially read all the data and hash it, than to perform an estimated 2100 index scans with their associated much more random disk access.

Sign up to request clarification or add additional context in comments.

Comments

1

Without seeing an explain analyze, these kind of problems usually result from statistics being off or an unusual setting required for random_page_cost or seq_page_cost.

It may run better with

set enable_hashjoin = false;

1 Comment

When you do this, the optimizer would do a mergejoin. Still sub-optimal since it has to sort the keys first before matching. I'd want index driven nested loops. You might want to set enable_hashjoin = true and experiment with setting your work_mem, maintainance_work_mem (and also shared_buffers) per postgresql.org/docs/current/static/… to a smaller size. Sometimes the optimizer thinks there's plenty of memory and it will do hash joins every time.
0

Your problem is that the optimizer doesn't have the right statistics to determine how many matches "A.H_id = tmp_ids.id" is going to create, which is a common problem with temporary tables--they don't have statistics the way a regular one does. It guesses that 21 rows are going to match coming out of the "Index Scan using idx_A_handid on A", but there are actually only 3. It's highlighted in the explain analysis where the lowest level up arrow has a 7 next to it, giving the multiplier for how wrong the estimate was.

That error carries forward to where it thinks it has 2100 rows to scan, at which point it might as well do a full sequential scan and hash the results given that's likely to touch most blocks in the table.

Had it known correctly there were only 300 to probe, it might have done something different involving only a subset of the data. You can't expect to get good plans from joins against temporary tables because of their lack of statistics. This may be a case where it's appropriate to nudge correct behavior by turning off enable_hashjoin before executing the query.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.