2

I have the following tables

  • ORDER (idOrder int, idCustomer int) [PK: idOrder]
  • ORDERLINE (idOrder int, idProduct int) [PK: idOrder, idProduct]
  • PRODUCT (idProduct int, rating hstore) [PK: idProduct]

In the PRODUCT table, 'rating' is a key/value column where the key is an idCustomer, and the value is an integer rating.

The query to count the orders containing a product on which the customer has given a good rating looks like this:

select count(distinct o.idOrder)
from order o, orderline l, product p 
where o.idorder = l.idorder and l.idproduct = p.idproduct
  and (p.rating->(o.idcust::varchar)::int) > 4;

The query plan seems correct, but this query takes forever. So I tried a different query, where I explode all the records in the hstore:

select count(distinct o.idOrder)
from order o, orderline l, 
  (select idproduct, skeys(p.rating) idcustomer, svals(p.rating) intrating from product) as p
where o.idorder = l.idorder and l.idproduct = p.idproduct
  and o.idcustomer = p.idcustomer and p.intrating > 4;

This query takes only a few seconds. How is this possible? I assumed that exploding all values of an hstore would be quite inefficient, but it seems to be the opposite. Is it possible that I am not writing the first query correctly?

1 Answer 1

1

I'm suspecting it is because in the first query you are doing:

p.rating->(o.idcust::varchar)::int

a row at a time as the query iterates over the rest of the operations, whereas in the second query the hstore values are expanded in a single query. If you want more insight use EXPLAIN ANALYZE:

https://www.postgresql.org/docs/12/sql-explain.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.