0

I have a query where I want to get all transactions for a specific user(owner table) in my database. The database is pretty normalized, so getting from transaction to owner traverses many tables. My tables with relevant foreign keys are as follows:

**owners**
-------
id

**store_shops**
-----------
id
owner_id

**service_shops**
-------------
id
owner_id

**products**
-------------
id
store_shop_id

**services**
------------
id
service_shop_id

**order_services**
------------------
id
service_id
order_id

**order_products**
------------------
id
product_id
order_id


**orders**
----------
id
transaction_id


**transactions**
----------------
id
refund_transaction_id
amount

I have the following query:

SELECT DISTINCT ON (sales.id) sales.id, sales.amount FROM transactions sales 
LEFT OUTER JOIN transactions refunds ON refunds.id = sales.refund_transaction_id
LEFT OUTER JOIN orders ON orders.transaction_id = trans.id OR orders.transaction_id = refunds.id
LEFT OUTER JOIN order_services ON order_services.order_id = orders.id
LEFT OUTER JOIN order_products ON order_products.order_id = orders.id
LEFT OUTER JOIN products ON  products.id = order_products.product_id
LEFT OUTER JOIN services ON services.id = order_services.service_id
LEFT OUTER JOIN service_shops ON service_shops.id = services.service_shop_id
LEFT OUTER JOIN store_shops ON store_shops.id = products.store_shop_id
LEFT OUTER JOIN owners service_shop_owners ON service_shop_owners.id = service_shops.owner_id
LEFT OUTER JOIN owners store_shop_owners ON store_shop_owners.id = store_shops.owner_id
WHERE (service_shop_owners.id = 26930 OR store_shop_owners.id = 26930)

This gives me the desired results. Only trouble is that on a dataset of hundreds of thousands of records, it becomes unusably slow.

I'm not very advanced when it comes to SQL, but I realize all the LEFT OUTER JOINs isn't very efficient.

Is there a better way for me to handle this query? Or am I going to have to denormalize the database a bit and store more info in the transaction table?

UPDATE Using Wyzard's answer below, I now have this query:

SELECT trans.id, trans.amount, refunds.id
FROM
  service_shops
  JOIN services ON services.service_shop_id = service_shop.id
  JOIN order_services ON order_services.service_id = services_id
  JOIN orders ON orders.id = order_services.order_id
  JOIN transactions trans ON trans.id = orders.transaction_id
  LEFT JOIN transactions refunds ON refunds.id = trans.refund_transaction_id
WHERE service_shops.owner_id = 26930
UNION
SELECT trans.id, trans.amount, refunds.id
FROM
  store_shops
  JOIN products ON store_shops.id = products.store_shop_id
  JOIN order_products ON order_products.product_id = products.id
  JOIN orders ON orders.id = order_products.order_id
  JOIN transactions trans ON trans.id = orders.transaction_id
  LEFT JOIN transactions refunds ON refunds.id = trans.refund_transaction_id
WHERE store_shops.owner_id = 2693

This is very fast and a big improvement. Only problem now is that the two LEFT JOIN transactions refunds ON refunds.id = trans.refund_transaction_id do not seem to be grabbing associated refund transactions. I'm assuming this is because they do not have an order associated directly with them, so the WHERE clause filters them out.

3
  • WHERE (service_shop_owners.id = 26930 OR store_shop_owners.id = 26930) will deteriorate at least two of the LEFT JOINS to plain JOINS. (which can be rewriten as EXISTS) (and the rest can probably be dropped since you only select from one table FROM transactions sales Commented Apr 1, 2017 at 19:26
  • LEFT OUTER JOIN store_shops ON store_shops.id = products.id — do these two tables really have the same IDs, or is that a mistake? (Comparing with the service_shops join, I'm guessing you might've meant something like store_shops.id = products.store_shop_id.) Commented Apr 1, 2017 at 20:12
  • @Wyzard Yes, sorry. That is a mistake. Edited. Commented Apr 1, 2017 at 20:27

2 Answers 2

2

Change this:

WHERE (service_shop_owners.id = 26930 OR store_shop_owners.id = 26930)

To this:

WHERE 26930 IN (service_shop_owners.id, store_shop_owners.id)

Using OR usually means the index won't be used, but it should be used with the IN.


The above change should be enough to make a big difference. To further improve thecquery, reverse the order of the tables, especially list service_shop_owners as the first table in the FROM clause. The optimiser should do this for you, but often it doesn't.

Sign up to request clarification or add additional context in comments.

1 Comment

I think you mis-read the query. The number is the same; the column is different. IN (26930, 26930) is the same as = 26930, but you've removed store_shop_owners.id from the clause.
1

First of all, EXPLAIN is your friend: it tells you about the query plan that the database will use to run the query, so you can see where the bottlenecks are. The output can be difficult to understand at first, but if you use pgAdmin, its EXPLAIN menu command command gives you a nice graphical visualization that's much more intuitive.


Second, the values used in your WHERE clause are at the end of a long chain of outer joins, which is inefficient because the database probably has to do all the joins and produce every candidate row just to get the owner IDs, only to discard most of the rows because the owner IDs don't match the WHERE condition.

It looks like you've structured the query this way because there are two separate paths from a sale to an owner: via products, or via services. This means you're basically doing two different queries at once, in a way that forces the database to process the product-related join conditions on rows that actually came from services, and vice versa. It'll probably be much more efficient to actually do two separate queries using UNION, and start each one from the table that you're using for filtering:

SELECT col1, col2, etc
FROM
  owners
  JOIN service_shops ON service_shops.owner_id = owners.id
  JOIN services ON services.service_shop_id = service_shop.id
  ...etc...
WHERE owners.id = 26930
UNION
SELECT col1, col2, etc
FROM
  owners
  JOIN store_shops ON store_shops.owner_id = owners.id
  JOIN products ON store_shops.id = products.store_shop_id
  ...etc...
WHERE owners.id = 26930

This should allow the database to quickly look up the owner using an index, then quickly look up the associated shops using another index, and so on. (That's assuming you have indexes on your FK columns, like service_shops.owner_id. If not, you should.)

Note that I've written JOIN instead of LEFT OUTER JOIN. Since you're not mixing both product data and service data in the same query, you won't have product-related rows that can't be joined to a service-related table, or vice versa, so you probably don't need outer joins at all.

Also, if you don't need any attributes from the owners table besides the ID, you can leave that table out of the query. Just do WHERE store_shops.owner_id = 26930.


Third, I've found that it helps to structure the FROM clause to use outer joins only where they're actually needed. Suppose you've written:

FROM
  foo
  LEFT JOIN bar ON bar.foo_id = foo.id
  LEFT JOIN baz ON baz.bar_id = bar.id

Let's suppose that you need to get the foo data even if it has no associated bar, but you don't need the bar data if it has no associated baz — or maybe you know there'll never be a bar without an associated baz. In that case you can rewrite the query like this:

FROM
  foo
  LEFT JOIN (
    bar
    JOIN baz ON baz.bar_id = bar.id
  ) ON bar.foo_id = foo.id

In my experience, this tends to be more efficient in PostgreSQL. (I don't know about other databases.)

9 Comments

This is a great answer and seems to have gotten me 95% of the way there... So much faster. My problem now is with the "refund" transactions. These are self referential via 'refund_transaction_id' on the transaction table. Because they have no ties up the chain to the WHERE condition, they get left out, even with a LEFT JOIN. (NOTE: Using the UNION method described above)
No ties up the chain? A sale points to a refund, so after you've joined to sales, you should be able to do LEFT JOIN transactions refunds ON refunds.id = sales.refund_transaction_id, and you'll get the refund data if it exists.
Looking closer, I'm not quite sure what you're trying to accomplish going from orders to sales and refunds. It looks like you can have an order that points to a sale, and you can also have an order that points to a refund, and you want both? I'd imagine that every refund must be preceded by a sale, so maybe you just want to go from orders to sales, and then from sales to refunds, not straight from orders to refunds.
Also, this might be a sign of a flaw in your data model: sales and refunds are both the same transactions table, which means both have a refund_transaction_id column. That means a refund transaction can be linked to another refund, which doesn't make sense.
I agree that the model may not be the best, but it is how the system I'm working on is currently set up so I'm trying to work with it. Yes, sales and refunds are the same transaction table. Essentially when a refund is issued to a customer we create another transaction that negates the original transaction amount. So the refund_transaction_id of a refund points to the id of the original sale and visa versa. So a refund transaction could theoretically be linked to another refund, but it won't be in our system. It links back to the original sale transaction.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.