0

I would like to be able to select all entries from the orders table where a certain product has been ordered prior to 2019 but not after it. The table has close to 7M entries and the below query seems to take almost ~4 minutes to run. Note that in the orders table productId is a foreign key to products table and is indexed. Could we rewrite the below query to be more optimized and better in performance time ? Any help is greatly appreciated. Thank you

SELECT distinct *
FROM orders o
WHERE o.year < '2019'
AND o.productid NOT IN (
                        SELECT distinct(productid)
                        FROM orders
                        WHERE year > '2019');

Please find below the output from explain commmand

+----+--------------------+-------+------------+------+------------------------+------------------------+---------+--------------------------+---------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys          | key                    | key_len | ref                      | rows    | filtered | Extra       |
+----+--------------------+-------+------------+------+------------------------+------------------------+---------+--------------------------+---------+----------+-------------+
|  1 | PRIMARY            | o     | NULL       | ALL  | NULL                   | NULL                   | NULL    | NULL                     | 2124177 |    33.33 | Using where |
|  2 | DEPENDENT SUBQUERY | o2    | NULL       | ref  | FK_orders_product | FK_orders_product | 4       | test-db.o.productid |       3 |    33.33 | Using where |
+----+--------------------+-------+------------+------+------------------------+------------------------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 2 warnings (0.05 sec)
4
  • 1
    Please run explain before your query and post the result in the question. Add table description as well Commented Jun 4, 2022 at 20:22
  • Why do you mention a foreign key to products? You don't use that table in this query. Commented Jun 4, 2022 at 22:19
  • @ErgestBasha - Please find the output from explain select in the question above. Thanks Commented Jun 6, 2022 at 2:45
  • Please provide SHOW CREATE TABLE orders Commented Jun 6, 2022 at 19:40

2 Answers 2

1

You could use not exists.

Hopefully the year column is not a varchar so you should not be using string literals. Presumably using select * means there won't be any duplicates so you should remove distinct.

Your year ranges also exclude 2019 completely, so presumably one of your predicates should be equal to 2019?

select *
from orders o
where o.year < 2019
  and not exists (
    select *
    from orders o2
    where o2.productid = o.productid
      and Year >= 2019
  );
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your help. I did add the changes you mentioned above and it has definitely improved/corrected the results returned. The query time still seems to be around the same time, but not sure how to completely optimize it yet. Thank you
@shashankhr - Please provide the EXPLAIN SELECT ...
@RickJames - Please find the output from explain select in the question above. Thanks
@shashankhr - And for Stu's NOT EXISTS version?
0

Probably both uses of DISTINCT were useless.

Add this composite index (to at least help the NOT EXISTS):

INDEX(product_id, year)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.