1

I am having a problem with this SQL query on mysql that runs for 5 seconds to fetch only 25 records-pretty bad;

select t.* from table1 t
left join table2 t2 on t.id=t2.transaction_id
where t2.transaction_id is null
and t.custom_type =0 limit 25

All the 3 tables have an estimate of 10 million records each.

The structure of the affected tables;

table1 ;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field               | Type         | Null | Key | Default           | Extra          |
+---------------------+--------------+------+-----+-------------------+----------------+
| id                  | int(11)      | NO   | PRI | NULL              | auto_increment |
| loan_application_id | int(11)      | YES  | MUL | NULL              |                |
| loan_repayment_id   | int(11)      | YES  | MUL | NULL              |                |
| person_id           | int(11)      | YES  | MUL | NULL              |                |
| direction           | tinyint(4)   | NO   |     | NULL              |                |
| amount              | float        | NO   |     | NULL              |                |
| sender_phone        | varchar(32)  | YES  | MUL | NULL              |                |
| recipient_phone     | varchar(32)  | YES  | MUL | NULL              |                |
| sender_name         | varchar(128) | YES  |     | NULL              |                |
| recipient_name      | varchar(128) | YES  |     | NULL              |                |
| date_time           | datetime     | NO   | MUL | NULL              |                |
| local_date_time     | datetime     | YES  |     | NULL              |                |
| payment_method      | varchar(128) | YES  |     | NULL              |                |
| project             | varchar(30)  | YES  | MUL | NULL              |                |
| confirmation_number | varchar(64)  | YES  | MUL | NULL              |                |
| reversal_of         | varchar(32)  | YES  |     | NULL              |                |
| custom_type         | int(11)      | YES  |     | 0                 |                |
| timestamp           | timestamp    | NO   |     | CURRENT_TIMESTAMP |                |
+---------------------+--------------+------+-----+-------------------+----------------+

table2;
+---------------------+-------------+------+-----+---------+----------------+
| Field               | Type        | Null | Key | Default | Extra          |
+---------------------+-------------+------+-----+---------+----------------+
| id                  | int(11)     | NO   | PRI | NULL    | auto_increment |
| transaction_id      | int(11)     | YES  | MUL | NULL    |                |
| type                | int(11)     | NO   | MUL | NULL    |                |
| phone_number        | varchar(16) | NO   | MUL | NULL    |                |
| amount              | double      | NO   |     | NULL    |                |
| description         | text        | YES  |     | NULL    |                |
| person_id           | int(11)     | YES  | MUL | NULL    |                |
| loan_application_id | int(11)     | YES  | MUL | NULL    |                |
| repayment_id        | int(11)     | YES  |     | NULL    |                |
| date_time           | datetime    | YES  |     | NULL    |                |
| local_date_time     | datetime    | YES  |     | NULL    |                |
| last_modified_by    | varchar(32) | YES  |     | NULL    |                |
| last_modified       | timestamp   | YES  |     | NULL    |                |
+---------------------+-------------+------+-----+---------+----------------+

table3;
+--------------------------------+--------------+------+-----+---------+-------+
| Field                          | Type         | Null | Key | Default | Extra |
+--------------------------------+--------------+------+-----+---------+-------+
| id                             | int(11)      | NO   | PRI | NULL    |       |
| transaction_type_id            | int(11)      | NO   | MUL | NULL    |       |
| msisdn                         | varchar(32)  | NO   | MUL | NULL    |       |
| amount                         | float        | NO   |     | NULL    |       |
| mobile_money_provider_id       | int(11)      | YES  |     | NULL    |       |
| mobile_money_provider_code     | varchar(32)  | YES  |     | NULL    |       |
| source_external_id             | varchar(128) | YES  |     | NULL    |       |
| source_user_id                 | int(11)      | YES  |     | NULL    |       |
| payment_server_trx_id          | varchar(64)  | YES  | MUL | NULL    |       |
| customer_receipt               | varchar(64)  | YES  | MUL | NULL    |       |
| transaction_account_ref_number | varchar(64)  | YES  |     | NULL    |       |
| status                         | int(11)      | YES  |     | NULL    |       |
| mno_status                     | int(11)      | YES  |     | NULL    |       |
| mno_status_desc                | text         | YES  |     | NULL    |       |
| mno_transaction_id             | varchar(64)  | YES  |     | NULL    |       |
| date_completed                 | timestamp    | YES  |     | NULL    |       |
| date_acknowledged              | timestamp    | YES  |     | NULL    |       |
| created_at                     | timestamp    | YES  |     | NULL    |       |
| updated_at                     | timestamp    | YES  |     | NULL    |       |
| project                        | varchar(32)  | NO   |     | NULL    |       |
| loan_application_id            | int(11)      | YES  | MUL | NULL    |       |
+--------------------------------+--------------+------+-----+---------+-------+

I have already indexed table1(id,custom_type,confirmation_number) table2(transaction_id) table3(customer_receipt) without any significant improvements.

How can i bring down the execution time of this query to below 100 ms?

2
  • Just a suggestion keep in mind that limit 25 ... show only 25 but fetch all .. the rows result in query Commented Jan 14, 2018 at 15:04
  • I see only 2 tables in the query. Please use SHOW CREATE TABLE, it is more descriptive than DESCRIBE. Commented Jan 14, 2018 at 20:14

2 Answers 2

1

This is your query:

select t.*
from table1 t left join
     table2 t2
     on t.id = t2.transaction_id left join
     table3 t3
     on t3.customer_receipt = confirmation_number
where t2.transaction_id is null and t.custom_type = 0
limit 25;

First, you do not seem to need table3, so let's remove that:

select t.*
from table1 t left join
     table2 t2
     on t.id = t2.transaction_id 
where t2.transaction_id is null and t.custom_type = 0
limit 25;

For this query, you want indexes on table1(custom_type, id) and table2(transaction_id).

Sign up to request clarification or add additional context in comments.

2 Comments

Done this already, the query executes at 5 seconds without table3 which is way too slow
@xcoder - So please edit your question to reflect this change.
0

Here are the changes I would try, in the order I would try them.

Adding an index

First, as Gordon Linoff suggests, add the following index:

ALTER TABLE table1
ADD INDEX (`custom_type`,`id`)

Making a column NOT NULL

If that doesn't improve performance enough, then I would change table2.transaction_id to be NOT NULL, if your business rules allow it.

The reason for that is due how the documentation describes how the anti-join you are using is executed (search for "Not exists" on the page):

MySQL was able to do a LEFT JOIN optimization on the query and does not examine more rows in this table for the previous row combination after it finds one row that matches the LEFT JOIN criteria. Here is an example of the type of query that can be optimized this way:

SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id
  WHERE t2.id IS NULL;

Assume that t2.id is defined as NOT NULL. In this case, MySQL scans t1 and looks up the rows in t2 using the values of t1.id. If MySQL finds a matching row in t2, it knows that t2.id can never be NULL, and does not scan through the rest of the rows in t2 that have the same id value. In other words, for each row in t1, MySQL needs to do only a single lookup in t2, regardless of how many rows actually match in t2.

In your query, the t2.id column is your table2.transaction_id column, but that CAN be NULL. If possible, try changing its table definition to be NOT NULL and see if performance improves. (If you must have that column null for other reasons, then obviously this solution will not work.

Adding a cache table

The remaining solution is one that has worked well for me in my job. I have a query that basically finds available "items" for users to pick up. The users in question tend to be aggressive in rapidly refreshing the page that calls this query to find their available items.

My query originally worked in the same fashion as yours. It would take a main table, like your table1, and a LEFT OUTER JOIN table2 ... WHERE table2.xxx IS NULL to exclude those items that someone had already grabbed.

However, since records were never deleted from either table, that started to slow down around when there were 50,000 or so "grabbed" items. Basically, it was taking too long for MySQL to check all the items, to find the 10-100 or so that were not grabbed yet.

The solution was to create a cache table that contained only the ungrabbed items. The server side code was updated to insert two records, instead of one, whenever a new item was available. For your situation, let's call it available_table1.

CREATE TABLE available_table1 (
`id` INT NOT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `Table1_AvailableTable1_fk`
FOREIGN KEY (`id`)
REFERENCES `table1` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Populate this table once with your original query, without the limit:

INSERT INTO available_table1
(`id`)
SELECT
t.id
FROM table1 t
left join table2 t2 on t.id=t2.transaction_id
where t2.transaction_id is null
and t.custom_type =0

Now your query becomes:

select t.* from table1 t
INNER JOIN available_table1 at
ON at.id = t.id
left join table2 t2 on t.id=t2.transaction_id
where t2.transaction_id is null
and t.custom_type =0 limit 25

You will need to clean up this table periodically (we do it nightly) by removing all records where a table2.transaction_id now exists for a given id.

DELETE at FROM available_table1 at
INNER JOIN table2 t2
ON t2.transaction_id = at.transaction_id

If your code can be modified easily enough, you can even remove the available_table record at the moment a table2 record is inserted. However, so long as there are few enough records in available_table1 table, you don't have to be overly aggressive in cleaning it out.

With this change, our query went from being a major headache that was really slowing down the entire application, to one that doesn't even show up anymore in our slow log, which is set to only show queries longer than 0.03 seconds.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.