1

I have a database table that stores imported information. For simplicity, its something like:

CREATE TABLE `data_import` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`amount` DECIMAL(12,2) NULL DEFAULT NULL,
`payee` VARCHAR(50) NULL DEFAULT NULL,
`posted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `payee` (`payee`)
)

I also have a table that stores import rules:

CREATE TABLE `import_rules` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`search` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `search` (`search`)
)

The idea is that for each imported transaction, the query needs to try find a single matching rule - this match is done on the data_import.payee and import_rules.seach fields. Because these are both varchar fields, I have indexed them in the hope of making the query faster.

This is what I have come up with so far, which seems to work fine. Albeit slower than I hoped.

SELECT i.id, i.payee, i.amount, i.posted r.id, r.search
FROM import_data id
LEFT JOIN import_rules ir on REPLACE(i.payee, ' ', '') = REPLACE(ir.search, ' ', '')

One thing that the above query does not cater for, is that if import_data.posted = 1, then I dont need to find a rule for that line - is it possible to stop the query joining on that particular row? Similarly, if the payee is null, then it shouldn't try join either.

Are there any other ways that I can optimise this? I realise that doing text joins is not ideal...not sure if there are any better methods.

2 Answers 2

3

I highly recommend doing anything you can to get rid of the REPLACEs in that JOIN. Using REPLACE on both sides of the join totally eliminate the ability to use an index on either table.

Assuming you can get rid of the REPLACEs (by cleansing the existing data and/or new data):

  • If you need to join on text columns, use a single byte per character charset if you application allows for it (for a smaller/faster index).
  • Make the N in VARCHAR(N) as small as you can as it will affect the side of the index (or arguably, use index prefixes).
  • I imagine you want to make the search index on import_rules UNIQUE -- then you're sure to only going to get 1 row result returned per row of import_data

You can throw an AND into your WHERE clause if you'd like to enforce your 'don't join in this case' rule.

LEFT JOIN import_rules ir ON id.payee=ir.search AND id.posted != 1
Sign up to request clarification or add additional context in comments.

Comments

2

The use of REPLACE() on the join is probably breaking the indexing, as it has an index of the values in the field, not the amended values after REPLACE().

As for not joining, you are already using a LEFT JOIN, so non-matching joins will result in NULLs for the import_rules fields; you should be able to add WHERE clauses to force that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.