0

I need to join some tables by product id, the problem is that some products in different tables are not written in full format (full format is 12 digits whereas in some cases the same product is written with only the last 6 digits).

I tried to compare by the last 6 digits with length(product_id,6) but it's not a pure comparison due to some products sharing the same last 6 digits but they are not the same.

I tried to perform a check of the length of the product id and only, in that case, take compare the last 6 digits the result are reliable but it working too slowly.

Methode 1: normal join fetch in 30-100 ms

SELECT * FROM
table_a ta 
  JOIN table_b tb ON ta.product_id = tb.product_id 

Methode 2: join by last 6 digits fetch in 100-200 ms but ~10% are fake join

SELECT * 
FROM table_a ta 
  JOIN table_b tb ON right(ta.product_id,6) = right(tb.product_id,6) 

Methode 3: join by last 6 digits only if the length of product id is not equal,data is reliable but fetch around 30 seconds

SELECT * 
FROM table_a ta 
  JOIN table_b tb ON
                     CASE 
                       WHEN LENGTH(ta.product_id) <> LENGTH(tb.product_id) 
                         THEN right(ta.product_id,6) = right(tb.product_id,6) 
                       ELSE ta.product_id = tb.product_id    
                     END

Why does using case conditions make it so slow? How do you suggest I compare those tables? I'm pretty sure there is another method that is not familiar to me yet.

1
  • You didn't show the execution plans, but the reason is very likely that if your join condition is not an equality comparison, the optimizer can only choose a nested loop join, which is slow with big tables and sequential scans. Commented Dec 19, 2022 at 7:01

1 Answer 1

2

I hope I do not have to say: having distinct values to represent the same product is bad design; the one thing you should do is correct that.
I will set that point aside for the rest of my answer though.

Method 2 is the better candidate for modification, I will explain below why Method 3 cannot be salvaged.
You have already written than the last 6 characters of product_id must be equal. We will simply add that product_id must be equal (all characters) or that their length must be different.

SELECT *
FROM table_a
JOIN table_b
ON RIGHT(table_a.product_id, 6) = RIGHT(table_b.product_id, 6)
AND (
    table_a.product_id = table_b.product_id
    OR LENGTH(table_a.product_id) <> LENGTH(table_b.product_id)
)

Some additional notes:

  1. Have you created a suitable index to perform the JOIN? Even though Method 2 does not look too bad in terms of performance, you should probably do:

    CREATE INDEX ON table_a (RIGHT(product_id, 6));
    CREATE INDEX ON table_b (RIGHT(product_id, 6));
    
  2. About the performance of Method 3:
    The CASE WHEN ... THEN ... ELSE ... is probably immensely inefficient. The worst part is the WHEN that will try to join probably way more records than your expect.
    Example: 100000123456 in table_a will be joined with all 6-character-long ids from table_b; you have detected what you call "10% fake join" in cases like 200000123456 but the vast majority of the pairs tested are just random garbage like 654873. Your CASE forces the evaluation of that garbage, hence the huge performance cost.

  3. You are talking about digits in your question but based on the functions you use, I can see the columns are not integer_based types. I recommend you try converting the type of product_id to bigint.
    The last 6 digits are MOD(product_id, 1000000) and the LENGTH can be replaced by a comparison product_id >= 1000000.
    Make sure you test it before you actually make the change on the real tables.

  4. I repeat myself but the correct way to solve your issue is by ensuring product_id is in 12-digit format everywhere. You should do a massive UPDATE using the query above.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.