SQL Query - slow using replace function

Question

I have a query that is taking a long time, see below...

SELECT R.Email
      ,MAX(R.Id)
      ,MAX(R.Postcode)
FROM ParsedCandidates PC
    INNER JOIN Results R
        ON REPLACE(
             REPLACE(
               REPLACE(
                 REPLACE(R.[Resume], 'D:\documents\', '')
               ,'D:\CMT\Resumes\', '')
             , 'internal_', '')
           , 'monster_', '')
      = REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(PC.[File], 'D:\documents\', '')
            ,'D:\CMT\Resumes\', '')
          ,'internal_', '')
        , 'monster_', '') 
WHERE CONTAINS(PC.ParsedCV, '"Marketing Executive"')
    AND R.Email IS NOT NULL
    AND R.Email <> ''
    AND R.Postcode IS NOT NULL
    AND R.Postcode <> ''
    AND EXISTS (SELECT 1
                FROM Candidates_Sourcing CS
                WHERE CS.Email = R.Email
                    AND CS.Email IS NOT NULL
                    AND CS.Email <> ''
               )
GROUP BY R.Email;

Both the candidates_sourcing table and the results table have many, many rows.

I know the replace function will be causing issues with sargability however I need to do it to ensure the match.

Any ideas how this can be improved

It will be, REPLACE makes the query non-SARGable, so the data engine has to perform a scan of the whole table. My guess is that you are using the REPLACE functions to remove the path and just return the file name? If so, you should really be storing the filename as a separate piece of information; either as a persisted computed column, or storing the path and filename as separate columns (and then indexing appropriately). Then you can simply use a clause like R.ResumeFileName = PC.FileName — Thom A
– Thom A ♦, Commented Apr 4, 2019 at 10:58
Would a computer column mean that It can be indexed? wont the expense of doing the replace just be moved to the computed column and still be expensive? — Matthew Stott
– Matthew Stott, Commented Apr 4, 2019 at 11:08
Why I said a persisted computed column. They can be indexed, and (as the name suggests) the value is stored as a persisted value in the table. — Thom A
– Thom A ♦, Commented Apr 4, 2019 at 11:19
@MatthewStott Yes - it is moving the cost. But you can take the hit when the columns are inserted/updated or you can take the hit every time you run this query. Which do you think is more expensive over time? — SMor
– SMor, Commented Apr 4, 2019 at 12:10
And just to add to @SMor's observation, you're not just shifting the workload, you're dividing it into tiny little pieces. One row's worth of REPLACE work will happen on every insert instead of thousands or millions of occurrences happening all at the same time during the execution of the query. Think of it as a divide and conquer approach, not a simple lift and shift. — Eric Brandt
– Eric Brandt, Commented Apr 4, 2019 at 14:11

MarcinJ · Accepted Answer · 2019-04-04 11:35:18Z

What you can do is create persisted columns on both tables and index those

ALTER TABLE Results ADD FixedPath AS REPLACE(
             REPLACE(
               REPLACE(
                 REPLACE([Resume], 'D:\documents\', '')
               ,'D:\CMT\Resumes\', '')
             , 'internal_', '')
           , 'monster_', '') PERSISTED

CREATE NONCLUSTERED INDEX ixResults_FixedPath ON Results (FixedPath) INCLUDE (...) WHERE (...)

INCLUDE and possibly WHERE of your index will depend on your queries.

If you don't want to alter the table, you can create an indexed view on both these tables and then join the views.

CREATE VIEW v_Results 
WITH SCHEMABINDING
AS
SELECT R.Id
--   , ... other columns ...
     , REPLACE(
                 REPLACE(
                   REPLACE(
                     REPLACE(R.[Resume], 'D:\documents\', '')
                   ,'D:\CMT\Resumes\', '')
                 , 'internal_', '')
               , 'monster_', '') AS FixedPath
  FROM dbo.Resume R
 WHERE R.Email IS NOT NULL
   AND R.Email <> ''
   AND R.Postcode IS NOT NULL
   AND R.Postcode <> ''
GO

However, the index has to be unique here.

CREATE UNIQUE CLUSTERED INDEX ux ON dbo.v_Results (FixedPath, Id);

Having created both these views you can then join

SELECT ...
  FROM v_Results R WITH (NOEXPAND)
  JOIN v_ParsedCandidates PC WITH (NOEXPAND)
    ON R.FixedPath = PC.FixedPath

NOEXPAND hint prevents SQL Server from expanding the view into the underlying query. See here.

Collectives™ on Stack Overflow

SQL Query - slow using replace function

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related