I have a query that is taking a long time, see below...
SELECT R.Email
,MAX(R.Id)
,MAX(R.Postcode)
FROM ParsedCandidates PC
INNER JOIN Results R
ON REPLACE(
REPLACE(
REPLACE(
REPLACE(R.[Resume], 'D:\documents\', '')
,'D:\CMT\Resumes\', '')
, 'internal_', '')
, 'monster_', '')
= REPLACE(
REPLACE(
REPLACE(
REPLACE(PC.[File], 'D:\documents\', '')
,'D:\CMT\Resumes\', '')
,'internal_', '')
, 'monster_', '')
WHERE CONTAINS(PC.ParsedCV, '"Marketing Executive"')
AND R.Email IS NOT NULL
AND R.Email <> ''
AND R.Postcode IS NOT NULL
AND R.Postcode <> ''
AND EXISTS (SELECT 1
FROM Candidates_Sourcing CS
WHERE CS.Email = R.Email
AND CS.Email IS NOT NULL
AND CS.Email <> ''
)
GROUP BY R.Email;
Both the candidates_sourcing table and the results table have many, many rows.
I know the replace function will be causing issues with sargability however I need to do it to ensure the match.
Any ideas how this can be improved
REPLACEmakes the query non-SARGable, so the data engine has to perform a scan of the whole table. My guess is that you are using theREPLACEfunctions to remove the path and just return the file name? If so, you should really be storing the filename as a separate piece of information; either as a persisted computed column, or storing the path and filename as separate columns (and then indexing appropriately). Then you can simply use a clause likeR.ResumeFileName = PC.FileNameREPLACEwork will happen on every insert instead of thousands or millions of occurrences happening all at the same time during the execution of the query. Think of it as a divide and conquer approach, not a simple lift and shift.