0

Can anyone please help me with below Requirement.

I have a requirement to check if a column in a record matches with any other column i want to replace the duplicate column with empty string.

Say i have x1,x2,x3 columns. How to check if x1 matches with any of the x1,x2,x3 columns and if it matches i want to replace the duplicate column with empty string.

6
  • 1
    Won't x1 always equal x1? Could you provide a more clear example? Commented Feb 7, 2017 at 13:59
  • No it may or may not. If it matches then it will be duplicateand i have to replace that duplicate column with empty string Commented Feb 7, 2017 at 14:39
  • Perhaps I am misunderstanding. Are you trying to see if a value of a column in one record is the same as a value of several columns in a different record? Commented Feb 7, 2017 at 15:14
  • The requirement is i have 10k plus records in a file and each record has customer details. The record includes three columns for the phone number. So i want to search if the phone number exists in any other records and if found i want to replace it with empty string. Commented Feb 7, 2017 at 17:20
  • Just to be clear, you would like to check if the phone numbers in record 1 exist in any of the other phone number columns of the other 9,999 records? Or is it: in record 1, you would like to see if the phone number is duplicated across phone number columns and, if so, blank out the repeated values in phone2 or phone3? Commented Feb 7, 2017 at 18:42

3 Answers 3

0

Doing this is more complexe than one would expect. Here are 2 options:

  1. Try the fuzzy lookup by duplicating the file and comparing it with itself with a high threshold. I suspect you want to check for the same record if there is a match on other columns so you will need to create an exact match on the key (go under the Columns tab and right click on the link, Edit Mappings) and do the fuzzy on the others. You can only link a field once so duplicate the columns as needed.
  2. Do a stored proc with all the combinations and have it generate an out table with the results (you can run a stored proc using the OLE DB Command). I would probably go with that one if I am sure of the "exactness" of the data. Otherwise, go with the fuzzy.
Sign up to request clarification or add additional context in comments.

Comments

0

Since you only have a few columns, you could just run a set of update statements like the following:

update Contacts
set Phone2 = null
where Phone2 = Phone1

update Contacts
set Phone3 = null
where Phone3 = Phone1

update Contacts
set Phone3 = null
where Phone3 = Phone2

Comments

0

Accomplishing this task within an SSIS dataflow would be a bit tricky, because you would be trying to compare all of the other rows in all the buffers compared to the current row.

Instead, I would recommend staging the data in a table as Gordon Bell has suggested. Then you need to determine which row wins when a duplicate is found. You might have a date column to sort it out, or you may add a row number column to the data flow in ssis and sort by how you received the data.

Here is an example of how you might find the winning row and update others with a self join: Deleting duplicate record in SQL Server

m

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.