0

I have a table with 55 columns. This table is going to be populated with data from a CSV file. I have created a PHP script which reads in the CSV file and inserts the records.

Whilst scanning through the CSV file I noticed there are some rows that are duplicates. I want to eliminate all duplicate records.

My question is, what would be the best way of doing this? I assume it will be either one of these two options:

  1. Remove / skip duplicate records at source, i.e. duplicate records will not be inserted in the table.

  2. Insert all records from the CSV file, then query the table to find and remove all duplicate records.

For option one, would this be possible to do using MS Excel or even just a text editor?

For option 2, I came across some possible solutions but surely this would result in a rather large query. I am looking for something short and simple. Is this at all possible to do?

1 Answer 1

2

A good way is to define a key for the table. A key is a set of fields that make each record unique and all other fields depend on it. (In the worst case the key will consist of all the columns in your table but usually you can define a smaller key). Then you can use the database itself to enforce that key for example using a primary key constraint or an unique index.

Sign up to request clarification or add additional context in comments.

1 Comment

Do you mean a UNIQUE key? Can I create just one key and specify multiple columns within it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.