Preventing duplicate rows based on a column (MySQL)?

Question

I'm building a system that updates its local database from other APIs frequently. I have Python-scripts set as cron jobs, and they do the job almost fine.

However, the one flaw is, that the scripts take ages to perform. When they are ran for the first time, the process is quick, but after that it takes nearly 20 minutes to go through a list of 200k+ items received from the third-party API.

The problem is that the script first gets all the rows from the database and adds their must-be-unique column value to a list. Then, when going through the API results, it checks if the current items must-be-unique value exists in the list. This gets really heavy, as the list has over 200k values in it.

Is there a way to check in an INSERT-query that, based on a single column, there is no duplicate? If there is, simply not add the new row.

Any help will be appreciated =)

Konerak · Accepted Answer · 2011-11-06 10:46:12Z

5

If you add a UNIQUE key to the column(s) that have to contain UNIQUE values, MySQL will complain when you insert a row that violates this constraint.

You then have three options:

INSERT IGNORE will try to insert, and in case of violation, do nothing.
INSERT ... ON DUPLICATE KEY UPDATE will try to insert, and in case of violation, update the row to the new values
REPLACE will try to insert, and in case of violation, DELETE the offending existing row, and INSERT the new one.

answered Nov 6, 2011 at 10:46

Konerak

39.8k13 gold badges102 silver badges121 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Martti Laine Over a year ago

One problem, though. Some occasions require the common unique column to be duplicate, while another column can't be. Is there a way to have the unique column set query-specific?

Konerak Over a year ago

The more complex your bussiness rules are, the less probable a ready-made solution is available. Perhaps are other ways to model them so they get supported by built-in mechanisms. Example: If you have one table with a UNIQUE key where the truly unique cases go, and another cloned table without the UNIQUE key for the other cases, a view that does a UNION of the tables could work for what you want. It depends on your business rules, though...

Collectives™ on Stack Overflow

Preventing duplicate rows based on a column (MySQL)?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related