I'm building a system that updates its local database from other APIs frequently. I have Python-scripts set as cron jobs, and they do the job almost fine.
However, the one flaw is, that the scripts take ages to perform. When they are ran for the first time, the process is quick, but after that it takes nearly 20 minutes to go through a list of 200k+ items received from the third-party API.
The problem is that the script first gets all the rows from the database and adds their must-be-unique column value to a list. Then, when going through the API results, it checks if the current items must-be-unique value exists in the list. This gets really heavy, as the list has over 200k values in it.
Is there a way to check in an INSERT-query that, based on a single column, there is no duplicate? If there is, simply not add the new row.
Any help will be appreciated =)