I have two tables: old_data and new_data.
Both tables have columns: ID, date, value
I want to delete any rows in "old_data" which are not in "new_data", but only between selected dates.
This works in psql:
DELETE FROM old_data
WHERE (id, date) NOT IN (SELECT id, date FROM new_data) AND
id = my_id AND date >= 'my_start_date' AND date <= 'my_end_date';
The start/end dates differ for each id, so I have to run the DELETE separately for each distinct id. There are about 1000 distinct id's in "new_data".
The problem is it is very slow - it takes an hour when "old_data" has 15 million rows and "new_data" has 100,000 rows.
Is there a more efficient way to do this?
WHEREconstraints into the subquery, and skip selectingdatesince you wont need it. That should help at least partially for a query with this structure.whereclause is not involved. You've already pointed out that he should list the table definition, which should bring some clarity. I'm also curious if he has the hardware to support the operations.explain your_query