I use a database to represent a list of files, and some metadata associated to each of them. I need to update regularly this list of files, adding only the new files and deleting files that do not exist anymore (I have not to touch the existing rows in the table as I would lose the metadata associated).
My current queries take only seconds when I have around 10000 files, but take one hour with my current 150000-files table.
After some research on the Internet, I have been the following process :
- Populate a table "newfiles" with the results of the scan
DELETE FROM files WHERE path NOT IN (SELECT path FROM newfiles);INSERT INTO files (SELECT * FROM newfiles WHERE path NOT IN (SELECT path FROM files));
I also have indexes :
CREATE INDEX "files_path" ON "files" ("path");
CREATE INDEX "files_path_like" ON "files" ("path" varchar_pattern_ops);
CREATE INDEX "files_path" ON "newfiles" ("path");
CREATE INDEX "files_path_like" ON "newfiles" ("path" varchar_pattern_ops);
(I mostly use these indexes for searching in the database; my application has a search engine in files.)
Each of these two queries take more than one hour when I have 150000 files. How can I optimize that ?
Thank you.
INHERITSa parent table, add an appropriate constraint, populate it, create indexes on it. This only works when your new data can be clearly partitioned on a single constraint.