1

I'm inserting millions of records, coming from C++ structures. With regular inserts, I'm getting a terrible performance, with the database taking up 98% of program time, even after config optimization. I'm reading that I should use COPY to import it from a CSV-file.

Now I'm not so sure if writing to a CSV-file first and then rewriting it to the DB will be much of an improvement, since the writing will double. I've looked at piping the STDIN, but at first glance it seems like there's also much overhead, and it's limited

If I've got a string in CSV-format, what would be the quickest way to write that data to my DB?

Thank you in advance,

CX

7
  • Are these millions of records the same every time? Is your database not persistent? Commented Sep 3, 2012 at 18:32
  • The same columns, different data. The database is persistent, afaik. Commented Sep 3, 2012 at 18:41
  • Do you have any indexes on the table? Triggers? Constraints? Are you batching the inserts inside transactions? Commented Sep 3, 2012 at 18:49
  • No indexes besides PK, no triggers, tons of FK's. No batching. Commented Sep 3, 2012 at 19:07
  • Read this: stackoverflow.com/questions/758945/… Commented Sep 3, 2012 at 19:34

1 Answer 1

2

I assume you need to do it once on an offline database. If you need to do it on an online database there's not much you can do besides using a your-program-generating-a-copy-command | psql or PQputCopyData from your program.

  1. Disable all other access to database, terminate all client connections.
  2. Backup a database without your bulk data, as this procedure is dangerous (a crash, reboot or power failure during this will make your database corrupted beyond repair).
  3. Shutdown database.
  4. Move pg_xlog directory from data directory to a tmpfs (RAM-disk), symlink it.
  5. Run postgres -F -c full_page_writes=off -c checkpoint_segments=128 …. It will need about 2GB of free RAM more than usually — be prepared.
  6. Drop primary key contraint from your table.
  7. Drop foreign key constraints from your table.
  8. Load data using COPY from stdin and a pipe or PQputCopyData.
  9. Run analyze.
  10. Recreate primary key constraint.
  11. Recreate foreign key constraints (a file generated by pg_dump before dropping them will have suitable commands near the end).
  12. Shutdown database.
  13. Delete pg_xlog symlink and move pg_xlog directory back to data directory.
  14. Run sync command on server.
  15. Start database normally.

This is based on Populating a Database guide from documentation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.