0

I have a feature where I am attempting to copy a very large CSV file into my database. I am using the pg gem to do so very quickly as explained in this article here POSTGRESQL COPY.

In my schema.rb, I have unique constraints on a model so there are times during the upload process, I'll encounter a PG::UniqueViolation constraint error when attempting to import a file.

What I need to do is I need to be able to capture this error and once captured, write some code that will log the error and the message along with some other details. The problem I am experiencing is that I am unable to currently capture the exception of writing the data into the file. Below is the following pseudo code:

db_conn.copy_data(CODE_COPY_STATEMENT, enco) do
  iterator.each do |line|
    
  .......# CODE #.......

    begin
      db_conn.put_copy_data([information_to_copy])
    rescue StandardError => e
      I need to do some stuff here for processing the error, etc.
    end
   .......# CODE #......
  end
end

So far I'm unable to capture the error and I've tried to capture PG::UniqueValidation, StandardError etc. but to no success. Ultimately, what I need to do is to skip over this error and continue processing the file. Does anyone have anything I can try? Help would be greatly appreciated.

2
  • You can't, COPY is al or none. The best practice is to COPY into a staging table that has no constraints and then move the data from there via SQL commands to the final table. Commented Jan 11, 2022 at 17:48
  • Thanks for chiming in, I appreciate your input. Commented Jan 11, 2022 at 21:08

1 Answer 1

2

COPY is a single statement and can only apply atomically - even if you are streaming in the data in chunks. By the time you catch the exception, the COPY statement has been aborted, none of the rows prior to the violation occurring will be queryable, and the COPY cannot be resumed. In a case where you have, say, 10 rows, and a single row causes a violation, the only way of getting those other 9 rows present is with an INSERT or a COPY that does not include the problematic row. In practice, this means using single-row inserts, possibly with savepoints so you don't have to do a transaction-per-row.

Another approach you may want to consider is to COPY into a table with no constraints, use regular DML to duplicate non-violating rows into the real data, and then drop/truncate the table you used for the import.

But fundamentally, a "resumable, violation-tolerant COPY" just isn't a thing; the statement is the finest-grained level at which an operation can succeed or fail in PG, and COPY is still just one statement.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much for the answer. I appreciate it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.