0

I am processing a big CSV file with a lot of columns and rows (tens of thousands, so it's nearly impossible to check cell by cell).

Somewhere in the file has probably occurred a bad character. I've tried to use construction begin - rescue to skip the currently processed row if there's an error (and mainly the error in the headline), but it doesn't work, the script will stop when it stumble upon the character.

Is there any way to ignore/skip this "bad" character/symbol? For processing the CSV file, I am using SmartCSV.

EDIT: Some code

datas = SmarterCSV.process(file, {:col_sep => ';', :chunk_size => 100, :remove_empty_values => false, :remove_empty_hashes => false }) do |data|
  begin
    data.each do |d|
      user.something = d[:hobby]
      ...
      here is basically just saving data from the file to database tables
      ...
    end
  rescue => e
    logger.warn "Ooops, an error occurred while processing this record: #{e}" 
  end
end  

I've also tried to put the begin construction into the data.each, but it didn't help to avoid the situation too.

As a solution for this issue is to use encoding every element/cell of the file, but each row has like 70 cells... So am trying to look for a better solution, if there is any.

EDIT2: Adding # encoding: UTF-8 on top of the file processing the CSV. The CSV file has us-ascii charset.

5
  • Any way you could post your script here? Commented Oct 14, 2014 at 21:16
  • Fix and tackle the encoding issue on the file before you CSV read on it, read this post -> stackoverflow.com/questions/4697413/… Commented Oct 14, 2014 at 21:23
  • What @DevDude says, or at least the rescue attempt. Commented Oct 14, 2014 at 21:23
  • @steenslag it seems this library is really closed for modification, but reading around, I found that this issue is very common, and the best thing is to actually fix the encoding before trying anything else. Commented Oct 14, 2014 at 21:24
  • @DevDuda steenslag thank you guys for your messages, I added some code that illustrate how I am processing the CSV file. Commented Oct 14, 2014 at 22:10

1 Answer 1

0

I was having a similar issue and providing encoding while opening the file solved the problem for me:

file = File.open(params[:file].tempfile, "r:bom|utf-8")
SmarterCSV.process(file, {chunk_size: 10000, col_sep: ";"}) do |chunk|
  # ...
end
Sign up to request clarification or add additional context in comments.

5 Comments

Hey bodrovis, thank you for your input. What's in "r:bom|utf-8" the r:bom?
Trying it now, we'll see.
I believe you also need to re-encode your file in UTF-8 with BOM.
I've tried to use the method you suggested but it unfortunately doesn't work - the same error. What do you mean by I believe you also need to re-encode your file in UTF-8 with BOM.? Could you give me an example? Thank you @bodrovis
If you have a CSV file you might use Notepad++ (or Excel) to open it up and save in UTF8 with BOM (Encoding menu in Notepad++). Unfortunately, I have no other ideas :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.