26

I am generating CSV files that needs to be opened and reviewed in Excel once they have been generated. It seems that Excel requires a different encoding than UTF-8.

Here is my config and generation code:

csv_config = {col_sep: ";", 
              row_sep: "\n", 
              encoding: Encoding::UTF_8
             }

csv_string = CSV.generate(csv_config) do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

When opening this in Excel, the special characters are not being displayed properly:

Text a  Text b  Text æ Text ø Text å

Any idea how to ensure proper encoding?

5
  • Try putting # encoding: UTF-8 as your Ruby file's first line (second if you have a hash-bang line, #!/usr/bin/env ruby). I believe you are writing in UTF-8, but the Ruby source file is taken to be encoded as US_ASCII. (With Ruby 2.0+, source encoding defaults to UTF-8) Commented May 21, 2015 at 8:30
  • I am using ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin12.4.0] so I suppose that means that my installation is already defaulting to UTF-8. Commented May 21, 2015 at 9:40
  • 2
    No experience with Ruby. But Excel can open semicolon delimited CSV files which are UTF-8 encoded. But the file must have a BOM at its beginning. And whether the semicolon can be used as delimiter is locale dependent. So the best approach is to use tabulator delimited CSV and those UTF-16LE encoded. This should be most locale independent. Commented May 21, 2015 at 12:13
  • What Excel are you using? I had no trouble getting the special characters to display in Excel 2013. Commented Sep 10, 2015 at 9:23
  • Another hint: With the axlsx-gem it is easy to create direct a xlsx-files. Commented Apr 20, 2016 at 19:03

5 Answers 5

37

Excel understands UTF-8 CSV if it has BOM. That can be done like:

Use CSV.generate

# the argument of CSV.generate is default string
csv_string = CSV.generate("\uFEFF") do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

Use CSV.open

filename = "/tmp/example.csv"

# Default output encoding is UTF-8
CSV.open(filename, "w") do |csv|
  csv.to_io.write "\uFEFF" # use CSV#to_io to write BOM directly 
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks the CSV.open solution worked perfectly for me! Slightly cleaner solution than other answers.
#<NoMethodError: undefined method `to_io' for #<StringIO:0x00000001d0c540> :(
This works for me even when I open on excel online, thanks!
FYI: \uFEFF is BOM for UTF-16. Use \xEF\xBB\xBF for UTF-8. Here is list of BOMs for UTF encodings.
this is the only thing that worked for me. I wish we could just do "CSV.open(filename, "w", encoding: "bom|utf-8")". That works while reading the file but it doesn't while writing to the output file.
|
33

The top voted answer from @joaofraga worked for me, but I found an alternative solution that also worked - no UTF-8 to ISO-8859-1 transcoding required.

From what I've read, Excel, can indeed handle UTF-8, but for some reason, it doesn't recognize it by default. But if you add a BOM to the beginning of the CSV data, this seems to cause Excel to realise that the file is UTF-8.

So, if you have a CSV like so:

csv_string = CSV.generate(csv_config) do |csv|
  csv << ["Text a", "Text b", "Text æ", "Text ø", "Text å"]
end

just add a BOM byte like so:

"\uFEFF" + csv_string

In my case, my controller is sending the CSV as a file, so this is what my controller looks like:

def show
  respond_to do |format|
    format.csv do
      #  add BOM to force Excel to realise this file is encoded in UTF-8, so it respects special characters
      send_data "\uFEFF" + csv_string, type: :csv, filename: "csv.csv"
    end
  end
end

I should note that UTF-8 itself does not require or recommend a BOM at all, but as I mentioned, adding it in this case seemed to nudge Excel into realising that the file was indeed UTF-8.

1 Comment

Don't be confused like i was, older Excel does not work (2010 and above)
11

You should switch the encoding to ISO-8859-1 as following:

CSV.generate(encoding: 'ISO-8859-1') { |csv|  csv << ["Text á", "Text é", "Text æ"] }

For your context, you can do this:

config = {
  col_sep: ';',
  row_sep: ';',
  encoding: 'ISO-8859-1'
}

CSV.generate(config) { |csv|  csv << ["Text á", "Text é", "Text æ"] }

I had the same issue and that encoding fixed.

3 Comments

The answer above worked for me, but only after I removed the col_sep and row_sep arguments. Just the encoding: 'ISO-8859-1' was all I needed. For context, the specific issue I was having was é characters appearing as é
Good catch Greg, I will update the example without the context.
yeah, sure, 8859-1 good for everything yay, try writing some japanese or arabic characters in a CSV file like that. OP asked specifically for UTF-8, so why advise him to go decades back in time?
0
config = {

  encoding: 'ISO-8859-1'
}

CSV.generate(config) { |csv|  csv << ["Text á", "Text é", "Text æ"] }

2 Comments

Would you like to augment your code-only answer with some explanation?
OP asked for UTF-8 specifically, not 8859-1 aka ANSI
0

With https://github.com/gtd/csv_builder, I had to:

In the controller action:

@output_encoding = 'UTF-8'
send_data "\uFEFF" + render_to_string(), type: :csv, filename: @filename

Atop the csv.csvbuilder template:

faster_csv.to_io.write("\uFEFF")

I don't know why I had to add the BOM twice, but it did not work with either one on its own.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.