3

I have the following file named test.rb encoding in UTF-16LE

# encoding: UTF-16LE

test = "test!"
p test

Running it with the following command produces no results

ruby ./test.rb

What am I missing here?


In case anyone is wondering, the reason I'm trying to set my source to UTF-16LE encoding is that I'm working with UTF-16LE input and output file encodings. My impression is that if I set encoding properly when I read in a file and set the encoding properly when I output and I have the # encoding: set properly in my source, everything should just work. If anyone sees anything wrong with this (or an easier way) feel free to let me know.

4
  • Writing your source code in an encoding has no bearing on what encoding you read or write a file. Commented Nov 21, 2010 at 0:21
  • @Greg So what you're saying is that if my source code is UTF-8 and I write a string to a UTF-16LE file, it will be automatically converted to the proper encoding? Commented Nov 21, 2010 at 0:58
  • No. It won't be automatically converted. You have to tell Ruby what encoding your file I/O is in. See Mladen Jablanović's answer as he's pointing you in the right direction. Commented Nov 21, 2010 at 1:08
  • 1
    To clarify: If I tell ruby to encode my output file as UTF-16, will all my strings be converted before they're written? String encoding is whatever the source file encoding is (unless it's specified), right? Commented Nov 21, 2010 at 1:30

1 Answer 1

7

Writing your program in UTF-16 in order to process UTF-16 files sounds like naming your variables in Russian in order to make a Russian website. :)

Ruby 1.9 supports string encodings, and James Gray has an excellent series of articles on the topic - I consider them a reference guide to encodings in Ruby.

In short, you can specify the encoding of your input files when you open them:

s = ''
File.open('utf16le.txt', 'rb:UTF-16LE') do |f| # here you set the encoding
  s = f.read
end
p s.encoding
#=> #<Encoding:UTF-16LE>
p s.length
#=> 19
p s
#=> "test\nmladen\n\u0436\u045F\u0446\u0432\u0431\n\n"

Everything is also in the docs for 1.9 IO class:

http://ruby-doc.org/ruby-1.9/classes/IO.html

Sign up to request clarification or add additional context in comments.

2 Comments

Any idea why my code doesn't work? My only guess is that UTF-16LE text isn't supported by terminal and doesn't display, but I would think it would output the \uXXX code at the very least.
In particular, "The Default External and Internal Encodings" section in James Gray's article is germane to the OP's question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.