I'm scraping a few websites and eventually I hit a UTF-8 error that looks like this:
/usr/local/lib/ruby/gems/1.9.1/gems/dm-core-1.2.0/lib/dm-core/support/ext/blank.rb:19:in
`=~': invalid byte sequence in UTF-8 (ArgumentError)
Now, I don't care about the websites being 100% accurate. Is there a way I can take the page I get and strip out any problem encodings and then pass it around inside my program?
I'm using ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0] if that matters.
Update:
def self.blank?(value)
return value.blank? if value.respond_to?(:blank?)
case value
when ::NilClass, ::FalseClass
true
when ::TrueClass, ::Numeric
false
when ::Array, ::Hash
value.empty?
when ::String
value !~ /\S/ ###This is the line 19 that has the issue.
else
value.nil? || (value.respond_to?(:empty?) && value.empty?)
end
end
end
When I try to save the following line:
What Happens in The Garage Tin Sign2. � � Newsletter Our monthly newsletter,
It throws the error. It's on page: http://www.stationbay.com/. But what is odd is that when I view it in my web browser it doesn't show the funny symbols in the source.
What do I do next?
value? That could be the root of the problem.#encoding: UTF-8magick comment). Maybe Stack Overflow filters out the invalid chars?