postgres encoding error in sidekiq app

Question

I am working on an application where a ruby sidekiq process calls a 3rd party and parses the data into a database.

I am using sequel ad my orm.

I am getting some weird characters back in the results, for example:

"Tweets en Ingl\xE9s y en Espa\xF1ol"

When this gets attempted to save to postgres, the following error happens:

Sequel::DatabaseError: PG::CharacterNotInRepertoire: ERROR: invalid byte sequence for encoding "UTF8": 0xe9 0x73 0x20

The weird thing is that the string thinks it is UTF-8, if I check the encoding name, it says:

name.encoding.name #UTF-8

What can I do to ensure that the data is in the correct format for postgres?

mu is too short · Accepted Answer · 2019-03-13 22:09:13Z

8

Just because the string claims to be UTF-8 doesn't mean that it is UTF-8. \xe9 is é in ISO-8859-1 (AKA Latin-1) but it is invalid in UTF-8; similarly, \xf1 is ñ in ISO-8859-1 but invalid in UTF-8. That suggests that the string is actually encoded in ISO-8859-1 rather than UTF-8. You can fix it with a combination of force_encoding to correct Ruby's confusion about the current encoding and encode to re-encode it as UTF-8:

> "Tweets en Ingl\xE9s y en Espa\xF1ol".force_encoding('iso-8859-1').encode('utf-8')
=> "Tweets en Inglés y en Español"

So before sending that string to the database you want to:

name = name.force_encoding('iso-8859-1').encode('utf-8')

Unfortunately, there is no way to reliably detect a string's real encoding. The various encodings overlap and there's no way to tell if è (\xe8 in ISO-8859-1) or č (\xe8 in ISO-8859-2) is the right character without manual sanity checking.

edited Mar 13, 2019 at 22:09

answered Oct 31, 2013 at 18:03

mu is too short

436k71 gold badges863 silver badges822 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

postgres encoding error in sidekiq app

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related