0

While debugging my Rails App I found the following messages in the log file:

(0.1ms)  ROLLBACK
Completed 500 Internal Server Error in 25ms (ActiveRecord: 4.2ms)
ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x89 u...' for column 'description' at row 1: INSERT INTO `course` (`title`, `description`) VALUES ('sometitle', '<p>Description containing 😉 and stuff</p>')

This seems to stem from my database being MySQL with not-quite-utf-8:

CREATE TABLE `course` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(250) DEFAULT NULL,
  `description` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2080 DEFAULT CHARSET=utf8;

According to the answers to this question CHARSET=utf8 is only capable of handling 3 byte characters, not 4 byte characters.

The Emoticon 😉 needs four bytes - see \xF0\x9F\x98\x89 in the log file.

I am wary of converting the whole database. I would rather forbid the use of emoticons and other 4 byte characters - they are really not necessary on my site.

What is the best way to do this in Rails?

2 Answers 2

0

Building on the regular expressions from these answers I implemented a validator:

# file /lib/three_byte_validator.rb
# forbid characters that use more than three byte
class ThreeByteValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    if value =~ /[\u{10000}-\u{10FFFF}]/
      record.errors.add attribute, (options[:message] || 'Keine Emoticons, keine UTF-8 Zeichen mit 4 Byte')
    end
  end
end

Now I can use this validator on the model that first had the problem:

class Course < ApplicationRecord
  validates :title, length: { in: 3..100 }, three_byte: true
  validates :description, length: { minimum: 50 }, three_byte: true

and also on other models:

class Person < ApplicationRecord
  validates :firstname, :lastname, :country, :city, :address, three_byte: true
Sign up to request clarification or add additional context in comments.

Comments

0

In MySQL, the Winking Face (and most other Emoji) needs utf8mb4 instead of utf8.

1 Comment

yes, thank you! but as I said: I am wary of converting the whole database. it's in production, there is a lot of data. I'd rather not change something so fundamental.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.