3

I have a list of words and want to find which ones already exist in the database.

Instead of making tens of SQL queries, I decided to use "SELECT word FROM table WHERE word IN(array_of_words)" and then loop through the result.

The problem is database collation.

http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html

There are many different characters, which MySQL treats as the same. However, in Ruby code string1 would not be equal to string2.

For example: if the word is "šuo", database might also return "suo", if it's found (and it's ok), but, when I want to check, if something by "šuo" is found, Ruby, of course, returns false (šuo != suo).

So, is there any way to compare two strings in Ruby in terms of the same collation?

0

1 Answer 1

1

I've used iconv like this for something similar:

require 'iconv'

class String
  def to_ascii_iconv
    Iconv.new('ASCII//IGNORE//TRANSLIT', 'UTF-8').iconv(self).unpack('U*').select { |cp| cp < 127 }.pack('U*')
  end
end

puts 'suo'.to_ascii_iconv
# => suo
puts 'šuo'.to_ascii_iconv
# => suo
puts 'suo'.to_ascii_iconv == 'šuo'.to_ascii_iconv
# => true

Hope that helps!

Zubin

Sign up to request clarification or add additional context in comments.

1 Comment

But how do I know which conversion to use based on the db's collation ordering? :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.