0

I'm on Ruby on Rails but that's not as significant (other than how Rails encodes request parameters).

I have a textbox where the user can enter text. I send this text using XHR back to my ruby backend which does a bunch of string processing. It looks for certain keywords and then returns to the client the list of keywords it found and their start indexes in the string.

I then process the keywords and indexes in javascript to do a bunch more things.

The problem is that if the text contains non-ASCII characters, the indexes of Ruby do not match those of javascript. Javascript handles a non-compliant unicode character just as any other character, whereas Ruby converts it to various code sequences which alter the length of the string, and make indexing useless.

Any advice on how to deal with such a situation? Simple escape/unescape encode/decode won't work.

Here's an example Mary had ä little lamb

I have a keyword match in my DB for little lamb.

Ruby (after Rails parametrizing) returns a length for that string of 23, and the start index of little lamb as 12.

Javascript returns a string length of 22, and a start index of 11.

2 Answers 2

1

I haven't tried this as I haven't used Ruby 1.8.7 ever, but perhaps mb_chars can help you.

http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html

Try running "Mary had ä little lamb".mb_chars.size

Either way, you should upgrade to Ruby 2.1, as Ruby 1.8.7 is no longer supported.

Sign up to request clarification or add additional context in comments.

1 Comment

mb_chars worked great! For future people who are not on Rails, you can also try the jcode module
1

Counting visible characters instead of bytes is a change made to Ruby in version 1.9. To get the same number of bytes in Ruby, maybe you need to upgrade to 1.9.3 or higher if you haven't already:

RUBY_VERSION
#=> "1.9.3"

str = 'Mary had ä little lamb'
keyword = 'little lamb'

str.size
#=> 22

str.index(keyword)
#=> 11

2 Comments

I'm on Ruby 1.8.7 so i guess that's a problem. Any alternatives i can use?
Sorry, none that I know of.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.