Regex error: too many multibyte code ranges are specified

Question

I have a regex that needs to match a bunch of characters. The code has no problem is ruby 1.8.7 but in 1.9 it kills over. I guess it has to do with encoding, I've done a good chunk of google searches so maybe someone can enlighten me.

Code:

# encoding: utf-8
non_latin_hashtag_chars = [
  (0xA960..0xA97F).to_a, # Hangul Jamo Extended-A
  (0xAC00..0xD7AF).to_a, # Hangul Syllables
  (0xD7B0..0xD7FF).to_a  # Hangul Jamo Extended-B
].flatten.pack('U*').freeze

e = /[a-z_#{non_latin_hashtag_chars}]/io

Error:

~/Desktop: ruby regex_test.rb 
regex_test.rb:13:in `<main>': too many multibyte code ranges are specified: /[a-z_가각갂갃간갅갆갇갈갉갊갋갌갍갎갏감갑값갓갔강갖갗갘같갚갛개객갞갟갠갡갢갣갤갥갦갧갨갩갪갫갬갭갮갯갰갱갲갳갴갵갶갷갸갹갺갻갼갽갾갿걀걁걂걃걄걅걆걇걈걉걊걋걌걍......

Do you mean "it keels over", not "it kills over"?

Andrew Grimm
– Andrew Grimm

2011-07-04 22:45:22 +00:00
Commented Jul 4, 2011 at 22:45 — Andrew Grimm
– Andrew Grimm, Commented Jul 4, 2011 at 22:45

Marc-André Lafortune · Accepted Answer · 2011-07-03 16:35:25Z

7

As twehad points out, there is a 10k limit in regexp.

In anycase, you should use unicode ranges within the Regexp:

/[a-z_\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]/io

I'm not an expert in Korean so I don't know if it is equivalent, but if you want to match all Hangul characters, you should use the class for that instead:

/[a-z_\p{Hangul}]/io

edited Jul 3, 2011 at 16:35

answered Jul 3, 2011 at 16:24

Marc-André Lafortune

79.8k17 gold badges172 silver badges167 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

the Tin Man Over a year ago

I think this is the most practical solution. Breaking the individual ranges into separate regex checks joined with a || would be my second choice. Order the tests by the most likely to least likely to be encountered might buy some speed.

twehad · Accepted Answer · 2011-07-03 15:51:27Z

4

This is the limit 10000 multibyte сhar in regex.

You need to change ONIG_MAX_MULTI_BYTE_RANGES_NUM config parameters(/ruby-1.9.2-p*/include/ruby/oniguruma.h):

#define ONIG_MAX_MULTI_BYTE_RANGES_NUM     10000

and then recompile ruby.

answered Jul 3, 2011 at 15:51

twehad

3011 silver badge1 bronze badge

Collectives™ on Stack Overflow

Regex error: too many multibyte code ranges are specified

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related