2

I am trying to remove invisible characters from string

see remove-zero-width-space-characters

iex> str = "\uFEFF<?xml>"
iex> String.replace(str, ~r/[\u200B\u200C\u200D\uFEFF]/, "")   
** (Regex.CompileError) PCRE does not support \L, \l, \N{name}, \U, or \u at position 1
    (elixir) lib/regex.ex:171: Regex.compile!/2
    (elixir) expanding macro: Kernel.sigil_r/2
    iex:44: (file)

error: PCRE does not support \L, \l, \N{name}, \U, or \u at position 1

how can I implement the above regex?

Note: When using a string instead regex it works, but for code efficiency I would like to use regex

iex(34)> String.replace(a, "\uFEFF", "")
"<?xml>"
3
  • 1
    Change it to [\x{200B}\x{200C}\x{200D}\x{FEFF}] Commented May 24, 2018 at 11:58
  • iex(44)> String.replace(a, ~r/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/, "") ** (Regex.CompileError) character value in \x{} or \o{} is too large at position 8 Commented May 24, 2018 at 11:59
  • Enable u flag. Commented May 24, 2018 at 11:59

1 Answer 1

3

Since you are using PCRE to match invisible characters use \p{C} property. This includes all invisible characters. For your case the error is due to the notation being used. PCRE doesn't support \uXXXX but \x{XXXX} and u flag should be set.

/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u

in code:

iex(33)> str = "\uFEFF<?xml>"
iex(34)> String.replace(str, ~r/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u, "") 
"<?xml>"
Sign up to request clarification or add additional context in comments.

1 Comment

I believe the u should be after slash: r/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u. If didn't work try r/(*UTF8)[\x{200B}\x{200C}\x{200D}\x{FEFF}]/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.