-3

I'm having trouble with my regex. I'm sure something is not escaping properly.

function regex(str) {
  
  str = str.replace(/(~|`|!|@|#|$|%|^|&|*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|+|=)/g,"")
  document.getElementById("innerhtml").innerHTML = str;
  
 }
<div id="innerhtml"></div>

<p><input type="button" value="Click Me" onclick="regex('test @ . / | ) this');">

5
  • Why not take a whitelist approach and only match alphanumeric characters? Commented Aug 3, 2015 at 0:17
  • I need to count all accented characters, so not sure how to handle whitelist approach. Commented Aug 3, 2015 at 0:18
  • @blasko whitelist approach would be slower. Commented Aug 3, 2015 at 0:18
  • 1
    I can see { } * + .... by the way str.replace(/[~\!@#$%^&*()\{\}[];:"'<,\.>\?\/\\\|\-_\+=]+/g,"")` seems to work - not sure which is easier to read Commented Aug 3, 2015 at 0:19
  • 1
    @Daniel Why Don't you try escaping everything, Escaping doesn't hurt. Commented Aug 3, 2015 at 0:19

2 Answers 2

6

* and + needs to be escaped.

function regex (str) {
    return str.replace(/(~|`|!|@|#|$|%|^|&|\*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|\+|=)/g,"")
}

var testStr = 'test @ . / | ) this'
document.write('<strong>before: </strong>' + testStr)
document.write('<br><strong>after: </strong>' + regex(testStr))

Sign up to request clarification or add additional context in comments.

1 Comment

The answer on the linked duplicate is better. It makes more sense to put the characters in a [matching group] than it does to a|or|b them.
6

The accepted answer on the question proposed duplicate doesn't cover all the punctuation characters in ASCII range. (The comment on the accepted answer does, though).

A better way to write this regex is to use put the characters into a character class.

/[~`!@#$%^&*(){}\[\];:"'<,.>?\/\\|_+=-]/g

In a character class, to match the literal characters:

  • ^ does not need escaping, unless it is at the beginning of the character class.
  • - should be placed at the beginning of the character class (after the ^ in a negated character class) or at the end of a character class.
  • ] has to be escaped to be specified as literal character. [ does not need to be escaped (but I escape it anyway, as a habit, since some language requires [ to be escaped inside character class).
  • $, *, +, ?, (, ), {, }, |, . loses their special meaning inside character class.

In RegExp literal, / has to be escaped.

In RegExp, since \ is the escape character, if you want to specify a literal \, you need to escape it \\.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.