Javascript regex to remove punctuation [duplicate]

Question

I'm having trouble with my regex. I'm sure something is not escaping properly.

function regex(str) {
  
  str = str.replace(/(~|`|!|@|#|$|%|^|&|*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|+|=)/g,"")
  document.getElementById("innerhtml").innerHTML = str;
  
 }

<div id="innerhtml"></div>

<p><input type="button" value="Click Me" onclick="regex('test @ . / | ) this');">

Why not take a whitelist approach and only match alphanumeric characters? — blasko
– blasko, Commented Aug 3, 2015 at 0:17
I need to count all accented characters, so not sure how to handle whitelist approach. — Daniel Williams
– Daniel Williams, Commented Aug 3, 2015 at 0:18
I can see { } * + .... by the way str.replace(/[~\!@#$%^&*()\{\}[];:"'<,\.>\?\/\\\|\-_\+=]+/g,"")` seems to work - not sure which is easier to read — Jaromanda X
– Jaromanda X, Commented Aug 3, 2015 at 0:19
@Daniel Why Don't you try escaping everything, Escaping doesn't hurt. — Rohcana
– Rohcana, Commented Aug 3, 2015 at 0:19

royhowie · Accepted Answer · 2015-08-03 05:33:45Z

6

* and + needs to be escaped.

function regex (str) {
    return str.replace(/(~|`|!|@|#|$|%|^|&|\*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|\+|=)/g,"")
}

var testStr = 'test @ . / | ) this'
document.write('<strong>before: </strong>' + testStr)
document.write('<br><strong>after: </strong>' + regex(testStr))

edited Aug 3, 2015 at 5:33

royhowie

11.2k14 gold badges54 silver badges67 bronze badges

answered Aug 3, 2015 at 0:26

Johan Karlsson

6,4761 gold badge22 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

royhowie Over a year ago

The answer on the linked duplicate is better. It makes more sense to put the characters in a [matching group] than it does to a|or|b them.

Community · Accepted Answer · 2017-05-23 12:18:11Z

The accepted answer on the question proposed duplicate doesn't cover all the punctuation characters in ASCII range. (The comment on the accepted answer does, though).

A better way to write this regex is to use put the characters into a character class.

/[~`!@#$%^&*(){}\[\];:"'<,.>?\/\\|_+=-]/g

In a character class, to match the literal characters:

^ does not need escaping, unless it is at the beginning of the character class.
- should be placed at the beginning of the character class (after the ^ in a negated character class) or at the end of a character class.
] has to be escaped to be specified as literal character. [ does not need to be escaped (but I escape it anyway, as a habit, since some language requires [ to be escaped inside character class).
$, *, +, ?, (, ), {, }, |, . loses their special meaning inside character class.

In RegExp literal, / has to be escaped.

In RegExp, since \ is the escape character, if you want to specify a literal \, you need to escape it \\.

Collectives™ on Stack Overflow

Javascript regex to remove punctuation [duplicate]

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related