29

I like the solution povided by "Remove not alphanumeric characters from string. Having trouble with the [\] character" but how would I do this while leaving the spaces in place?

I need to tokenize string based on the spaces after it has been cleaned.

2 Answers 2

68
input.replace(/[^\w\s]/gi, '')

Shamelessly stolen from the other answer. ^ in the character class means "not." So this is "not" \w (equivalent to \W) and not \s, which is space characters (spaces, tabs, etc.) You can just use the literal if you need.

Sign up to request clarification or add additional context in comments.

6 Comments

That works except for special characters like the slanted quotes ’ ” “.
What do you mean? Do you want to include those characters or to exclude them?
Not completely. Normal ascii quotes work but when copying an pasting text from a PDF the angled quotes don't get removed while regular quotes do.
I really don't know what to say .. I can't reproduce this problem
[^\w] = \W and [^\s] = \S thus the regex could be reduced to /[\W\S]/g, the ignore case modifier isn't needed because \W takes these into account.
|
3

I know this is an old thread, but so popular that appears at the top of a Google search. So, as an alternative, the accepted answer and comment from 3limin4t0r inspired me to:

.replace(/\W+/g, " ")

IMHO

const input = document.querySelector("input");
const button = document.querySelector("button");
const output = document.querySelector("output");

button.addEventListener("click", () => {
    output.textContent = input.value.replace(/\W+/g, " ");
})
<input>
<button>Replace</button>
<p>
  <output></output>
</p>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.