2

I want to remove stopwords from text but fail to use regex and variables properly. For example I remove the stopword "he" but this also affects the word "when". I tried to use word boundaries like this:

new RegExp('\b'+stopwords[i]+'\b' , 'g') but doesn't work...

See a small example here: jsFiddle

var stopwords = ['as', 'at', 'he', 'the', 'was'];
for (i = 0; i < stopwords.length; i++) {
    str = str.replace(new RegExp(stopwords[i], 'g'), '');
}

3 Answers 3

9

Something like this maybe

str = str.replace(new RegExp('\\b('+stopwords.join('|')+')\\b', 'g'), '');

FIDDLE

You have to double escape in RegExp, and you could just join everything creating

/\b(as|at|he|the|was)\b/g
Sign up to request clarification or add additional context in comments.

5 Comments

Nice! Do you know if this join is faster than the for loop? Consider that I have a list of 300 stopwords.
I'd assume it is, as it creates one single regex and does the replace once, and not 300 times.
Can you explain the use of .join('|') Sorry but I noticed this difference now :-)
You need to enclose the joined words in a group for the \b anchors to work properly, i.e. the pattern should be: /\b(?:word1|word2)\b/ instead of: /\bword1|word2\b/.
@antithesis - join('|') joins an array with the pipe as "glue", and as noted by ridgerunner there's parenthesis added to create the regex I posted in the answer.
2

Use \\b to make a single \b.

new RegExp('\\b'+stopwords[i]+'\\b' , 'g')

Comments

1

You need to escape backslash becasue it's inside string literal, not in the regular expression:

new RegExp('\\b' + stopwords[i] + '\\b' , 'g')

Otherwise, '\b' is BACKSPACE character ('\x08').

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.