1

I have the following Regex:

The regex is in a bit of code in our app, I can see it splits words. It obviously removes characters such as $#* and so on. I need it to do the same thing exactly but allow the a hash tag, since the words can now have #hashtags.

"Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) // returns ["test", "words"]

The current Regex removes the hash, i want it to remain. So i get:

["test", "#words"] 
3
  • 1
    Why don't you just .split(/\s+/)? Commented Aug 2, 2016 at 20:05
  • Would splitting on \s be sufficient? Commented Aug 2, 2016 at 20:05
  • The regex is in a bit of code in our app, I can see it splits words. It obviously removes characters such as $#* and so on. I need it to do the same thing but allow the hash, since the words can now have #hashtags. Commented Aug 2, 2016 at 20:09

2 Answers 2

1

Your "Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) does the following:

  • The whole string is turned to lower case
  • The string is split at any word boundary (leading and trailing, meaning Test #words is split into [,Test, #,words,])
  • The parts that match the ^\w+$ regex (1+ word chars from the start till end of string) are kept in the array.

You may use an identical matching approach to also include # with /(?:\B#)?\w+/g:

console.log("Test #words".toLowerCase().match(/(?:\B#)?\w+/g))

The pattern matches:

  • (?:\B#)? - an optional # preceded with a non-word boundary
  • \w+ - 1 or more word chars (from [a-zA-Z0-9_] ranges)

If context is not so important, use a simpler /#?\w+/g regex that will match an optional # anywhere in the string, followed with 1+ word chars.

Sign up to request clarification or add additional context in comments.

7 Comments

I' have added some more detail in the Questions. I want to make sure that the regex does exactly the same thing but allows/passes a hashtag in front of the word. Sorry if i was not completely clear.
That is impossible without code change. The reason is easy: JS regex does not support a lookbehind. Now, the question is: to what extent can the code be changed? I should say that my code above does exactly what you need: it extracts chunks of word chars optionally preceded with #.
Well I can change the regex. The approach you offer seems to work? I'm not a regex expert so I need to know what else does the regex you suggest other that allowing the # in-front of a word.
I tried to explain what your code does, and I think you now see I am doing the same thing, but now, # is also matched. Note that /#?\w+/g is context-unaware, and - from experience - I know that for the majority of people \B#\w+ worked better. That is why I put the (?:\B#)?\w+ version on top.
If context is not so important, use a simpler /#?\w+/g regex that will match an optional # anywhere in the string, followed with 1+ word chars. this is the correct solution. Thank you!
|
0

Just add optional # at the beginning of the regexp to support #hashtags.

"Test #words".toLowerCase().match(/#?\w+/g);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.