1

I have a string from an input named 'text' and from this one, I would like to generate hashtags for a different field in my mongoose model:

req.body.tags = req.body.text
  .split('#')
  .map((tag) =>
    tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase()
  )
  .filter((tag) => tag.length !== 0)

The code above is almost perfect but every time, I press a comma it gets inserted as a hashtag(or part of it) which is something I'm trying to avoid as well, take a look into what I'm talking about:

{
    "text": "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined"
}

The text above is the data I insert via Postman and this is the output:

"tags": [
  "hola,-mi-nombre-es-kevin-y-tu-como-te-llamas?",
  "random,",
  "userundefined"
],

What I would like to get is this:

"tags": [
  "random",
  "userundefined"
],

I just want to retrieve the words followed by a # and just that, I don't want the commas after it as shown in the random tag

3
  • would a ? be apart of the the text allowed in tags? Commented Mar 10, 2022 at 1:10
  • 1
    .text.match(/(#\w+)/g) ? Commented Mar 10, 2022 at 1:20
  • no unless it is located inside the word itself, example #rand?om Commented Mar 10, 2022 at 1:20

4 Answers 4

1

matchAll should be usefull here...

The demo below is based on the documentation example. It returns an array of match arrays. In your case, you want the match[1] of each match, therefore the chained map.

let text = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined, #user-defined, #Grand_Father, #test123Four, #99startWithNumberIsWrong, #911, #Special!characters?"

let validHashtags = [...text
  .toLowerCase()
  .matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
  .map(match => match[1])
  
console.log(validHashtags)

So that would be:

req.body.tags = [...req.body.text
  .toLowerCase()
  .matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
  .map(match => match[1])

I used a regular expression that complies with hashtag.org:

  • No spaces
  • No Special Characters
  • Don't Start With or Use Only Numbers

For the length and slangs, you simply should advise your users about it when they enter the text.

Sign up to request clarification or add additional context in comments.

7 Comments

You code was almost there but it was also removing the - characters!. Lol!.
I did not get the thing about -... You were replacing spaces by - (with split/join), but there is no practical use to it. No one will attempt to write a hashtag with a space. Except my Grand Pa, maybe.
no, as in the input being like #some-tag you would have #some as a key saved.. try this text for example "Hola, mi nombre es Kevin y tu como te llamas? #random, #user-undefined"
Lol, yours is now working as I mainly wanted xD. This was a good question, lol. Thank you all, love you all, lol.
@LouysPatriceBessette I looked up hashtag "standards" and apparently it's the _ that's allowed not - and also numbers but not ONLY numbers.. I'm going to edit my question to suit.. see here
|
0

Well, you could play around with req.body.text in a slightly different way, now that I see you want to filter out the tags from full texts

working example

//you can declare this function outside of whatever web-server callback you have
function tagsFrom(text){ //function
  var toReturn=[], i=0, hashtag=false
  let isNumber=(n)=>Number(n).toString()==n //!isNaN(n) doesn't work properly
  let isValidChar=(c)=>c=="_"?true:isNumber(c)||(c.toUpperCase()!=c.toLowerCase())
  for(let c of text){
    if(typeof toReturn[i]!="string"){toReturn[i]=""} //because I can't add a character to undefined
    if(c=="#"){hashtag=true;continue} //hashtag found
    if(isValidChar(c)&&hashtag){toReturn[i]+=c} //character of a hashtag word
    else if(hashtag){hashtag=false;i++} //no longer the hashtag
  }
  return toReturn.filter(tag=>tag.length&&!isNumber(tag))
  //no empty values and a tag can't be ONLY a number
}

var req={body:{text:"Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined #1 #spanish101 #, #1name ##underscore_test"}} //test variable
req.body.tags = tagsFrom(req.body.text) //usage
console.log(req.body.tags)

MY EDIT: I parse hashtags based on this article about hashtags

2 Comments

This is exactly what I was looking for. The only modification I added to your solution was on the return statement, return toReturn.filter((tag) => tag.length !== 0). Your function kept returning empty hashtags but out of that, the solution makes sense. Thanks!
oof, empty tags.. didn't think about that one but yea, fun question
0

You might want to trim the commas off the ends of your tags after you split:

.map((tag) => {
  let newTag = tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase();
  if (newTag.endsWith(',')) {
    newTag = newTag.substring(0, newTag.length - 1);
  }
  return newTag;
})

EDIT: If you're trying to get rid of anything that isn't preceded by a hashtag, you need to do your split a little differently. I would recommend maybe looking for .indexOf('#') and using .substring() to remove anything up to that index, then do your split.

5 Comments

Well, so far I've been able to remove the last comma from the strings.
isn't that what you want?
Yes and no. Right now the only problem left would be sentences not including the # character. As of now, the code above with your addition, keeps adding the text of hola,-mi-nombre-es-kevin-y-tu-como-te-llamas? which is something I also want to get rid off.
@Kirasiris I have the answer that does specifically what you want
@Kirasiris see my edit
0

If you don't want to match digits only, or start with a digit you can use a capture group and use for example matchAll to get the group value.

\B#([^\W\d]\w*)\b

The pattern matches:

  • \B# A non word boundary followed by matching #
  • ( Capture group 1
    • [^\W\d]\w* Match a word character not being a digit, then match optional word characters
  • ) Close group 1
  • \b A word boundary

Regex demo

Example code:

const s = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined";
const regex = /\B#([^\W\d]\w*)\b/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.