I need to -automatically- generate tags for a text string. In this case, I'll use this string:
var text = 'This text talks about loyalty in the Royal Family with Príncipe Charles';
My current implementation, generates the tags for words that are 6+ characters long, and it works fine.
words = (text).replace(/[^a-zA-Z\s]/g,function(str){return '';});
words = words.match(/\w{6,}/g);
console.log(words);
This will return:
["loyalty","Family","Prince","Charles"]
The problem is that sometimes, a tag should be a specific set of words. I need the result to be:
["loyalty","Royal Family","Príncipe Charles"]
That means, that the replace/match code should test for:
- words that are 6 characters long (or more); and/or
- if a set of words starts with an uppercase letter, those words should be joined together in the same array element. It doesn't matter if some of the words are less than 6 characters long - but at least one of them has to be 6+, e.g.: "Stop at The UK Guardián in London" should return ["The UK Guardián", "London"]
I'm obviously having trouble in the second requirement. Any ideas? Thanks!