0

I has a string like this:

const string =  'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples';

and now I want to split the string by following keywords:

const keywords = ['John smith',  '100', 'apples', '200', 'oranges', '300'];

now I want to get result like this:

const result = [
  {isKeyword: true, text: 'John Smith'},
  {isKeyword: false, text: 'I want to buy '}, 
  {isKeyword: true, text: '100'}, 
  {isKeyword: true, text:'apples'}, 
  {isKeyword: false, text:'\r\nI want to buy'}, 
  {isKeyword: true, text:'200'},
  {isKeyword: true, text:'oranges'}, 
  {isKeyword: false, text:'\r\n, and add'},
  {isKeyword: true, text:'300'},
  {isKeyword: true, text:'apples'}];

Keywords could be lowercase or uppercase, I want to keep the string in array just the same as string.

I also want to keep the array order as the same as the string but identify the string piece in array whether it is a keyword.

How could I get it?

13
  • Does letter case matter? Your zeroth keyword has a lowercase "smith". Also, pure split? or get rid of the colon too? Commented Jul 7, 2018 at 5:04
  • keywords could be uppercase or lowercase, but I want to keep the original string format@SamyokNepal Commented Jul 7, 2018 at 5:15
  • I'm not sure what the pattern is here. Your keywords ask for 100, apples, etc but your results contain seemingly random results. For one, where did the colon go? Secondly, why is "I want to buy" in the result? etc Commented Jul 7, 2018 at 5:15
  • 1
    Um..are the keywords in order? Is each keyword guaranteed to be in the string only once? What have you tried so far? SO is not a code-writing service. Commented Jul 7, 2018 at 5:17
  • You just edited it--but where did the colon go?? Commented Jul 7, 2018 at 5:24

2 Answers 2

2

I would start by finding the indexes of all your keywords. From this you can make you can know where all the keywords in the sentence start and stop. You can sort this by the index of where the keyword starts.

Then it's just a matter of taking substrings up to the start of the keywords -- these will be the keyword: false substrings, then add the keyword substring. Repeat until you are done.

const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];

// find all indexes of a keyword
function getInd(kw, arr) {
  let regex = new RegExp(kw, 'gi'), result, pos = []

  while ((result = regex.exec(string)) != null)
    pos.push([result.index, result.index + kw.length]);
  return pos
}

// find all index of all keywords
let positions = keywords.reduce((a, word) => a.concat(getInd(word, string)), [])
positions.sort((a, b) => a[0] - b[0])

// go through the string and make the array
let start = 0, res = []

for (let next of positions) {
  if (start + 1 < next[0])
    res.push({ isKeyword: false,text: string.slice(start, next[0]).trim()})

  res.push({isKeyword: true, text: string.slice(next[0], next[1])})
  start = next[1]

}
// get any remaining text
if (start < string.length) res.push({isKeyword: false, text: string.slice(start, string.length).trim()})


console.log(res)

I'm trimming whitespace as I go, but you may want to do something different.

If you are willing to pick a delimiter


Here's a much more succinct way to do this if you are willing to pick a set of delimiters that can't appear in your text for example, use {} below

Here we simply wrap the keywords with the delimiter and then split them out. Grabbing the keyword with the delimiter makes it easy to tell which parts of the split are your keywords:

const string =  'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith',  '100', 'apples', '200', 'oranges', '300'];

let res = keywords.reduce((str, k ) => str.replace(new RegExp(`(${k})`, 'ig'), '{$1}'), string)
          .split(/({.*?})/).filter(i => i.trim())
          .map(s =>  s.startsWith('{') 
            ? {iskeyword: true, text: s.slice(1, s.length -1)}
            : {iskeyword: false, text: s.trim()})
            
console.log(res)

Sign up to request clarification or add additional context in comments.

Comments

0

Use a regular expression

rx = new RegExp('('+keywords.join('|')+')')

thus

str.split(rx)

7 Comments

Um, I got this: ["John Smith: I want to buy ", "100", " ", "apples", "I want to buy ", 200", " ", "oranges", ", and add ", "300", " ", "apples", ""]
but How could I know which it is keyword?
Fix 'John smith' in the keywords.
or add i regex flag (for case insensitive matches)
@Lumaskcete I think we all were assuming you had a known set of keywords.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.