0

i have this html string:

this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too

i want to split it and have result array like this :

this simple 
the<b>html string<b>
text test 
that<b>need</b>to<b>spl</b>it
it too

i tried this way :

     var string ='this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
     var regex =  XRegExp('((?:[\\p{L}\\p{Mn}]+|)<\\s*.*?[^>]*>.*?<\/.*?>(?:[\\p{L}\\p{Mn}]+|))', "g");
 
    result = string.split(regex);

it didn't work i don't want split word by word is there way to do it ...

6
  • 2
    You try to to split it at what condition?! Commented Aug 15, 2020 at 15:25
  • yes i want to match whole words that contain multi tag or one tag and split the string as shown in in array i provide Commented Aug 15, 2020 at 16:00
  • That makes no sense, you have word the in two "object arrays" that have no tags around it. And it Commented Aug 15, 2020 at 16:15
  • string.split(/(?:^|\s+)([^\s<>]+(?:\s+[^\s<>]+)*)(?:\s+|$)/).filter(Boolean) (demo) Commented Aug 15, 2020 at 16:30
  • string.split(/((?<=\s)\w+<\w>.*?<\/\w>.*?(?=\s))/); - You can also try this. Commented Aug 15, 2020 at 16:31

1 Answer 1

1

Use

string.split(/\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/).filter(Boolean);

Capturing group will enable saving the matches as part of the resulting array.

REGEX EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^\s<>]+                 any character except: whitespace (\n,
                             \r, \t, \f, and " "), '<', '>' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      [^\s<>]+                 any character except: whitespace (\n,
                               \r, \t, \f, and " "), '<', '>' (1 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))

JavaScript:

const string = 'this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
const regex= /\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/;
console.log(string.split(regex).filter(Boolean));

Output:

[
  "this simple",
  "the<b>html string</b>",
  "text test",
  "that<b>need</b>to<b>spl</b>it",
  "it too"
]
Sign up to request clarification or add additional context in comments.

6 Comments

what if tag contain values or attributes like :"the<b style ='color:red'>html string</b>",
and also what if string had only this string : "the<b class ='test test2>html string</b>" i want also to get it in regx
@جومارتميرزا Try string.split(/\s*((?:[^\s<]*<\w[^>]*>[\s\S]*?<\/\w[^>]*>)+[^\s<]*)\s*/)
thank you again for you concern this really what i want it really solved big issues with me thanks again
i need help plz what if the match contain new line like this regex101.com/r/20zEyO/3
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.