0

I've got an expression held in a JS string and I want to split it into tokens. The string could contain any symbols or characters (its actually a string expression)

I've been using

expr.split(/([^\"]\S*|\".+?\")\s*/)

But when I get a text symbol outside of quotes it splits it wrongly.

e.g. When

expr = "Tree = \"\" Or Tree = \"hello cruel world\" + \" and xyz\""

Then The OR gets mixed in with the following string.

Splitting on \b seems to be the way to go (is it?) but I don't know how to keep the strings in quotes together. So ideally in the above I'd get:

Tree
=
\"\"
Or
Tree
=
\"Hello cruel world\"
+
\" and xyz\"

I suppose ideally I would find a tokenizer but if I could do it in regex that would be a major headache solved :)

thanks

5
  • 1
    The + means at least 1 character, so it will never match "". Try a *. Commented Feb 9, 2016 at 15:32
  • short notice: i find regex101.com verry helpfull when i encounter problems with regex, as it gives short explanations on the regex parts Commented Feb 9, 2016 at 15:34
  • Thanks both, Kenneys answer looks promising, I'm just testing it out Commented Feb 9, 2016 at 15:36
  • 2
    Note that \b won't solve your problem here if you want to retrieve the = as a match : since = isn't a word character, there won't be any word-boundary around it. Commented Feb 9, 2016 at 15:44
  • My whole app just ran through kenney's answer with a zillion different cases and it works ! Kenney if you'd like to post it as an answer I'll mark it as answer - thank you. Commented Feb 9, 2016 at 15:45

1 Answer 1

1

A simpler approach is to use .match() instead of .split() and match the characters between the quotes or groups of non-whitespace characters using an alternation:

/"[^"]+"|\S+/g

Explanation:

  • "[^"]+" - Match one or more non-" characters between the double quotes..
  • | - Alternation
  • \S+ - ...or match groups of one or more non-whitespace characters

Usage:

var string = 'Tree = \"\" Or Tree = \"hello cruel world\" + \" and xyz\"';
var result = string.match(/"[^"]+"|\S+/g);

document.querySelector('pre').textContent = JSON.stringify(result, null, 4);
<pre></pre>

Sign up to request clarification or add additional context in comments.

1 Comment

Is there a way to tweak this regex so that the strings with multiple words don't include the quotes, or would I need to postprocess the results?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.