2

Tokens are separated by 1 or more spaces. "A quoted string" is a single token. Anything else not beginning with a quote is a token. I tried and failed with:

var tokenre = /"[^"]*"|[^"]\S+|\s\s*/g;

For instance I want this input

[4,4]  "This is fun"
 2  2 +
 #

To tokenize as

['[4,4]', '  ', '"This is fun"', '\n ', '2', '  ', '2', ' ', '+', '\n ', '#']

This could be tested with the following code:

var result = null;
do {
    result = tokenre.exec (program);
    console.log (result);
} while (result != null);
3
  • While this is for Java, I think this question might be helpful to you: stackoverflow.com/questions/366202/… Commented Jul 17, 2018 at 17:58
  • 2
    Looks like you need .match(/"[^"]*"|\S+|\s+/g), please check if it is in line with your requirements. Commented Jul 17, 2018 at 17:59
  • Your comment works. Thank you. I will say so in an answer. Commented Jul 17, 2018 at 19:50

1 Answer 1

3

It seems you want to tokenize a string into whitespace and non-whitespace char chunks, but also separate "..." like substrings between quotes into separate elements.

You may achieve it using

s.match(/"[^"]*"|\S+|\s+/g)

See the regex demo.

Details

  • "[^"]*" - a ", then any 0+ chars other than a quote, and then a " (NOTE: to match regular escape sequences, you need to replace it with "[^"\\]*(?:\\[\s\S][^"\\]*)*")
  • | - or
  • \S+ - 1+ non-whitespace chars
  • | - or
  • \s+ - 1+ whitespace chars.

JS demo:

var s = "[4,4]  \"This is fun\"\n2  2 +\n#";
console.log(s.match(/"[^"]*"|\S+|\s+/g));

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.