3

I am splitting equation string into string array like this:

String[] equation_array = (equation.split("(?<=[-+×÷)(])|(?=[-+×÷)(])"));

Now for test string:

test = "4+(2×5)"

result is fine:

test_array = {"4", "+", "(", "2",...}

but for test string:

test2 = "(2×5)+5"

I got string array:

test2_array = {"", "(", "×",...}.

So, problem is why does it add an empty string before ( in array after splitting?

2
  • 1
    Regular expressions aren't the right tool for this; a parser like ANTLR would be better for you. Commented Oct 4, 2013 at 14:35
  • Or at least a recursive descent parser and hand-written scanner. Regular expressions are entirely the wrong tool for this job. Commented Feb 25 at 0:50

4 Answers 4

2

This is actually known behavior in Java regex.

To avoid this empty result use this negative lookahead based regex:

String[] equation_array = "(2×5)+5".split("(?!^)((?<=[-+×÷)(])|(?=[-+×÷)(]))");
//=> ["(", "2", "×", "5", ")", "+", "5"]

What (?!^) means is to avoid splitting at line start.

Sign up to request clarification or add additional context in comments.

4 Comments

You're welcome. See this Q&A for some discussion over this behavior: stackoverflow.com/questions/18870699/…
Just FYI, it doesn't have to be a lookbehind; (?!^) works just as well. Recall that ^ itself is a zero-width assertion that matches a position that's not preceded by any character. It doesn't need to move backward to do is work, it only has to look at the preceding character (or lack of one). That means you can use this technique in JavaScript, too.
@AlanMoore is correct even "(2×5)+5".split("(?!^)((?<=[-+×÷)(])|(?=[-+×÷)(]))"); will work just fine.
You didn't really need to switch to a lookahead. The lookbehind version works just as well, and it's more intuitive (i.e. less likely to confuse people).
0

You can add condition that not to split if before token is start of string like

"(?<=[-+×÷)(])|(?<!^)(?=[-+×÷)(])"
               ^^^^^^

Comments

0

What about looking backwards to make sure we're not at the start of the string, and looking forwards to make sure we're not at the end?

"(?<=[-+×÷)(])(?!$)|(?<!^)(?=[-+×÷)(])"

Here ^ and $ are start and end of string indicators and (?!...) and (?<!...) are negative lookahead and lookbehind.

Comments

0

problem is why does it add an empty string before ( in array after splitting?

Because for the input (2×5)+5 the regex used for splitting matches right at the start-of-string because of the positive look ahead (?=[-+×÷)(]).

(2×5)+5
↖

It matches right here before the (, resulting in an empty string: "".

My advice would be not to use regular expressions to parse mathematical expressions, there are more suitable algorithms for this.

2 Comments

"My advise would be not to use regular expressions to parse mathematical expressions, there are more suitable algorithms for this." Like which ones?
@MarkoSerbia Check this, this and this. You can Google for more but they are fine on their own.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.