1

I am looking for a regular expression to split a string on commas. Sounds very simple, but there is another restriction. The parameters on the string could have commas surrounded by parenthesis which should not split the string.

Example:

1, 2, 3, add(4, 5, 6), 7, 8
 ^  ^  ^      !  !   ^  ^

The string should only be splitted by the commas marked with ^ and not with !.

I found a solution for it here: A regex to match a comma that isn't surrounded by quotes

Regex:

,(?=([^\(]*\([^\)]*\))*[^\)]*$)

But my string could be more complex:

1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11
 ^  ^  ^      !  !      !  !   !   ^   ^

For this string the result is wrong and i have no clue how to fix this or if it even is possible with regular expressions.

Have anyone an idea how to resolve this problem?

Thanks for your help!

5
  • 1
    By all means try to avoid using ,(?=([^\(]*\([^\)]*\))*[^\)]*$), it is a means of last resort. Commented Nov 16, 2016 at 9:56
  • are the escape only the parenthesis or do you use the keyword add(...)? Commented Nov 16, 2016 at 10:04
  • @LoicM. "add" is just an example here and could be anything else. The main point is that commas in parenthesis should NOT split the string! Commented Nov 16, 2016 at 10:14
  • You will have to write a parser Commented Nov 16, 2016 at 10:27
  • Agree with @TheLostMind. A regex solution will be too complex to be something you want. Commented Nov 16, 2016 at 10:32

2 Answers 2

2

Ok, I think a regular expression is not very useful for this. A small block of java might be easier.

So this is my java code for solving the problem:

public static void splitWithJava() {
    String EXAMPLE = "1, 2, 3, add(4, 5, add(7, 8), 6), 7, 8";
    List<String> list = new ArrayList<>();
    int start = 0;
    int pCount = 0;
    for (int i = 0; i < EXAMPLE.length(); i++) {
      char c = EXAMPLE.charAt(i);
      switch (c) {
      case ',': {
        if (0 == pCount) {
          list.add(EXAMPLE.substring(start, i).trim());
          start = i + 1;
        };
        break;
      }
      case '(': {
        pCount++;
        break;
      }
      case ')': {
        pCount--;
        break;
      }
      }
    }
    list.add(EXAMPLE.substring(start).trim());
    for (String str : list) {
      System.out.println(str);
    }
  }
Sign up to request clarification or add additional context in comments.

Comments

1

You can also achieve this using this regex: ([^,(]+(?=,|$)|[\w]+\(.*\)(?=,|$))

regex online demo

Considering this text 1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11 it creates groups based on commas (not surrounded by ())

So, the output would be:

Match 1
Group 1.    0-1    `1`

Match 2
Group 1.    2-4    ` 2`

Match 3
Group 1.    5-7    ` 3`

Match 4
Group 1.    9-35    `add(4, 5, add(6, 7, 8), 9)`

Match 5
Group 1.    36-39    ` 10`

Match 6
Group 1.    40-43    ` 11`

2 Comments

I asked for a regEx solution and your answer looks pretty good at my tests. So i vote yours as the correct answer. But my tests show also that the java method i posted as an answer is a little bit faster than this. Therefore i will use the java method to parse the string.
black magic.. I was trying to achieve this with a recursive pattern when I threw in the towel and found this. Bravo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.