23

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:

String [] parts = input.split(",");

And works great for input like:

a,b,c

Or

type=simple, output=Hello, repeat=true 

Just to say something.

How can I escape the comma, so it doesn't match intermediate commas?

For instance, if I want to include a comma in one of the parts:

type=simple, output=Hello, world, repeate=true

I was thinking in something like:

type=simple, output=Hello\, world, repeate=true

But I don't know how to create the split to avoid matching the comma.

I've tried:

String [] parts = input.split("[^\,],");

But, well, is not working.

3
  • I'll upvote your question in 2 hours (I'm out of votes for today!) Commented Feb 10, 2011 at 21:43
  • Guava Issue 412:Add escape functionality to Joiner and Splitter github.com/google/guava/issues/412 Commented Feb 10, 2011 at 22:38
  • I have created generic string splitter. Please refer stackoverflow.com/a/67707356/730676 Commented May 26, 2021 at 14:56

4 Answers 4

31

You can solve it using a negative look behind.

String[] parts = str.split("(?<!\\\\), ");

Basically it says, split on each ", " that is not preceeded by a backslash.

String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
    System.out.println(s);

Output:

type=simple
output=Hello\, world
repeate=true

(ideone.com link)


If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:

String[] parts = str.split(", (?=\\w+=)");

Which says split on each ", " which is followed by some word-characters and an =

(ideone.com link)

Sign up to request clarification or add additional context in comments.

8 Comments

Quite an interesting answer, but not to his question How can I escape the comma, so it doesn't match intermediate commas?
Nice solution, but IMHO for the wrong problem. While such a string can get parsed using this, it'll fail one day with somebody saying type=simple, output=Hello, world, repeat=until tomorrow, or maybe until 0=1. I'd suggest a proper escaping mechanism instead of being too smart.
da daaaa!! Thank you... here's my testing code System.out.println( Arrays.toString( args[0].split("(?<!\\\\), "))); tried with: "a,b\,a,c" produces: [a,b\,a,c]
It works, however it fails to split on commas preceded by an escaped backslash like in "type=simple\\\\, output=Hello\\, world\\\\, repeate=true". This would require an unlimited lookbehind, which doesn't work in Java. That's why I said, that there's no perfect solution for String.split.
Can't seem to find any simple solution to this, except to "preprocess" the input string, and temporarily replace any escaped backslash with some dummy character.
|
5

I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe

final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));

You'll probably want to skip the spaces after the comma as well:

final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");

It's not really complicated, just note that you need four backslashes in order to match one.

5 Comments

It's easy: The group is a sequence consisting of 1. normal chars (i.e. any except backslash and comma) and 2. any escaped char (i.e. backslash followed by anything). The remainder is either the separating comma or the end anchor.
If you go the Pattern/Matcher route, there should be a simpler matcher.find solution that can find one key/value pair at a time, no?
Simpler? I don't think that mine is complicated, it just looks terrible because of those backslashes. Neither I think it could be done any simpler, but I may err or you may understand the question in a different way. Concerning the key/value pairs: I ignored them and did just the split.
I have the strange effect, that there is always an extra, empty element at the end of the splitted list. The reason is probably, that when the 'cursor' it at the end of the string, the empty string still matches (so, "" matches the given pattern, too). One solution is to add a && m.start(1) != s.length() to the wile condition. Or is there a simple solution by tuning the pattern itself ?
Omitting |$ from my pattern should do, but for strings like a,b, you may want to get the final empty string. Maybe a look-behind for the last comma could help.
4

Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind

final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
    System.out.println("'" + item.replace("\\,", ",") + "'");
}

Output:

'type=simple'
'output=Hello, world'
'repeate=true'

Reference:

Comments

0

I think

input.split("[^\\\\],");

should work. It will split at all commas that are not preceeded with a backslash. BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

1 Comment

This is nearly right, but not perfect as it doesn't allow escaping backslashes. It'll eat the character before backslash. A lookbehind would do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.