If you are passing a CSV file, some of your values may have got double-quotes around them, so you may need something a little more complicated. For example:
Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)");
Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL");
while (m.find()) {
System.out.println( m.group(1));
}
or in Groovy:
java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)')
.matcher("11,=\"12,345\",ABC,,JKL")
.iterator()
.collect { it[1] }
This code handles:
- blank lines (with no values or commas on them)
- empty columns, including the last column being empty
- handles values wrapped in double-quotes, including commas inside the double-quotes
- but does not handle two double-quotes used for escaping a double quote-itself
The pattern consists of:
(?:^|,) matches the start of the line or a comma after the last column, but does not add that to the group
((?:[^",]|"[^"]*")*) matches the value of the column, and consists of:
a collecting group, which collects zero or more characters that are:
[^",] is a character that's not a comma or a quote
"[^"]*" is a double-quote followed by zero or more other characters ending in another double-quote
those are or-ed together, using a non-collecting group: (?:[^",]|"[^"]*")
- use a
* to repeat the above any number of times: (?:[^",]|"[^"]*")*
- and into a collecting group to give the columns value:
((?:[^",]|"[^"]*")*)
Doing escaping of double quotes is left as an exercise to the reader
a.split(',')?