5

I am trying to split a comma separated string using regex.

var a = 'hi,mr.007,bond,12:25PM'; //there are no white spaces between commas
var b = /(\S+?),(?=\S|$)/g;
b.exec(a); // does not catch the last item.

Any suggestion to catch all the items.

1
  • how about a.split(',')? Commented Feb 28, 2013 at 19:26

3 Answers 3

15

Use a negated character class:

/([^,]+)/g

will match groups of non-commas.

< a = 'hi,mr.007,bond,12:25PM'
> "hi,mr.007,bond,12:25PM"
< b=/([^,]+)/g
> /([^,]+)/g
< a.match(b)
> ["hi", "mr.007", "bond", "12:25PM"]
Sign up to request clarification or add additional context in comments.

3 Comments

What if I have to match Juan Gastelum from Juan Gastelum, [email protected], 213-375-3149
Try it and see.
I mean I am getting the list but I just exactly want Juan Gastelum and not list.
7

Why not just use .split?

>'hi,mr.007,bond,12:25PM'.split(',')
["hi", "mr.007", "bond", "12:25PM"]

If you must use regex for some reason:

str.match(/(\S+?)(?:,|$)/g)
["hi,", "mr.007,", "bond,", "12:25PM"]

(note the inclusion of commas).

Comments

1

If you are passing a CSV file, some of your values may have got double-quotes around them, so you may need something a little more complicated. For example:

Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)");

Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL");

while (m.find()) {
    System.out.println( m.group(1));
}

or in Groovy:

java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)')
        .matcher("11,=\"12,345\",ABC,,JKL")
            .iterator()
                .collect { it[1] }

This code handles:

  • blank lines (with no values or commas on them)
  • empty columns, including the last column being empty
  • handles values wrapped in double-quotes, including commas inside the double-quotes
  • but does not handle two double-quotes used for escaping a double quote-itself

The pattern consists of:

  • (?:^|,) matches the start of the line or a comma after the last column, but does not add that to the group

  • ((?:[^",]|"[^"]*")*) matches the value of the column, and consists of:

    • a collecting group, which collects zero or more characters that are:

      • [^",] is a character that's not a comma or a quote
      • "[^"]*" is a double-quote followed by zero or more other characters ending in another double-quote
    • those are or-ed together, using a non-collecting group: (?:[^",]|"[^"]*")

    • use a * to repeat the above any number of times: (?:[^",]|"[^"]*")*
    • and into a collecting group to give the columns value: ((?:[^",]|"[^"]*")*)

Doing escaping of double quotes is left as an exercise to the reader

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.