3

I'm parsing a dataset which annoyingly decided to comma-delimit items in a TSV (PharmaGKB pathways, I'm looking at you), but allow commas in each logical element.

Basically, comma-followed-by-space means no delimit, while comma followed by character means new element.

"This is one, element,two element, three element"

Should be:

  • This is one, element
  • two element
  • three element

I have the regex a.split(",\\S+");, which splits fine but it removes the first character after every split.

  • This is one, element
  • wo element
  • hree element

Regex is like going to the dentist for me, help is much appreciated.

1
  • 1
    To anyone reading this -- please note that the given example is incorrect. The actual output should be "This is one, element" "two element, three element". There is a space after the comma between "two element" and "three element" and thus it should not be treated as a delimiter. Commented Apr 2, 2015 at 18:35

1 Answer 1

5

Positive lookaheads:

a.split(",(?=\\S)");
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.