I'm parsing a dataset which annoyingly decided to comma-delimit items in a TSV (PharmaGKB pathways, I'm looking at you), but allow commas in each logical element.
Basically, comma-followed-by-space means no delimit, while comma followed by character means new element.
"This is one, element,two element, three element"
Should be:
- This is one, element
- two element
- three element
I have the regex a.split(",\\S+");, which splits fine but it removes the first character after every split.
- This is one, element
- wo element
- hree element
Regex is like going to the dentist for me, help is much appreciated.