0

I'd like to split a string at comma ",". The string contains escaped commas "\," and escaped backslashs "\\". Commas at the beginning and end as well as several commas in a row should lead to empty strings.

So ",,\,\\,," should become "", "", "\,\\", "", ""

Note that my example strings show backslash as single "\". Java strings would have them doubled.

I tried with several packages but had no success. My last idea would be to write my own parser.

2
  • This is my answer from another question that has similar requirement. It handles the case of multiple \ in a row. However, as fge suggested, you might be better off using a library, since my code is written without knowledge of corner cases in CSV format. Commented Feb 11, 2014 at 10:19
  • Thanks for the suggestion. I will have a look at it. Nevertheless, I would like my project to have as few dependencies to additional artifacts as possible (guava and Apache Commons is ok). And probably this issue is the only one that would require that library. So I would prefer not to use it. Commented Feb 11, 2014 at 10:38

4 Answers 4

1

While certainly a dedicated library is a good idea the following will work

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*+,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

Idea is to find , prefixed by no or even-numbered \ (i.e. not escaped ,) and as the , is the last part of the pattern cut at end()-1 which is just before the ,.

Function is tested against most odds I can think of except for null-input. If you like handling List<String> better you can of course change the return; I just adopted the pattern implemented in split() to handle escapes.

Example class uitilizing this function:

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Print {
    public static void main(final String[] args) {
        String input = ",,\\,\\\\,,";
        final String[] strings = splitValues(input);
        System.out.print("\""+input+"\" => ");
        printQuoted(strings);
    }

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*+,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

    public static void printQuoted(final String[] strings) {
        if (strings.length > 0) {
            System.out.print("[\"");
            System.out.print(strings[0]);
            for(int i = 1; i < strings.length; i++) {
                System.out.print("\", \"");
                System.out.print(strings[i]);
            }
            System.out.println("\"]");
        } else {
            System.out.println("[]");
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0

In this case a custom function sounds better for me. Try this:

public String[] splitEscapedString(String s) {
    //Character that won't appear in the string.
    //If you are reading lines, '\n' should work fine since it will never appear.
    String c = "\n";
    StringBuilder sb = new StringBuilder();
    for(int i = 0;i<s.length();++i){
        if(s.charAt(i)=='\\') {
            //If the String is well formatted(all '\' are followed by a character),
            //this line should not have problem.
            sb.append(s.charAt(++i));                
        }
        else {
            if(s.charAt(i) == ',') {
                sb.append(c);
            }
            else {
                sb.append(s.charAt(i));
            }
        }
    }
    return sb.toString().split(c);
}

Comments

0

Don't use .split() but find all matches between (unescaped) commas:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "(?:         # Start of group\n" +
    " \\\\.      # Match either an escaped character\n" +
    "|           # or\n" +
    " [^\\\\,]++ # Match one or more characters except comma/backslash\n" +
    ")*          # Do this any number of times", 
    Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
} 

Result: ["", "", "\\,\\\\", "", ""]

I used a possessive quantifier (++) in order to avoid excessive backtracking due to the nested quantifiers.

Comments

0

I have used below solution for generic sting splitter with quotes(' and ") and escape(\) character.

public static List<String> split(String str, final char splitChar) {
    List<String> queries = new ArrayList<>();
    int length = str.length();
    int start = 0, current = 0;
    char ch, quoteChar;
    
    while (current < length) {
        ch=str.charAt(current);
        // Handle escape char by skipping next char
        if(ch == '\\') {
            current++;
        }else if(ch == '\'' || ch=='"'){ // Handle quoted values
            quoteChar = ch;
            current++;
            while(current < length) {
                ch = str.charAt(current);
                // Handle escape char by skipping next char
                if (ch == '\\') {
                    current++;
                } else if (ch == quoteChar) {
                    break;
                }
                current++;
            }
        }else if(ch == splitChar) { // Split sting
            queries.add(str.substring(start, current + 1));
            start = current + 1;
        }
        current++;
    }
    // Add last value
    if (start < current) {
        queries.add(str.substring(start));
    }
    return queries;
}

public static void main(String[] args) {

    String str = "abc,x\\,yz,'de,f',\"lm,n\"";
    List<String> queries = split(str, ',');
    System.out.println("Size: "+queries.size());
    for (String query : queries) {
        System.out.println(query);
    }
}

Getting result

Size: 4
abc,
x\,yz,
'de,f',
"lm,n"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.