1

I am trying to make a utility function that converts a file pattern to a java regular expression pattern, I need this to make a wildcard matching of files inside the directory. I came up with 4 cases that needs to be consider. Are the case sufficient enough?

    regexPattern = filePattern;
    // convert windows backslash to slash
    regexPattern = regexPattern.replace("\\", "/");
    // convert dot to \\.
    regexPattern = regexPattern.replace("\\.", "\\\\.z");
    // convert ? wildcard to .+
    regexPattern = regexPattern.replace("?", ".+");
    // convert * wildcard to .*
    regexPattern = regexPattern.replace("*", ".*");
3
  • 1
    Are you limiting yourself to Windows command shell globbing or trying to support Unix-based shell file glob patterns? Commented Feb 26, 2015 at 4:36
  • Use regexPattern = regexPattern.replace("?", "."); as ? matches single character, same as dot in regular expressions. .+ matches one or more characters Commented Feb 26, 2015 at 5:05
  • @ewh I am also trying to match Unix based shell file pattern. Commented Feb 26, 2015 at 6:14

2 Answers 2

7

Someone already did this: http://www.rgagnon.com/javadetails/java-0515.html

As you see other reserved regex characters (described in What special characters must be escaped in regular expressions? i.e. .^$*+?()[{\|) also has to be escaped, not only dot.

The approach to parse character by character is safer than using String#replace(..) method. In latter case you have to be careful about the order of replacements so that you do not replace something you already did (imagine what happens if in your example you first replace dot with \\. and then windows backslash to slash).

However, I am afraid the example does not work for all cases. It is because syntax for globs is various across implementations, see wikipedia entry.

For simple windows cmd patterns the code would be:

public static String wildcardToRegex(String wildcard){
    StringBuffer s = new StringBuffer(wildcard.length());
    s.append('^');
    for (int i = 0, is = wildcard.length(); i < is; i++) {
        char c = wildcard.charAt(i);
        switch(c) {
            case '*':
                s.append(".*");
                break;
            case '?':
                s.append(".");
                break;
            case '^': // escape character in cmd.exe
                s.append("\\");
                break;
                // escape special regexp-characters
            case '(': case ')': case '[': case ']': case '$':
            case '.': case '{': case '}': case '|':
            case '\\':
                s.append("\\");
                s.append(c);
                break;
            default:
                s.append(c);
                break;
        }
    }
    s.append('$');
    return(s.toString());
}

This does not handle well escaping of other characters than * and ? (^w should be converted into w and not '\w` which has special meaning in regex) but you can easily improve that.

Sign up to request clarification or add additional context in comments.

Comments

2

FileSystem.getPathMatcher(String) supports glob syntax.

From the "Finding Files" tutorial:

PathMatcher matcher =
    FileSystems.getDefault().getPathMatcher("glob:*.{java,class}");

Path filename = ...;
if (matcher.matches(filename)) {
    System.out.println(filename);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.