0

I'm using regex to control an input and I want to get the exact index of the wrong char.

My regex is :

^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?

If I type the following input :

DATE/201A08

Then macher.group() (using lookingAt() method) will return "DATE" instead of "DATE/201". Then, I can't know that the wrong index is 9.

2
  • There is no way to know. A close, but not guarantee solution is to make every token optional in the form A(B(C(D)?)?)?. Commented Dec 2, 2014 at 12:06
  • Thanks for your response but the result is the same. Commented Dec 2, 2014 at 12:13

2 Answers 2

1

If I read this right, you can't do this using only one regex. ^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])? assumes either a String starting with 1 to 4 characters followed by nothing, or followed by / and exactly 6 digits. So it correctly parses your input as "DATE" as it is valid according to your regex.

Try to split this into two checks. First check if it's a valid DATE Then, if there's an actual / part, check this against the non-optional pattern.

Sign up to request clarification or add additional context in comments.

5 Comments

But the question is if I have a pattern as this: '[1-2][0-9][0-9][0-9]' and I have the follow input 20Y4. How can I detect that the wrong index is 3.
as nhahtdh suggested, have each token be optional and look at the result. [1-2]?[0-9]?[0-9]?[0-9]? should result in 20 for your example so you know that there's a non-digit at position 3.
Thanks for the suggestion but in this case the year 999 will be match and it's not what I want. And my big problem is that, I have many different patterns attached to a different inputs. I'm looking for a generic solution if there is one.
Then try and see if you drop Regex and iterate of the characters and check for things like Character.isDigit(char ch) Maybe there's a pure Regex solution but I don't know it and it starts to remind me of the epic "Parse HTML with Regex" Thread.
It rather reminds of this quote from Jamie Zawinski: Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.
1

You want to know whether the entire pattern matched, and when not, how far it matched.

There regex fails. A regex test must succeed to give results in group(). If it also succeeds on a part, one does not know whether all was matched.

The sensible thing to do is split the matching.

public class ProgressiveMatch {

    private final String[] regexParts;
    private String group;

    ProgressiveMatch(String... regexParts) {
        this.regexParts = regexParts;
    }

    // lookingAt with (...)?(...=)?...
    public boolean lookingAt(String text) {
        StringBuilder sb = new StringBuilder();
        sb.append('^');
        for (int i = 0; i < regexParts.length; ++i) {
            String part = regexParts[i];
            sb.append("(");
            sb.append(part);
            sb.append(")?");
        }
        Pattern pattern = Pattern.compile(sb.toString());
        Matcher m = pattern.matcher(text);
        if (m.lookingAt()) {
            boolean all = true;
            group = "";
            for (int i = 1; i <= regexParts.length; ++i) {
                if (m.group(i) == null) {
                    all = false;
                    break;
                }
                group += m.group(i);
            }
            return all;
        }
        group = null;
        return false;
    }

    // lookingAt with multiple patterns
    public boolean lookingAt(String text) {
        for (int n = regexParts.length; n > 0; --n) {
            // Match for n parts:
            StringBuilder sb = new StringBuilder();
            sb.append('^');
            for (int i = 0; i < n; ++i) {
                String part = regexParts[i];
                sb.append(part);
            }
            Pattern pattern = Pattern.compile(sb.toString());
            Matcher m = pattern.matcher(text);
            if (m.lookingAt()) {
                group = m.group();
                return n == regexParts.length;
            }
        }
        group = null;
        return false;
    }

    public String group() {
        return group;
    }
}

public static void main(String[] args) {
    // ^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?
    ProgressiveMatch match = new ProgressiveMatch("[A-Z]{1,4}", "/",
            "[1-2]", "[0-9]", "[0-9]", "[0-9]", "[0-1]", "[0-9]");
    boolean matched = match.lookingAt("DATE/201A08");
    System.out.println("Matched: " + matched);
    System.out.println("Upto; " + match.group());
}

One could make a small DSL in java, like:

    ProgressiveMatch match = ProgressiveMatchBuilder
         .range("A", "Z", 1, 4)
         .literal("/")
         .range("1", "2")
         .range("0", "9", 3, 3)
         .range("0", "1")
         .range("0", "9")
         .match();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.