1

i am using regex to check correctness of the string in my application. I want to check if string has a following pattern: x=y&a=b&... x,y,a,b etc. can be empty.

Example of correct strings:

abc=def&gef=cda&pdf=cdf
=&gef=def
abc=&gef=def
=abc&gef=def

Example of incorrect strings:

abc=def&gef=cda&
abc=def&gef==cda&
abc=defgef=cda&abc=gda

This is my code showing current solution:

    String pattern = "[[a-zA-Z0-9]*[=]{1}[a-zA-Z0-9]*[&]{1}]*";
    if(!Pattern.matches(pattern, s)){
        throw new IllegalArgumentException(s);
    }

This solution is bad because it accepts strings like:

abc=def&gef=def&

Can anyone help me with correct pattern?

2
  • No, string is containing only letters + numbers or is empty Commented May 21, 2017 at 19:46
  • =&= would be correct Commented May 21, 2017 at 19:48

3 Answers 3

3

You may use the following regex:

^[a-zA-Z0-9]*=[a-zA-Z0-9]*(?:&[a-zA-Z0-9]*=[a-zA-Z0-9]*)*$

See the regex demo

When used with matches(), the ^ and $ anchors may be omitted.

Details:

  • ^ - start of string
  • [a-zA-Z0-9]* - 0+ alphanumeric chars (may be replaced with \p{Alnum})
  • = - a = symbol
  • [a-zA-Z0-9]* - 0+ alphanumeric chars
  • = - a = symbol
  • (?: - start of a non-capturing group matching sequences of...
    • & - a & symbol
    • [a-zA-Z0-9]*=[a-zA-Z0-9]* - same as above
  • )* - ... zero or more occurrences
  • $ - end of string

NOTE: If you want to make the pattern more generic, you may match any char other than = and & with a [^&=] pattern that would replace a more restrictive [a-zA-Z0-9] pattern:

^[^=&]*=[^=&]*(?:&[^=&]*=[^=&]*)*$

See this regex demo

Sign up to request clarification or add additional context in comments.

Comments

1

I believe you want this.

([a-zA-Z0-9]*=[a-zA-Z0-9]*&)*[a-zA-Z0-9]*=[a-zA-Z0-9]*

This matches any number of repetitions like x=y, with a & after each one; followed by one repetition like x=y without the following &.

1 Comment

Quantifying the pattern start causes more redundant backtracking than when quantifying the pattern end.
1

Here you go:

^\w*=\w*(?:&(?:\w*=\w*))*$
  • ^ is the starting anchor
  • (\w*=\w*) is to represent parameters like abc=def
    • \w matches a word character [a-zA-Z0-9_]
    • \w* represents 0 or more characters
  • & represents tha actual ampersand literal
  • (&(\w*=\w*))* matches any subsequents parameters like &b=d etc.
  • $ represents the ending anchor

Regex101 Demo

EDIT: Made all groups non-capturing.

Note: As @WiktorStribiżew has pointed out in the comments, \w will match _ as well, so above regex should be modified to exclude underscores if they are to be avoided in the pattern, i.e [A-Za-z0-9]

2 Comments

\w also matches _ symbols.
Valid point @WiktorStribiżew, would have to change all \w to [A-Za-z0-9] instead. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.