5

I have an input field where both regular text and sprintf tags can be entered.

Example: some text here. %1$s done %2$d times

How do I validate the sprintf parts so its not possible them wrong like %$1s ? The text is utf-8 and as far as I know regex only match latin-1 characters.

www.regular-expressions.info does not list /u anywhere, which I think is used to tell that string is unicode.

Is the best way to just search the whole input field string for % or $ and if either found then apply the regex to validate the sprintf parts ?

I think the regex would be: /%\d\$(s|d|u|f)/u

3 Answers 3

8

I originally used Gumbo's regex to parse sprintf directives, but I immediately ran into a problem when trying to parse something like %1.2f. I ended up going back to PHP's sprintf manual and wrote the regex according to its rules. By far I'm not a regex expert, so I'm not sure if this is the cleanest way to write it:

/%(?:\d+\$)?[+-]?(?:[ 0]|'.{1})?-?\d*(?:\.\d+)?[bcdeEufFgGosxX]/
Sign up to request clarification or add additional context in comments.

Comments

2

The UTF-8 modifier is not necessary unless you use UTF-8 in your pattern. And beside that the sprintf format is more complex, try the following

/%(?:\d+\$)?[dfsu]/

This would match both the %s and %1$s format.

But if you want to check every occurrence of % and whether a valid sprintf() format is following, regular expressions would not be a good choice. A sequential parser would be better.

2 Comments

A sequential parser ? I can use preg_match_all to find all %-words but Im having problems stopping it at the first space or EOL. Using the above example will I get an array with two entries: [0]="%1$s done ", [1]="%2$d times". $realRegEx=Explode(" ",[0]) works but there must be some way with regex.
'/(?<!%)%(?:\d+\$)?[dfsu]/' will prevent an escaped percent, e.g. '%%s' from being caught.
0

This is what I ended up with, and its working.

// Always use server validation even if you have JS validation
if (!isset($_POST['input']) || empty($_POST['input'])) {
  // Do stuff
} else {
  $matches = explode(' ',$_POST['input']);
  $validInput = true;

  foreach ($matches as $m) {
    // Check if a slice contains %$[number] as it indicates a sprintf format
    if (preg_match('/[%\d\$]+/',$m) > 0) {
      // Match found. Now check if its a valid sprintf format
      if ($validInput === false || preg_match('/^%(?:\d+\$)?[dfsu]$/u',$m)===0) {   // no match found
        $validInput = false;
        break; // Invalid sprintf format found. Abort
      }
    }
  }

  if ($validInput === false) {
    // Do stuff when input is NOT valid
  }
}

Thank you Gumbo for the regex pattern that matches both with and without order marking.

Edit: I realized that searching for % is wrong, since nothing will be checked if its forgotten/omitted. Above is new code.

"$validInput === false ||" can be omitted in the last if-statement, but I included it for completeness.

3 Comments

You should change the first regex to “/%[^\s]*/” since the formats can also accur at the end of a string therefor having no following whitespace. And the second should be changed to “/^%(?:\d+\$)?[dfsu]$/” otherwise “%%1$s” would also be accepted as valid.
So the text “10 apples cost $4” would have two invalid sprintf formats, as both “10” and “$4” would match the first but not the second regex. That’s not a good idea, isn’t it?
No, its okay since that text should be "%u %s cost %s".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.