3

I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:

\\server\something

... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.

I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:

$regex = "/\\\\/s";

or apparently this also:

$regex = "/[\\]/s";

However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.

To see if I could match two backslashes I used the following to test:

$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path)) 
{
    echo "matched";
}
else
{
    echo "not matched";
}

The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server") match?

I thought perhaps the following would work:

$regex = "/[\\][\\]/s";

and again, no :(

Please help before I jump out a window lol :)

0

2 Answers 2

6

Use this little gem:

$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!@#$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!@#$%^&(){}\'._-]+)*)+$=s';

Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)

The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)


Sidenote on the backslashes issue:

  • When you use double quotes (") to enclose your string, you must be aware of PHP special character escaping.. "\\" is a single \ in PHP.

  • Important: even with single quotes (') those backslashes must be escaped.
    A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:
    1. A backslash followed by a backslash (\\) is interpreted as a single backslash.
      ('C:\\*.*' => C:\*.*)
    2. A backslash followed by a single-quote (\') is interpreted as a single quote.
      ('I\'ll be back' => I'll be back)
    3. A backslash followed by anything else is interpreted as a backslash.
      ('Just a \ somewhere' => Just a \ somewhere)

  • Also, you must be aware of PCRE escape sequences.
    The RegEx parser treats \ for character classes, so you need to escape it for RegEx, again.
    To match two \\ you must write $regex = "\\\\\\\\" or $regex = '\\\\\\\\'

    From the PHP docs on PCRE escape sequences:

    Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.


Regarding your Question:

why would an input of 3 backslashes ($path = "\\\server") match with regex "/\\\\\\\\/s"?

The reason is that you have no boundaries defined (use ^ for beginning and $ for end of string), thus it finds \\ "somewhere" resulting in a positive match. To get the expected result, you should do something like this:

$regex = '/^\\\\\\\\[^\\\\]/s';

The RegEx above has 2 modifications:

  • ^ at the beginning to only match two \\ at the beginning of the string
  • [^\\] negative character class to say: not followed by an additional backslash

Regarding your last RegEx:

$regex = "/[\\][\\]/s";

You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s" is interpreted by PHP to /[\][\]/s, which will let the RegEx fail because \ is a reserved character in RegEx and thus must be escaped.

This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:

$regex = '/[\\\\][\\\\]/s';
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for your explanation :) I tried the regex from regexlib but it broke things. In Notepad++ a regex encapsulated by '' is usually entirely grey. This is grey, up to: $UNC_regex = '=^\\\[a-zA-Z0-9-]+\[a-zA-Z0-9~!@#$%^&(){}' characters after this are coloured: ._-]+([ ]+[a-zA-Z0-9~!@#$%^&(){}'._-]+)*$=s'; Additionally, taking into consideration what you've said, why do these all report not matched? I would have thought this would match the second if/else: pastebin.com/BSnJrnFQ
Haha, thanks Kaii! Now my understanding matches up with my experience / results. Would I be right in thinking that the $UNC_regex actually has to be modified to: $regex = '=^\\\\\\\[a-zA-Z0-9-]+\\\[a-zA-Z0-9~!@#$%^&(){}\'._-]+([ ]+[a-zA-Z0-9~!@#$%^&(){}\'._-]+)*$=s'; in order to work? (added additional backslashes). It appears to work when I do this.
It appears that stackoverflow strips a single backslash when pasting in that regex. I do have 8 and 4 in my actual code!
Ok, so now I have: pastebin.com/KSByxEwC - using the longer more complex regex we get "not matched" on all 3 conditions. Using $regex = '/^\\\\\\\\[^\\\\]/s'; it works "correctly" (not matched, matched, not matched) but of course doesn't validate any characters other than slashes.
This answer is incorrect and the top result on Google. According to msdn.microsoft.com/en-us/library/gg465305.aspx the UNC path is allowed to have IPv6 addresses or IPv4 addresses and the reg-name part according to RFC3986 can contain "-" / "." / "_" / "~".
|
3

Echo your regex as well, so you see what's the actual pattern, writing those slashes inside PHP can become akward for the pattern, so you can verify it's correct.

Also you should put ^ at the beginning of the pattern to match from string start and $ to the end to specify that the whole string has to be matched.

\\server\something

Regex:

 ~^\\\\server\\something$~

PHP String:

$pattern = '~^\\\\\\\\server\\\\something$~';

For the repetition, you want to say that a server exists and it's followed by one or more \something parts. If server is like something, this can be simplified:

^\\(?:\\[a-z]+){2,}$

PHP String:

$pattern = '~^\\\\(?:\\\\[a-z]+){2,}$~';

As there was some confusion about how \ characters should be written inside single quoted strings:

# Output:
#
# * Definition as '\\' ....... results in string(1) "\"
# * Definition as '\\\\' ..... results in string(2) "\\"
# * Definition as '\\\\\\' ... results in string(3) "\\\"

$slashes = array(
    '\\',
    '\\\\',
    '\\\\\\',
);

foreach($slashes as $i => $slashed) {
    $definition = sprintf('%s ', var_export($slashed, 1));
    ob_start();
    var_dump($slashed);
    $result = rtrim(ob_get_clean());    
    printf(" * Definition as %'.-12s results in %s\n", $definition, $result);
}

4 Comments

@Kaii: No, don't think so. Proof me wrong, but backslash escaping in single quoted strings should work as shown in the examples, see Strings: Single quoted.
yes, i just realized that myself and reworked my own answer. i was about to roll back my edit, but you already did :)
@Kaii: I added some chunk of code, it's just that the massive amount of slashes can become akward when writing the patterns ... ;)
Thank you hakre, and thank you for helping Kaii help me. Using pastebin.com/KSByxEwC with your regex of $regex = '~^\\\\(?:\\\\[a-z]+){2,}$~'; also results in not matched for all 3 conditions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.