4

I've tried to find a windows file path validation for Javascript, but none seemed to fulfill the requirements I wanted, so I decided to build it myself.

The requirements are the following:

  • the path should not be empty
  • may begin with x:\, x:\\, \, // and followed by a filename (no file extension required)
  • filenames cannot include the following special characters: <>:"|?*
  • filenames cannot end with dot or space

Here is the regex I came up with: /^([a-z]:((\|/|\\|//))|(\\|//))[^<>:"|?*]+/i

But there are some issues:

  • it validates also filenames that include the special characters mentioned in the rules
  • it doesn't include the last rule (cannot end with: . or space)

var reg = new RegExp(/^([a-z]:((\\|\/|\\\\|\/\/))|(\\\\|\/\/))[^<>:"|?*]+/i);
var startList = [
  'C://test',
  'C://te?st.html',
  'C:/test',
  'C://test.html',
  'C://test/hello.html',
  'C:/test/hello.html',
  '//test',
  '/test',
  '//test.html',
  '//10.1.1.107',
  '//10.1.1.107/test.html',
  '//10.1.1.107/test/hello.html',
  '//10.1.1.107/test/hello',
  '//test/hello.txt',
  '/test/html',
  '/tes?t/html',
  '/test.html',
  'test.html',
  '//',
  '/',
  '\\\\',
  '\\',
  '/t!esrtr',
  'C:/hel**o'
];

startList.forEach(item => {
  document.write(reg.test(item) + '  >>>   ' + item);
  document.write("<br>");
});

1
  • Your regex lacks $ (end of line), so path is matched if some first characters satisfy it. \. is dot, \s is whitespace, remember to escape them for RegExp. Commented Jul 24, 2018 at 9:37

3 Answers 3

11

Unfortunately, JavaScript flavour of regex does not support lookbehinds, but fortunately it does support lookaheads, and this is the key factor how to construct the regex.

Let's start from some observations:

  1. After a dot, slash, backslash or a space there can not occur another dot, slash or backslash. The set of "forbidden" chars includes also \n, because none of these chars can be the last char of the file name or its segment (between dots or (back-)slashes).

  2. Other chars, allowed in the path are the chars which you mentioned (other than ...), but the "exclusion list" must include also a dot, slash, backslash, space and \n (the chars mentioned in point 1).

  3. After the "initial part" (C:\) there can be multiple instances of char mentioned in point 1 or 2.

Taking these points into account, I built the regex from 3 parts:

  • "Starting" part, matching the drive letter, a colon and up to 2 slashes (forward or backward).
  • The first alternative - either a dot, slash, backslash or a space, with negative lookahead - a list of "forbidden" chars after each of the above chars (see point 1).
  • The second alternative - chars mentioned in point 2.
  • Both the above alternatives can occur multiple times (+ quantifier).

So the regex is as follows:

  • ^ - Start of the string.
  • (?:[a-z]:)? - Drive letter and a colon, optional.
  • [\/\\]{0,2} - Either a backslash or a slash, between 0 and 2 times.
  • (?: - Start of the non-capturing group, needed due to the + quantifier after it.
    • [.\/\\ ] - The first alternative.
    • (?![.\/\\\n]) - Negative lookahead - "forbidden" chars.
  • | - Or.
    • [^<>:"|?*.\/\\ \n] - The second alternative.
  • )+ - End of the non-capturing group, may occur multiple times.
  • $ - End of the string.

If you attempt to match each path separately, use only i option.

But if you have multiple paths in separate rows, and match them globally in one go, add also g and m options.

For a working example see https://regex101.com/r/4JY31I/1

Note: I suppose that ! should also be treated as a forbidden character. If you agree, add it to the second alternative, e.g. after *.

Sign up to request clarification or add additional context in comments.

3 Comments

This example incorrectly matches "\\\foo" and "///foo". According to the given requirements "\\foo" is also incorrectly matched, though I would push back on the requirements in this case since it is a valid match for an SMB server. I would also push back on "x:\\" as a valid case unless "x:\\foo\\bar" et al. are also considered valid. Double slashes anywhere other than the start of a path string are in the class "wrong but Windows will (sometimes) forgive your wrongness."
Your first premise is incorrect (on windows) as, for example, C:..\..\folder\file.exe is an entirely valid Windows file/path declaration.
It also allows the '+' and '=' signs. Spaces and/or periods at the end of the path, if not preceded by a '\' or '/', are also not caught...
3

This may work for you: ^(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:"\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$

You have a demo here

Explained:

^
    (?!.*[\\\/]\s+)         # Disallow files beginning with spaces
    (?!(?:.*\s|.*\.|\W+)$)  # Disallow bars and finish with dot/space
    
    (?:[a-zA-Z]:)? # Drive letter (optional)
    
    (?:
          (?:[^<>:"\|\?\*\n])+  # Word (non-allowed characters repeated one or more)
          (?:\/\/|\/|\\\\|\\)?  # Bars (// or / or \\ or \); Optional
     )+ # Repeated one or more
     
$

11 Comments

Your regex allows a trailing space or a space before a dot, but the requirements is that it should not.
At the very least, this permits '+' characters in file names, also doesn't balk at repeated periods or slashes...
Hi @NetXpert. + is a valid character for a filename, and OP didn't state he didn't want that character out. Also, repeated bars are valid. Try to write in explorer C:\\\\\\\Windows for example. It should work.
@Julio -- FWIW, "+" was a file-concatenator operator in DOS, and so it is an invalid character in base (short) file names for the same reason that "<", ">" and "|" are. (See the section "In addition, short file names must not contain the following characters" here learn.microsoft.com/en-us/windows/win32/msi/filename). IMO, if you're invalidating the redirection operators, you should reject the concatenator too. ...and, yes, they've basically compensated for people adding in superfluous backslashes, but that doesn't make them actually "correct".
@Valdi_Bo Where is stated that trailing spaces or space before a dot shouldn't be allowed?
|
2

Since this post seems to be (one of) the top result(s) in a search for a RegEx Windows path validation pattern, and given the caveats / weaknesses of the above proposed solutions, I'll include the solution that I use for validating Windows paths (and which, I believe, addresses all of the points raised previously in that use-case).

I could not come up with a single viable REGEX, with or without look-aheads and look behinds that would do the job, but I could do it with two, without any look-aheads, or -behinds!

Note, though, that successive relative paths (i.e. "..\..\folder\file.exe") will not pass this pattern (though using "..\" or ".\" at the beginning of the string will). Periods and spaces before and after slashes, or at the end of the line are failed, as well as any character not permitted according to Microsoft's short-filename specification: https://learn.microsoft.com/en-us/windows/win32/msi/filename

First Pattern:

^   (?# <- Start at the beginning of the line #)
    (?# validate the opening drive or path delimiter, if present -> #)
        (?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
                (?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
            | (?# or "\", "..\", ".\", "\\" -> #)
                (?:[\/\\]{1,2}|\.{1,2}[\/\\])
        )?
    (?# validate the form and content of the body -> #)
        (?:[^\x00-\x1A|*?\v\r\n\f+\/,;"'`\\:<>=[\]]+[\/\\]?)+
$   (?# <- End at the end of the line. #)

This will generally validate the path structure and character validity, but it also allows problematic things like double-periods, double-backslashes, and both periods and backslashes that are preceded-, and/or followed-by spaces or periods. Paths that end with spaces and/or periods are also permitted. To address these problems I perform a second test with another (similar) pattern:

^   (?# <- Start at the beginning of the line #)
    (?# validate the opening drive or path delimiter, if present -> #)
        (?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
                (?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
            | (?# or "\", "..\", ".\", "\\" -> #)
                (?:[\/\\]{1,2}|\.{1,2}[\/\\])
        )?
    (?# ensure that undesired patterns aren't present in the string -> #)
        (?:([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*
    [^\x00-\x1A|*?\s+,;"'`:<.>=[\]]) (?# <- Ensure that the last character is valid #)
$   (?# <- End at the end of the line. #)

This validates that, within the path body, no multiple-periods, multiple-slashes, period-slashes, space-slashes, slash-spaces or slash-periods occur, and that the path doesn't end with an invalid character. Annoyingly, I have to re-validate the <root> group because it's the one place where some of these combinations are allowed (i.e. ".\", "\\", and "..\") and I don't want those to invalidate the pattern.

Here is an implementation of my test (in C#):

/// <summary>Performs pattern testing on a string to see if it's in a form recognizable as an absolute path.</summary>
/// <param name="test">The string to test.</param>
/// <param name="testExists">If TRUE, this also verifies that the specified path exists.</param>
/// <returns>TRUE if the contents of the passed string are valid, and, if requested, the path exists.</returns>
public bool ValidatePath( string test, bool testExists = false )
{
    bool result = !string.IsNullOrWhiteSpace(test);
    string 
        drivePattern = /* language=regex */ 
           @"^(([A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)|([\/\\]{1,2}|\.{1,2}[\/\\]))?",
        pattern = drivePattern + /* language=regex */ 
           @"([^\x00-\x1A|*?\t\v\f\r\n+\/,;""'`\\:<>=[\]]+[\/\\]?)+$";
    result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
    pattern = drivePattern + /* language=regex */
        @"(([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*[^\x00-\x1A|*?\s+,;""'`:<.>=[\]])$";
    result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
    return result && (!testExists || Directory.Exists( test ));
}

1 Comment

This is the regex without the comments : ^(?:(?:[A-Z]:(?:\.{1,2}[\/\]|[\/\])?)|(?:[\/\]{1,2}|\.{1,2}[\/\]))?(?:([^\/\\. ]|[^\/. \][\/. \][^\/. \]|[\/\]$)*[^\x00-\x1A|*?\s+,;"'`:<.>=[]])$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.