3

Having trouble determining if a file name conforms to a specific convention as follows. Using regular expression in C# .Net 4.0.

Valid Format: xxxxT_SSS_sss[i]_t#y.png
where
// x = Any single character.
// T = Digit: 1 to 7 inclusive.
// _SSS = Positive Integer: 000 to 999 inclusive. Always padded with leading zeros.
// _sss = Positive Integer: 000 to 999 inclusive. Always padded with leading zeros.
// i = Random text of any length including any characters. Will always be enclosed in square [] brackets. Optional.
// _t = Positive Integer: 0 to 999 inclusive. Not padded. Optional.
// #y = Positive Integer: 0 to 999 inclusive. Not padded. Optional.

UPDATE

Valid file names:
File1_000_000.png
File1_000_000_1.png
File1_000_000#2.png
File1_000_000_1#2.png
File1_000_000[text].png
File1_000_000[text]_1.png
File1_000_000[text]#2.png
File1_000_000[text]_1#2.png

The regex I've been trying is: ^(.{4}\\d_\\d{3}_\\d{3}(\\[\\w\\s]+\\])?(_\\d{1,3})?(\\#\\d{1,3})?)

This returns true for all the sample file names BUT, if I change File1_000_000[text]_1#2.png to File1_000_000[text]_#2.png by deleting the digit 1, it still returns true. The underscore is a part of the _t.

4
  • Can you qualify what you mean by "character" for "x". Digits and word characters? In the meantime, will assume you mean legal filename characters for Windows (since this is a C# question) Commented Feb 23, 2012 at 16:48
  • "// T = Digit: 2 to 7 inclusive." - but your example is File1. Do you mean [1-7]? Commented Feb 23, 2012 at 17:20
  • "x" could be any letter, digit, whitespace or legal file character. Commented Feb 24, 2012 at 5:59
  • "T": Yes typo. It can range between 1 to 7 inclusive. Commented Feb 24, 2012 at 6:00

6 Answers 6

2

A question on regex that doesn't involve HTML parsing, is a rarity!

Try the following:

@"^.{4}[2-7](_\d{3}){2}(\[.*?\])?(_\d{1,3})?(#\d{1,3})?\.png$"

This breaks down into:

^             Start of string
.{4}          Any character, exactly 4 times
[2-7]         A number in the range 2 - 7 once
(_\d{3}){2}   An underscore followed by 3 numbers, twice
(\[.*?\])?    An opening square bracket followed by any number of characters and closed by a square bracket 0 or 1 times
(_\d{1,3})?   An underscore followed by at least 1 and up to 3 numbers 0 or 1 times
(#\d{1,3})?   A pound (#) followed by at least 1 and up to 3 numbers 0 or 1 times
\.png$        Ending in .png
Sign up to request clarification or add additional context in comments.

3 Comments

Regex tips: [\d] = \d, {0,1} = ?. Also, you need to escape the . in .png.
Thanks. Is there a way to extract these matched parts using the same RegEx instance? I've been manually parsing otherwise.
@RaheelKhan Yes, any portion in braces forms a capture group. Have a look at msdn.microsoft.com/en-us/library/bs2twtah.aspx for some more information on grouping.
2

I'll just rewrite one, here:

^.{4}[2-7](_\d{3}){2}(\[[^\]]*\])?(_\d{1,3})?(\#\d{1,3})?\.png$

The problem right now is that you're not matching .png and you're not anchoring the end - the match ends prematurely. Also, you can avoid the double-escaping by prefixing your string with @:

@"^.{4}[2-7](_\d{3}){2}(\[[^\]]*\])?(_\d{1,3})?(\#\d{1,3})?\.png$"

2 Comments

If I add braces to enclose content after start and before end like so: @"^(.{4}[2-7](_\d{3}){2}([[^]]*])?(_\d{1,3})?(\#\d{1,3})?\.png)$", does that have any implications?
@RaheelKhan: No. Why would you do it, though?
1

Based on you "valid format", this will do the trick

^(?i)([a-z]{4}[2-7](_\d{3}){2}(\[.*?[^0-9]\])?(_\d{1,3}?)?(#\d{1,3}?)?\.png)$

remove (?i) to make the match case sensitive, and change [2-7] to [1-7] to make it match the files you gave (you said valid were 2-7, but your sample files are File1...)

Comments

0

This unit test will fail because you have specified the pattern must have an integer 2-7 at the 4th character positionm, where as File1 has 1.

[Test]
public void StackOverflow()
{
    Regex pattern = new Regex(@"^.{4}[2-7](_\d{3}){2}(\[[^\]]+\])?(_\d{1,3})?(#\d{1,3})?\.png$");

    Assert.IsTrue(pattern.IsMatch("File1_000_000.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000_1.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000#2.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000_1#2.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000[text].png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000[text]_1.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000[text]#2.png"));
    Assert.IsTrue(pattern.IsMatch("File1_000_000[text]_1#2.png"));

    Assert.IsFalse(pattern.IsMatch("File1_000_000[text]_#2.png"));
}

Comments

0

This one should work:

^.{4}[2-7]{3}(_[0-9]{3}){2}(\[.+?\])?(_[1-9][0-9]{0,2})?(#[1-9][0-9]{0,2})?\.png$

Comments

0
@"^.{4}[2-7](_\d{3}){2}(\[.+\])?(_\d{1,3})?(\#\d{1,3})?\.png$"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.