0

I am sure that has been asked before, but I cannot find the appropriate question(s).

Being new to C#'s Regex, I want to mimic what is possible e.g. with sed and awk where I would write s/_(20[0-9]{2})[.0-9]{1}/\1/g in order to find obtain a 4-digit year number after 2000 which is has an underscore as prefix and a number or a dot afterwards. The \1 refers to the value within brackets.

Example: Both files fx_201902.csv or fx_2019.csv should give me back myYear=2019. I was not successful with:

string myYear = Regex.Replace(Path.GetFileName(x), @"_20([0-9]{2})[.0-9]{1}", "\1")

How do I have to escape? Or is this kind of replacement not possible? If so, how would I do that?

Edit: My issue how to do the /1 in C#, in other words how to extract a regex-variable. Please forgive me my typos in the original post - I am trying the new SO app and I submitted earlier than intended.

2
  • Is that a typo in the replacement string? It has "\1" but should be either @"\1" or "\\1" or "$1" Commented Feb 11, 2020 at 17:32
  • @AdrianHH: No, it was not a typo, it was me not knowing what it should be. Commented Feb 11, 2020 at 20:09

4 Answers 4

1

I'd suggest more robust regex: _(20(?:0[1-9]|[1-9][0-9]))[\d.]

Explanation:

_ - match _ literally

(...) - first capturing group

20 - match 20 literally

(?:...) - non-capturing group

0[1-9]|[1-9][0-9] - alternation: match 0 and digit other than 0 OR match digit other then zero followed by any digits - this allows you to match ANY year after 2000

[\d.] - match dot or digit

And below is how you use capturing groups:

var regex = new Regex(@"_(20(?:0[1-9]|[1-9][0-9]))[\d.]");
regex.Match("fx_201902.csv").Groups[1].Value;
// "2019"
regex.Match("fx_20190.csv").Groups[1].Value;
// "2019"
regex.Match("fx_2019.csv").Groups[1].Value;
// "2019"
Sign up to request clarification or add additional context in comments.

2 Comments

An alternation (\d|\.) is slower than a character class [.0-9] see stackoverflow.com/questions/22132450/…
@Nick thanks for remark... such obvious thing to do (character class is even shorter in terms of code). Corrected
1

To extract the year using Regex.Replace, you need to capture only the year part of the string into a group and replace the entire string with just the capture group. That means you need to also match the characters before and after the year using (for example)

^.*_(20[0-9]{2})[.0-9].*$

That can then be replaced with $1 e.g.

Regex r = new Regex(@"^.*_(20[0-9]{2})[.0-9].*$");
string filename = "fx_201902.csv";
string myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
filename = "fx_2019.csv";
myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);

Output:

2019
2019

If you want to exclude the year 2000 from your match, change the regex to

^.*_(20(?:0[1-9]|[1-9][0-9]))[.0-9].*$

Comments

1

You might use a capturing group for the first 4 digits and match what is before and after the 4 digits.

.*_(20[0-9]{2})[0-9]*\.\w+$

Explanation

  • .*_ Match the last underscore
  • (20[0-9]{2}) Match 20 and 2 digits
  • [0-9]*\. Match 0 or more occurrences of a digit followed by a dot
  • \w+$ Match 1 or or more word chars till the end of the string.

Regex demo | C# demo

In the replacement use:

$1

For example

string[] strings = {"fx_2019.csv", "fx_201902.csv"};
foreach (string s in strings)
{
    string myYear = Regex.Replace(s, @".*_(20[0-9]{2})[0-9]*\.\w+$", "$1");
    Console.WriteLine(myYear);
}

Output

2019
2019

4 Comments

The Regex is not my issue, the issue is how to do that in c#
@B--rian I have updated the answer with an example.
@Nick You are right of course, it is the sin of copy pasting
@Thefourthbird regrettably I have sinned that way myself too many times... :-)
0

Your second example does not contains the month's digits. If you still want to capture, make it optional:

Regex.Replace(Path.GetFileName(x), @"_20([1-9]{2})([.0-9]{2})?", "\1")

Note that I only added 3 characters to your query: (, ) and ?

If you want the returning value to be as expected: change the replacement to $1 from \1 as documented (with the correct parenthesis) and capture 2020, 2030, etc (still excluding 2000) with the usage of or operator and the combination of [0-9]{1} and [1-9]{1}:

Regex.Replace(Path.GetFileName(x), @"_(20(([1-9]{1})([0-9]{1})||([0-9]{1})([1-9]{1})))([.0-9]{2})?", "$1")

It worths mentioning that $3 and $4 matches the last and the 2nd last digit; and $2 matches with the last 2 digits (aka the combination of [0-9]{1} [1-9]{1} || [1-9]{1} [0-9]{1}).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.