Optional part of regex expression [duplicate]

Question

I could have 2 types of strings:

234_are223422_nextdate_2210.txt
234_are223422.txt

and I want a general expression that would return the string even if "_nextdate_2210" is not present in the string. The expression should depend on the keyword "_nextdate" because I do not know precisely the numerical part I tried something like this:

re.search(r'[\w|\w]*?' + '(nextdate[\w|\w]?)'+ '.txt', 234_are223422_nextdate_2210.txt )
re.search(r'[\w|\w]*?' + '?(nextdate[\w|\w])?' + '.txt', 234_are223422_nextdate_2210.txt )
re.search(r'[\w|\w]*?' + '?:(nextdate[\w|\w])' + '.txt', 234_are223422_nextdate_2210.txt )

I know that might be too easy but I could not manage too find the correct form

Maybe review the Stack Overflow regex tag info page for guidance and answers to several beginner FAQs. It's unclear what you hope for [\w|\w] to match but it is exactly equivalent to [\w|], i.e. a character class which can match one character which is a "word" character (\w) or the literal |. — tripleee
– tripleee, Commented Oct 14, 2023 at 17:36

Andrej Kesely · Accepted Answer · 2023-10-14 17:19:25Z

0

IIUC you can do optional group (?:...)? (note the ? at the end) (Regex101 demo):

import re

text = """\
234_are223422_nextdate_2210.txt
234_are223422.txt
this_should_not_be_matched.txt"""

for m in re.findall(r"\d+_are\d+(?:_nextdate_\d+)?\.txt", text):
    print(m)

Prints:

234_are223422_nextdate_2210.txt
234_are223422.txt

answered Oct 14, 2023 at 17:19

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

colidyre · Accepted Answer · 2023-10-14 15:33:37Z

A similar approach to the existing ones in OP's question is this one:

import re

string1 = "234_are223422.txt"
string2 = "234_are223422_nextdate_2210.txt"

regex = re.compile(r"\w+(?:nextdate\w+)?\.txt")

re.search(regex, string1).group()  # outputs "234_are223422.txt"
re.search(regex, string2).group()  # outputs "234_are223422_nextdate_2210.txt"

\w+ is very broad. Consider using \d+_are\d+_? or similar
(?: ...) is using a non-capturing group, also suitable for re.find
The last ? will make the whole non-capturing group optional

Some comments to your approaches:

You don't need to concatenate regex strings. The verbose flag is very helpful. You can even comment your parts of the regex.
With [\w|\w] you maybe want to express \w or \W. I don't know, it's just a guess. Then, it is easier to write . instead, matching everything on a single line.
A non-greedy match *? is not necessary if the part with the nextdate is optional.
With ?:(...) you maybe wanted to express the non-capturing group. But the syntax is that you have to put the ?: as first element inside the parenthesis as you can see above in my example.
Making something optional is done by ? at the end of an expression. If you apply this on a group like (...)?, it makes the whole group optional. If it is applied on a single character like a?, it would make the a optional. A single letter could also be expressed as a set of characters. [\w|\w]? will therefore also recognize only one character (without knowing exactly what you want to gain exactly from this expression).

Collectives™ on Stack Overflow

Optional part of regex expression [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related