0

I could have 2 types of strings:

  • 234_are223422_nextdate_2210.txt
  • 234_are223422.txt

and I want a general expression that would return the string even if "_nextdate_2210" is not present in the string. The expression should depend on the keyword "_nextdate" because I do not know precisely the numerical part I tried something like this:

  • re.search(r'[\w|\w]*?' + '(nextdate[\w|\w]?)'+ '.txt', 234_are223422_nextdate_2210.txt )
  • re.search(r'[\w|\w]*?' + '?(nextdate[\w|\w])?' + '.txt', 234_are223422_nextdate_2210.txt )
  • re.search(r'[\w|\w]*?' + '?:(nextdate[\w|\w])' + '.txt', 234_are223422_nextdate_2210.txt )

I know that might be too easy but I could not manage too find the correct form

1
  • Maybe review the Stack Overflow regex tag info page for guidance and answers to several beginner FAQs. It's unclear what you hope for [\w|\w] to match but it is exactly equivalent to [\w|], i.e. a character class which can match one character which is a "word" character (\w) or the literal |. Commented Oct 14, 2023 at 17:36

2 Answers 2

0

IIUC you can do optional group (?:...)? (note the ? at the end) (Regex101 demo):

import re

text = """\
234_are223422_nextdate_2210.txt
234_are223422.txt
this_should_not_be_matched.txt"""

for m in re.findall(r"\d+_are\d+(?:_nextdate_\d+)?\.txt", text):
    print(m)

Prints:

234_are223422_nextdate_2210.txt
234_are223422.txt
Sign up to request clarification or add additional context in comments.

Comments

0

A similar approach to the existing ones in OP's question is this one:

import re

string1 = "234_are223422.txt"
string2 = "234_are223422_nextdate_2210.txt"

regex = re.compile(r"\w+(?:nextdate\w+)?\.txt")

re.search(regex, string1).group()  # outputs "234_are223422.txt"
re.search(regex, string2).group()  # outputs "234_are223422_nextdate_2210.txt"
  • \w+ is very broad. Consider using \d+_are\d+_? or similar
  • (?: ...) is using a non-capturing group, also suitable for re.find
  • The last ? will make the whole non-capturing group optional

Some comments to your approaches:

  • You don't need to concatenate regex strings. The verbose flag is very helpful. You can even comment your parts of the regex.
  • With [\w|\w] you maybe want to express \w or \W. I don't know, it's just a guess. Then, it is easier to write . instead, matching everything on a single line.
  • A non-greedy match *? is not necessary if the part with the nextdate is optional.
  • With ?:(...) you maybe wanted to express the non-capturing group. But the syntax is that you have to put the ?: as first element inside the parenthesis as you can see above in my example.
  • Making something optional is done by ? at the end of an expression. If you apply this on a group like (...)?, it makes the whole group optional. If it is applied on a single character like a?, it would make the a optional. A single letter could also be expressed as a set of characters. [\w|\w]? will therefore also recognize only one character (without knowing exactly what you want to gain exactly from this expression).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.