0

Hello I'm trying to use regex to search through a markdown file for a date and only get a match if it finds an instance of a specific string before it finds another date.

This is what I have right now and it definitely doesn't work. (\d{2}\/\d{2}\/\d{2})(string)?(^(\d{2}\/\d{2}\/\d{2}))

So in this instance It would throw a match since the string is before the next date:

01/20/20

string

01/21/20

Here it shouldn't match since the string is after the next date:

01/20/20

this isn't the phrase you're looking for

01/21/20

string

Any help on this would be greatly appreciated.

1
  • Do you mean like this? \d{2}\/\d{2}\/\d{2}(?:(?!\d{2}\/\d{2}\/\d{2}).)*string.*?\d{2}\/\d{2}\/\d{2} regex101.com/r/FREPRt/1 Commented Jan 12, 2020 at 14:58

2 Answers 2

1

You could match a date like pattern. Then use a tempered greedy token approach (?:(?!\d{2}\/\d{2}\/\d{2}).)* to match string without matching another date first.

If you have matched the string, use a non greedy dot .*? to match the first occurrence of the next date.

\d{2}\/\d{2}\/\d{2}(?:(?!\d{2}\/\d{2}\/\d{2}).)*string.*?\d{2}\/\d{2}\/\d{2}

Regex demo | Python demo

For example (using re.DOTALL to make the dot match a newline)

import re

regex = r"\d{2}\/\d{2}\/\d{2}(?:(?!\d{2}\/\d{2}\/\d{2}).)*string(?:(?!string|\d{2}\/\d{2}\/\d{2}).)*\d{2}\/\d{2}\/\d{2}"

test_str = """01/20/20\n\n"
    "string\n\n"
    "01/21/20\n\n"
    "01/20/20\n\n"
    "this isn't the phrase you're looking for\n\n"
    "01/21/20\n\n"
    "string"""

print(re.findall(regex, test_str, re.DOTALL))

Output

['01/20/20\n\n"\n\t"string\n\n"\n\t"01/21/20']

If the string can not occur 2 times between the date, you might use

\d{2}\/\d{2}\/\d{2}(?:(?!\d{2}\/\d{2}\/\d{2}|string).)*string(?:(?!string|\d{2}\/\d{2}\/\d{2}).)*\d{2}\/\d{2}\/\d{2}

Regex demo

Note that if you don't want the string and the dates to be part of a larger word, you could add word boundaries \b

Sign up to request clarification or add additional context in comments.

8 Comments

@TimBiegeleisen It is just the answer for my comment, which looks like your answer afterwards :) But you will get my vote anyway
@The Fourth Bird That does match correctly but two things. How do I get it to return all instances of this match? and two How do I get it to return only the initial date after it has confirmed that it is a match.
@CaptainPo-Po Fourth's answer already matches all instances of the match +1. For the second requirement, you only need to change the capture group in the call to re.findall.
@CaptainPo-Po You could indeed use a capturing group for the first date regex101.com/r/khGQ8G/1 Using re.findall will return only the capturing group. If you want the match and the groups, you could use re.finditer. This is an example of the auto generated code by regex101 ideone.com/2M19fp
Okay yeah I see that now. I've been tinkering and I've run into an error when running this: TypeError: findall() missing 1 required positional argument: 'string' i've replaced the phrase "string" with the actual string I need, is one of those a command of some sort?
|
1

One approach here would be to use a tempered dot to ensure that the regex engine does not cross over the ending date while trying to find the string after the starting date. For example:

inp = """01/20/20

string                  # <-- this is matched

01/21/20

01/20/20

01/21/20

string"""               # <-- this is not matched

matches = re.findall(r'01/20/20(?:(?!\b01/21/20\b).)*?(\bstring\b).*?\b01/21/20\b', inp, flags=re.DOTALL)
print(matches)

This prints string only once, that match being the first occurrence, which legitimately sits in between the starting and ending dates.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.