-1

I have string pattern like these:

Beginning through June 18, 2022 at Noon standard time\n
Jan 20, 2022
Beginning through April 26, 2022 at 12:01 a.m. standard time

I want to extract the data part presetnt after "through" and before "at" word using python regex.

June 18, 2022
Jan 20, 2022
April 26, 2022

I can extract for the long text using re group.

s ="Beginning through June 18, 2022 at Noon standard time"
re.search(r'(.*through)(.*) (at.*)', s).group(2)

However it will not work for

s ="June 18, 2022"

Can anyone help me on that.

3
  • 1
    What language? Remove or extract? Commented Jun 17, 2022 at 10:38
  • 1
    @bobblebubble it is python Commented Jun 17, 2022 at 10:54
  • Edited your question title, please undo if my edit is inappropriate. Commented Jun 17, 2022 at 12:29

2 Answers 2

2

How about playing with optional groups and backtracking.

^(?:.*?through )?(.*?)(?: at.*)?$

See this demo at regex101 or a Python demo at tio.run

Note that if just one of the substrings are present, it will either match from the first to end of the string or from start of string to the latter. If none are present, it will match the full string.


Another idea could be to use PyPI regex which supports branch reset groups.

^(?|.*?through (.+?) at|(.+))

This one extracts the part between if both are present, else the full string. Afaik the regex module is widely compatible to Python's regex functions, just use import regex as re instead.

Demo at regex101 or Python demo at tio.run

Sign up to request clarification or add additional context in comments.

Comments

2

You may use this regex with a capture group:

(?:.* through |^)(.+?)(?: at |$)

RegEx Demo

RegEx Details:

  • (?:.* through |^): Match anything followed by " though " or start position
  • (.+?): Match 1+ of any character and capture it in group #1
  • (?: at |$): Match " at " or end of string

Code:

import re
arr = ['Beginning through June 18, 2022 at Noon standard time',
'Jan 20, 2022',
'Beginning through April 26, 2022 at 12:01 a.m. standard time']

for i in arr:
     print (re.findall(r'(?:.* through |^)(.+?)(?: at |$)', i))

Output:

['June 18, 2022']
['Jan 20, 2022']
['April 26, 2022']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.