Using a list as an argument for a regular expression in python

Question

I am building regular expressions to find dates in my text. I have created lists for the month name, day, and specials characters that are part of a date.

dict_month_name =['january','february','march','april','may','june','july','august','september','october','november','december']

dict_day =['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']

dict_special_char = ['-', '/', '.', ',' ,'',' ']

I have also compiled them as shown below.

month_name = re.compile('|'.join(dict_month_name))

day = re.compile('|'.join(dict_day))

special_char = re.compile('|'.join(dict_special_char))

Now, in my regular expression shown below, I want to use different variations of the lists I created earlier. For e.g. to search for dates like - Monday, January 2017 the regex would be -

regexp1 = re.findall('.*?^(day+,\s,month_name+\s[0-9][0-9][0-9][0-9])$.*', text)

However, the regex is not returning any output. I need to solve this using regex and not the datetime module. Is there a way I can include my list inside the regular expression as shown above ?

regexp1 is not using any of the precompiled regexes, and is literally searching for 'day' and 'month_name' in text. — DeepSpace
– DeepSpace, Commented Mar 1, 2018 at 13:32
I don't think there's a way to directly combine compiled regexes. Closest I could find is this. — glibdud
– glibdud, Commented Mar 1, 2018 at 13:32
@DeepSpace Is there a way I can tell the re.findall function to read "day" and "month_name" as a list and not text to search for a pattern as you mentioned? — user8929822
– user8929822, Commented Mar 1, 2018 at 13:41
thanks for the advice. I have reviewed my questions and accepted answers as appropriate. — user8929822
– user8929822, Commented Mar 6, 2018 at 18:57

Wiktor Stribiżew · Accepted Answer · 2018-03-01 21:56:24Z

1

You may combine the regex the following way:

import re
dict_month_name =['january','february','march','april','may','june','july','august','september','october','november','december']
dict_day =['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
dict_special_char = ['-', '/', '.', ',' ,'',' ']

s = 'For e.g. to search for dates like - Monday, January 2017 the regex would be'
rx = r"\b(?:{day})[{special}]\s+(?:{month_name})\s+[0-9]{{4}}\b".format(
    day="|".join(dict_day), 
    special="".join([re.escape(x) for x in dict_special_char]), 
    month_name="|".join(dict_month_name))

print(re.findall(rx, s, re.I)) # => ['Monday, January 2017']

See the Python demo.

In this example, the regex will be

\b(?:Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)[\-\/\.\,\ ]\s+(?:january|february|march|april|may|june|july|august|september|october|november|december)\s+[0-9]{4}\b

You see that the patterns are now part of a bigger pattern. re.I enables case insensitive matching.

Also note that special chars should be escaped with [re.escape(x) for x in dict_special_char] in order to get matched as literal chars.

answered Mar 1, 2018 at 21:56

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user8929822 Over a year ago

Thanks. That seemed to work. One more thing though. I have many variations within my date data. Do I need to write a regex for each unique format or is there a more efficient way to solve this using a regex dictionary method?

Wiktor Stribiżew Over a year ago

@user8929822 I think you need to handle them with their respective patterns, but you may use | to add alternatives to one single regex.

Collectives™ on Stack Overflow

Using a list as an argument for a regular expression in python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related