validate multiple rows with multiple regex from two pandas df

Question

I'm trying to validate, if zip code for many different countries which stored in a table are in a correct format, as an example:

ZIP	COUNTRY_CODE
1033 SC	NL
60593	DE

To do that I have a separate DF with country code and regular expressions as a pattern for every zip code.

REGEX	COUNTRY
\d{4}[ ]?[A-Z]{2}	NL
\d{5}	DE

I'm trying to merge this to tables based on a coutry code and then create which indicates as True or False if the zip code based on regex is correct.

Here is my currecnt code:

df_merged = pd.merge(regex_df, zip_df,  left_on = 'CODE',   right_on= 'COUNTRY_CODE')
df_merged['zip_correct'] = df_mergedf.CODE_y.str.contains(df_merged.REGEX.str, regex= True, na=False)

Hovewer I'm etting only false results since pandas is cheking the regex pattern in every row. How could I limit it to check it row by row?

Expected output:

ZIP	COUNTRY_CODE	ZIP_CORRECT
1033 SC	NL	TRUE
60593	DE	TRUE
6059TT	DE	FALSE

Could you please help?

mozway · Accepted Answer · 2022-01-12 11:43:50Z

2

I assume you might have many countries. You can apply re.match with the matching country regex on each row:

import re

s = regex_df.set_index('COUNTRY')['REGEX']

df['ZIP_CORRECT'] = df.apply(lambda x: bool(re.match('%s$' % s[x['COUNTRY_CODE']], x['ZIP'])), axis=1)

output:

       ZIP COUNTRY_CODE  ZIP_CORRECT
0  1033 SC           NL         True
1    60593           DE         True
2   6059TT           DE        False

answered Jan 12, 2022 at 11:43

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2022-01-12 12:00:17Z

1

You can use

import pandas as pd
import re

zip_df = pd.DataFrame({"ZIP":["1033 SC", "60593"], "COUNTRY_CODE": ["NL","DE"]})
regex_df = pd.DataFrame({"REGEX":[r"\d{4}[ ]?[A-Z]{2}", "\d{5}"], "COUNTRY": ["NL","DE"]})

d = dict(regex_df[['COUNTRY', 'REGEX']].to_numpy())
zip_df['ZIP_CORRECT'] = zip_df.apply(lambda x: bool(re.fullmatch(d[x['COUNTRY_CODE']], x['ZIP'])), axis=1)

The d dictionary will look like {'NL': '\\d{4}[ ]?[A-Z]{2}', 'DE': '\\d{5}'}, and within the .apply(), each ZIP value from the zip_df dataframe is checked against the appropriate regex pattern from the dictionary as it is obtained form the d dictionary using the zip_df COUNTRY_CODE value as a key.

Note that re.fullmatch only returns True if the regex pattern fully matches the string, i.e. matches the entire string.

Test in Pandas (copy/pasted):

>>> import pandas as pd
>>> import re
>>> zip_df = pd.DataFrame({"ZIP":["1033 SC", "60593", "6059TT"], "COUNTRY_CODE": ["NL","DE","DE"]})
>>> regex_df = pd.DataFrame({"REGEX":[r"\d{4}[ ]?[A-Z]{2}", "\d{5}"], "COUNTRY": ["NL","DE"]})
>>> d = dict(regex_df[['COUNTRY', 'REGEX']].to_numpy())
>>> zip_df['ZIP_CORRECT'] = zip_df.apply(lambda x: bool(re.fullmatch(d[x['COUNTRY_CODE']], x['ZIP'])), axis=1)
>>> zip_df
       ZIP COUNTRY_CODE  ZIP_CORRECT
0  1033 SC           NL         True
1    60593           DE         True
2   6059TT           DE        False
>>>

edited Jan 12, 2022 at 12:00

answered Jan 12, 2022 at 11:46

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

8 Comments

mozway Over a year ago

nice, I forgot about re.fullmatch ;)

Wiktor Stribiżew Over a year ago

@mozway Yeah, it is convenient here. Something is happening to SO editing feature, I have to confirm every edit today :(

mozway Over a year ago

me too, and then it shows a blank post until I refresh, hope it will be fixed soon ;)

hpchavaz Over a year ago

@WiktorStribiżew Q2 : why going through a dict? Would it not be simpler to regex_df.set_index('COUNTRY') then in the lambda get the regex via regex_df['REGEX'][x['COUNTRY_CODE']]

Wiktor Stribiżew Over a year ago

@hpchavaz Because it works. Why not?

|

Collectives™ on Stack Overflow

validate multiple rows with multiple regex from two pandas df

2 Answers 2

Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related