Pattern Matching Error in Python Using Pandas.series.str.contains for String Replacement

Question

I am attempting to replace a string within a pandas dataframe, with a string pulled from a dictionary which contains multiple sets of parentheses. When running the script, I get an error for match groups, and the string is not replaced. I'm fairly confident that this error is caused by the parentheses.

To resolve, I have been attempting to use regular expression pattern matching using the str.contains() method. I have reviewed other solutions provided on stackoverflow, but haven't been successful in resolving my error.

Here is some script I am using for testing purposes. It's important that the parentheses are maintained in the strings (i.e. I don't to have to remove them):

import pandas as pd
import numpy as np

dict= {'2017() (pat)':'2000',
       '2018() (pat)':'2001'}

df = pd.DataFrame({'YEAR': ['test2017end','test2018end','test2019end'],
                   'MONTH': ['Jan','Feb','Mar'],
                   'DD': ['1','12','22']})

for init, repl in dict.items():
    df.loc[df['YEAR'].str.contains(init),'YEAR'] = repl

print(df)

Can someone please provide guidance on using pattern matching so that the strings are properly replaced?

Thanks!

Don't name dictionaries dict

user3483203
– user3483203

2018-08-11 05:23:48 +00:00
Commented Aug 11, 2018 at 5:23 — user3483203
– user3483203, Commented Aug 11, 2018 at 5:23

jezrael · Accepted Answer · 2018-08-11 05:28:54Z

1

Dont use variable dict, because python code keyword.

Solution is extract first integer in key of dictionary:

import re

d= {'2017() (pat)':'2000',
       '2018() (pat)':'2001'}

df = pd.DataFrame({'YEAR': ['test2017end','test2018end','test2019end'],
                   'MONTH': ['Jan','Feb','Mar'],
                   'DD': ['1','12','22']})

for init, repl in d.items():
    i = re.findall('\d+', init)[0]
    df.loc[df['YEAR'].str.contains(i),'YEAR'] = repl

print(df)
          YEAR MONTH  DD
0         2000   Jan   1
1         2001   Feb  12
2  test2019end   Mar  22

answered Aug 11, 2018 at 5:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Prayson W. Daniel · Accepted Answer · 2018-08-11 07:55:18Z

0

Have you tried methods that doesn’t involve looping? Something in this direction:

import re
import pandas as pd

dict_= {'2017() (pat)':'2000',
       '2018() (pat)':'2001'}

df = pd.DataFrame({'YEAR': ['test2017end','test2018end','test2019end'],
                   'MONTH': ['Jan','Feb','Mar'],
                   'DD': ['1','12','22']})

pat = r'(\d{4,4})'

dict_b = {re.search(pat, key).group(1):item for key, item in dict_.items()}

# Return NaN for no match
df['YEARX'] = df['YEAR'].str.extract(pat,expand=False).map(dict_b)

# Return found year for no match
df['YEARY'] = df['YEAR'].str.extract(pat,
                  expand=False).apply(lambda x: dict_b[x] if x in dict_b.keys() else x)

answered Aug 11, 2018 at 7:55

Prayson W. Daniel

15.8k6 gold badges57 silver badges62 bronze badges

Comments

Diedrich · Accepted Answer · 2018-08-11 13:55:14Z

Thank you for the quick responses. My code was a little more complicated than I posted, and I was actually matching characters rather than numbers. I modified jerzael's response for this and the script functions correctly. Here is my test script I used:

import pandas as pd
import numpy as np
import re

dct= {'love (one)()':'john',
       'smith (two)()':'doe',
       'ken (three)()':'yearns'}

df = pd.DataFrame({'MAN': ['test|smith (two)()end','test|love (one)()end','test|ken (three)()end'],
                   'MONTH': ['Jan','Feb','Mar'],
                   'DD': ['1','12','22']})

for init, repl in dct.items():
    i = re.findall(r'\w+', init)[0]
    df.loc[df['MAN'].str.contains(i),'MAN'] = repl

print(df)

For the beginners like me, the regular expression how to documentation is a must (https://docs.python.org/3/howto/regex.html#regex-howto)

Cheers

Collectives™ on Stack Overflow

Pattern Matching Error in Python Using Pandas.series.str.contains for String Replacement

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related