1

I have a dataframe and a dictionary as follows (but much bigger),

import pandas as pd
df = pd.DataFrame({'text': ['can you open the door?','shall you write the address?']})

dic = {'Should': ['can','could'], 'Could': ['shall'], 'Would': ['will']}

I would like to replace the words in the text column if they can be found in dic list of values, so i did the following and it works for the lists that have one value but not for the other list,

for key, val in dic.items():
    if df['text'].str.lower().str.split().map(lambda x: x[0]).str.contains('|'.join(val)).any():
       df['text'] = df['text'].str.replace('|'.join(val), key, regex=False)
print(df)

my desired output would be,

              text
0   Should you open the door?
1  Could you write the address?

2 Answers 2

1

The best is to change the logic and try to minimize the pandas steps.

You can craft a dictionary that will directly contain your ideal output:

dic2 = {v:k for k,l in dic.items() for v in l}
# {'can': 'Should', 'could': 'Should', 'shall': 'Could', 'will': 'Would'}

# or if not yet formatted:
# dic2 = {v.lower():k.capitalize() for k,l in dic.items() for v in l}

import re
regex = '|'.join(map(re.escape, dic2))

df['text'] = df['text'].str.replace(f'\b({regex})\b',
                                    lambda m: dic2.get(m.group()),
                                    case=False, # only if case doesn't matter
                                    regex=True)

output (as text2 column for clarity):

                           text                         text2
0        can you open the door?     Should you open the door?
1  shall you write the address?  Could you write the address?
Sign up to request clarification or add additional context in comments.

Comments

1

You can use lowercase in flatten dictionary to d for keys and values, then replace values with words boundaries and last use Series.str.capitalize:

d = {x.lower(): k.lower() for k, v in dic.items() for x in v}


regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df['text'] = (df['text'].str.lower()
                        .str.replace(regex, lambda x: d[x.group()], regex=True)
                        .str.capitalize())
print(df)
                           text
0     Should you open the door?
1  Could you write the address?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.