Replace exact substrings in a column based on a table / list / dataframe in python

Question

I have a pandas dataframe df, where one column has a string in it:

columnA
'PSX - Judge A::PSK-Ama'
'VSC - Jep::VSC-Da'
'VSO - Jep::VSO-Da'
...

And I have another dataframe, where I have the new strings:

old new
PSX PCC
VSO VVV

My desired outcome would be:

columnA
'PCC - Judge A::PCC-Ama'
'VSC - Jep::VSC-Da'
'VVV - Jep::VVV-Da'
...

My idea would be to write:

import re
df['columnA'] = df.replace('PSX', 'PCC', regex=True)
df['columnA'] = df.replace('VSO', 'VVV', regex=True)

for two replacements it is ok, but how to do it for severel replacements? Is there a smarter way to do it?

The dataframe you get here (thx to Daniel):

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])
replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])

Why did not you involve "another dataframe" ? What's the sense of mentioning it? — RomanPerekhrest
– RomanPerekhrest, Commented Nov 18, 2019 at 14:23

Dani Mesejo · Accepted Answer · 2019-11-18 14:38:40Z

1

You could use the fact that the replacement parameter can be a function:

import pandas as pd

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])

replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])

lookup = dict(zip(replace.old, replace.new))


def repl(w, lookup=lookup):
    return lookup.get(w.group(), w.group())


df['columnA'] = df['columnA'].str.replace('\w+', repl)

print(df)

Output

                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da

The idea is to extract the words in columnA and if it matches one in lookup replace it. This is inspired by this answer, in which bench-marking shows this to be the more competitive approach.

edited Nov 18, 2019 at 14:38

answered Nov 18, 2019 at 14:31

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PV8 Over a year ago

This takes 1.26359 seconds in my case (only measured once)

rbcvl · Accepted Answer · 2019-11-18 14:26:23Z

1

for row in df_map.iterrows():
    df['columnA'] = df.replace(row[0], row[1], regex=True)

Where df_map is your mapping DataFrame.

answered Nov 18, 2019 at 14:26

rbcvl

5163 silver badges13 bronze badges

1 Comment

PV8 Over a year ago

This takes 9.5315 seconds in my case (only measured once)

Erfan · Accepted Answer · 2019-11-18 14:33:35Z

1

You can make a "replacement dictionary" out of your second dataframe and then iterate over the keys and values and meanwhile use str.replace. This solution should be quite fast:

replacements = dict(zip(df2['old'], df2['new']))

for k, v in replacements.items():
    df['columnA'] = df['columnA'].str.replace(k, v)

                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da

answered Nov 18, 2019 at 14:33

Erfan

43.4k10 gold badges76 silver badges86 bronze badges

1 Comment

PV8 Over a year ago

This takes 3.5880 seconds in my case (only measured once)

Collectives™ on Stack Overflow

Replace exact substrings in a column based on a table / list / dataframe in python

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related