0

I have a pandas dataframe df, where one column has a string in it:

columnA
'PSX - Judge A::PSK-Ama'
'VSC - Jep::VSC-Da'
'VSO - Jep::VSO-Da'
...

And I have another dataframe, where I have the new strings:

old new
PSX PCC
VSO VVV

My desired outcome would be:

columnA
'PCC - Judge A::PCC-Ama'
'VSC - Jep::VSC-Da'
'VVV - Jep::VVV-Da'
...

My idea would be to write:

import re
df['columnA'] = df.replace('PSX', 'PCC', regex=True)
df['columnA'] = df.replace('VSO', 'VVV', regex=True)

for two replacements it is ok, but how to do it for severel replacements? Is there a smarter way to do it?

The dataframe you get here (thx to Daniel):

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])
replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])
5
  • 2
    How about a for loop? Commented Nov 18, 2019 at 14:22
  • Hm..any vectorize solution ? Commented Nov 18, 2019 at 14:23
  • It is always single word replacement? Commented Nov 18, 2019 at 14:23
  • Why did not you involve "another dataframe" ? What's the sense of mentioning it? Commented Nov 18, 2019 at 14:23
  • It is always this type of replacement, so single word yes Commented Nov 18, 2019 at 14:24

3 Answers 3

1

You could use the fact that the replacement parameter can be a function:

import pandas as pd

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])

replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])

lookup = dict(zip(replace.old, replace.new))


def repl(w, lookup=lookup):
    return lookup.get(w.group(), w.group())


df['columnA'] = df['columnA'].str.replace('\w+', repl)

print(df)

Output

                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da

The idea is to extract the words in columnA and if it matches one in lookup replace it. This is inspired by this answer, in which bench-marking shows this to be the more competitive approach.

Sign up to request clarification or add additional context in comments.

1 Comment

This takes 1.26359 seconds in my case (only measured once)
1
for row in df_map.iterrows():
    df['columnA'] = df.replace(row[0], row[1], regex=True)

Where df_map is your mapping DataFrame.

1 Comment

This takes 9.5315 seconds in my case (only measured once)
1

You can make a "replacement dictionary" out of your second dataframe and then iterate over the keys and values and meanwhile use str.replace. This solution should be quite fast:

replacements = dict(zip(df2['old'], df2['new']))

for k, v in replacements.items():
    df['columnA'] = df['columnA'].str.replace(k, v)
                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da

1 Comment

This takes 3.5880 seconds in my case (only measured once)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.