Pandas DataFrame - replace substring in column if a substring exists

Question

I am trying to update DataFrame column names if a substring exists within the column name, however I only need to update the substring, retaining all information either side of it.

Example:

import pandas as pd

lookup = [('abc_param_one_new', 'param_one'), 
          ('abc_param_two_new', 'param_two'), 
          ('abc_param_three_new', 'param_three')]

d = {'p1_abc_param_ONE_new1': [1,2],
     'p2_abc_param_one_new2': [45,3],
    'p3_abc_Param_One_new3': [76,5],
    'p4_abc_param_two_new1': [4321,6],
    'p5_abc_param_Two_new2': [3,2],
    'p6_abc_param_THREE_new1': [6,5]}

df = pd.DataFrame(d)

# make all lowercase 
df.columns = [x.lower() for x in df.columns]

I am trying to use the lookup var (this could be a different format though) to search and replace the substring within each column name.

e.g.

'p1_abc_param_ONE_new1' -> 'p1_param_one1'

I can do this primitively looping over the lookup and the checking for each column header, but can't manage to replace the substring.

Additionally it feels that the nested loop approach is not very efficient. Is there a more pythonic way to do this perhaps using built in pandas methods and/or list comprehensions?

Shubham Sharma · Accepted Answer · 2022-03-18 12:35:08Z

2

You can use series.replace with replacement dictionary

repl = {fr'(?i){k}': v for k, v in lookup}
df.columns = df.columns.to_series().replace(repl, regex=True)

   p1_param_one1  p2_param_one2  p3_param_one3  p4_param_two1  p5_param_two2  p6_param_three1
0              1             45             76           4321              3                6
1              2              3              5              6              2                5

answered Mar 18, 2022 at 12:35

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

Note this is is much less efficient than using str.replace with a single regex as I suggest here. replace will test all the items in the dictionary for all the columns!

mozway · Accepted Answer · 2022-03-18 12:32:04Z

1

You can use a regex and str.replace:

dic = dict(lookup)
pat = '|'.join(dic)
df.columns = df.columns.str.replace(pat, lambda x: dic.get(x.group(0)), regex=True)

output (df.columns):

Index(['p1_param_one1', 'p2_param_one2', 'p3_param_one3', 'p4_param_two1',
       'p5_param_two2', 'p6_param_three1'],
      dtype='object')

answered Mar 18, 2022 at 12:32

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame - replace substring in column if a substring exists

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related